phpreaddir乱码技巧_csv 文件读写乱码问题的一个简单解决方法

文章目录 [+]

本日扼要总结一个处理csv文件乱码问题，可能你有类似经历，用excel打开一个csv文件，中文全部显示乱码。
然后，手动用notepad++打开，修正编码为utf-8并保存后，再用excel打开显斧正常。

本日利用Python，很少代码就能将上面过程自动化。
首先，导入3个模块：

phpreaddir乱码技巧_csv 文件读写乱码问题的一个简单解决方法

#coding:utf-8#@author:zhenguo#@date:2020-12-16#@describe:functionsaboutautomaticfileprocessingimportpandasaspdimportosimportchardet

chardet 模块用于得到文件的编码格式，pandas 按照这个格式读取，然后保存为xlsx格式。

（图片来自网络侵删）

获取filename文件的编码格式：

defget_encoding(filename):"""返回文件编码格式"""withopen(filename,'rb')asf:returnchardet.detect(f.read())['encoding']

保存为utf-8编码xlsx格式文件，支持csv, xls, xlsx 格式的文件乱码处理。
须要把稳，如果读入文件为csv格式，保存时要利用xlsx格式：

defto_utf8(filename):"""保存为to_utf-8"""encoding=get_encoding(filename)ext=os.path.splitext(filename)ifext[1]=='.csv':if'gb'inencodingor'GB'inencoding:df=pd.read_csv(filename,engine='python',encoding='GBK')else:df=pd.read_csv(filename,engine='python',encoding='utf-8')df.to_excel(ext[0]+'.xlsx')elifext[1]=='.xls'orext[1]=='.xlsx':if'gb'inencodingor'GB'inencoding:df=pd.read_excel(filename,encoding='GBK')else:df=pd.read_excel(filename,encoding='utf-8')df.to_excel(filename)else:print('onlysupportcsv,xls,xlsxformat')

上面函数实现单个文件转化，下面batch_to_utf8 实现目录 path 下所有后缀为ext_name文件的批量乱码转化：

defbatch_to_utf8(path,ext_name='csv'):"""path下，后缀为ext_name的乱码文件，批量转化为可读文件"""forfileinos.listdir(path):ifos.path.splitext(file)[1]=='.'+ext_name:to_utf8(os.path.join(path,file))

调用：

if__name__=='__main__':batch_to_utf8('.')#对当前目录下的所有csv文件保存为xlsx格式,utf-8编码的文件

文件读写时乱码问题，常常会碰着，相信本日这篇文章里的to_utf8，batch_to_utf8函数会办理这个问题，你如果后面碰着，不妨直接引用这两个函数考试测验下。