Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
771 views
in Technique[技术] by (71.8m points)

python - pandas to_csv: ascii can't encode character

I'm trying to read and write a dataframe to a pipe-delimited file. Some of the characters are non-Roman letters (`, ?, ?, etc.). But it breaks when I try to write out the accents as ASCII.

df = pd.read_csv('filename.txt',sep='|', encoding='utf-8')
<do stuff>
newdf.to_csv('output.txt', sep='|', index=False, encoding='ascii')

-------

  File "<ipython-input-63-ae528ab37b8f>", line 21, in <module>
    newdf.to_csv(filename,sep='|',index=False, encoding='ascii')

  File "C:UsersaliceellAppDataLocalContinuumAnaconda3libsite-packagespandascoreframe.py", line 1344, in to_csv
    formatter.save()

  File "C:UsersaliceellAppDataLocalContinuumAnaconda3libsite-packagespandasformatsformat.py", line 1551, in save
    self._save()

  File "C:UsersaliceellAppDataLocalContinuumAnaconda3libsite-packagespandasformatsformat.py", line 1652, in _save
    self._save_chunk(start_i, end_i)

  File "C:UsersaliceellAppDataLocalContinuumAnaconda3libsite-packagespandasformatsformat.py", line 1678, in _save_chunk
    lib.write_csv_rows(self.data, ix, self.nlevels, self.cols, self.writer)

  File "pandaslib.pyx", line 1075, in pandas.lib.write_csv_rows (pandaslib.c:19767)

UnicodeEncodeError: 'ascii' codec can't encode character 'xb4' in position 7: ordinal not in range(128)

If I change to_csv to have utf-8 encoding, then I can't read the file in properly:

newdf.to_csv('output.txt',sep='|',index=False,encoding='utf-8')
pd.read_csv('output.txt', sep='|')

> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 2: invalid start byte

My goal is to have a pipe-delimited file that retains the accents and special characters.

Also, is there an easy way to figure out which line read_csv is breaking on? Right now I don't know how to get it to show me the bad character(s).

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Check the answer here

It's a much simpler solution:

newdf.to_csv('filename.csv', encoding='utf-8')

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...