Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
285 views
in Technique[技术] by (71.8m points)

python - How to convert utf-8 fancy quotes to neutral quotes

I'm writing a little Python script that parses word docs and writes to a csv file. However, some of the docs have some utf-8 characters that my script can't process correctly.

Fancy quotes show up quite often (u'u201c'). Is there a quick and easy (and smart) way of replacing those with the neutral ascii-supported quotes, so I can just write line.encode('ascii') to the csv file?

I have tried to find the left quote and replace it:

val = line.find(u'u201c')
if val >= 0: line[val] = '"'

But to no avail:

TypeError: 'unicode' object does not support item assignment

Is what I've described a good strategy? Or should I just set up the csv to support utf-8 (though I'm not sure if the application that will be reading the CSV wants utf-8)?

Thank you

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use the Unidecode package to automatically convert all Unicode characters to their nearest pure ASCII equivalent.

from unidecode import unidecode
line = unidecode(line)

This will handle both directions of double quotes as well as single quotes, em dashes, and other things that you probably haven't discovered yet.

Edit: a comment points out if your language isn't English, you may find ASCII to be too restrictive. Here's an adaptation of the above code that uses a whitelist to indicate characters that shouldn't be converted.

>>> from unidecode import unidecode
>>> whitelist = set('μàá??????èéê?ìí??D?òó????ùú?üYT?àáa?????èéê?ìí??e?òó????ùú?üyt?')
>>> line = 'u201cRésuméu201d'
>>> print(line)
“Résumé”
>>> line = ''.join(c if c in whitelist else unidecode(c) for c in line)
>>> print(line)
"Résumé"

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...