Process text to and from Unicode at the I/O boundaries of your program using open
with the encoding
parameter. Make sure to use the (hopefully documented) encoding of the file being read. The default encoding varies by OS (specifically, locale.getpreferredencoding(False)
is the encoding used), so I recommend always explicitly using the encoding
parameter for portability and clarity (Python 3 syntax below):
with open(filename, 'r', encoding='utf8') as f:
text = f.read()
# process Unicode text
with open(filename, 'w', encoding='utf8') as f:
f.write(text)
If still using Python 2 or for Python 2/3 compatibility, the io
module implements open
with the same semantics as Python 3's open
and exists in both versions:
import io
with io.open(filename, 'r', encoding='utf8') as f:
text = f.read()
# process Unicode text
with io.open(filename, 'w', encoding='utf8') as f:
f.write(text)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…