Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
705 views
in Technique[技术] by (71.8m points)

unicode - Difference between open and codecs.open in Python

There are two ways to open a text file in Python:

f = open(filename)

And

import codecs
f = codecs.open(filename, encoding="utf-8")

When is codecs.open preferable to open?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Since Python 2.6, a good practice is to use io.open(), which also takes an encoding argument, like the now obsolete codecs.open(). In Python 3, io.open is an alias for the open() built-in. So io.open() works in Python 2.6 and all later versions, including Python 3.4. See docs: http://docs.python.org/3.4/library/io.html

Now, for the original question: when reading text (including "plain text", HTML, XML and JSON) in Python 2 you should always use io.open() with an explicit encoding, or open() with an explicit encoding in Python 3. Doing so means you get correctly decoded Unicode, or get an error right off the bat, making it much easier to debug.

Pure ASCII "plain text" is a myth from the distant past. Proper English text uses curly quotes, em-dashes, bullets, € (euro signs) and even diaeresis (¨). Don't be na?ve! (And let's not forget the Fa?ade design pattern!)

Because pure ASCII is not a real option, open() without an explicit encoding is only useful to read binary files.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...