Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
603 views
in Technique[技术] by (71.8m points)

unicode - How to make python 3 print() utf8

How can I make python 3 (3.1) print("Some text") to stdout in UTF-8, or how to output raw bytes?

Test.py

TestText = "Test - āāēē??..??ūū??" # this is UTF-8
TestText2 = b"Test2 - xc4x81xc4x80xc4x93xc4x92xc4x8dxc4x8c..xc5xa1xc5xa0xc5xabxc5xaaxc5xbexc5xbd" # just bytes
print(sys.getdefaultencoding())
print(sys.stdout.encoding)
print(TestText)
print(TestText.encode("utf8"))
print(TestText.encode("cp1252","replace"))
print(TestText2)

Output (in CP1257 and I replaced chars to byte values [x00]):

utf-8
cp1257
Test - [xE2][xC2][xE7][C7][xE8][xC8]..[xF0][xD0][xFB][xDB][xFE][xDE]  
b'Test - xc4x81xc4x80xc4x93xc4x92xc4x8dxc4x8c..xc5xa1xc5xa0xc5xabxc5xaaxc5xbexc5xbd'
b'Test - ??????..x9ax8a??x9ex8e'
b'Test2 - xc4x81xc4x80xc4x93xc4x92xc4x8dxc4x8c..xc5xa1xc5xa0xc5xabxc5xaaxc5xbexc5xbd'

print is just too smart... :D There's no point using encoded text with print (since it always show only representation of bytes not real bytes) and it's impossible to output bytes at all, because print anyway and always encodes it in sys.stdout.encoding.

For example: print(chr(255)) throws an error:

Traceback (most recent call last):
  File "Test.py", line 1, in <module>
    print(chr(255));
  File "H:Python31libencodingscp1257.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character 'xff' in position 0: character maps to <undefined>

By the way print( TestText == TestText2.decode("utf8")) returns False, although print output is the same.


How does Python 3 determine sys.stdout.encoding and how can I change it?

I made a printRAW() function which works fine (actually it encodes output to UTF-8, so really it's not raw...):

 def printRAW(*Text):
     RAWOut = open(1, 'w', encoding='utf8', closefd=False)
     print(*Text, file=RAWOut)
     RAWOut.flush()
     RAWOut.close()

 printRAW("Cool", TestText)

Output (now it print in UTF-8):

Cool Test - āāēē??..??ūū??

printRAW(chr(252)) also nicely prints ü (in UTF-8, [xC3][xBC]) and without errors :)

Now I'm looking for maybe better solution if there's any...

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Clarification:

TestText = "Test - āāēē??..??ūū??" # this not UTF-8...it is a Unicode string in Python 3.X.
TestText2 = TestText.encode('utf8') # this is a UTF-8-encoded byte string.

To send UTF-8 to stdout regardless of the console's encoding, use the its buffer interface, which accepts bytes:

import sys
sys.stdout.buffer.write(TestText2)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...