Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
359 views
in Technique[技术] by (71.8m points)

python - Converting UTF-16 to UTF-8

I've loading a string from a file. When I print out the string with:

print my_string
print binascii.hexlify(my_string)

I get:

2DF5
0032004400460035

Meaning this string is UTF-16. I would like to convert this string to UTF-8 so that the above code produces this output:

2DF5
32444635

I've tried:

my_string.decode('utf-8')

Which output:

32004400460035

EDIT:

Here's a quick sample:

    hello = 'hello'.encode('utf-16')
    print hello
    print binascii.hexlify(hello)

    hello = hello[2:].decode('utf-8')
    print hello
    print binascii.hexlify(hello)

Which produces this output:

??hello
fffe680065006c006c006f00
hello
680065006c006c006f00

Expected output would be:

??hello
fffe680065006c006c006f00
hello
68656c6c6f
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your string appears to have been encoded using utf-16be:

In [9]: s = "2DF5".encode("utf-16be")
In [11]: print binascii.hexlify(s)
0032004400460035

So, in order to convert it to utf-8, you first need to decode it, then encode it:

In [14]: uni = s.decode("utf-16be")
In [15]: uni
Out[15]: u'2DF5'

In [16]: utf = uni.encode("utf-8")
In [17]: utf
Out[17]: '2DF5'

or, in one step:

In [13]: s.decode("utf-16be").encode("utf-8")
Out[13]: '2DF5'

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...