python - Evaluate UTF-8 literal escape sequences in a string in Python3

Question

Welcome To Ask or Share your Answers For Others

python - Evaluate UTF-8 literal escape sequences in a string in Python3

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Evaluate UTF-8 literal escape sequences in a string in Python3

I have a string of the form:

s = '\xe2\x99\xac'

I would like to convert this to the character ? by evaluating the escape sequence. However, everything I've tried either results in an error or prints out garbage. How can I force Python to convert the escape sequence into a literal unicode character?

What I've read elsewhere suggests that the following line of code should do what I want, but it results in a UnicodeEncodeError.

print(bytes(s, 'utf-8').decode('unicode-escape'))

I also tried the following, which has the same result:

import codecs
print(codecs.getdecoder('unicode_escape')(s)[0])

Both of these approaches produce the string 'ax99?', which print is subsequently unable to handle.

In case it makes any difference the string is being read in from a UTF-8 encoded file and will ultimately be output to a different UTF-8 encoded file after processing.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:58:54+0000

...decode('unicode-escape') will give you string 'xe2x99xac'.

>>> s = '\xe2\x99\xac'
>>> s.encode().decode('unicode-escape')
'ax99?'
>>> _ == 'xe2x99xac'
True

You need to decode it. But to decode it, encode it first with latin1 (or iso-8859-1) to preserve the bytes.

>>> s = '\xe2\x99\xac'
>>> s.encode().decode('unicode-escape').encode('latin1').decode('utf-8')
'?'

Categories

python - Evaluate UTF-8 literal escape sequences in a string in Python3

python - Evaluate UTF-8 literal escape sequences in a string in Python3

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags