Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
561 views
in Technique[技术] by (71.8m points)

html - ValueError: unichr() arg not in range(0x10000) (narrow Python build)

I am trying to convert the html entity to unichar, the html entity is 󮠖 when i try to do the following:

unichr(int(976918))

I got error that:

ValueError: unichr() arg not in range(0x10000) (narrow Python build)

seems like it is out of range conversion for unichar.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can decode a string that has a Unicode escape (U followed by 8 hex digits, zero-padded) using the "unicode-escape" encoding:

>>> s = "\U%08x" % 976918
>>> s
'\U000ee816'

>>> c = s.decode('unicode-escape')
>>> c
u'U000ee816'

On a narrow build it's stored as a UTF-16 surrogate pair:

>>> list(c)
[u'udb7a', u'udc16']

This surrogate pair is processed correctly as a code unit during encoding:

>>> c.encode('utf-8')
'xf3xaexa0x96'

>>> 'xf3xaexa0x96'.decode('utf-8')
u'U000ee816'

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...