Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
269 views
in Technique[技术] by (71.8m points)

python - how to show chinese word , not unicode word

this is my code:

from whoosh.analysis import RegexAnalyzer
    rex = RegexAnalyzer(re.compile(ur"([u4e00-u9fa5])|(w+(.?w+)*)"))
    a=[(token.text) for token in rex(u"hi 中 000 中文测试中文 there 3.141 big-time under_score")]

    self.render_template('index.html',{'a':a})

and it show this on the web page:

[u'hi', u'u4e2d', u'000', u'u4e2d', u'u6587', u'u6d4b', u'u8bd5', u'u4e2d', u'u6587', u'there', u'3.141', u'big', u'time', u'under_score']

but i want to show chinese word , so i change this:

a=[(token.text).encode('utf-8') for token in rex(u"hi 中 000 中文测试中文 there 3.141 big-time under_score")]

and it show :

['hi', 'xe4xb8xad', '000', 'xe4xb8xad', 'xe6x96x87', 'xe6xb5x8b', 'xe8xafx95', 'xe4xb8xad', 'xe6x96x87', 'there', '3.141', 'big', 'time', 'under_score']

so how to show chinese word in my code,

thanks

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

By default, printing a larger built-in structure gives the repr() of each of the elements. If you want the str()/unicode() instead then you need to iterate over the sequence yourself.

a = u"['" + u"', '".join(token.text for token in ...) + u"']"
print a

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...