Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
821 views
in Technique[技术] by (71.8m points)

unicode - Python the same char not equals

I have text in my database. I send some text from xhr to my view. Function find does not find some unicode chars.

I want to find selected text using:

text.find(selection)

but sometimes variable 'selection' contains a char like that:

?  # in xhr unichr(281)

whereas in variable 'text' there was:

e?  # in db has two chars unichr(101) + unichr(808)

They are just different forms of the same thing. How to make .find work more reliably here?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here unicodedata.normalize might help you.

Basically if you normalize the data coming from the db, and normalize your selection to the same form, you should have a better result when using str.find, str.__contains__ (i.e. in), str.index, and friends.

>>> u1 = chr(281)
>>> u2 = chr(101) + chr(808)
>>> print(u1, u2)
? e?
>>> u1 == u2
False
>>> unicodedata.normalize('NFC', u2) == u1
True

NFC stands for the Normal Form Composed form. You can read up here for some description of the other possible forms.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...