glyph - Find characters that are similar glyphically in Unicode?

Question

Welcome To Ask or Share your Answers For Others

glyph - Find characters that are similar glyphically in Unicode?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

glyph - Find characters that are similar glyphically in Unicode?

Lets say I have the characters ú, ù, ü. All of them are similar glyphically to the English U.

Is there some list or algorithm to do this:

Given a ú or ù or ü return the English U
Given a English U, return the list of all U-similar characters

I'm not sure if the code point of the Unicode characters is the same across all fonts? If it is, I suppose there could be some easy way and efficient to do this?

UPDATE

If you're using Ruby, there is a gem available unicode-confusable for this that may help in some cases.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:38:58+0000

It is very unclear what you are asking to do here.

There are characters whose canonical decompositions all start with the same base character: e, é, ê, ?, ē, ?, ?, ?, ě, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, e?, … or s, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ….
There are characters whose compatibility decompositions all include a particular character: ?, ?, ?, ?, ?, ?, ?, ?, ｅ, … or s, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ｓ, … or R, ?, ?, ?, ?, ?, ?, ?, Ｒ, ….
There are characters that just happen to look alike in some fonts: ? and β and ?, or 3 and ? and ? and ? and ? and ? and ?, or ? and ? and γ, or F and ? and ?, or B and Β and В, or ? and ○ and 0 and O and ? and ? and ? and ?, or 1 and l and I and Ⅰ and ? and | and ? and ∣, ….
Characters that are the same case-insensitively, like s and S and ?, or ss and Ss and SS and ? and ?, ….
Characters that all have the same numeric value, like all these for the value 1: 11?????????????????????????????????? ① ⑴ ⒈ ? ??????????????????????????????????????????????????????????????? ?? Ⅰⅰ?一㈠一????.
Characters that all have the same primary collation strength, like all these that are the same as d: DdDe??????????????????????????? ? ? ??Ｄｄ???????????????????????????????????????????????????? ?? ?? ?? ?? . Note that some of those are not accessible through any kind of decomposition, but only through the DUCET/UCA values; for example, the fairly common e or the newish ? can be equated to d only through a primary UCA strength comparison; same with ? and z, ? and c, etc.
Characters that are same in certain locales, like ? and ae, or ? and ae, or ? and aa, or MacKinley and McKinley, …. Note that locale can make a really big difference, since in some locales both c and ? are the same character while in others they are not; similarly for n and ?, or a and á and ?, ….

Some of these can be handled. Some cannot. All require different approaches depending on different needs.

What is your real goal?

Categories

glyph - Find characters that are similar glyphically in Unicode?

glyph - Find characters that are similar glyphically in Unicode?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags