java - Separating Unicode ligature characters

Question

Welcome To Ask or Share your Answers For Others

java - Separating Unicode ligature characters

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:56:18+0000

U+FB00 is a compatibility character. Normally Unicode doesn't support separate codepoints for ligatures (arguing that it's a layout decision if and when a ligature should be used and should not influence how the data is stored). A few of those still exist to allow round-trip conversion compatibility with older encodings that do represent ligatures as separate entities.

Luckily, the information which characters the ligature represents is present in the Unicode data file and most capable string handling systems have that data built-in.

In Java, you'll need to use the Normalizer class and the NFKC form:

String ff ="uFB00";
String normalized = Normalizer.normalize(ff, Form.NFKC);
System.out.println(ff + " = " + normalized);

This will print

? = ff

Categories

java - Separating Unicode ligature characters

java - Separating Unicode ligature characters

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags