Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
435 views
in Technique[技术] by (71.8m points)

unicode - Java charAt used with characters that have two code units

From Core Java, vol. 1, 9th ed., p. 69:

The character ? requires two code units in the UTF-16 encoding. Calling

String sentence = "? is the set of integers"; // for clarity; not in book
char ch = sentence.charAt(1)

doesn't return a space but the second code unit of ?.

But it seems that sentence.charAt(1) does return a space. For example, the if statement in the following code evaluates to true.

String sentence = "? is the set of integers";
if (sentence.charAt(1) == ' ')
    System.out.println("sentence.charAt(1) returns a space");

Why?

I'm using JDK SE 1.7.0_09 on Ubuntu 12.10, if it's relevant.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It sounds like tho book is saying that '?' is not a UTF-16 character in the basic multilingual plane, but in fact it is.

Java uses UTF-16 with surrogate pairs for characters that are not in the basic multilingual plane. Since '?' (0x2124) is in the basic multilingual plane it is represented by a single code unit. In your example sentence.charAt(0) will return '?', and sentence.charAt(1) will return ' '.

A character represented by surrogate pairs has two code units making up the character. sentence.charAt(0) would return the first code unit, and sentence.charAt(1) would return the second code unit.

See http://docs.oracle.com/javase/6/docs/api/java/lang/String.html:

A String represents a string in the UTF-16 format in which supplementary characters are represented by surrogate pairs (see the section Unicode Character Representations in the Character class for more information). Index values refer to char code units, so a supplementary character uses two positions in a String.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...