Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
285 views
in Technique[技术] by (71.8m points)

c++ - 😃 (and other Unicode characters) in identifiers not allowed by g++

I am ?? to find that I cannot use ?? as a valid identifier with g++ 4.7, even with the -fextended-identifiers option enabled:

int main(int argc, const char* argv[])
{
  const char* ?? = "I'm very happy";
  return 0;
}

main.cpp:3:3: error: stray ‘360’ in program
main.cpp:3:3: error: stray ‘237’ in program
main.cpp:3:3: error: stray ‘230’ in program
main.cpp:3:3: error: stray ‘203’ in program

After some googling, I discovered that UTF-8 characters are not yet supported in identifiers, but a universal-character-name should work. So I convert my source to:

int main(int argc, const char* argv[])
{
  const char* U0001F603 = "I'm very happy";
  return 0;
}

main.cpp:3:15: error: universal character U0001F603 is not valid in an identifier

So apparently ?? isn't a valid identifier character. However, the standard specifically allows characters from the range 10000-1FFFD in Annex E.1 and doesn't disallow it as an initial character in E.2.

My next effort was to see if any other allowed Unicode characters worked - but none that I tried did. Not even the ever important PILE OF POO (??) character.

So, for the sake of meaningful and descriptive variable names, what gives? Does -fextended-identifiers do as it advertises or not? Is it only supported in the very latest build? And what kind of support do other compilers have?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

As of 4.8, gcc does not support characters outside of the BMP used as identifiers. It seems to be an unnecessary restriction. Also, gcc only supports a very restricted set of character described in ucnid.tab, based on C99 and C++98 (it is not updated to C11 and C++11 yet, it seems).

As described in the manual, -fextended-identifiers is experimental, so it has a higher chance won't work as expected.


Edit:

GCC supported the C11 character set starting from 4.9.0 (svn r204886 to be precise). So OP's second piece of code using U0001F603 does work. I still can't get the actual code using ?? to work even with -finput-charset=UTF-8 with GCC 8.2 on https://gcc.godbolt.org though (You may want to follow this bug report, provided by @DanielWolf).

Meanwhile both pieces of code work on clang 3.3 without any options other than -std=c++11.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...