Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
768 views
in Technique[技术] by (71.8m points)

utf 8 - Is it possible to get GCC to compile UTF-8 with BOM source files?

I develop C++ cross platform using Microsoft Visual Studio on Windows and GCC on uBuntu Linux.

In Visual Studio I can use unicode symbols like "π" and "2" in my code. Visual Studio always saves the source files as UTF-8 with BOM (Byte Order Mark).

For example:

// A = π.r2
double π = 3.14;

GCC happily compiles these files only if I remove the BOM first. If I do not remove the BOM, I get errors like these:

wwga_hydutils.cpp:28:9: error: stray ‘317’ in program

wwga_hydutils.cpp:28:9: error: stray ‘200’ in program

Which brings me to the question:

Is there a way to get GCC to compile UTF-8 files without first removing the BOM?


I'm using:

  • Windows 7
  • Visual Studio 2010

and:

  • uBuntu Oneiric 11.10
  • GCC 4.6.1 (as provided by apt-get install gcc)

Edit:

As the first commenter pointed out, my problem was not the BOM, but having non-ascii characters outside of string constants. GCC does not like non-ascii characters in symbol names, but it turns out GCC is fully compatible with UTF-8 with BOM.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

According to the GCC Wiki, this isn't supported yet. You can use -fextended-identifiers and pre-process your code to convert the identifiers to UCN. From the linked page:

perl -pe 'BEGIN { binmode STDIN, ":utf8"; } s/(.)/ord($1) < 128 ? $1 : sprintf("\U%08x", ord($1))/ge;' 

See also g++ unicode variable name and Unicode Identifiers and Source Code in C++11?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...