Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
497 views
in Technique[技术] by (71.8m points)

printing Unicode characters C++

I'm trying to write a simple command line app to teach myself Japanese, but can't seem to get Unicode characters to print. What am I missing?

#include <iostream>
using namespace std;

int main()
{
        wcout << L"こんにちは世界
";
        wcout << L"Hello World
"
        system("pause");
}

In this example only "Press any key to continue" is displayed. Tested on Visual C++ 2013.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is not easy on Windows. Even when you manage to get the text to the Windows console you still need to configure cmd.exe to be able to display Japanese characters.


#include <iostream>

int main() {
  std::cout << "こんにちは世界
";
}

This works fine on any system where:

  • The compiler's source and execution encodings include the characters.
  • The output device (e.g., the console) expects text in the same encoding as the compiler's execution encoding.
  • A font with the appropriate characters is available (usually not a problem).

Most platforms these days use UTF-8 by default for all these encodings and so can support the entire Unicode range with code similar to the above. Unfortunately Windows is not one of these platforms.

wcout << L"こんにちは世界
";

In this line the string literal data is (at compile time) converted from the source encoding to the execution wide encoding and then (at run time) wcout uses the locale it is imbued with to convert the wchar_t data to char data for output. Where things go wrong is that the default locale is only required to support characters from the basic source character set, which doesn't even include all ASCII characters, let alone non-ASCII characters.

So the conversion results in an error, putting wcout into a bad state. The error has to be cleared before wcout will function again, which is why the second print statement does not output anything.


You can work around this for a limited range of characters by imbuing wcout with a locale that will successfully convert the characters. Unfortunately the encoding that is needed to support the entire Unicode range this way is UTF-8; Although Microsoft's implementation of streams supports other multibyte encodings it very specifically does not support UTF-8.

For example:

wcout.imbue(std::locale(std::locale::classic(), new std::codecvt_utf8_utf16<wchar_t>()));

SetConsoleOutputCP(CP_UTF8);

wcout << L"こんにちは世界
";

Here wcout will correctly convert the string to UTF-8, and if the output were written to a file instead of the console then the file would contain the correct UTF-8 data. However the Windows console, even though configured here to accept UTF-8 data, simply will not accept UTF-8 data written in this way.


There are a few options:

  • Avoid the standard library entirely:

    DWORD n;
    WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), L"こんにちは世界
    ", 8, &n, nullptr);
    
  • Use non-standard magical incantation that will break standard code:

    #include <fcntl.h>
    #include <io.h>
    
    _setmode(_fileno(stdout), _O_U8TEXT);
    std::wcout << L"こんにちは世界
    ";
    

    After setting this mode std::cout << "Hello, World"; will crash.

  • Use a low level IO API along with manual conversion:

    #include <codecvt>
    #include <locale>
    
    SetConsoleOutputCP(CP_UTF8);
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> convert;
    std::puts(convert.to_bytes(L"こんにちは世界
    "));
    

Using any of these methods, cmd.exe will display the correct text to the best of its ability, by which I mean it will display unreadable boxes. Seven little boxes, for the given string.

                            Little Boxes

You can copy the text out of cmd.exe and into notepad.exe or whatever to see the correct glyphs.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...