Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
558 views
in Technique[技术] by (71.8m points)

windows - What most correct way to set the encoding in C++?

How it is best of all to set the encoding in C++?

I got used to working with Unicode (and wchar_t, wstring, wcin, wcout and L" ... "). I also save source in UTF-8.

At the moment I use MinGW (Windows 7) and run my program in Windows console (cmd.exe), but sometimes I can use gcc on GNULinux and run promgram in Linux console with UTF-8 encoding.

At all times I want to compile my source on Windows and on Linux and I want that all Unicode symbols were correctly inputed and outputed.

When I faced the next problem with encodings, I googled. Also I found the most different councils: setlocale(LC_ALL, "") and setlocale(LC_ALL, "xx_XX.UTF-8"), std::setlocale(LC_ALL, "") and std::setlocale(LC_ALL, "xx_XX.UTF-8") from <clocale>,

SetConsoleCP() and SetConsoleOutputCP() from <windows.h> and many, many others.

At last I was bothered by this shamanism and I want to ask you: how it is correct to establish the encoding?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I need that any Unicode symbol/string was correctly inputed and outputed.

This is certainly possible, although making the Windows command prompt console properly Unicode-aware takes some special magic. I seriously doubt that any of the implementations of the standard library functions are going to do this, unfortunately.

You'll find a number of questions about it on Stack Overflow, but this one is a good one. Basically, the console uses what is called (somewhat erroneously) the "OEM" code page by default. You want to change that to the UTF-8 code page, the value of which is defined by CP_UTF8. To do this, you'll need to call both the SetConsoleCP function (to set the input code page) and the SetConsoleOutputCP function (to set the output code page). The code would look something like this:

if (!SetConsoleCP(CP_UTF8))
{
    // An error occurred; handle it. Call GetLastError() for more information.
    // ...
}
if (!SetConsoleOutputCP(CP_UTF8))
{
    // An error occurred; handle it. Call GetLastError() for more information.
    // ...
}

For extra robustness, you might also want to make sure that the UTF-8 code page is supported first, before trying to set and use it. You would do that by calling the IsValidCodePage function. For example:

if (IsValidCodePage(CP_UTF8))
{
    // We're all good, so set the console code page...
}

You will also have to change the font from the default ("Raster Fonts") to something that contains the requisite Unicode character glyphs—e.g., Lucida Console or Consolas (reference). That's trivial to do using the SetCurrentConsoleFontEx function.

Unfortunately, this function does not exist in versions of Windows prior to Vista. If you absolutely need to support these older operating systems, the only thing I know to do is to call the undocumented SetConsoleFont function. Normally, I would advise strongly against using undocumented functions, but I think it's less of a problem here since you would only be using it in old versions of the operating system. You know those aren't going to change. On the newer versions where it is available, you call the supported function. Sample untested code:

bool IsWinVistaOrLater()
{
    OSVERSIONINFOEX osvi;
    osvi.dwOSVersionInfoSize = sizeof(osvi);
    GetVersionEx(reinterpret_cast<LPOSVERSIONINFO>(&osvi));

    if (osvi.dwPlatformId == VER_PLATFORM_WIN32_NT)
    {
        return osvi.dwMajorVersion >= 6;
    }
    return false;
}

void SetConsoleToUnicodeFont()
{
    HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE);
    if (IsWinVistaOrLater())
    {
        // Call the documented function.
        typedef BOOL (WINAPI * pfSetCurrentConsoleFontEx)(HANDLE, BOOL, PCONSOLE_FONT_INFOEX);
        HMODULE hMod = GetModuleHandle(TEXT("kernel32.dll"));
        pfSetCurrentConsoleFontEx pfSCCFX = (pfSetCurrentConsoleFontEx)GetProcAddress(hMod, "SetCurrentConsoleFontEx");

        CONSOLE_FONT_INFOEX cfix;
        cfix.cbSize       = sizeof(cfix);
        cfix.nFont        = 12;
        cfix.dwFontSize.X = 8;
        cfix.dwFontSize.Y = 14;
        cfix.FontFamily   = FF_DONTCARE;
        cfix.FontWeight   = 400;  // normal weight
        lstrcpy(cfix.FaceName, TEXT("Lucida Console"));

        pfSCCFX(hConsole,
                FALSE, /* set font for current window size */
                &cfix);
    }
    else
    {
        // There is no supported function on these older versions,
        // so we have to call the undocumented one.
        typedef BOOL (WINAPI * pfSetConsoleFont)(HANDLE, DWORD);
        HMODULE hMod = GetModuleHandle(TEXT("kernel32.dll"));
        pfSetConsoleFont pfSCF = (pfSetConsoleFont)GetProcAddress(hMod, "SetConsoleFont");
        pfSCF(hConsole, 12);
    }
}

Notice that I've left adding the required error checking as an exercise for the reader. The focus here is on technique and readability; cluttering it up with error handling would just confuse matters.

I have no idea how to do any of this on Linux. I suspect it's a lot less work, since people tell me the OS uses UTF-8 internally. Either way, you're on your own for that; making Windows purr is enough work for one answer!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...