Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
614 views
in Technique[技术] by (71.8m points)

utf 8 - C#: Cycle through encodings

I am reading files in various formats and languages and I am currently using a small encoding library to take attempt to detect the proper encoding (http://www.codeproject.com/KB/recipes/DetectEncoding.aspx).

It's pretty good, but it still misses occasionally. (Multilingual files)

Most of my potential users have very little understanding of encoding (the best I can hope for is "it has something to do with characters") and are very unlikely to be able to choose the right encoding in a list, so I would like to let them cycle through different encodings until the right one is found just by clicking on a button.

Display problems? Click here to try a different encoding! (Well that's the concept anyway)

What would be the best way to implement something like that?


Edit: Looks like I didn't express myself clearly enough. By "cycling through the encoding", I don't mean "how to loop through encodings?"

What I meant was "how to let the user try different encodings in sequence without reloading the file?"

The idea is more like this: Let's say the file is loaded with the wrong encoding. Some strange characters are displayed. The user would click a button "Next encoding" or "previous encoding", and the string would be converted in a different encoding. The user just need to keep clicking until the right encoding is found. (whatever encoding looks good for the user will do fine). As long as the user can click "next", he has a reasonable chance of solving his problem.

What I have found so far involves converting the string to bytes using the current encoding, then converting the bytes to the next encoding, converting those bytes into chars, then converting the char into a string... Doable, but I wonder if there isn't an easier way to do that.

For instance, if there was a method that would read a string and return it using a different encoding, something like "render(string, encoding)".


Thanks a lot for the answers!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Read the file as bytes and use then the Encoding.GetString Method.

        byte[] data = System.IO.File.ReadAllBytes(path);

        Console.WriteLine(Encoding.UTF8.GetString(data));
        Console.WriteLine(Encoding.UTF7.GetString(data));
        Console.WriteLine(Encoding.ASCII.GetString(data));

So you have to load the file only one time. You can use every encoding based on the original bytes of the file. The user can select the correct one und you can use the result of Encoding.GetEncoding(...).GetString(data) for further processing.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...