utf 8 - How do I correct the character encoding of a file?

Question

Welcome To Ask or Share your Answers For Others

utf 8 - How do I correct the character encoding of a file?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

utf 8 - How do I correct the character encoding of a file?

I have an ANSI encoded text file that should not have been encoded as ANSI as there were accented characters that ANSI does not support. I would rather work with UTF-8.

Can the data be decoded correctly or is it lost in transcoding?

What tools could I use?

Here is a sample of what I have:

?§ ??

I can tell from context (caf?? should be café) that these should be these two characters:

? é

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:07:02+0000

Follow these steps with Notepad++

1- Copy the original text

2- In Notepad++, open new file, change Encoding -> pick an encoding you think the original text follows. Try as well the encoding "ANSI" as sometimes Unicode files are read as ANSI by certain programs

3- Paste

4- Then to convert to Unicode by going again over the same menu: Encoding -> "Encode in UTF-8" (Not "Convert to UTF-8") and hopefully it will become readable

The above steps apply for most languages. You just need to guess the original encoding before pasting in notepad++, then convert through the same menu to an alternate Unicode-based encoding to see if things become readable.

Most languages exist in 2 forms of encoding: 1- The old legacy ANSI (ASCII) form, only 8 bits, was used initially by most computers. 8 bits only allowed 256 possibilities, 128 of them where the regular latin and control characters, the final 128 bits were read differently depending on the PC language settings 2- The new Unicode standard (up to 32 bit) give a unique code for each character in all currently known languages and plenty more to come. if a file is unicode it should be understood on any PC with the language's font installed. Note that even UTF-8 goes up to 32 bit and is just as broad as UTF-16 and UTF-32 only it tries to stay 8 bits with latin characters just to save up disk space

Categories

utf 8 - How do I correct the character encoding of a file?

utf 8 - How do I correct the character encoding of a file?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags