Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
743 views
in Technique[技术] by (71.8m points)

character encoding - Text file with 0D 0D 0A line breaks

A customer is sending me a .csv file where the line breaks are made up of the sequence 0xD 0xD 0xA. As far as I know line breaks are either 0xA from Mac or Unix or 0xD 0xA from Windows.

Is the 0xD 0xD 0xA any known encoding? Is there any known sequence of savings that corrupts a file's line endings that causes this (I think the customer uses a Mac)?

The file doesn't start with any encoding markers, it starts with the text contents directly. The text is displayed correctly if opened with code page 1252.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The CRCRLF is known as result of a Windows XP notepad word wrap bug.

For future reference, here's an extract of relevance from the linked blog:

When you press the Enter key on Windows computers, two characters are actually stored: a carriage return (CR) and a line feed (LF). The operating system always interprets the character sequence CR LF the same way as the Enter key: it moves to the next line. However when there are extra CR or LF characters on their own, this can sometimes cause problems.

There is a bug in the Windows XP version of Notepad that can cause extra CR characters to be stored in the display window. The bug happens in the following situation:

If you have the word wrap option turned on and the display window contains long lines that wrap around, then saving the file causes Notepad to insert the characters CR CR LF at each wrap point in the display window, but not in the saved file.

The CR CR LF characters can cause oddities if you copy and paste them into other programs. They also prevent Notepad from properly re-wrapping the lines if you resize the Notepad window.

You can remove the CR CR LF characters by turning off the word wrap feature, then turning it back on if desired. However, the cursor is repositioned at the beginning of the display window when you do this.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...