Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
137 views
in Technique[技术] by (71.8m points)

shell - Sort ignores an apostrophe - sometimes (except when it is the only column used); WHY?

This happens to me both on Linux and on cygwin, so I suspect it is not a bug. Still, I don't understand it. Can anyone explain?

Consider the following file (tab-delimited, and that's a regular apostrophe) (I create it with cat to ensure that it wasn't non-printing characters that were the source of the problem)

$cat > temp
cat     1389
cat'    1747
ca't    3175
cat     46848484
ca't    720

$sort temp
<gives the exact same output as cat temp>

$sort -k1,1 temp
cat     1389
cat     46848484
cat'    1747
ca't    3456
ca't    720

Why do I have to ignore the second column in order to sort correctly?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I pulled up the manual for sort and noticed the following:

* WARNING * The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values.

As it turns out, locales actually specify how lexicographic ordering works for a given locale. This makes a lot of sense, but for some reason it trips over multi field files...

(see also:)
Unusual behaviour of linux's sort command
Why does the sort command sort differently if there are trailing fields?

There are a couple of things you can do:

You can sort naively by byte value using

LC_ALL="C" sort temp

This will give a more logical result, but it might not be the one you actually want.

You could try to get sort to do a more basic lexicographical ordering by setting the locale to C and telling it you want dictionary ordering:

LC_ALL="C" sort -d temp

To have sort output your locale information and hilight the sort key, you can use

sort --debug temp




Personally I'm really curious to know what rule is being specified that makes sort behave unintuitively across multiple fields.

They're supposed to specify correct lexicographic order in the given language and dialect. Do the locales' functions simply not handle the multiple field case at all, or are they taking some kind of different interpretation on the "meaning" of the line?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...