Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
781 views
in Technique[技术] by (71.8m points)

sorting - How do we sort faster using unix sort?

We are sorting a 5GB file with 37 fields and sort it with 5 keys. The big file is composed of 1000 files of 5MB each.

After 190 minutes it still hasn't finished.

I am wondering if there are other methods to speed up the sorting. We choose unix sort because we don't want it to use up all the memory, so any memory based approach is not okay.

What is the advantage of sorting each files independently, and then use -m option to merge sort it?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Buffer it in memory using -S. For example, to use (up to) 50% of your memory as a sorting buffer do:

sort -S 50% file

Note that modern Unix sort can sort in parallel. My experience is that it automatically uses as many cores as possible. You can set it directly using --parallel. To sort using 4 threads:

sort --parallel=4 file

So all in all, you should put everything into one file and execute something like:

sort -S 50% --parallel=4 file

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...