algorithm - Given a file, find the ten most frequently occurring words as efficiently as possible

Question

Welcome To Ask or Share your Answers For Others

algorithm - Given a file, find the ten most frequently occurring words as efficiently as possible

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

algorithm - Given a file, find the ten most frequently occurring words as efficiently as possible

This is apparently an interview question (found it in a collection of interview questions), but even if it's not it's pretty cool.

We are told to do this efficiently on all complexity measures. I thought of creating a HashMap that maps the words to their frequency. That would be O(n) in time and space complexity, but since there may be lots of words we cannot assume that we can store everything in memory.

I must add that nothing in the question says that the words cannot be stored in memory, but what if that were the case? If that's not the case, then the question does not seem as challenging.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:26:58+0000

Optimizing for my own time:

sort file | uniq -c | sort -nr | head -10

Possibly followed by awk '{print $2}' to eliminate the counts.

Categories

algorithm - Given a file, find the ten most frequently occurring words as efficiently as possible

algorithm - Given a file, find the ten most frequently occurring words as efficiently as possible

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags