Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
507 views
in Technique[技术] by (71.8m points)

algorithm - Find median value from a growing set

I came across an interesting algorithm question in an interview. I gave my answer but not sure whether there is any better idea. So I welcome everyone to write something about his/her ideas.

You have an empty set. Now elements are put into the set one by one. We assume all the elements are integers and they are distinct (according to the definition of set, we don't consider two elements with the same value).

Every time a new element is added to the set, the set's median value is asked. The median value is defined the same as in math: the middle element in a sorted list. Here, specially, when the size of set is even, assuming size of set = 2*x, the median element is the x-th element of the set.

An example: Start with an empty set, when 12 is added, the median is 12, when 7 is added, the median is 7, when 8 is added, the median is 8, when 11 is added, the median is 8, when 5 is added, the median is 8, when 16 is added, the median is 8, ...

Notice that, first, elements are added to set one by one and second, we don't know the elements going to be added.

My answer.

Since it is a question about finding median, sorting is needed. The easiest solution is to use a normal array and keep the array sorted. When a new element comes, use binary search to find the position for the element (log_n) and add the element to the array. Since it is a normal array so shifting the rest of the array is needed, whose time complexity is n. When the element is inserted, we can immediately get the median, using instance time.

The WORST time complexity is: log_n + n + 1.

Another solution is to use link list. The reason for using link list is to remove the need of shifting the array. But finding the location of the new element requires a linear search. Adding the element takes instant time and then we need to find the median by going through half of the array, which always takes n/2 time.

The WORST time complexity is: n + 1 + n/2.

The third solution is to use a binary search tree. Using a tree, we avoid shifting array. But using the binary search tree to find the median is not very attractive. So I change the binary search tree in a way that it is always the case that the left subtree and the right subtree are balanced. This means that at any time, either the left subtree and the right subtree have the same number of nodes or the right subtree has one node more than in the left subtree. In other words, it is ensured that at any time, the root element is the median. Of course this requires changes in the way the tree is built. The technical detail is similar to rotating a red-black tree.

If the tree is maintained properly, it is ensured that the WORST time complexity is O(n).

So the three algorithms are all linear to the size of the set. If no sub-linear algorithm exists, the three algorithms can be thought as the optimal solutions. Since they don't differ from each other much, the best is the easiest to implement, which is the second one, using link list.

So what I really wonder is, will there be a sub-linear algorithm for this problem and if so what will it be like. Any ideas guys?

Steve.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your complexity analysis is confusing. Let's say that n items total are added; we want to output the stream of n medians (where the ith in the stream is the median of the first i items) efficiently.

I believe this can be done in O(n*lg n) time using two priority queues (e.g. binary or fibonacci heap); one queue for the items below the current median (so the largest element is at the top), and the other for items above it (in this heap, the smallest is at the bottom). Note that in fibonacci (and other) heaps, insertion is O(1) amortized; it's only popping an element that's O(lg n).

This would be called an "online median selection" algorithm, although Wikipedia only talks about online min/max selection. Here's an approximate algorithm, and a lower bound on deterministic and approximate online median selection (a lower bound means no faster algorithm is possible!)

If there are a small number of possible values compared to n, you can probably break the comparison-based lower bound just like you can for sorting.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...