Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
691 views
in Technique[技术] by (71.8m points)

algorithm - Can an array be grouped more efficiently than sorted?

While working on example code for an algorithm question, I came across the situation where I was sorting an input array, even though I only needed to have identical elements grouped together, but not in any particular order, e.g.:

{1,2,4,1,4,3,2} → {1,1,2,2,4,4,3} or {1,1,2,2,3,4,4} or {3,1,1,2,2,4,4} or ...

Which made me wonder: is it possible to group identical elements in an array together more efficiently than by sorting the array?

On the one hand, the fact that elements don't need to be moved to a particular location means more freedom to find an order which requires fewer swaps. On the other hand, keeping track of where every element in a group is located, and what the optimal final location is, may need more calculations than simply sorting the array.

A logical candidate would be a type of counting sort, but what if the array length and/or value range were impractically large?

For the sake of argument, let's say the array is large (e.g. a million elements), contains 32-bit integers, and the number of identical elements per value could be anything from 1 to a million.


UPDATE: For languages with support for dictionaries, Salvador Dali's answer is obviously the way to go. I'd still be interested in hearing about old-fashioned compare-and-swap methods, or methods which use less space, if there are any.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Since you asked about comparison-based methods, I'm going to make the usual assumptions that (1) elements can be compared but not hashed (2) the only resource of interest is three-way operations.

In an absolute sense, it's easier to group than to sort. Here's a grouping algorithm for three elements that uses one comparison (sorting requires three). Given an input x, y, z, if x = y, then return x, y, z. Otherwise, return x, z, y.

Asymptotically, however, both grouping and sorting require Omega(n log n) comparisons. The lower bound technique is information-theoretic: we prove that, for every grouping algorithm expressed as a decision tree, there are 3^Omega(n log n) leaves, which implies that the height of the tree (and hence the worst-case running time of the algorithm) is Omega(n log n).

Fix an arbitrary leaf of the decision tree where no input elements are found to be equal. The input positions are partially ordered by the inequalities found.

Suppose to the contrary that i, j, k are pairwise incomparable input positions. Letting x = input[i], y = input[j], z = input[k], the possibilities x = y < z and y = z < x and z = x < y are all consistent with what the algorithm has observed. This cannot be, since it is impossible for the one order chosen by the leaf to put x next to y next to z next to x. We conclude that the partial order has no antichain of cardinality three.

By Dilworth's theorem, the partial order has two chains that cover the whole input. By considering all possible ways to merge these chains into a total order, there are at most n choose m ≤ 2^n permutations that map to each leaf. The number of leaves is thus at least n!/2^n = 3^Omega(n log n).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...