Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
769 views
in Technique[技术] by (71.8m points)

big o - What is the time and space complexity of method retainAll when used on HashSets in Java?

For example in the code below:

public int commonTwo(String[] a, String[] b)
{
    Set common = new HashSet<String>(Arrays.asList(a));
    common.retainAll(new HashSet<String>(Arrays.asList(b)));
    return common.size();
} 
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Lets take a peruse at the code. The method retainAll is inherited from AbstractCollection and (at least in OpenJDK) looks like this:

public boolean retainAll(Collection<?> c) {
    boolean modified = false;
    Iterator<E> it = iterator();
    while (it.hasNext()) {
        if (!c.contains(it.next())) {
            it.remove();
            modified = true;
        }
    }
    return modified;
}

There is one big this to note here, we loop over this.iterator() and call c.contains. So the time complexity is n calls to c.contains where n = this.size() and at most n calls to it.remove().

This important thing is that the contains method is called on the other Collection and so the complexity is dependant upon the complexity of the other Collection contains.

So, whilst:

Set<String> common = new HashSet<>(Arrays.asList(a));
common.retainAll(new HashSet<>(Arrays.asList(b)));

Would be O(a.length), as HashSet.contains and HashSet.remove are both O(1) (amortized).

If you were to call

common.retainAll(Arrays.asList(b));

Then due to the O(n) contains on Arrays.ArrayList this would become O(a.length * b.length) - i.e. by spending O(n) copying the array to a HashSet you actually make the call to retainAll much faster.

As far as space complexity goes, no additional space (beyond the Iterator) is required by retainAll, but your invocation is actually quite expensive space-wise as you allocate two new HashSet implementations which are actually fully fledged HashMap.

Two further things can be noted:

  1. There is no reason to allocate a HashSet from the elements in a - a cheaper collection that also has O(1) remove from the middle such as an LinkedList can be used. (cheaper in memory and also build time - a hash table is not built)
  2. Your modifications are being lost as you create new collection instances and only return b.size().

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...