Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
412 views
in Technique[技术] by (71.8m points)

algorithm - Which is faster, Hash lookup or Binary search?

When given a static set of objects (static in the sense that once loaded it seldom if ever changes) into which repeated concurrent lookups are needed with optimal performance, which is better, a HashMap or an array with a binary search using some custom comparator?

Is the answer a function of object or struct type? Hash and/or Equal function performance? Hash uniqueness? List size? Hashset size/set size?

The size of the set that I'm looking at can be anywhere from 500k to 10m - incase that information is useful.

While I'm looking for a C# answer, I think the true mathematical answer lies not in the language, so I'm not including that tag. However, if there are C# specific things to be aware of, that information is desired.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

For very small collections the difference is going to be negligible. At the low end of your range (500k items) you will start to see a difference if you're doing lots of lookups. A binary search is going to be O(log n), whereas a hash lookup will be O(1), amortized. That's not the same as truly constant, but you would still have to have a pretty terrible hash function to get worse performance than a binary search.

(When I say "terrible hash", I mean something like:

hashCode()
{
    return 0;
}

Yeah, it's blazing fast itself, but causes your hash map to become a linked list.)

ialiashkevich wrote some C# code using an array and a Dictionary to compare the two methods, but it used Long values for keys. I wanted to test something that would actually execute a hash function during the lookup, so I modified that code. I changed it to use String values, and I refactored the populate and lookup sections into their own methods so it's easier to see in a profiler. I also left in the code that used Long values, just as a point of comparison. Finally, I got rid of the custom binary search function and used the one in the Array class.

Here's that code:

class Program
{
    private const long capacity = 10_000_000;

    private static void Main(string[] args)
    {
        testLongValues();
        Console.WriteLine();
        testStringValues();

        Console.ReadLine();
    }

    private static void testStringValues()
    {
        Dictionary<String, String> dict = new Dictionary<String, String>();
        String[] arr = new String[capacity];
        Stopwatch stopwatch = new Stopwatch();

        Console.WriteLine("" + capacity + " String values...");

        stopwatch.Start();

        populateStringArray(arr);

        stopwatch.Stop();
        Console.WriteLine("Populate String Array:      " + stopwatch.ElapsedMilliseconds);

        stopwatch.Reset();
        stopwatch.Start();

        populateStringDictionary(dict, arr);

        stopwatch.Stop();
        Console.WriteLine("Populate String Dictionary: " + stopwatch.ElapsedMilliseconds);

        stopwatch.Reset();
        stopwatch.Start();

        Array.Sort(arr);

        stopwatch.Stop();
        Console.WriteLine("Sort String Array:          " + stopwatch.ElapsedMilliseconds);

        stopwatch.Reset();
        stopwatch.Start();

        searchStringDictionary(dict, arr);

        stopwatch.Stop();
        Console.WriteLine("Search String Dictionary:   " + stopwatch.ElapsedMilliseconds);

        stopwatch.Reset();
        stopwatch.Start();

        searchStringArray(arr);

        stopwatch.Stop();
        Console.WriteLine("Search String Array:        " + stopwatch.ElapsedMilliseconds);

    }

    /* Populate an array with random values. */
    private static void populateStringArray(String[] arr)
    {
        for (long i = 0; i < capacity; i++)
        {
            arr[i] = generateRandomString(20) + i; // concatenate i to guarantee uniqueness
        }
    }

    /* Populate a dictionary with values from an array. */
    private static void populateStringDictionary(Dictionary<String, String> dict, String[] arr)
    {
        for (long i = 0; i < capacity; i++)
        {
            dict.Add(arr[i], arr[i]);
        }
    }

    /* Search a Dictionary for each value in an array. */
    private static void searchStringDictionary(Dictionary<String, String> dict, String[] arr)
    {
        for (long i = 0; i < capacity; i++)
        {
            String value = dict[arr[i]];
        }
    }

    /* Do a binary search for each value in an array. */
    private static void searchStringArray(String[] arr)
    {
        for (long i = 0; i < capacity; i++)
        {
            int index = Array.BinarySearch(arr, arr[i]);
        }
    }

    private static void testLongValues()
    {
        Dictionary<long, long> dict = new Dictionary<long, long>(Int16.MaxValue);
        long[] arr = new long[capacity];
        Stopwatch stopwatch = new Stopwatch();

        Console.WriteLine("" + capacity + " Long values...");

        stopwatch.Start();

        populateLongDictionary(dict);

        stopwatch.Stop();
        Console.WriteLine("Populate Long Dictionary: " + stopwatch.ElapsedMilliseconds);

        stopwatch.Reset();
        stopwatch.Start();

        populateLongArray(arr);

        stopwatch.Stop();
        Console.WriteLine("Populate Long Array:      " + stopwatch.ElapsedMilliseconds);

        stopwatch.Reset();
        stopwatch.Start();

        searchLongDictionary(dict);

        stopwatch.Stop();
        Console.WriteLine("Search Long Dictionary:   " + stopwatch.ElapsedMilliseconds);

        stopwatch.Reset();
        stopwatch.Start();

        searchLongArray(arr);

        stopwatch.Stop();
        Console.WriteLine("Search Long Array:        " + stopwatch.ElapsedMilliseconds);
    }

    /* Populate an array with long values. */
    private static void populateLongArray(long[] arr)
    {
        for (long i = 0; i < capacity; i++)
        {
            arr[i] = i;
        }
    }

    /* Populate a dictionary with long key/value pairs. */
    private static void populateLongDictionary(Dictionary<long, long> dict)
    {
        for (long i = 0; i < capacity; i++)
        {
            dict.Add(i, i);
        }
    }

    /* Search a Dictionary for each value in a range. */
    private static void searchLongDictionary(Dictionary<long, long> dict)
    {
        for (long i = 0; i < capacity; i++)
        {
            long value = dict[i];
        }
    }

    /* Do a binary search for each value in an array. */
    private static void searchLongArray(long[] arr)
    {
        for (long i = 0; i < capacity; i++)
        {
            int index = Array.BinarySearch(arr, arr[i]);
        }
    }

    /**
     * Generate a random string of a given length.
     * Implementation from https://stackoverflow.com/a/1344258/1288
     */
    private static String generateRandomString(int length)
    {
        var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
        var stringChars = new char[length];
        var random = new Random();

        for (int i = 0; i < stringChars.Length; i++)
        {
            stringChars[i] = chars[random.Next(chars.Length)];
        }

        return new String(stringChars);
    }
}

Here are the results with several different sizes of collections. (Times are in milliseconds.)

500000 Long values...
Populate Long Dictionary: 26
Populate Long Array: 2
Search Long Dictionary: 9
Search Long Array: 80

500000 String values...
Populate String Array: 1237
Populate String Dictionary: 46
Sort String Array: 1755
Search String Dictionary: 27
Search String Array: 1569

1000000 Long values...
Populate Long Dictionary: 58
Populate Long Array: 5
Search Long Dictionary: 23
Search Long Array: 136

1000000 String values...
Populate String Array: 2070
Populate String Dictionary: 121
Sort String Array: 3579
Search String Dictionary: 58
Search String Array: 3267

3000000 Long values...
Populate Long Dictionary: 207
Populate Long Array: 14
Search Long Dictionary: 75
Search Long Array: 435

3000000 String values...
Populate String Array: 5553
Populate String Dictionary: 449
Sort String Array: 11695
Search String Dictionary: 194
Search String Array: 10594

10000000 Long values...
Populate Long Dictionary: 521
Populate Long Array: 47
Search Long Dictionary: 202
Search Long Array: 1181

10000000 String values...
Populate String Array: 18119
Populate String Dictionary: 1088
Sort String Array: 28174
Search String Dictionary: 747
Search String Array: 26503

And for comparison, here's the profiler output for the last run of the program (10 million records and lookups). I highlighted the relevant functions. They pretty closely agree with the Stopwatch timing metrics above.

Profiler output for 10 million records and lookups

You can see that the Dictionary lookups are much faster than binary search, and (as expected) the difference is more pronounced the larger the collection. So, if you have a reasonable hashing function (fairly quick with few collisions), a hash lookup should beat binary search for collections in this range.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...