Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
608 views
in Technique[技术] by (71.8m points)

dictionary - How to handle hash collisions for Dictionaries in Swift

TLDR

My custom structure implements the Hashable Protocol. However, when hash collisions occur while inserting keys in a Dictionary, they are not automatically handled. How do I overcome this problem?

Background

I had previously asked this question How to implement the Hashable Protocol in Swift for an Int array (a custom string struct). Later I added my own answer, which seemed to be working.

However, recently I have detected a subtle problem with hashValue collisions when using a Dictionary.

Most basic example

I have simplified the code down as far as I can to the following example.

Custom structure

struct MyStructure: Hashable {

    var id: Int

    init(id: Int) {
        self.id = id
    }

    var hashValue: Int {
        get {
            // contrived to produce a hashValue collision for id=1 and id=2
            if id == 1 {
                return 2 
            }
            return id
        }
    }
}

func ==(lhs: MyStructure, rhs: MyStructure) -> Bool {
    return lhs.hashValue == rhs.hashValue
}

Note the global function to overload the equality operator (==) in order to conform to the Equatable Protocol, which is required by the Hashable Protocol.

Subtle Dictionary key problem

If I create a Dictionary with MyStructure as the key

var dictionary = [MyStructure : String]()

let ok = MyStructure(id: 0)            // hashValue = 0
let collision1 = MyStructure(id: 1)    // hashValue = 2
let collision2 = MyStructure(id: 2)    // hashValue = 2

dictionary[ok] = "some text"
dictionary[collision1] = "other text"
dictionary[collision2] = "more text"

print(dictionary) // [MyStructure(id: 2): more text, MyStructure(id: 0): some text]
print(dictionary.count) // 2

the equal hash values cause the collision1 key to be overwritten by the collision2 key. There is no warning. If such a collision only happened once in a dictionary with 100 keys, then it could easily be missed. (It took me quite a while to notice this problem.)

Obvious problem with Dictionary literal

If I repeat this with a dictionary literal, though, the problem becomes much more obvious because a fatal error is thrown.

let ok = MyStructure(id: 0)            // hashValue = 0
let collision1 = MyStructure(id: 1)    // hashValue = 2
let collision2 = MyStructure(id: 2)    // hashValue = 2

let dictionaryLiteral = [
    ok : "some text",
    collision1 : "other text",
    collision2 : "more text"
]
// fatal error: Dictionary literal contains duplicate keys

Question

I was under the impression that it was not necessary for hashValue to always return a unique value. For example, Mattt Thompson says,

One of the most common misconceptions about implementing a custom hash function comes from ... thinking that hash values must be distinct.

And the respected SO user @Gaffa says that one way to handle hash collisions is to

Consider hash codes to be non-unique, and use an equality comparer for the actual data to determine uniqueness.

In my opinion, the question Do swift hashable protocol hash functions need to return unique values? has not been adequately answered at the time of this writing.

After reading the Swift Dictionary question How are hash collisions handled?, I assumed that Swift automatically handled hash collisions with Dictionary. But apparently that is not true if I am using a custom class or structure.

This comment makes me think the answer is in how the Equatable protocol is implemented, but I am not sure how I should change it.

func ==(lhs: MyStructure, rhs: MyStructure) -> Bool {
    return lhs.hashValue == rhs.hashValue
}

Is this function called for every dictionary key lookup or only when there is a hash collision? (Update: see this question)

What should I do to determine uniqueness when (and only when) a hash collision occurs?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
func ==(lhs: MyStructure, rhs: MyStructure) -> Bool {
    return lhs.hashValue == rhs.hashValue
}

Note the global function to overload the equality operator (==) in order to conform to the Equatable Protocol, which is required by the Hashable Protocol.

Your problem is an incorrect equality implementation.

A hash table (such as a Swift Dictionary or Set) requires separate equality and hash implementations.

hash gets you close to the object you're looking for; equality gets you the exact object you're looking for.

Your code uses the same implementation for hash and equality, and this will guarantee a collision.

To fix the problem, implement equality to match exact object values (however your model defines equality). E.g.:

func ==(lhs: MyStructure, rhs: MyStructure) -> Bool {
    return lhs.id == rhs.id
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...