Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
800 views
in Technique[技术] by (71.8m points)

haskell - Updating record when referenced by multiple data structures

Suppose I have a record, e.g. Person, and I want to be able to look this person up through multiple data structures. Maybe there's an index by name, another index by the person's zip code, and another index by the person's current latitude and longitude. And maybe many more data structures. All of which exist because I need to efficiently look up a person or persons with different criteria.

If I just need to read a person's attributes, this is no problem. But now suppose I need to look up a person using one of these data structures and then update the person's data.

In an OOP language, each data structure would point to the same person in memory. So when you update one, you're implicitly updating the referents of the other data structures as well. This is pretty much the definition of side effects and impurity. I know it's totally counter to the Haskell paradigm, and I'm not expecting Haskell to work this way.

So, what is the Haskell-ish way to do it? To be clear, the problem is this: I look up a person by one data structure, and I pass that person (and maybe some other arbitrary data) into a function of type ArbitraryData -> Person -> Person. How do I propagate this change across all the various lookup structures?

As a relative newcomer to Haskell, my first instinct is to reconstruct every lookup structure with the newly updated person, every time I update a person. But that seems like a lot of ceremony, with plenty of opportunity for me to screw up in a way GHC can't detect, and not at all elegant. Haskell is known for its elegance, and I can't imagine it lacks an elegant solution to such a common and basic problem. So I think I'm missing something.

For reference, this question expands on some of the issues I was discussing in the following questions:

Multiple lookup structures for same data: Memory duplication?

Identity of simulation objects in Haskell

Edit

One solution that just crossed my mind: Don't maintain a copy of each lookup structure in your main state. Just keep one single list of all persons in existence, and that's the only thing we need to update when we update a person. Every time you need to lookup by, say, zip code, pass the list of all persons into a function that generates the efficient by-zip-code data structure. Then perform the lookup on the result.

I don't know if this would be efficient. If it results in the CPU actually recomputing the lookup structure on each use, it's unacceptable. But I know Haskell sometimes can avoid reevaluating identical expressions. Unfortunately, I still haven't figured out when this is the case. So I don't know if this approach is viable.

So in other words: Can I write my functions as if they're computing the lookup each time, when in fact GHC will optimize it away for cases where the underlying data hasn't changed? Because that would be a very elegant solution to the problem I've identified above.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Since I answered this, a few people in #haskell on Freenode recommended alternative, premade solutions:


You can make a data structure that contains your lookup tables, as well as a Vector of actual Persons. The lookup tables will give you an Int or a list of Ints (rather than a Person or a list of Persons) which is the index into the Vector Person. For example:

data PersonStuff = PersonStuff {
                                 persons              :: Vector Person,
                                 firstNameLookupTable :: LookupTable Name,
                                 ...
                               }

data LookupTable a = LookupTable {
                                   table  :: Map a Int,
                                   update :: Person -> Person -> Map a Int -> Map a Int
                                 }

The update function is given the old Person, the updated Person, and will update the table only if the relevant details have changed. When a Person is modified through the convenient PersonStuff functions you'll write, those functions will update all the lookup tables for you, returning a new PersonStuff with all associated data. This makes for a pure data structure with quick lookups.

You can make functions like updatePeopleWithFirstName :: Name -> (Person -> Person) -> PersonStuff -> PersonStuff that will get all the people with a first name, apply a Person -> Person to each of them, modify their entries in the Vector, and use the update functions to update all of the lookupTables.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...