Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
367 views
in Technique[技术] by (71.8m points)

caching - Can I use a counter in a database Many-to-Many field to reduce lookups?

I am trying to figure out the fastest way to access data stored in a junction object. The example below is analagous to my problem, but with a different context, because the actual dataset I am dealing with is somewhat unintuitive in its relationships.

We have 3 classes: User, Product, and Rating. User has a many-to-many relationship to Product with Rating as the junction/'through' class.

The Rating object stores the answers to several questions which are integer ratings on a scale of 1-5 (Example questions: How is the quality of the Product, how is the value of the Product, how user-friendly is the Product). For simplification assume every User rates every Product they buy.

Now here is the calculation I want to perform: For a User, calculate the average rating of all the Products they have bought (that is, the average rating from all other Users, one of which will be from this User themself). Then we can tell the user "On average, you buy products rated 3/5 for value by all customers who bought that product".

The simple and slow way is just to iterate over all of a user's review objects. If we assume that each user has bought a small (<100) number of products, and each product has n ratings, this is O(100n) = O(n).

However, I could also do the following: On the Product class, keep a counter of the number of Rating s that selected each number (e.g. how many User s rated this product 3/5 for value). If you increment that counter every time a Product is rated, then computing the average for a given Product just requires checking the 5 counters for each Rating criteria.

Is this a valid technique? Is it commonly employed/is there a name for it? It seems intuitive to me, but I don't know enough about databases to tell whether there's some fundamental flaw or not.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is normal. It is ultimately caching: encoding of state redundantly to benefit some patterns of usage at the expense of others. Of course it's also a complexification.

Just because the RDBMS data structure is relations doesn't mean you can't rearrange how you are encoding state from some straightforward form. Eg denormalization.

(Sometimes redundant designs (including ones like yours) are called "denormalized" when they are not actually the result of denormalization and the redundancy is not the kind that denormalization causes or normalization removes. Cross Table Dependency/Constraint in SQL Database Indeed one could reasonably describe your case as involving normalization without preserving FDs (functional dependencies). Start with a table with a user's id & other columns, their ratings (a relation) & its counter. Then ratings functionally determines counter since counter = select count(*) from ratings. Decompose to user etc + counter, ie table User, and user + ratings, which ungroups to table Rating. )


Do you have a suggestion as to the best term to use when googling this

A frequent comment by me: Google many clear, concise & specific phrasings of your question/problem/goal/desiderata with various subsets of terms & tags as you may discover them with & without your specific names (of variables/databases/tables/columns/constraints/etc). Eg 'when can i store a (sum OR total) redundantly in a database'. Human phrasing, not just keywords, seems to help. Your best bet may be along the lines of optimizing SQL database designs for performance. There are entire books ('amazon isbn'), some online ('pdf'). (But maybe mostly re queries). Investigate techniques relevant to warehousing, since an OLTP database acts as an input buffer to an OLAP database, and using SQL with big data. (Eg snapshot scheduling.)

PS My calling this "caching" (so does tag ) is (typical of me) rather abstract, to the point where there are serious-jokes that everything in CS is caching. (Googling... "There are only two hard problems in Computer Science: cache invalidation and naming things."--Phil Karlton.) (Welcome to both.)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...