Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
118 views
in Technique[技术] by (71.8m points)

sql - Given a set of IDs, Return the Subset of Orders with only those IDs

Given a set of product_ids, what are the order_ids that only have those product_ids?

For the example below, I'm only wanting order_ids that have some combination of (a,b,c). I have 2 tables like below:

"transactions" table:

order_id | product_id |
---------+-------------
    1    |    a       |
    1    |    b       |
    2    |    a       |
    2    |    X       |
    3    |    a       |
    3    |    b       |
    3    |    c       |
    ...  |    ...     |
    999  |    Y       |

"products" table:

product_id |
------------
     a     |
     b     |
     c     |
     d     |
     X     |
     Y     |
     ...   |
     ZZZ   |

Desired Output has 2 order_ids with expected table output:

order_id |
----------
    1    |
    3    |

Notice that order_id == 2 is removed although it has product_id == a but because it has product_id == X then it should be removed.

Therefore it's not a simple:

SELECT DISTINCT(order_id)
FROM transactions
WHERE product_id IN (a, b, c)
question from:https://stackoverflow.com/questions/65895818/given-a-set-of-ids-return-the-subset-of-orders-with-only-those-ids

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Typically, there is an orders table to go with that, with exactly one row per order.

If we can further assume that there is always at least one transaction for every order, this would do the job:

SELECT o.id
FROM   orders o
WHERE  NOT EXISTS (
   SELECT FROM transactions  -- SELECT list can be empty for EXISTS test
   WHERE  order_id = o.id
   AND    product_id <> ALL ('{a,b,c}')
   );

That's good for very common product_id's or long lists.

For short lists or rare products, it will be faster to start with a positive selection first. Like:

SELECT order_id
FROM  (
   SELECT DISTINCT order_id
   FROM   transactions
   WHERE  product_id = ANY ('{a,b,c}')
   ) t
WHERE  NOT EXISTS (
   SELECT FROM transactions
   WHERE  order_id = t.order_id
   AND    product_id <> ALL ('{a,b,c}')
   );

An index on (product_id) is essential for performance. Better yet, a multicolumn index on (product_id, order_id), plus another one on (order_id, product_id). See:

The manual about array literals:

About the ANY and ALL constructs:


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...