Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
291 views
in Technique[技术] by (71.8m points)

postgresql - postgres only using sorted index for querying 5m records table

I'm having some issues with a 5M record table. it has a few indexes.

this is the one being used:

CREATE INDEX emails_ordered_created_and_keys
ON vsko_mailer_api_prod.emails USING btree
("$$meta.created" ASC NULLS LAST, key ASC NULLS LAST)
TABLESPACE pg_default;

I'm doing a query like this:

SELECT
    "key",
    "job",
    "sentDate",
    "scheduledDate",
    "status",
    "recipient",
    "opens",
    "clicks",
    "smtpEvents",
    "$$meta.deleted",
    "$$meta.created",
    "$$meta.modified",
    "$$meta.version",
    "$$meta.deleted",
    "$$meta.created",
    "$$meta.modified"
FROM
    "emails"
WHERE
    "emails"."$$meta.deleted" = FALSE
    AND ("emails"."$$meta.created" > '2018-02-13T14:30:35.679075Z'
        OR ("emails"."$$meta.created" = '2018-02-13T14:30:35.679075Z'
            AND "emails"."key" > '8c0a3151-bf17-490f-8124-d93f7482624f'))
ORDER BY
    "$$meta.created" ASC,
    "key" ASC
LIMIT '500'

The problem is that it is only using the emails_ordered_created_and_keys index.

"Limit  (cost=0.11..192.72 rows=500 width=764) (actual time=1.122..1.701 rows=500 loops=1)"
"  ->  Index Scan using emails_ordered_created_and_keys on emails  (cost=0.11..1966725.65 rows=5105459 width=764) (actual time=1.120..1.664 rows=500 loops=1)"
"        Filter: ((NOT ""$$meta.deleted"") AND ((""$$meta.created"" > '2018-02-13 14:30:35.679075+00'::timestamp with time zone) OR ((""$$meta.created"" = '2018-02-13 14:30:35.679075+00'::timestamp with time zone) AND (key > '8c0a3151-bf17-490f-8124-d93f7482624f'::uuid))))"
"        Rows Removed by Filter: 1000"
"Planning Time: 0.297 ms"
"Execution Time: 1.749 ms" 

readable version here: https://explain.depesz.com/s/ajW2

as I'm going down the table, by the time I'm at 2019, it has to filter out so many rows that the query takes ages.

I'm not sure what index i would need for this case. I have an index on $$meta.created as well. (example: https://explain.depesz.com/s/Ju9O)

question from:https://stackoverflow.com/questions/65896487/postgres-only-using-sorted-index-for-querying-5m-records-table

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Change the query to

WHERE NOT emails."$$meta.deleted" = FALSE
  AND (emails."$$meta.created", emails.key) >
      ('2018-02-13T14:30:35.679075Z', '8c0a3151-bf17-490f-8124-d93f7482624f')

and use this index:

CREATE INDEX ON emails ("$$meta.created", key)
   WHERE NOT emails."$$meta.deleted";

Explanation: OR is a performance problem. By rewriting it as a lexicographical comparison of pairs, you get rid of the OR, and a simple index scan finds the rows efficiently. PostgreSQL is no longer tempted to use the index that supports the ORDER BY.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...