elasticsearch - Elastic Search - Scroll behavior

Question

Welcome To Ask or Share your Answers For Others

elasticsearch - Elastic Search - Scroll behavior

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:57:46+0000

Using from/size is the default and easiest way to paginate results. By default, it only works up to a size of 10000. You can increase that limit, but it is not advised to go too far because deep pagination will decrease the performance of your cluster.

The scroll API will allow you to paginate over all your data. The way it works is by creating a search context (i.e. a snapshot of the data at the time your start scrolling) and then you'll get a cursor to paginate over all your data. When done, you can close the search context. The created search context has an associated cost (requires state, hence memory), hence this way of paginating is not suited to real-time pagination (more for batch-like pagination).

There is another way of scrolling over all the data without the additional cost of creating a dedicated search context every time, and it's called search_after. In this flavor, the idea is to sort your data, and then use the sort values as lightweight cursors. It can have some drawbacks, for instance, if you're constantly indexing new data, you might run the risk of missing new data that would have appeared on a previous "page".

In 7.10, there is going to be yet another way of paginating data, which is called Point in Time search (PIT). Here the idea is again to create a context so that you can return hits as rapidly as possible and aggregations (a bit later) in two distinct calls.

UPDATE

7.10 got released on Nov 11th, 2020, and Point in Time searches are now available, too.

Categories

elasticsearch - Elastic Search - Scroll behavior

elasticsearch - Elastic Search - Scroll behavior

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags