Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
798 views
in Technique[技术] by (71.8m points)

elasticsearch - How do I retrieve more than 10000 results/events in Elastic-search

Example query:

GET hostname:port /myIndex/_search { 
    "size": 10000,
    "query": {
        "term": { "field": "myField" }
    }
}

I have been using the size option knowing that:

index.max_result_window = 100000

But if my query has the size of 650,000 Documents for example or even more, how can I retrieve all of the results in one GET?

I have been reading about the SCROLL, FROM-TO, and the PAGINATION API, but all of them never deliver more than 10K.

This is the example from Elasticsearch Forum, that I have been using:

GET /_search?scroll=1m

Can anybody provide an example where you can retrieve all the documents for a GET search query?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Scroll is the way to go if you want to retrieve a high number of documents, high in the sense that it's way over the 10000 default limit, which can be raised.

The first request needs to specify the query you want to make and the scroll parameter with duration before the search context times out (1 minute in the example below)

POST /index/type/_search?scroll=1m
{
    "size": 1000,
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}

In the response to that first call, you get a _scroll_id that you need to use to make the second call:

POST /_search/scroll 
{
    "scroll" : "1m", 
    "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==" 
}

In each subsequent response, you'll get a new _scroll_id that you need to use for the next call until you've retrieved the amount of documents you need.

So in pseudo code it looks somewhat like this:

# first request
response = request('POST /index/type/_search?scroll=1m')
docs = [ response.hits ]
scroll_id = response._scroll_id

# subsequent requests
while (true) {
   response = request('POST /_search/scroll', scroll_id)
   docs.push(response.hits)
   scroll_id = response._scroll_id
}

UPDATE:

Please refer to the following answer which is more accurate regarding the best solution for deep pagination: Elastic Search - Scroll behavior


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...