Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
617 views
in Technique[技术] by (71.8m points)

elasticsearch - Refresh vs flush

If a new document is indexed to Elasticsearch index then it is available for searching something like 1 second after index operation. However it can be forced to make this document searchable immediately by calling _flush or _refresh operation on index. What is the difference between these two operations - the result seems to be the same for them, document is immediately searchable.

What exactly does each one of these operations?

ES documentation seems to not tackle this problem deeply.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The answer that you got is correct but I think it's worth to elaborate a bit more.

A refresh effectively calls a reopen on the lucene index reader, so that the point in time snapshot of the data that you can search on gets updated. This lucene feature is part of the lucene near real-time api.

An elasticsearch refresh makes your documents available for search, but it doesn't make sure that they are written to disk to a persistent storage, as it doesn't call fsync, thus doesn't guarantee durability. What makes your data durable is a lucene commit, which is way more expensive.

While you can call lucene reopen every second, you cannot do the same with lucene commit.

Through lucene you can then have new documents available for search in near real-time by calling reopen pretty often, but you still need to call commit to ensure data is written to disk and fsynced, thus safe.

Elasticsearch solves this "problem" by adding a transaction log per shard (effectively a lucene index), where write operations that have not been committed yet are stored. The transaction log is fsynced and safe, thus you obtain durability at any point in time, even for documents that have not been committed yet. You can search on documents in near real-time as refresh happens automatically every second, and you can also be sure that if something bad happens the transaction log can be replayed to restore eventually lost documents. The nice thing about the transaction log is that it can be used internally for other things, for instance to provide real-time get by id.

An elasticsearch flush effectively triggers a lucene commit, and empties also the transaction log, since once data is committed on the lucene level, durability can be guaranteed by lucene itself. Flush is exposed as an api too and can be tweaked, although usually that is not necessary. Flush happens automatically depending on how many operations get added to the transaction log, how big they are, and when the last flush happened.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...