elasticsearch - Refresh vs flush

Question

Welcome To Ask or Share your Answers For Others

elasticsearch - Refresh vs flush

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:26:56+0000

The answer that you got is correct but I think it's worth to elaborate a bit more.

A refresh effectively calls a reopen on the lucene index reader, so that the point in time snapshot of the data that you can search on gets updated. This lucene feature is part of the lucene near real-time api.

An elasticsearch refresh makes your documents available for search, but it doesn't make sure that they are written to disk to a persistent storage, as it doesn't call fsync, thus doesn't guarantee durability. What makes your data durable is a lucene commit, which is way more expensive.

While you can call lucene reopen every second, you cannot do the same with lucene commit.

Through lucene you can then have new documents available for search in near real-time by calling reopen pretty often, but you still need to call commit to ensure data is written to disk and fsynced, thus safe.

Elasticsearch solves this "problem" by adding a transaction log per shard (effectively a lucene index), where write operations that have not been committed yet are stored. The transaction log is fsynced and safe, thus you obtain durability at any point in time, even for documents that have not been committed yet. You can search on documents in near real-time as refresh happens automatically every second, and you can also be sure that if something bad happens the transaction log can be replayed to restore eventually lost documents. The nice thing about the transaction log is that it can be used internally for other things, for instance to provide real-time get by id.

An elasticsearch flush effectively triggers a lucene commit, and empties also the transaction log, since once data is committed on the lucene level, durability can be guaranteed by lucene itself. Flush is exposed as an api too and can be tweaked, although usually that is not necessary. Flush happens automatically depending on how many operations get added to the transaction log, how big they are, and when the last flush happened.

Categories

elasticsearch - Refresh vs flush

elasticsearch - Refresh vs flush

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags