Apart from transformations, the following reasons can be considered for having Firehose in front of AWS ES:
- Better control over streaming data
Since Elasticsearch has limit on the write queue size, if there is a burst in data for few seconds, ES might throw rejects if it wont be
able to write the data in that limited data. In this, you will end
up loosing the rejected data as well.
However, when Firehose is kept in front, it will handle the retries for you and there will be less chances of data loss.
- Firehose is one-way to ES
Your ES cluster might contain confidential data and if you are
allowing user to make POST requests (required for some writes), you
might expose the cluster to more than required users. Firehose can
help you in limiting that by only giving write applications/user
access to the FH stream instead of the ES cluster.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…