Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
348 views
in Technique[技术] by (71.8m points)

logstash + elasticsearch : reloads the same data

Managed to get logstash (1.3.1) to send data to elasticsearch (0.9.5).

My logstash conf file setup is

input {
  file {
    path => ["D:/apache-tomcat-7.0.5/logs/*.*"]
   }

}
 output {
  stdout { } 
    elasticsearch_http {
    host => "localhost"
    port => 9200
   }
 }

The data is stored in ES under index logstash-2013.12.xx

However, if i restart logstash, lets say next day - the same data is reloaded into a new index. Even if i restart again, the document count doubles in the index.

Seems like logstash re-reading the data and ES is also duplicating the documents.

Is there a way to not reload in logstash or not duplicate in ES or do BOTH.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I ran across this issue with Logstash 1.3.3 as well. The relevant bug report on Logstash Jira is LOGSTASH-429 File Input - .sincedb file is broken on Windows. There has also been a patch created by Boyd Meier.

This patch has also been pulled into Jordan Sissel's ruby-filewatch git repository for inclusion in a later version however it just hasn't made it in yet.

The issue comes from Logstash using the file inode which always returns 0 on Windows. Boyd Meier uses the File ID to get an identifier for the file to bypass the issue. This file id remains the same until the file is deleted from the volume.

If you're comfortable doing a bit of patching you can patch the change in from Jordan Sissel's ruby-filewatch git repository. For 1.3.3 that I have only just patched and am in the process of testing against test log files the steps were:

  1. Download ruby-filewatch zip file from Github: Jordan Sissel's ruby-filewatch git repository
  2. Unzip the zip file you downloaded to a new directory
  3. I had to make a change to the Ruby-filewatchlibfilwatchail.rb file -> Line 10 which reads require "JRubyFileExtension.jar". I had to change to require "java/JRubyFileExtension.jar" as otherwise I was getting an error that it wasn't able to find the jar file when trying to read a file. For reference that makes the whole line appear as: require "java/JRubyFileExtension.jar" if defined? JRUBY_VERSION
  4. Open logstash-1.3.3-flatjar.jar file in 7-Zip
  5. Drag and drop the java directory from ruby-filewatch into the root folder in 7-Zip
  6. Drag and drop all the files from the ruby-filewatchlibfilewatch folder into the filewatch folder in 7-Zip, overwriting any existing files

Now when you run it against multiple log files you should find that sincedb contains more than one entry and the entries appear similar to 1717916447-2604966-851968 0 2 428312038. If you're having trouble finding the sincedb file and haven't set sincedb_path in your config file it can be found in the home directory of the user running the jar. If this is your user you can get to it easily using Windows key + Run -> %USERPROFILE% -> OK.

As always take care when patching and test thoroughly before deploying to production systems.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...