Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
522 views
in Technique[技术] by (71.8m points)

elasticsearch - grok regex in logstash to parse and extract field

I am trying to extract certain fields from a single message field. I am trying to achieve this by grok regex on the logstash so that i could view them in kibana.

My log events is as below: [2021-01-06 12:10:40] ApiLogger.INFO: API log data: {"endpoint":"/rest/thre_en/V1/temp-carts/13cEIQqUb6cUfxB/tryer-inform","http_method":"GET","payload":[],"user_id":0,"user_type":4,"http_response_code":200,"response":"{"pay_methods":[{"code":"frane","title":"R2 Partial redeem"}],"totals":{"grand_total":0,"base_grand_total":0}}

The entire log has more information into different key value store- Basically, I needed these information -

  1. time stamp (i am able to get this)
  2. log level (I am able to get this) => on loglevel, i just want the info not the entire Api.INFO
  3. endpoint
  4. http-method
  5. user_id
  6. user_type
  7. http_response_code
  8. response

I am not able to get the information from 3-8 ... i tested it. it is due to the semi colon(:) this is what i tried through grok debugger %{SYSLOG5424SD:logtime} %{JAVACLASS:loglevel}: (?<API>w+ w+ w+):

i tried uri and other but it did not work, may be due to the colon.

question from:https://stackoverflow.com/questions/65600601/grok-regex-in-logstash-to-parse-and-extract-field

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use

%{SYSLOG5424SD:logtime} ApiLogger.%{LOGLEVEL:loglevel}: (?<API>w+ w+ w+):s*%{GREEDYDATA:json_field}

Then, you can parse the json_field with JSON filter.

If you want to play around with regex, you should remember that regex engine parses a string from left to right by default. If you want to capture several fields with one regular expression, you should make sure the regex engine can "walk" all the way from one part to another. If you know what patterns there are, what types of chars there are between the two, it is great. If not, you can only rely on a .* (%{GREEDYDATA}) or .*? (%{DATA}) patterns.

So, as an excercise, you might have a look at

%{SYSLOG5424SD:logtime} %{JAVACLASS:loglevel}: (?<API>w+ w+ w+):s*{"endpoint":"(?<endpoint>[^"]*)","http_method":"(?<http_method>[A-Z]++).*?"user_id":(?<user_id>[0-9]++).*?"user_type":(?<user_type>[0-9]++).*?"http_response_code":(?<http_response_code>[0-9]++).*?"response":"(?<response>.*)"

Check the ++ in [0-9]++ and .*? patterns between each field. The ++ possessive quantifier make sure the engine does not retry matching with the pattern that is modified by the quantifier again if the subsequent patterns fail to match. The [0-9]++ grabs a sequence of digits and does not give them away and if the subsequent patterns fail, the whole match fails. .*? simply matches any zero or more chars other than line break chars, as few as possible. The last .* is greedy, because it must match as many chars other than line break chars as possible.

See the regex demo.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...