es-json.org
1	-*- org-mode -*-
2
3 * How to parse single line JSON logs for ElasticSearch Cloud
4 This is going to be a brief blog post, but wanted to jot down a few things as solving this "easy" issue has taken me the better part of 4 hours.
5 ** The Problem
6 I have a number of weirdly formatted logs that developers would like to be able to easily search through and get insights from. The developers control this log format,
7 but its an embedded environment and it's "non-trivial" to modify the format. I wrote a Perl script that will read in these developer logs and regex out
8 key fields I'm interested in, transforming them like so (fake data):
9
10 #+BEGIN_SRC shell
11 # Original log line
12 # LOG LEVEL # DATE & TIME # FUNCTION NAME/LINE NUMBER # LOG MESSAGE
13 [DEBUG] 2020/9/10 - 13:59:23 | some_function_name 166: some log message
14
15 # PARSED LOG LINE
16 {"log_level":"Debug","timestamp":"2020-09-10T13:59:23","function_name":"some_function_name","line_number":"166","message":"some log message"}
17 #+END_SRC
18
19 After setting up this log parser and filebeat, I started processing these logs into a hosted ElasticSearch cloud instance. To my surprised, the JSON fields were
20 not indexed, meaning I couldn't perform KQL searches like =timestamp:2020-09*= to get all log lines from that month.
21
22 ** The Solution
23 To Elastic's credit, it's actually incredibly simple to get this behavior with filebeat, all I needed to do was add the following to the =/etc/filebeat/filebeat.yml=
24 file under the =processors= field (This is on filebeat versions 7.x):
25
26 #+BEGIN_SRC yaml
27 processors:
28 - decode_json_fields:
29 fields: ["line_number","message","timestamp","function_name","log_level"]
30 process_array: false
31 max_depth: 1
32 target: ""
33 overwrite_keys: false
34 add_error_key: true
35 #+END_SRC
36
37 The relevant documentation can be found here: https://www.elastic.co/guide/en/beats/filebeat/current/decode-json-fields.html
38
39 After creating a new index in ElasticSearch and ingesting logs to this new index, the expected KQL behavior worked.
40
41 The reason why I'm making this blog post is that it took me hours to find this documentation, as there seems to be about 1000 different ways to get this
42 functionality, with a number of different caveats or options depending on your use case. I may just be showing my inexperience with ElasticSearch here,
43 but decided to write something brief about this because it took me a while to track down.
44
45 Note: This post isn't a knock against Elastic and their products. They solve a complex issue and give users a lot of options for how to manage, index, and
46 search their data. I think given those options though, groking documentation can become time consuming and I wanted to try and offer a shortcut.
47