Table of Contents
1 How to parse single line JSON logs for ElasticSearch Cloud
This is going to be a brief blog post, but wanted to jot down a few things as solving this "easy" issue has taken me the better part of 4 hours.
1.1 The Problem
I have a number of weirdly formatted logs that developers would like to be able to easily search through and get insights from. The developers control this log format, but its an embedded environment and it's "non-trivial" to modify the format. I wrote a Perl script that will read in these developer logs and regex out key fields I'm interested in, transforming them like so (fake data):
# Original log line # LOG LEVEL # DATE & TIME # FUNCTION NAME/LINE NUMBER # LOG MESSAGE [DEBUG] 2020/9/10 - 13:59:23 | some_function_name 166: some log message # PARSED LOG LINE {"log_level":"Debug","timestamp":"2020-09-10T13:59:23","function_name":"some_function_name","line_number":"166","message":"some log message"}
After setting up this log parser and filebeat, I started processing these logs into a hosted ElasticSearch cloud instance. To my surprised, the JSON fields were
not indexed, meaning I couldn't perform KQL searches like timestamp:2020-09*
to get all log lines from that month.
1.2 The Solution
To Elastic's credit, it's actually incredibly simple to get this behavior with filebeat, all I needed to do was add the following to the /etc/filebeat/filebeat.yml
file under the processors
field (This is on filebeat versions 7.x):
processors: - decode_json_fields: fields: ["line_number","message","timestamp","function_name","log_level"] process_array: false max_depth: 1 target: "" overwrite_keys: false add_error_key: true
The relevant documentation can be found here: https://www.elastic.co/guide/en/beats/filebeat/current/decode-json-fields.html
After creating a new index in ElasticSearch and ingesting logs to this new index, the expected KQL behavior worked.
The reason why I'm making this blog post is that it took me hours to find this documentation, as there seems to be about 1000 different ways to get this functionality, with a number of different caveats or options depending on your use case. I may just be showing my inexperience with ElasticSearch here, but decided to write something brief about this because it took me a while to track down.
Note: This post isn't a knock against Elastic and their products. They solve a complex issue and give users a lot of options for how to manage, index, and search their data. I think given those options though, groking documentation can become time consuming and I wanted to try and offer a shortcut.