Table of Contents

1 How to parse single line JSON logs for ElasticSearch Cloud

This is going to be a brief blog post, but wanted to jot down a few things as solving this "easy" issue has taken me the better part of 4 hours.

1.1 The Problem

I have a number of weirdly formatted logs that developers would like to be able to easily search through and get insights from. The developers control this log format, but its an embedded environment and it's "non-trivial" to modify the format. I wrote a Perl script that will read in these developer logs and regex out key fields I'm interested in, transforming them like so (fake data):

# Original log line
# LOG LEVEL # DATE & TIME      # FUNCTION NAME/LINE NUMBER # LOG MESSAGE
[DEBUG] 2020/9/10 - 13:59:23 | some_function_name 166: some log message

# PARSED LOG LINE
{"log_level":"Debug","timestamp":"2020-09-10T13:59:23","function_name":"some_function_name","line_number":"166","message":"some log message"}

After setting up this log parser and filebeat, I started processing these logs into a hosted ElasticSearch cloud instance. To my surprised, the JSON fields were not indexed, meaning I couldn't perform KQL searches like timestamp:2020-09* to get all log lines from that month.

1.2 The Solution

To Elastic's credit, it's actually incredibly simple to get this behavior with filebeat, all I needed to do was add the following to the /etc/filebeat/filebeat.yml file under the processors field (This is on filebeat versions 7.x):

processors:
  - decode_json_fields:
      fields: ["line_number","message","timestamp","function_name","log_level"]
      process_array: false
      max_depth: 1
      target: ""
      overwrite_keys: false
      add_error_key: true

The relevant documentation can be found here: https://www.elastic.co/guide/en/beats/filebeat/current/decode-json-fields.html

After creating a new index in ElasticSearch and ingesting logs to this new index, the expected KQL behavior worked.

The reason why I'm making this blog post is that it took me hours to find this documentation, as there seems to be about 1000 different ways to get this functionality, with a number of different caveats or options depending on your use case. I may just be showing my inexperience with ElasticSearch here, but decided to write something brief about this because it took me a while to track down.

Note: This post isn't a knock against Elastic and their products. They solve a complex issue and give users a lot of options for how to manage, index, and search their data. I think given those options though, groking documentation can become time consuming and I wanted to try and offer a shortcut.

Author: Simon Watson

Created: 2022-01-14 Fri 10:55