Today, one of the easiest ways to ship logs from a Kubernetes cluster is by using Fluent Bit.

Fluent Bit

Fluent Bit is the lightweight sibling of Fluentd. It is written in C, uses fewer resources, and is a great fit for running as a DaemonSet in Kubernetes to ship pod logs.

Fluent Bit also enriches logs it collects from pods in Kubernetes using a built-in filter called kubernetes, which adds the following information:

  • Pod Name
  • Namespace
  • Container Name
  • Container ID
  • Pod ID
  • Labels
  • Annotations

The first four are collected from the pod tag and the last three from the Kubernetes API server. The data is stored in Fluent Bit’s cache, so there isn’t a big overhead on the API server. These records are appended to each log collected from the pods, making them easier to search.

Elasticsearch

Elasticsearch is one of Fluent Bit’s outputs and is easily configured to work with AWS’s managed Elasticsearch Service.

Elasticsearch is one of the most popular logging and log analysis platforms to date, so if you haven’t tried it, you should. The company behind Elasticsearch is Elastic, and they offer a full logging stack and a cloud‑based solution.


I decided to write this post after two teams I manage started logging their application logs differently, which eventually caused many field conflicts in our Elasticsearch index mappings, and logs started going missing when searching in Kibana. The reason is simple: Elasticsearch can’t index a field with two different types (for example, string and int), or in our case, string and object.

This happened when we decided to start logging in JSON so it would be parsed into fields in Elasticsearch, allowing us to use our logs more effectively.

JSON parsing is done by Fluent Bit’s JSON parser. The Docker logging driver defaults to json-file.

So, you can get an almost out‑of‑the‑box logging system by using the right tools with the right configurations, which I am about to demonstrate.


Set up

Following Fluent Bit’s guide for log shipping on Kubernetes, we first apply these resources:

kubectl create namespace logging
kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-service-account.yaml
kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-role.yaml
kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-role-binding.yaml

This will create a new namespace called logging and the appropriate permissions for Fluent Bit to query the API server for metadata.

Next step is the ConfigMap

kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/output/elasticsearch/fluent-bit-configmap.yaml

This ConfigMap is generic; it uses a tail INPUT plugin, a kubernetes FILTER, and an es OUTPUT to Elasticsearch, using the json PARSER for Docker containers.

How it works

Kubernetes uses symlinks from the Docker logs in /var/lib/docker/containers to /var/log/containers. Each symlink adds something to the log name: first the pod name, then the namespace. The logs in /var/log/containers include podName_namespaceName_deploymentName- in the file name.

Fluent Bit tails those logs, tags them with kube.*, and keeps a marker in its own local DB. After collecting them, the kubernetes filter matches the kube.* tags to enrich the logs and uses the Docker parser of type json. After enriching and parsing, the OUTPUT plugin that matches * (all logs) ships them to Elasticsearch.

In the ConfigMap there are two environment variables:

  1. FLUENT_ELASTICSEARCH_HOST
  2. FLUENT_ELASTICSEARCH_PORT

We will address those in the DaemonSet.

Download the DaemonSet file from https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/output/elasticsearch/fluent-bit-ds.yaml

Now update those environment variables from the defaults elasticsearch and 9200 to your Elasticsearch domain URL. If you use AWS Elasticsearch Service, the URL will look like elasticsearch-domain-name-absdefg-2-abcdefg4.eu-west-2.es.amazonaws.com and the port is 443.

Then apply this ConfigMap and you’re set.

The result of this deployment is Fluent Bit collecting logs from all Kubernetes namespaces, including kube-system, etc., and sending them to a single index named logstash-yyyy-mm-dd, which is the default name when Logstash format is set to on in the OUTPUT config.

This was the first deployment, but when my teams started logging differently and their logs couldn’t share the same index mapping, I had to split them into different indices. Of course, different teams use different namespaces in our Kubernetes cluster.

For example, team1 uses the team1 namespace and team2 uses the team2 namespace, so I decided to split the logs for each namespace and have them in different indices with different index mappings.

To do that, I modified the ConfigMap as follows:

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: logging
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     info
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020

    [INPUT]
        Name              tail
        Tag               team1.*
        Path              /var/log/containers/*_team1_*.log
        Parser            docker
        DB                /var/log/flb_kube_team1.db
        Mem_Buf_Limit     50MB
        Skip_Long_Lines   On
        Refresh_Interval  10

    [INPUT]
        Name              tail
        Tag               team2.*
        Path              /var/log/containers/*_team2_*.log
        Parser            docker
        DB                /var/log/flb_kube_team2.db
        Mem_Buf_Limit     50MB
        Skip_Long_Lines   On
        Refresh_Interval  10

    [OUTPUT]
        Name            es
        Match           team1.*
        Host            elasticsearch
        Port            9200
        Index           team1-logs
        Type            _doc

    [OUTPUT]
        Name            es
        Match           team2.*
        Host            elasticsearch
        Port            9200
        Index           team2-logs
        Type            _doc

I decided to split into two inputs for only my namespaces and two outputs to different indices, and in Kibana I configured two index mappings:

  1. team1-*
  2. team2-*

And now, no more conflicts in Elasticsearch.

In this example, I am not collecting kube-system logs; you can add them with the same logic. This allows me to control how and what exactly is being shipped to our logging system. It gives me the freedom to ship logs to different outputs as well.