Today, One of the easiest ways to do log shipping from a Kubernetes cluster is by using fluent bit.

Fluent Bit

Fluent Bit is like the little brother of fluentd and is written in C and takes less resources, so it is the best fit for running as a Daemonset in Kubernetes for log shipping pod logs.

Fluent bit also enriches the logs it is collecting from pods in kubernetes using a built-in filter called kubernetes which enriches the logs with the following information:

  • Pod Name
  • Namespace
  • Container Name
  • Container ID
  • Pod ID
  • Labels
  • Annotations

First 4 are being collected from the pod tag and last 3 are being collected from the Kubernetes API server. the data is stored in fluent bit cache so there won’t be a big overhead on the API server. Those important record will then be appended to each log collected from the pods, making it easier to search.

Elasticsearch

Elasticsearch is on of the fluent bit outputs and easily configured to work with AWS managed Elasticsearch Service which is a managed elasticsearch service.

Elasticsearch is one of the most popular logging and log analysis platform to date, so if you haven’t tried it, you should. The company behind elasticsearch is Elastic and they offer a full stack of logging and a cloud base solution.


I have decided to write this post after having two different teams I manage, start to log differently their application logs, which eventually made our elasticsearch index mapping having a lot of field conflicts and logs started missing when searching them in Kibana. The reason is quite simply because Elasticsearch cant index a filed with 2 types of variables like string and it, or in our case string and object.

It happened when we decided to start logging in the json form, so it will be parsed by fields in elasticsearch and will allow us to use our logs more productively in elasticsearch.

The json parsing is being made by fluent bit json parser which is the default logging driver for docker.

So, basically you can get almost out of the box logging system by just using the right tools with the right configurations, which I am about the demonstrate.


Set up

Following fluent bit guide to do log shipping to kubernetes we first apply these:

kubectl create namespace logging
kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-service-account.yaml
kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-role.yaml
kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-role-binding.yaml

This will create a new namespace called logging and create the appropriate permissions for fluent bit to query the api server for metadata.

Next step is the configmap

kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/output/elasticsearch/fluent-bit-configmap.yaml

This configmap is generic, which use a tail INPUT plugin, a kubernetes FILTER and es OUTPUT to elasticsearch, using the json PARSER for docker containers.

How it works

Kubernetes uses symlinks from the docker logs in /var/lib/docker/containers to eventually /var/log/containers. Each symlink add to the log name something. Once the pod name is added, and then the namespace is added. Eventually the logs in /var/log/containers adds to the log file name podName_namespaceName_deploymentName-.

Fluent bit will tail those logs and tag them with kube.* and keep a marker its own local db, then after processing then, after collecting them, the kubernetes filter will match what was tagged with kube.* to enrich the logs and will try to use the docker parser which is type json. After enriching the logs and parsing them the OUTPUT plugin that match * (all logs collected) will ship them to elasticsearch.

In the configmap there are 2 environment variables:

  1. FLUENT_ELASTICSEARCH_HOST
  2. FLUENT_ELASTICSEARCH_PORT

We will address those in the daemonset.

Download the daemonset file from https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/output/elasticsearch/fluent-bit-ds.yaml

And now update those environment variables from default elasticsearch and 9200 to your elasticsearch domain URL. If you use AWS Elasticsearch Service then the URL should be like elasticsearch-domain-name-absdefg-2-abcdefg4.eu-west-2.es.amazonaws.com and port is 443.

Then apply this configmap and your set.

The result of this deployment is fluent bit collecting all the logs from Kubernetes namespaces, including kube-system etc… and sending them to a single indices named: logstash-yyyy-mm-dd which is the default name when logstash format is set to On in the OUTPUT config.

This was the first deployment, but now, when my teams log differently and their logs cant be in the same index mapping, I had to split them to different indices. Of course diffrent teams use a different namespace in our kubernetes cluster.

For the example, team1 uses team1 namespace and team2 uses team2 namespace, So, I have decided to split the logs for each namespace and having them in different indices with a different index mapping.

To do that, I had to modify the configmap file as follows:

I have decided to split to 2 inputs of only my namespaces and 2 outputs to different indices, and in kibana I have configured two index mapping:

  1. team1-*
  2. team2-*

And now, no more conflicts in elasticsearch.

In the example I am not collecting kube-system logs, you can add them with the same logic. This allow me to control how and what exactly is being shipped to our logging system. I gives my the freedom to ship logs to different outputs as well.