Today, one of the easiest ways to ship logs from a Kubernetes cluster is by using Fluent Bit.
Fluent Bit
Fluent Bit is the lightweight sibling of Fluentd. It is written in C, uses fewer resources, and is a great fit for running as a DaemonSet in Kubernetes to ship pod logs.
Fluent Bit also enriches logs it collects from pods in Kubernetes using a built-in filter called kubernetes, which adds the following information:
- Pod Name
- Namespace
- Container Name
- Container ID
- Pod ID
- Labels
- Annotations
The first four are collected from the pod tag and the last three from the Kubernetes API server. The data is stored in Fluent Bit’s cache, so there isn’t a big overhead on the API server. These records are appended to each log collected from the pods, making them easier to search.
Elasticsearch
Elasticsearch is one of Fluent Bit’s outputs and is easily configured to work with AWS’s managed Elasticsearch Service.
Elasticsearch is one of the most popular logging and log analysis platforms to date, so if you haven’t tried it, you should. The company behind Elasticsearch is Elastic, and they offer a full logging stack and a cloud‑based solution.
I decided to write this post after two teams I manage started logging their application logs differently, which eventually caused many field conflicts in our Elasticsearch index mappings, and logs started going missing when searching in Kibana. The reason is simple: Elasticsearch can’t index a field with two different types (for example, string and int), or in our case, string and object.
This happened when we decided to start logging in JSON so it would be parsed into fields in Elasticsearch, allowing us to use our logs more effectively.
JSON parsing is done by Fluent Bit’s JSON parser. The Docker logging driver defaults to json-file
.
So, you can get an almost out‑of‑the‑box logging system by using the right tools with the right configurations, which I am about to demonstrate.
Set up
Following Fluent Bit’s guide for log shipping on Kubernetes, we first apply these resources:
kubectl create namespace logging
kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-service-account.yaml
kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-role.yaml
kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-role-binding.yaml
This will create a new namespace called logging
and the appropriate permissions for Fluent Bit to query the API server for metadata.
Next step is the ConfigMap
kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/output/elasticsearch/fluent-bit-configmap.yaml
This ConfigMap is generic; it uses a tail
INPUT plugin, a kubernetes
FILTER, and an es
OUTPUT to Elasticsearch, using the json
PARSER for Docker containers.
How it works
Kubernetes uses symlinks from the Docker logs in /var/lib/docker/containers
to /var/log/containers
. Each symlink adds something to the log name: first the pod name, then the namespace.
The logs in /var/log/containers
include podName_namespaceName_deploymentName-
in the file name.
Fluent Bit tails those logs, tags them with kube.*
, and keeps a marker in its own local DB. After collecting them, the kubernetes
filter matches the kube.*
tags to enrich the logs and uses the Docker parser of type json
.
After enriching and parsing, the OUTPUT plugin that matches *
(all logs) ships them to Elasticsearch.
In the ConfigMap there are two environment variables:
- FLUENT_ELASTICSEARCH_HOST
- FLUENT_ELASTICSEARCH_PORT
We will address those in the DaemonSet.
Download the DaemonSet file from https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/output/elasticsearch/fluent-bit-ds.yaml
Now update those environment variables from the defaults elasticsearch
and 9200
to your Elasticsearch domain URL.
If you use AWS Elasticsearch Service, the URL will look like elasticsearch-domain-name-absdefg-2-abcdefg4.eu-west-2.es.amazonaws.com
and the port is 443
.
Then apply this ConfigMap and you’re set.
The result of this deployment is Fluent Bit collecting logs from all Kubernetes namespaces, including kube-system
, etc., and sending them to a single index named logstash-yyyy-mm-dd
, which is the default name when Logstash format is set to on in the OUTPUT config.
This was the first deployment, but when my teams started logging differently and their logs couldn’t share the same index mapping, I had to split them into different indices. Of course, different teams use different namespaces in our Kubernetes cluster.
For example, team1 uses the team1
namespace and team2 uses the team2
namespace, so I decided to split the logs for each namespace and have them in different indices with different index mappings.
To do that, I modified the ConfigMap as follows:
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: logging
data:
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
[INPUT]
Name tail
Tag team1.*
Path /var/log/containers/*_team1_*.log
Parser docker
DB /var/log/flb_kube_team1.db
Mem_Buf_Limit 50MB
Skip_Long_Lines On
Refresh_Interval 10
[INPUT]
Name tail
Tag team2.*
Path /var/log/containers/*_team2_*.log
Parser docker
DB /var/log/flb_kube_team2.db
Mem_Buf_Limit 50MB
Skip_Long_Lines On
Refresh_Interval 10
[OUTPUT]
Name es
Match team1.*
Host elasticsearch
Port 9200
Index team1-logs
Type _doc
[OUTPUT]
Name es
Match team2.*
Host elasticsearch
Port 9200
Index team2-logs
Type _doc
I decided to split into two inputs for only my namespaces and two outputs to different indices, and in Kibana I configured two index mappings:
team1-*
team2-*
And now, no more conflicts in Elasticsearch.
In this example, I am not collecting kube-system
logs; you can add them with the same logic. This allows me to control how and what exactly is being shipped to our logging system.
It gives me the freedom to ship logs to different outputs as well.