Implementation is realized with combination of custom fluentbit log collector and GCloud Logging, formerly known as StackDriver

GKE GCloud Integration

GKE supports logging out of the box for 2 domains:

  • System

  • Workloads

While system logs dont need any special formatting and are ok to use as they are provided, we need customization for logs from workloads (mainly in -apps- clusters) to effectively filter and use structured format of logs in GCloud Logging Explorer

Using custom fluentbit deployment

Motivation of using custom fluentbit daemonset is that logging out of the box dont support customized configuration. If we want to use customized configuration, we have to use custom deployment. This wil give us advantage of:

  • using custom filters (for 3rd party applications like kafka, etc )

  • allow us filtering of logs on pod level via annotations (annotated pod will not send logs to GCP)

  • defining which filter to use on pod level

  • masking sensitive data before they are sent to GCloud (credit card numbers, customer names, etc…

How fluentbit is working in general:

Our custom setup:

Configuration can be found and edited here: https://github.com/SafiBank/SaFiMono/blob/main/devops/charts/fluentbit/templates/configmap.yaml

Formatting of logs

By default all logs are parsed with build-in docker & containerd parsers which are followed by Kubernetes filter, which parse json logs to structured format accepted by GCloud Logging. Thats why for our micronaut containers we will use log formatter for this purpose described by Yuetong Yang (Unlicensed) :

https://github.com/recoyang/box-service

In other cases it is sufficient to log into stdout or stderr and logs will be picked. If they are not in proper json format, they still will be logged in non structured way.

Container-Level customization

Excluding

Logs can be excluded on container level with annotation:

 metadata:
      annotations:
        fluentbit.io/exclude: "true"

With this annotation, logs are excluded in Kubernetes filter in our fluntbit chain and are not sent to GCloud sink.

Explicit Parser definition

If it is required we can use explicit parser definition for logs from particular container , this will be injected after Kubernetes filter in our chain .

metadata:
      annotations:
        fluentbit.io/parser: apache

Here is the documentation for parser configuration: https://docs.fluentbit.io/manual/pipeline/parsers/configuring-parser

Data anonymization / masking

For this purpose we have defined custom LUA function with some examples:

function replace_sensitive_info(tag, timestamp, record)
          -- mask social security number
          record["log"] = string.gsub(record["log"], "%d%d%d%-+%d%d%-+%d%d%d%d", "xxx-xx-xxxx")
          -- mask credit card number
          record["log"] = string.gsub(record["log"], "%d%d%d%d *%d%d%d%d *%d%d%d%d *%d%d%d%d", "xxxx xxxx xxxx xxxx")
          -- mask email address
          record["log"] = string.gsub(record["log"], "[%w+%.%-_]+@[%w+%.%-_]+%.%a%a+", "user@email.tld")
          return 1, timestamp, record
        end

GCloud Logs-explorer

For reading logs, we will use built-in functionality of GCP which is in detail describe here:

https://cloud.google.com/logging/docs/view/logs-explorer-interface

This is sample query:

resource.type="k8s_container"
resource.labels.cluster_name="safi-fluentbit-II"
resource.labels.container_name="box-service"
resource.labels.namespace_name="logging-istio"

And this is an example structured log with some masking on:

IAM Permissions

Project level is lowest possible level to define permission for accessing logs. Only basic setup is implemented for assigning pre-defined roles per member per project in automated way in:

https://github.com/SafiBank/SaFiMono/blob/main/devops/terraform/_files/users.yaml

as follows:

users:
  - name: "Peter Kmec"
    gcp_email: peter.kmec@vacuumlabs.com
    gcp:
      projects_iam:
        - project: safi-env-dev-apps
          roles:
            - roles/logging.viewer

and here is the description of predefined gcp predefined roles: https://cloud.google.com/logging/docs/access-control

It is possible to implement permission control with higher granularity on fields level, but at the time of writing there was no request for that.

Alerting

We can setup automatic alarms based on log values, for that there are needed 2 resources + notification channel defined. Description of setup is https://cloud.google.com/logging/docs/alerting/log-based-alerts#lba-by-api

For terraform automation we need:

  • google_logging_metric => metric definition, this will collect metric values over time

  • google_monitoring_alert_policy => definition of conditions when alert is triggered and where notification should be sent

  • google_monitoring_notification_channel => channel for alert notifications

Example of implementation is here:

https://github.com/SafiBank/SaFiMono/blob/main/devops/terraform/tf-environments/log_alerts.tf

Attachments:

overview-new[1].svg (image/svg+xml)
image-20220627-110655.png (image/png)
flb_pipeline.png (image/png)
image-20220627-115840.png (image/png)