Implementation is realized with combination of custom fluentbit log collector and GCloud Logging, formerly known as StackDriver
GKE GCloud Integration
GKE supports logging out of the box for 2 domains:
System
Workloads
While system logs dont need any special formatting and are ok to use as they are provided, we need customization for logs from workloads (mainly in -apps- clusters) to effectively filter and use structured format of logs in GCloud Logging Explorer
Using custom fluentbit deployment
Motivation of using custom fluentbit daemonset is that logging out of the box dont support customized configuration. If we want to use customized configuration, we have to use custom deployment. This wil give us advantage of:
using custom filters (for 3rd party applications like kafka, etc )
allow us filtering of logs on pod level via annotations (annotated pod will not send logs to GCP)
defining which filter to use on pod level
masking sensitive data before they are sent to GCloud (credit card numbers, customer names, etc…
How fluentbit is working in general:
Our custom setup:
Configuration can be found and edited here: https://github.com/SafiBank/SaFiMono/blob/main/devops/charts/fluentbit/templates/configmap.yaml
Formatting of logs
By default all logs are parsed with build-in docker & containerd parsers which are followed by Kubernetes filter, which parse json logs to structured format accepted by GCloud Logging. Thats why for our micronaut containers we will use log formatter for this purpose described by Yuetong Yang (Unlicensed) :
https://github.com/recoyang/box-service
In other cases it is sufficient to log into stdout or stderr and logs will be picked. If they are not in proper json format, they still will be logged in non structured way.
Container-Level customization
Excluding
Logs can be excluded on container level with annotation:
metadata: annotations: fluentbit.io/exclude: "true"
With this annotation, logs are excluded in Kubernetes filter in our fluntbit chain and are not sent to GCloud sink.
Explicit Parser definition
If it is required we can use explicit parser definition for logs from particular container , this will be injected after Kubernetes filter in our chain .
metadata: annotations: fluentbit.io/parser: apache
Here is the documentation for parser configuration: https://docs.fluentbit.io/manual/pipeline/parsers/configuring-parser
Data anonymization / masking
For this purpose we have defined custom LUA function with some examples:
function replace_sensitive_info(tag, timestamp, record) -- mask social security number record["log"] = string.gsub(record["log"], "%d%d%d%-+%d%d%-+%d%d%d%d", "xxx-xx-xxxx") -- mask credit card number record["log"] = string.gsub(record["log"], "%d%d%d%d *%d%d%d%d *%d%d%d%d *%d%d%d%d", "xxxx xxxx xxxx xxxx") -- mask email address record["log"] = string.gsub(record["log"], "[%w+%.%-_]+@[%w+%.%-_]+%.%a%a+", "user@email.tld") return 1, timestamp, record end
GCloud Logs-explorer
For reading logs, we will use built-in functionality of GCP which is in detail describe here:
https://cloud.google.com/logging/docs/view/logs-explorer-interface
This is sample query:
resource.type="k8s_container" resource.labels.cluster_name="safi-fluentbit-II" resource.labels.container_name="box-service" resource.labels.namespace_name="logging-istio"
And this is an example structured log with some masking on:
IAM Permissions
Project level is lowest possible level to define permission for accessing logs. Only basic setup is implemented for assigning pre-defined roles per member per project in automated way in:
https://github.com/SafiBank/SaFiMono/blob/main/devops/terraform/_files/users.yaml
as follows:
users: - name: "Peter Kmec" gcp_email: peter.kmec@vacuumlabs.com gcp: projects_iam: - project: safi-env-dev-apps roles: - roles/logging.viewer
and here is the description of predefined gcp predefined roles: https://cloud.google.com/logging/docs/access-control
It is possible to implement permission control with higher granularity on fields level, but at the time of writing there was no request for that.
Alerting
We can setup automatic alarms based on log values, for that there are needed 2 resources + notification channel defined. Description of setup is https://cloud.google.com/logging/docs/alerting/log-based-alerts#lba-by-api
For terraform automation we need:
google_logging_metric
=> metric definition, this will collect metric values over timegoogle_monitoring_alert_policy
=> definition of conditions when alert is triggered and where notification should be sentgoogle_monitoring_notification_channel
=> channel for alert notifications
Example of implementation is here:
https://github.com/SafiBank/SaFiMono/blob/main/devops/terraform/tf-environments/log_alerts.tf
Attachments:
image-20220627-110655.png (image/png)
flb_pipeline.png (image/png)
image-20220627-115840.png (image/png)