SaFi Bank Space : Self-managed Monitoring Stack


Task

Design and Deploy a Monitoring Solution that will enable us to:

  1. Choose a solution that is cloud agnostic and at the same time offers a wide range of support to tools we’re using.

  2. Have a view of multi cluster metrics which includes (but not limited to) Kubernetes state metrics, Workloads, Node Metrics, Container scanning metrics, argocd metrics etc.

  3. Design a Unified Monitoring through a “single pane of glass” with the ability to filter and view each environments (dev, stage, production), multiple clusters and services outside of the GKE/GCP ecosystem.

  4. Create a repeatable/reusable monitoring architecture which can easily be redeployed in any new environments.

  5. Create, design and take advantage of tools/software that has the ability to construct a highly available monitoring stack.


Monitoring Stack Diagram

Below is a high level monitoring stack diagram (Monitoring-Stack-001) which illustrates current folder and projects and where the different parts of the monitoring stack is/will be deployed.

Now zooming in to this diagram Monitoring-Stack-002 which aims to illustrate how Grafana uses Thanos in querying metrics from different Clusters' Prometheuses.

Finally, this diagram Monitoring-Stack-003 will try to provide a bigger picture on how Thanos will manage the different metrics sources (stores), send data to object storage while maintaining high-availability.


Kubernetes Prometheus Stack

This is deployed using the helm chart packaged by Bitnami which deploys the following:

Component

Purpose

Kind

Prometheus Operator

The main purpose of this operator is to simplify and automate the configuration and management of the Prometheus monitoring stack running on a Kubernetes cluster. Essentially it is a custom controller that monitors the new object types introduced through the following CRDs:

  • Prometheus: defines the desired Prometheus deployments as a StatefulSet

  • Alertmanager: defines a desired Alertmanager deployment

  • ServiceMonitor: declaratively specifies how groups of Kubernetes services should be monitored

  • PodMonitor: declaratively specifies how groups of pods should be monitored

  • Probe: declaratively specifies how groups of ingresses or static targets should be monitored

  • PrometheusRule: defines a desired set of Prometheus alerting and/or recording rules

  • AlertmanagerConfig: declaratively specifies subsections of the Alertmanager configuration

Deployment

Prometheus

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. Prometheus collects and stores its metrics as time series data, i.e. metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels.

Prometheus which deploys a StatefulSet
Related Documentation

https://prometheus.io/docs/introduction/overview/

https://docs.openshift.com/container-platform/4.8/rest_api/monitoring_apis/prometheus-monitoring-coreos-com-v1.html

Alertmanager

The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or OpsGenie. It also takes care of silencing and inhibition of alerts.

Currently Disabled and not deployed in our Monitoring Stack

Node Exporters

Node exporter is an official Prometheus exporter for capturing all the Linux system-related metrics.

It collects all the hardware and Operating System level metrics that are exposed by the kernel.

Daemonset

Kube State Metrics

kube-state-metrics (KSM) is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects. It is not focused on the health of the individual Kubernetes components, but rather on the health of the various objects inside, such as deployments, nodes and pods.

Deployment

https://github.com/kubernetes/kube-state-metrics

Service Monitors

The Prometheus Operator includes a Custom Resource Definition that allows the definition of the ServiceMonitor. The ServiceMonitor is used to define an application you wish to scrape metrics from within Kubernetes, the controller will action the ServiceMonitors we define and automatically build the required Prometheus configuration.

Within the ServiceMonitor we specify the Kubernetes Labels that the Operator can use to identify the Kubernetes Service which in turn then identifies the Pods, that we wish to monitor.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor

See ArgoCD ApplicationSet yaml definition here and the actual sample ArgoCD Application here deployed in monitoring dev cluster.


MultiCluster Monitoring with Thanos

What is Thanos and why we chose it?

The Thanos Project turns Prometheus into a highly available metrics platform with unlimited metrics storage. This article about Thanos is a great and easy read to better understand the limitations of Kube Prometheus and how Thanos aims to solve that limitation/s.

The three key features of Thanos, are as follows:

  1. Global query view of metrics.

  2. Unlimited retention of metrics.

  3. High availability of components, including Prometheus.

Components

Following the KISS and Unix philosophies, Thanos is made of a set of components with each filling a specific role.

  • Sidecar: connects to Prometheus, reads its data for query and/or uploads it to cloud storage.

  • Store Gateway: serves metrics inside of a cloud storage bucket.

  • Compactor: compacts, downsamples and applies retention on the data stored in cloud storage bucket.

  • Receiver: receives data from Prometheus’s remote-write WAL, exposes it and/or upload it to cloud storage.

  • Ruler/Rule: evaluates recording and alerting rules against data in Thanos for exposition and/or upload.

  • Querier/Query: implements Prometheus’s v1 API to aggregate data from the underlying components.

  • Query Frontend: implements Prometheus’s v1 API proxies it to Query while caching the response and optional splitting by queries day.

Deployment with Sidecar:

More info can be found from Thanos official documentation.


Grafana

Grafana is an open source solution for running data analytics, pulling up metrics that make sense of the massive amount of data & to monitor our apps with the help of cool customizable dashboards.

Basically, we chose Grafana for creating and visualizing dashboards from metrics we pulled from GKE clusters and other sources through Thanos which communicates via the Thanos sidecar that runs alongside Prometheus.

Below is a snapshot from our Grafana Dev Kubernetes Global Dashboard.


Monitoring as a Code

How each of the above components are currently deployed via Argo CD and Terraform

Monitoring GKE cluster was deployed first as this is the cluster where Thanos and Grafana will be hosted. Each of our environments (dev, stage, production) will have a dedicated Monitoring GKE cluster and all of the GKE clusters in all environments will have Prometheus (with Thanos sidecar) installed. (all in the monitoring namespace)

Pre-requisites:

  1. Add the Project and APIs required here

  2. Monitoring CIDR Network was added here for dev env.

  3. Monitoring GKE resource is then added and created through Terraform here.

Continuous Deployment

Prometheus with Thanos sidecar

Prometheus is deployed as Helm Chart via ArgoCD ApplicationSet - See the yaml definition here

If you inspect the yaml file, you will notice that each cluster as defined in the list generator, has the following:

externalLabels - Prometheus allows the configuration of “external labels” of a given Prometheus instance. These are meant to globally identify the role of that instance. As Thanos aims to aggregate data across all instances, providing a consistent set of external labels becomes crucial!

thanos.create - Set to true to deploy Thanos sidecar along with Prometheus

Some of the features such as alertmanager, blackboxexporter and coreDns has been set to enabled=False

As this Prometheus is deployed in the same namespace and same cluster as Thanos, the Service Type is left to default which is ClusterIP

For Prometheus in other Clusters in the same SharedVPC, a service type LoadBalancer with an annotation of networking.gke.io/load-balancer-type: "Internal" has been added to allow the Query Layer of Thanos query the Store APIs exposed by the Thanos side car.

Thanos

Thanos and Grafana are deployed via a Kustomize overlay using public Helm Charts as sources via ArgoCD Application.

SaFiMono/devops/argocd/environments/dev/monitoring

├── base
│   ├── grafana-dashboards.yaml
│   ├── grafana.yaml
│   └── thanos.yaml
├── grafana
│   ├── Chart.yaml
│   └── values.yaml
├── kustomization.yaml
└── thanos
    ├── Chart.yaml
    └── values.yaml

3 directories, 8 files

If we inspect this file, SaFiMono/devops/argocd/environments/dev/monitoring/thanos/values.yaml, we will see the following:

thanos:
  objstoreConfig: |-
    type: GCS
    config:
      bucket: safi-thanos-dev
  querier:
    stores:
      # safi-cicd
      - 172.19.0.223:10901
      # safi-dev-apps
      - 172.16.47.237:10901
      # safi-dev-tyk
      - 172.16.96.59:10901
      # safi-dev-hcv
      - 172.16.64.13:10901
      # safi-dev-monitoring
      - kube-prometheus-prometheus-thanos.monitoring.svc:10901
  bucketweb:
    enabled: true
    serviceAccount:
      annotations:
        iam.gke.io/gcp-service-account: safi-thanos-gcs-dev@safi-env-dev-monitoring.iam.gserviceaccount.com
  compactor:
    enabled: true
    serviceAccount:
      annotations:
        iam.gke.io/gcp-service-account: safi-thanos-gcs-dev@safi-env-dev-monitoring.iam.gserviceaccount.com
  storegateway:
    enabled: true
    serviceAccount:
      annotations:
        iam.gke.io/gcp-service-account: safi-thanos-gcs-dev@safi-env-dev-monitoring.iam.gserviceaccount.com
  ruler:
    enabled: true
    alertmanagers:
      - http://prometheus-operator-alertmanager.monitoring.svc.cluster.local:9093
    config: |-
      groups:
        - name: "metamonitoring"
          rules:
            - alert: "PrometheusDown"
              expr: absent(up{prometheus="monitoring/prometheus-operator"})

As seen in the values.yaml above, we can see that:

  1. We are using Google Cloud Storage as our Object Storage for storing metrics data sent by the Store Gateway. The bucket and its config is deployed via this terraform code.

  2. These are the services exposed via the Internal GKE LB from prometheus-thanos sidecar.

  3. The Service accounts indicated in the values.yaml above are also created by Terraform via this tf file.

Grafana with kiwigrid sidecar

Grafana is deployed using Grafana Helm Chart with the following values.yaml.

If we inspect the values.yaml we will see that:

Datasource is pointed to the Thanos Query Frontend

Ingress is enabled via Traefik and Cert Manager for TLS

Kiwigrid sidecar container is enabled. This allows us to search for any configmap in the monitoring namespace with the label grafana_dashboard=true and automatically mount the json from the same configmap in Grafana as Dashboard.

Grafana Dashboards

The Grafana Dashboards is a custom helm chart created for the purpose of automating the uploading of Grafana Dashboards as Configmap.

How it works
  1. This private helm chart in our Safi Chart Museum is installed in our Monitoring Cluster in the monitoring namespace where Grafana is also deployed.

  2. Grafana-Dashboards chart is deployed via this Application on ArgoCD

  3. New dashboards in json format can be added in this dashboards folder

Make sure to update the Chart in Chart.yaml version so ArgoCD will be able to pick it up and continuously deploy it in the target cluster as configmap.

References

Tickets

Link

Epic

SM-1032 - Getting issue details... STATUS

Story

SM-2460 - Getting issue details... STATUS

Task

SM-2466 - Getting issue details... STATUS