Overview

Metrics provide a near real-time stream of data, informing operators and stakeholders about the functions the system is performing as well as its health. They provide insight into the performance and behavior of a system. Metrics are quantitative measurements of a system's state and can be used to track various aspects of a system, such as resource usage, response times, error rates, and more.

Collect metrics from where

In a distributed system, metrics can be collected from multiple sources, including servers, applications, and services. Here are some major sources for us to collect metrics:

  1. Micro-services

  2. Kafka

  3. Database

  4. ThoughtMachine

  5. Temporal

  6. Mobile App

  7. GCP resources

  8. Infra resources

Tech Stack

The major metrics tech stack is based on Prometheus . For more details please refer to Self-managed Monitoring Stack