SaFi Bank Space : Google Cloud Monitoring - POC

Tasks

Resources/PR

Tickets

Initial Presentation Deck

https://docs.google.com/presentation/d/1xTFg2_U1yAT5pdAUMl7F4APb-zjwmgJXpUxNMl5S24w/edit?usp=sharing

SM-1424 - PoC of monitoring Resolved

PoC on Monitoring - Google Cloud

See subtasks in user stories SM-1424 - PoC of monitoring Resolved

SM-1424 - PoC of monitoring Resolved

PoC on Google Managed Prometheus

See Ticket SM-2036 - Implement dashboards Cancelled for linked PRs and commits

SM-1425 - Implement monitoring to dev env base on PoC Cancelled
SM-2036 - Implement dashboards Cancelled

Decision:

GMP may not be matured enough to support our needs in monitoring at this time July 14, 2022.

See issue raised concerning the prometheus-engine and the answers from the google dev https://github.com/GoogleCloudPlatform/prometheus-engine/issues/278

Known Issues when using Google Managed Prometheus frontend prometheus-engine

The /alerts endpoint is not implemented, by design.

When using a regular Prometheus server, your scrape configs, data, queries, and alerts are all colocated on the same server. However, we've decoupled all these in GMP - scrape configs live on the managed collectors, data lives in Monarch and queries are executed against Monarch, and rules live on the rule-evaluator components that live within your clusters (which executes queries against Monarch and writes back results).

The GMP data source that Grafana uses is hooked up to Monarch so that it can execute queries against the global data store. However, as rules are not installed into Monarch, the /alerts endpoint wouldn't have any data even if implemented.

You can get alerts by running kubectl describe Rules/ClusterRules/GlobalRules in each namespace where you have them installed.

I agree that a global status page showing what rules you have and their current state would be useful - we'll chew this over.

Re: /rules, the instance of Grafana that runs queries against Monarch is not hooked up to anything that runs rules. There is no way to get rules into Grafana right now.

Re: the 400, Can you hover over the pink ! and tell me what the error returned by Grafana is?

Re: the Grafana error "1m0", that's a known issue and we're working on it: #247

Re: /rules, that page simply doesn't work in a Grafana instance that is hooked up to Monarch. Given the architecture of GMP, it's unlikely to ever work. We will have to come up with a replacement way to see what rules are installed and what the status of each is, likely within Google Cloud Console.

Re: /query_exemplars, we are working on exemplars this half, so that will be functional soon enough.

Decision Date: July 14, 2022

  • At this point, we have decided to use the self managed (oss) Prometheus along with Prometheus Operator, Grafana (visualization) and Thanos (for managing persistent data) as our monitoring tool of choice for our Infrastructure.
Page Title Decisions
Customer profile view - Consent Management
  • Just marketing campaigns?
  • Some predefined consents?
  • All available consents?
Google Cloud Monitoring - POC
  • At this point, we have decided to use the self managed (oss) Prometheus along with Prometheus Operator, Grafana (visualization) and Thanos (for managing persistent data) as our monitoring tool of choice for our Infrastructure.