Tasks | Resources/PR | Tickets |
---|---|---|
Initial Presentation Deck | https://docs.google.com/presentation/d/1xTFg2_U1yAT5pdAUMl7F4APb-zjwmgJXpUxNMl5S24w/edit?usp=sharing | SM-1424 - PoC of monitoring Resolved |
PoC on Monitoring - Google Cloud | See subtasks in user stories SM-1424 - PoC of monitoring Resolved | SM-1424 - PoC of monitoring Resolved |
PoC on Google Managed Prometheus | See Ticket SM-2036 - Implement dashboards Cancelled for linked PRs and commits |
SM-1425
-
Implement monitoring to dev env base on PoC
Cancelled
|
Decision:
GMP may not be matured enough to support our needs in monitoring at this time July 14, 2022.
See issue raised concerning the prometheus-engine and the answers from the google dev https://github.com/GoogleCloudPlatform/prometheus-engine/issues/278
Known Issues when using Google Managed Prometheus frontend prometheus-engine
The /alerts endpoint is not implemented, by design.
When using a regular Prometheus server, your scrape configs, data, queries, and alerts are all colocated on the same server. However, we've decoupled all these in GMP - scrape configs live on the managed collectors, data lives in Monarch and queries are executed against Monarch, and rules live on the rule-evaluator components that live within your clusters (which executes queries against Monarch and writes back results).
The GMP data source that Grafana uses is hooked up to Monarch so that it can execute queries against the global data store. However, as rules are not installed into Monarch, the /alerts endpoint wouldn't have any data even if implemented.
You can get alerts by running kubectl describe Rules/ClusterRules/GlobalRules in each namespace where you have them installed.
I agree that a global status page showing what rules you have and their current state would be useful - we'll chew this over.
Re: /rules, the instance of Grafana that runs queries against Monarch is not hooked up to anything that runs rules. There is no way to get rules into Grafana right now.
Re: the 400, Can you hover over the pink ! and tell me what the error returned by Grafana is?
Re: the Grafana error "1m0", that's a known issue and we're working on it: #247
Re: /rules, that page simply doesn't work in a Grafana instance that is hooked up to Monarch. Given the architecture of GMP, it's unlikely to ever work. We will have to come up with a replacement way to see what rules are installed and what the status of each is, likely within Google Cloud Console.
Re: /query_exemplars, we are working on exemplars this half, so that will be functional soon enough.
Decision Date: July 14, 2022
- At this point, we have decided to use the self managed (oss) Prometheus along with Prometheus Operator, Grafana (visualization) and Thanos (for managing persistent data) as our monitoring tool of choice for our Infrastructure.
Page Title | Decisions |
---|---|
Customer profile view - Consent Management |
|
Google Cloud Monitoring - POC |
|