eCore Banking - handover part 1 - notes

Part 1: November 24, 2022

Recordings: https://advancegroup.larksuite.com/drive/folder/fldushkYIDbPSadK5zhUgz6bHvc?from=space_persnoal_filelist

Attendees:

Lucky La Torre (Unlicensed) BharathKumar D Gnanasekaran Gajendiran Fol Justin Lacsina (Unlicensed) Regin Villamor (Unlicensed) Joebert Jacaba (Unlicensed) Pavol Antalík (Unlicensed) Peter Kmec (Unlicensed) Peter Luknár (Unlicensed)

Thought Machine in Sandbox Environments

  • What are the main differences between tm 3, 4, 5, 6 (in terms of how they were installed)

    • Nothing. No differences. All TM Sandboxes followed this Thought Machine for SandBox TM-3 (Manual Installation)

    • Yes all in sandbox

    • Is TM SB 5 and TM SB 6 now pointed to Hashicorp Vault Enterprise or CICD Vault or its own hashicorp vault installed in the same tm gke cluster?

      • Answer: It is its own hashicorp vault installed in same GKE using helm chart helm repo add hashicorp https://helm.releases.hashicorp.com

    • The Kafka being used by TM 5&6 right now is its own locally (on its own gke cluster) helm installed kafka

      • using the tm vault installer

    • Available Documentations

    • Installation of tm vault

      • we are still using dev mode and not HA

      • for sandbox we are using the dev mode

        • one pod for every tm service component

      • for env (w/c also includes dev) we are using ha mode (replicas >=2)

    • TM Sandbox 3 - safi-sandbox-tm3

      • Being utilized by which env?

        • dev still uses for some integration testing

      • What is the version - v3.3.1?

        • yes same for all sandboxes

      • Does the components differs from other sandbox tms?

        • No. It is all the same for all sandboxes

    • TM4 Sandbox 4 - safi-sandbox-tm4

      • Being utilized by which env?

        • epfs stage

      • What is the version - v3.3.1?

        • same version across all sandboxes

    • TM Sandbox 5 - safi-sandbox-tm5

      • Being utilized by which env?

        • brave

      • What is the version - v3.3.1?

        • same for all sb

    • TM Sandbox 6 - safi-sandbox-tm6

      • Being utilized by which env?

        • this is being prepped for new stage

      • What is the version - v3.3.1?

        • same version

    • TM Vault Stack - Components

      • TM Vault vs Hashicorp Vault vs CICD Vault

      • 💡TM VAULT IS NOT THE SAME AS HASHICORP VAULT, TM VAULT is the Thought Machine platform itself.

        • TM, they call their platform Vault, DO NOT BE CONFUSED. 🫣

        • TM Vault uses its own Hashicorp Vault

          • it has its own hashicorp vault running in the same GKE cluster. (opensource)

        • hcv is using google cloud storage bucket for the secrets

        • hcv is also using another gcs bucket for backups

        • there is a script? cronjob?

        • one component of tm uses a persistent vol (vault-documents-webserver)

        • Why are not utilizing the Hashicorp Vault Enterprise in any of the TM Vault sandbox clusters?

          • Are there any compatibility issues?

          • we’re not using the enterprise for hcv - we only purchased hcv enterprise for support from hashicorp

        • compatibility - we are using hashicorp vault version 1.6 - because this is a requirement of tm 3.3.1 (for all sandboxes 3-6)

        • what do we store here?

          • all authentication related entries are stored

      • TM Vault still in Sandbox

        • Why are we still in sandbox environment?

          • because the sb installation is where we can easily install and test

          • Because we are upgrading versions 3.3.1 to 4.4.1?

      • TM Vault and its Kafka

        • Which Kafka are we using in each of the sandbox

          • we are using internal kafka (internal to tm vault)

            • installation of the kafka itself is included in the vault installer as a parameter

          • kafka-init

            • creates 5600 acls

          • so far no issues on internal kafka in sandbox

            • but ITS NOT SUPPORTED by TM so its not recommended to use the internal kafka

          • the internal kafka uses persistent volumes in GKE (for kafka and zookeeper)

            • storage class - persistent disk

        • What are the plans once we launch it in a non sandbox environment

          • Are we going to use the confluent cloud kafka for TM Vault

            • in dev we are testing external kafka (in context, a kafka that is not included in the tm vault installer)

      • TM Vault and Monitoring

        • Please include in the doc - how we can enable the observability package (plus) Pavol Antalík (Unlicensed)

        • Are we using Google Managed Monitoring?

          • No, because TM vault comes with its own observability package

        • TM vault comes with its own observability package

          • comes with its own image of grafana and thanos

          • its a package that can be installed with vault installer

          • which includes tm internal workings metrics and kubernetes metrics (gke)

            • the thanos service that is included can be added as a new datasource in the existing grafana

            • Does tm provide a prometheus metrics?

            • Maybe if we try looking and unpack their observability package. The easier route is to use their observability package. Or contact TM support for more recommendations.

          • We can have our own grafana and use the TM Thanos as another datasource perhaps.

          • For persistent volumes

            • Thanos and Prometheus creates PVC - see the sandboxes gke for more details.

        • Can we implement the same monitoring stack we have with other GKE clusters in TM Vault GKE?

          • This is possible by adding Thanos as another Datasource in each Grafana environments but has not been tested in our TM Vault sandboxes and even the TM vault we are using in Dev env.

          • Tempo

            • We may need to check what are the available monitoring tools included in the TM Observability Package as it may include some tracing already.

            • Pavol Antalík (Unlicensed) in the document, can you add the credentials and urls for the Grafana running in Sandbox 5, Sandbox 6 and the one in the Dev please.

      • TM vault and tracing

        • not installed

        • We may need to check what are the available monitoring/tracing tools included in the TM Observability Package as it may include some tracing already.

      • TM Vault and Logging

        • We are using Google Cloud Logging, GKE Google Cloud Logging is enabled in sandboxes and Dev.

        • Can we use Promtail in the same cluster?

          • will there be any issues if implement the same promtail like the rest of other GKE clusters?

            • We havent really tested but most like we can use promtail in TM as they are GKE clusters as well

      • TM Vault and CICD Pipeline

        • If we need to upgrade/update any images in the TM GKE Cluster, will there be any issues if we integrate it with the existing argocd cluster.

          • if we need to update the container images, there is a gha that should get all the images from TM registry to Safi Repos

          • No automated way of installing TM so far

            • we have asked support if theres a helm charts but tbh its not yet automation friendly

            • as the vault installs and deploys hundreds of unique pods

      • TM Vault and Cert Manager

        • We are still utilizing lets encrypt and we are not encountering the rate limiting issues so far here. We havent utilized zerossl here but we dont see any reason why we cant in the future.

      • TM Vault and Ingress Controller

        • Why cant we use traefik? Why are we using nginx-ingress

          • for sandboxes were using nginx as this was the default before

          • for dev we are going to use traefik

      • TM and ISTIO

        • its a not a required component but it is enabled and installed in all TM Sandboxes and the TM Dev.

        • in all sandboxes istio comes in the vault installer as an additional parameter

        • functionality wise, we didnt do any manual config on the istio that came with the installer, everything is default

      • TM and Database

        • Cloudsql postgres 13

          • It's configurable but all sandboxes is using postgres default user of pg db

          • We havent tried a different custom db user so far and there may or there may not be some issues that we may encounter, so better to get in touch with TM Support for assistance.

          • Prior from the installation, we should create the db instance first.

TM Version 4.4.1 testing

  • we are testing this version of TM in the dev env

    • project is safi-env-dev-tms

      • It is installed in a gke cluster

      • We are testing an external kafka for this installation (kafka that is not part of the tm vault installer, regardless if its a managed or self managed kafka cluster)

      • testing if the configuration are all correct

      • http://ops.tms.dev.safibank.online/

      • doc not yet complete - but there are starter docs for tm 4 k8s operator

    • we are testing installation using k8s operator

      • Pavol Antalík (Unlicensed) If the installation process is different for this TM (not the same installation process as what we did in the sandbox TMs) can you document the process that you followed as there are couple of manual steps as well?

    • Hashicorp vault is installed in a different gke cluster

      • The version is 1.9

      • This TM Vault 4.4.1 is compatible to use a Hashicorp Vault version 1.9 and can use upto secret storage kv version 2

    • What are the open issues with the upgrade

      • when we issue the install command with kafka as parameter, vault installer finishes but not all topics are created (operator?)

      • are we testing confluent cloud kafka here?

        • Not yet

    • Traefik installation on TM with observability package enabled

    • CURRENTLY, its still internal kafka

      • fyi

      • issues on kafka authentication methods and TM authentication methods

      • there are authentication methods being tested (for kafka authentication)

        • TM uses plain or plaintext

        • sasl_ssl - confluent operator uses this

        • oauth - this is the target mode (supported for cc kafka and tm)

      • in order to use oauth from the confluent cloud we have to go thru okta oidc, we need to recreate the Confluent Cloud cluster for it to be a dedicated confluent cloud kafka cluster which is already done.

        • types of cluster

          • basic (which we are using) only provides plain auth method

          • for oauth to work, we switched to dedicated confluent cloud kafka cluster

            • recreating of the cluster is the only way and we already did that (done) (Peter Kmec (Unlicensed) can you confirm that there are no more issues related to this new kafka dedicated cluster?

Hashicorp vault

  • Existing doc - Hashicorp Vault

    • In tms for all sandboxes, all hashicorp vault instances are deployed in the same gke cluster manually using helm

    • Sandboxes - we use pvc for storing secrets backend

    • For Dev TM - we use gcs bucket for storing secrets backend

      • There is a cronjob running to back up the contents of this bucket to another bucket - see argocd/environments/dev/hcv/vault/templates/cronjob.yaml

      • Dockerfile is available in docker/hashicorp_vault/vault_gcs_backup

    • any manual config?

    • CICD vault is installed and setup by Aleksandr Kanaev (Unlicensed)

    • Environment Vaults - argocd/environments/brave/hcv/vault/values.yaml

      • Why are we running vault instances in each environment? (running on its own GKE cluster, one for each env (brave, stage, new stage etc)

      • Reason is for developers to integrate directly with vault in the environment (micronaut)

      • Initial plan is for the Applications and for thought machine specific for that environment use these HCV and not the CICD Vault.

    • Future plans on multiple vaults?

      • each envs should have its own one HCV to be used by Apps and TM

      • one for cicd and one per each env (total 4 if env is eq to dev, staging, prod)

    • Outstanding issues with TM in Dev using version 4.4.1 - Please confirm Pavol Antalík (Unlicensed)

      • Both internal and external kafka has issues right now?

        • Internal - statefulset pods are restarting? Do you know whats the reason?

      • External kafka - do we still have any issues here that we know or tested since the Confluent Cloud Kafka is now running in a dedicated cluster?

    • Peter Kmec (Unlicensed) Are we using the HCV Enterprise license in any of the HCV cluster? Which one? All? Did we enable any enterprise feature in any of the HCV?

      • Not yet.

3 stages in setting up the HCV

Tokens handover

  • Root tokens for all the hashicorp vaults