eCore Banking - handover part 1 - notes
Part 1: November 24, 2022
Recordings: https://advancegroup.larksuite.com/drive/folder/fldushkYIDbPSadK5zhUgz6bHvc?from=space_persnoal_filelist
Attendees:
Lucky La Torre (Unlicensed) BharathKumar D Gnanasekaran Gajendiran Fol Justin Lacsina (Unlicensed) Regin Villamor (Unlicensed) Joebert Jacaba (Unlicensed) Pavol Antalík (Unlicensed) Peter Kmec (Unlicensed) Peter Luknár (Unlicensed)
Thought Machine in Sandbox Environments
What are the main differences between tm 3, 4, 5, 6 (in terms of how they were installed)
Nothing. No differences. All TM Sandboxes followed this Thought Machine for SandBox TM-3 (Manual Installation)
Yes all in sandbox
Is TM SB 5 and TM SB 6 now pointed to Hashicorp Vault Enterprise or CICD Vault or its own hashicorp vault installed in the same tm gke cluster?
Answer: It is its own hashicorp vault installed in same GKE using helm chart
helm repo add hashicorp https://helm.releases.hashicorp.com
The Kafka being used by TM 5&6 right now is its own locally (on its own gke cluster) helm installed kafka
using the tm vault installer
Available Documentations
Thought Machine for SandBox TM-3 (Manual Installation)
Is this the same installation process used for installing TM Sandbox 5 and 6?
Answer: Yes
Are we involved with smart contracts in TM - not actively but see this for more info Deploying and testing TM resources
Vendor Documentation
Theres a lark group (TM Working Group) and for support see Thought Machine - Service Procedure Framework
https://docs.thoughtmachine.net/
How are you accessing this? It seems like we need their okta ? Pavol Antalík (Unlicensed)
According to TM Support, we have an internal doc that should be included in the with the vault installation. Can you share the URL?
Message From TM Support, see below
Alternatively, we will encourage that you should use the internal documentation URL that is shipped with the Vault release. This will have the version of documentation that specifically matches the version that you have installed. The URL is defined in the documentation.endpoint within the values.yaml.
How to access TM
For Dev
login credentials
Installation of tm vault
we are still using dev mode and not HA
for sandbox we are using the dev mode
one pod for every tm service component
for env (w/c also includes dev) we are using ha mode (replicas >=2)
TM Sandbox 3 -
safi-sandbox-tm3
Being utilized by which env?
dev still uses for some integration testing
What is the version - v3.3.1?
yes same for all sandboxes
Does the components differs from other sandbox tms?
No. It is all the same for all sandboxes
TM4 Sandbox 4 -
safi-sandbox-tm4
Being utilized by which env?
epfs stage
What is the version - v3.3.1?
same version across all sandboxes
TM Sandbox 5 -
safi-sandbox-tm5
Being utilized by which env?
brave
What is the version - v3.3.1?
same for all sb
TM Sandbox 6 -
safi-sandbox-tm6
Being utilized by which env?
this is being prepped for new stage
What is the version - v3.3.1?
same version
TM Vault Stack - Components
TM Vault vs Hashicorp Vault vs CICD Vault
💡TM VAULT IS NOT THE SAME AS HASHICORP VAULT, TM VAULT is the Thought Machine platform itself.
TM, they call their platform Vault, DO NOT BE CONFUSED. 🫣
TM Vault uses its own Hashicorp Vault
it has its own hashicorp vault running in the same GKE cluster. (opensource)
hcv is using google cloud storage bucket for the secrets
hcv is also using another gcs bucket for backups
there is a script? cronjob?
one component of tm uses a persistent vol (vault-documents-webserver)
Why are not utilizing the Hashicorp Vault Enterprise in any of the TM Vault sandbox clusters?
Are there any compatibility issues?
we’re not using the enterprise for hcv - we only purchased hcv enterprise for support from hashicorp
compatibility - we are using hashicorp vault version 1.6 - because this is a requirement of tm 3.3.1 (for all sandboxes 3-6)
what do we store here?
all authentication related entries are stored
TM Vault still in Sandbox
Why are we still in sandbox environment?
because the sb installation is where we can easily install and test
Because we are upgrading versions 3.3.1 to 4.4.1?
TM Vault and its Kafka
Which Kafka are we using in each of the sandbox
we are using internal kafka (internal to tm vault)
installation of the kafka itself is included in the vault installer as a parameter
kafka-init
creates 5600 acls
so far no issues on internal kafka in sandbox
but ITS NOT SUPPORTED by TM so its not recommended to use the internal kafka
the internal kafka uses persistent volumes in GKE (for kafka and zookeeper)
storage class - persistent disk
What are the plans once we launch it in a non sandbox environment
Are we going to use the confluent cloud kafka for TM Vault
in dev we are testing external kafka (in context, a kafka that is not included in the tm vault installer)
TM Vault and Monitoring
Please include in the doc - how we can enable the observability package Pavol Antalík (Unlicensed)
Are we using Google Managed Monitoring?
No, because TM vault comes with its own observability package
TM vault comes with its own observability package
comes with its own image of grafana and thanos
its a package that can be installed with vault installer
which includes tm internal workings metrics and kubernetes metrics (gke)
the thanos service that is included can be added as a new datasource in the existing grafana
Does tm provide a prometheus metrics?
Maybe if we try looking and unpack their observability package. The easier route is to use their observability package. Or contact TM support for more recommendations.
We can have our own grafana and use the TM Thanos as another datasource perhaps.
For persistent volumes
Thanos and Prometheus creates PVC - see the sandboxes gke for more details.
Can we implement the same monitoring stack we have with other GKE clusters in TM Vault GKE?
This is possible by adding Thanos as another Datasource in each Grafana environments but has not been tested in our TM Vault sandboxes and even the TM vault we are using in Dev env.
Tempo
We may need to check what are the available monitoring tools included in the TM Observability Package as it may include some tracing already.
Pavol Antalík (Unlicensed) in the document, can you add the credentials and urls for the Grafana running in Sandbox 5, Sandbox 6 and the one in the Dev please.
TM vault and tracing
not installed
We may need to check what are the available monitoring/tracing tools included in the TM Observability Package as it may include some tracing already.
TM Vault and Logging
We are using Google Cloud Logging, GKE Google Cloud Logging is enabled in sandboxes and Dev.
Can we use Promtail in the same cluster?
will there be any issues if implement the same promtail like the rest of other GKE clusters?
We havent really tested but most like we can use promtail in TM as they are GKE clusters as well
TM Vault and CICD Pipeline
If we need to upgrade/update any images in the TM GKE Cluster, will there be any issues if we integrate it with the existing argocd cluster.
if we need to update the container images, there is a gha that should get all the images from TM registry to Safi Repos
https://github.com/SafiBank/SaFiMono/blob/main/.github/workflows/mirror-vault-4.4.1-tm-images.yml
But this GHA is pulling all the images from TM private container registry and mirroring to our Container Registry.
There is no argocd implementation in place yet for updating individual TM Vault container images.
We havent done any single image update on any tm component nor major or version upgrades as everything so far is a clean install
No automated way of installing TM so far
we have asked support if theres a helm charts but tbh its not yet automation friendly
as the vault installs and deploys hundreds of unique pods
TM Vault and Cert Manager
We are still utilizing lets encrypt and we are not encountering the rate limiting issues so far here. We havent utilized zerossl here but we dont see any reason why we cant in the future.
TM Vault and Ingress Controller
Why cant we use traefik? Why are we using nginx-ingress
for sandboxes were using nginx as this was the default before
for dev we are going to use traefik
TM and ISTIO
its a not a required component but it is enabled and installed in all TM Sandboxes and the TM Dev.
in all sandboxes istio comes in the vault installer as an additional parameter
functionality wise, we didnt do any manual config on the istio that came with the installer, everything is default
TM and Database
Cloudsql postgres 13
It's configurable but all sandboxes is using
postgres
default user of pg dbWe havent tried a different custom db user so far and there may or there may not be some issues that we may encounter, so better to get in touch with TM Support for assistance.
Prior from the installation, we should create the db instance first.
TM Version 4.4.1 testing
we are testing this version of TM in the dev env
project is safi-env-dev-tms
It is installed in a gke cluster
We are testing an external kafka for this installation (kafka that is not part of the tm vault installer, regardless if its a managed or self managed kafka cluster)
testing if the configuration are all correct
doc not yet complete - but there are starter docs for tm 4 k8s operator
we are testing installation using k8s operator
Pavol Antalík (Unlicensed) If the installation process is different for this TM (not the same installation process as what we did in the sandbox TMs) can you document the process that you followed as there are couple of manual steps as well?
Hashicorp vault is installed in a different gke cluster
The version is 1.9
This TM Vault 4.4.1 is compatible to use a Hashicorp Vault version 1.9 and can use upto secret storage kv version 2
What are the open issues with the upgrade
when we issue the install command with kafka as parameter, vault installer finishes but not all topics are created (operator?)
are we testing confluent cloud kafka here?
Not yet
Traefik installation on TM with observability package enabled
Are there any issues Pavol Antalík (Unlicensed) , kindly document under the 4.4.1 installation doc.
CURRENTLY, its still internal kafka
fyi
issues on kafka authentication methods and TM authentication methods
there are authentication methods being tested (for kafka authentication)
TM uses plain or plaintext
sasl_ssl - confluent operator uses this
oauth - this is the target mode (supported for cc kafka and tm)
in order to use oauth from the confluent cloud we have to go thru okta oidc, we need to recreate the Confluent Cloud cluster for it to be a dedicated confluent cloud kafka cluster which is already done.
types of cluster
basic (which we are using) only provides plain auth method
for oauth to work, we switched to dedicated confluent cloud kafka cluster
recreating of the cluster is the only way and we already did that (done) (Peter Kmec (Unlicensed) can you confirm that there are no more issues related to this new kafka dedicated cluster?
Hashicorp vault
Existing doc - Hashicorp Vault
In tms for all sandboxes, all hashicorp vault instances are deployed in the same gke cluster manually using helm
Sandboxes - we use pvc for storing secrets backend
For Dev TM - we use gcs bucket for storing secrets backend
There is a cronjob running to back up the contents of this bucket to another bucket - see argocd/environments/dev/hcv/vault/templates/cronjob.yaml
Dockerfile is available in docker/hashicorp_vault/vault_gcs_backup
any manual config?
initialization of hashicorp vault
where are the keys now, Pavol Antalík (Unlicensed) will share these for all hcv that are deployed for the dev and sandboxes
CICD vault is installed and setup by Aleksandr Kanaev (Unlicensed)
Environment Vaults - argocd/environments/brave/hcv/vault/values.yaml
Why are we running vault instances in each environment? (running on its own GKE cluster, one for each env (brave, stage, new stage etc)
Reason is for developers to integrate directly with vault in the environment (micronaut)
Request made by Bogdan Popa (Unlicensed) and Adam Brček (Unlicensed) to Pavol Antalík (Unlicensed) before but we are not sure if there are any active development to utilise these separate Vault running on their own GKE clusters. (or what are the next plans on these)
Initial plan is for the Applications and for thought machine specific for that environment use these HCV and not the CICD Vault.
Future plans on multiple vaults?
each envs should have its own one HCV to be used by Apps and TM
one for cicd and one per each env (total 4 if env is eq to dev, staging, prod)
Outstanding issues with TM in Dev using version 4.4.1 - Please confirm Pavol Antalík (Unlicensed)
Both internal and external kafka has issues right now?
Internal - statefulset pods are restarting? Do you know whats the reason?
External kafka - do we still have any issues here that we know or tested since the Confluent Cloud Kafka is now running in a dedicated cluster?
Peter Kmec (Unlicensed) Are we using the HCV Enterprise license in any of the HCV cluster? Which one? All? Did we enable any enterprise feature in any of the HCV?
Not yet.
3 stages in setting up the HCV
setup infra - gke and deploy hcv using argo - all in code see below
terraform/tf-env-hcvault-infra
argocd/environments/brave/hcv
this needs to be done manually to generate the tokens.
There is a tf code for configuring the roles and policies
Is this still accurate or already obsolete Pavol Antalík (Unlicensed) devops/terraform/tf-env-hcvault-config/vault_tfc_policy.tf
which policy is being used for devops/terraform/tf-env-hcvault-config/auth_kubernetes_tms.tf Pavol Antalík (Unlicensed)
Tokens handover
Root tokens for all the hashicorp vaults
where can we find this
Pavol Antalík (Unlicensed) to share these