SaFi Bank Space : Confluent Kafka Implementation

Organization / Environment management

Organization in counfluent.io is set-up manually.

All the environments are managed in dispatcher terraform project, where according list of defined environments, confluent environment is created, with its respective environment-owner role. Credentials for this role are propagated to respective tf-workspace environment exactly same way as for GCP Projects

Currently on dev environment is created (due to resource costs) and for stage, limiting conditions have to be removed

Org / Environment user & role management for devops users is automated and configured via https://github.com/SafiBank/SaFiMono/blob/main/devops/terraform/_files/users_devops.yaml#L363-L369

Environment user & role management for non-devops users is automated and configured via

https://github.com/SafiBank/SaFiMono/blob/5cefb3c0ea66ae113d7f1b720ceb2aa84c0a6cf2/devops/terraform/_files/users_confluent.yaml#L2 in users-org collection. This is for Org level in case some dev needs permission like Env Admin or similar

Please note these are roles and users on Org / Environment levels, not cluster levels

Developer access for particular cluster with read permission on all is handled here:

https://github.com/SafiBank/SaFiMono/blob/5cefb3c0ea66ae113d7f1b720ceb2aa84c0a6cf2/devops/terraform/_files/users_confluent.yaml#L12 in users-cluster collection

Overall procedure of adding new user involve manual task (users cant be created via TF yet):

  1. Let user change / add his permission with PR and assign to DevOps

  2. Take email from PR and create user account manually, without any permission

  3. Give review accept to let user know, invitation email was sent

  4. User accept invitation, after that user can merge the PR which will trigger TF run

Please note that if user merges PR sooner and Terraform run will trigger sooner then user accepts invitation, run will fail

Networking & Clusters

Networking & Clusters are setup up on each env’s particular tf project

In order to use dedicated clusters in our shared-vpc network, some specific settings had to be ensured.

For that, vpc peering with Confluent network is needed, and for encryption with CMK KMS we have to retrieve environment confluent group id, so that we can bind role for KMS usage. There is no documented way how to obtains this value via API, (still defined in old API which is converted to new format and is deprecated) For temporary, this value has to be retrieved from confluent console, after environment is created via add cluster wizard. Do not create cluster, just get to the point where wizard give you the group id value

After that kafka cluster, schema registry, and respective API and Cluster keys are created and propagated to vault.

Access to private confluent cluster

As clusters are set up as private, proxy was needed to be set up. Currently proxy is set up for Meiro connection only, all services have access via shared vpc, all devs have access via VPN. There is public access temporarily for Meiro for their devs. This has to be cancelled when not needed anymore. After that Meiro will use this proxy with private IP from their VPC. In order to use this proxy, its public ip is published in terraform outputs, and, local /etc/hosts have to be changed to point to proxy ip, like this:

34.143.151.255 pkc-7kqno.asia-southeast1.gcp.confluent.cloud
34.143.151.255 pkac-qzjr6.asia-southeast1.gcp.confluent.cloud
34.143.151.255 b0-pkc-7kqno.asia-southeast1.gcp.confluent.cloud
34.143.151.255 b1-pkc-7kqno.asia-southeast1.gcp.confluent.cloud
34.143.151.255 b2-pkc-7kqno.asia-southeast1.gcp.confluent.cloud
34.143.151.255 b3-pkc-7kqno.asia-southeast1.gcp.confluent.cloud
34.143.151.255 b4-pkc-7kqno.asia-southeast1.gcp.confluent.cloud
34.143.151.255 b5-pkc-7kqno.asia-southeast1.gcp.confluent.cloud

This proxy is needed only for access from outside of our gcp shared-vpc

Custom terraform provider

For manipulation with Schema registry and Schema API key creation, custom terraform provider is used. This provider is deployed in TFC private registry.

Source code with manual building and deployment are described in

https://github.com/SafiBank/safi-tf-providers/tree/main/terraform-provider-ccloud

This is temporary solution until this resources will be migrated to official Confluent terraform provider, or until proper automated way of building & deploying is set up (Github Actions)

Issue on official provider - https://github.com/confluentinc/terraform-provider-confluent/issues/146

After this issue is solved and new version of provider released, API keys can be recreated with new official resources and this can be deleted.

Usage of provider and its resource:

_main.tf:

terraform {
  required_version = ">= 1.1.2"

  required_providers {
safi-ccloud = {
      source  = "app.terraform.io/safi/ccloud"
      version = "0.0.8"
    }

_providers.tf:

## SaFi Confluent
provider "safi-ccloud" {
  cloud_api_key    = var.confluent_cloud_api_key
  cloud_api_secret = var.confluent_cloud_api_secret
}

(using same keys as official provider is ok )

example.tf:

resource "safi-ccloud_api_key" "registry_key" {
  logical_clusters = [
    confluent_schema_registry_cluster.package[0].id
  ]
  environment_id = data.terraform_remote_state.dispatcher.outputs.confluent_env[var.env_name].id
}

Confluent Operator

Is used to manage granular artifacts of Kafka clusters like connectors, topics, schemas and RBAC based on that. Current terraform provider integration is not on acceptable level thats way this approach was chosen as better fit.

Operator is installed via ArgoCD application:

https://github.com/SafiBank/SaFiMono/tree/main/devops/argocd/environments/brave/confluent-kafka/confluent-operator

Operator handles creation of Kafka artefacts in ConfluentCloud . For connection it uses Kafka Api and Schema Registry Api keys assigned to Operator-Manager service account created per env. For all possible configuration properties of all CRDs best way is to use

https://docs.confluent.io/operator/current/co-configure-overview.html#use-kubectl-to-examine-cp-crds

and list particular CRD on K8s

Topics & Schema management

Topics & schema generation is based on one source of truth configuration file:

https://github.com/SafiBank/SaFiMono/blob/main/common/schema/topicSchemasDefinitions.json

This file is symlinked altogether with /schemas folder containing schema definitions into

https://github.com/SafiBank/SaFiMono/tree/main/devops/argocd/environments/brave/confluent-kafka/topics

Which is helm chart deployed by ArgoCD. In each template is loop which is looping through config file and creates topics & schemas CRs appropriately.

For advanced topic configuration consult

https://docs.confluent.io/platform/current/installation/configuration/topic-configs.html#ak-topic-configurations-for-cp

Kafka Connect & Connectors

In similar fashion, Connect is deployed for instance:

https://github.com/SafiBank/SaFiMono/tree/main/devops/argocd/environments/brave/confluent-kafka/kafka-connect

Which can contain 1 or more Kafka Connect. It is depending on use case, how and if to use separate connect. One benefit would be separate scaling for individual connectors. Desired replica count is set in Connect it self. For list of successfully installed connectors visit, (ably for instance):

https://kafka-connect-ably.apps.brave.safibank.online/connectors

For debugging & status of connector, visit:

https://kafka-connect-ably.apps.brave.safibank.online/connectors?expand=status&expand=info

For each Kafka Connect, can be installed multiple Connectors,:

https://github.com/SafiBank/SaFiMono/tree/main/devops/argocd/environments/brave/confluent-kafka/connectors

Again helm chart is used, to take advantage in looping over topic configuration file, to get links fro instance which topic is assigned to connector (like in Ably)

OAUTH - Okta integration

Whole oauth mechanism is implemented via Terraform resources here:

https://github.com/SafiBank/SaFiMono/blob/c8b833b026083e471c3c56b7c8af8e4bdfc58579/devops/terraform/tf-dispatcher-okta-config/okta_confluent.tf#L1

All settings are done programmatically, pictures are for illustration of settings

The basic part is configuration of default Okta Auth server:

Please note that this is part of package which is still in trial and will stop working after 15 days as of 2.12.2022
In this default auth server is configured custom scope, confluent this is later used by clients

This scope is used for all confluent access, doesnt matter the client. For this access Confluent Identity Provider is set on Confluent side:

This is per whole Confluent organization, doesnt matter environment or scope

Next there are groups of clients which needs to connect to confluent, depending on use case currently there are two:

  • ThoughtMachine

  • Meiro

They are separated from security reasons, not to share the credentials data. Another use case could be Applications which are currently using SASL/PLAIN and/or Connectors/Operator which are using also SASL/PLAIN atm. For each such use case , identity pool is created

To separate access rights, which entities are having using separate pools.

To reflect this in OKTA there are 2 applications created in similar manner: