SaFi Bank Space : Data Platform Architecture

The overall Data Management Landscape Plan consists of the different activities such as implementing Data Governance, Architecture and so on. The image below shows how each of the activities relate with each other. An overall Data Governance will be implemented throughout the Landscape to ensure that each implementation is properly done and consistent within the whole journey. A Metadata management process will be implemented to properly align the structures within the Data Landscape ensuring that any changes are accounted and down the line process will not be affected. This also helps in implementing the standards for our Reference data, building our Master data and properly identifying and understanding the Transaction data. When all of this considered we could properly implements our models based on the business requirements which in turn dictates how our data marts or the warehouse will be built. It is also important to check how we operationalize and store our data system gaining seamless integration to other systems and develops interoperability with the different systems within the company.

To further understand the overall Data Landscape, let’s first discover the current data flow.

System

How to get Data?

Type of Data

Remarks

Genesys

API

Agent Call Logs, Customer Interactions

Based on the current license, we cant connect directly to the database. Current plan is to use the 16 reports they can provide and get the data from there.

Meiro

Still on discussion

Customer Segments

BOFE

Kafka

Customer Details

Core Banking

Kafka

Customer Details

Customer Transactions

Bank Microservices

Kafka

Customer Details

Customer Transactions

Slacker

Kafka/Data Lake

Risk Details

Loxon

Kafka

Collection Details

Firebase

Data Lake

Device Activity

JIRA

API

Customer Complaints Tickets

OSP

Kafka

Customer Onboarding Details

Tracker Files in Lark

Data Lake (TBD)

Various Files such as Manpower Tracker, Sales Tracker etc.

Some teams plan to use Lark Sheets to track different activities within their department.

ZipHR

Data Lake (TBD)

Employee Details

System is still being planned for Implementation

Notes:

  • Transactions flow through Kafka. Communication between microservices is being done via REST API.

  • There is an existing Airflow to schedule jobs.

  • Currently using Hashicorp Terraform and Vault for IaaS

Based on the current requirements, we plan to build the data lake as follows.

  • Raw - Contains raw data. A specific retention period will be set before it is moved to archive. This will ensure that the bank is storing the required years for the retention period

  • Archive - long term storage. Can easily be reloaded to BQ as necessary. Must be on compressed format for cost consideration

  • Trusted Store - contains the Raw data but implement data quailty rules.

  • Staging/Processed Store - Staging Layer for temporary use per job or per day. This will ensure that using the same processed store will not incur duplicate processing cost.

  • Data Marts - Formed based on the reports, products requirements.