SaFi Bank Space : Data Platform Architecture

The overall Data Management Landscape Plan consists of the different activities such as implementing Data Governance, Architecture and so on. The image below shows how each of the activities relate with each other. An overall Data Governance will be implemented throughout the Landscape to ensure that each implementation is properly done and consistent within the whole journey. A Metadata management process will be implemented to properly align the structures within the Data Landscape ensuring that any changes are accounted and down the line process will not be affected. This also helps in implementing the standards for our Reference data, building our Master data and properly identifying and understanding the Transaction data. When all of this considered we could properly implements our models based on the business requirements which in turn dictates how our data marts or the warehouse will be built. It is also important to check how we operationalize and store our data system gaining seamless integration to other systems and develops interoperability with the different systems within the company.

To further understand the overall Data Landscape, let’s first discover the current data flow.

System	How to get Data?	Type of Data	Remarks
Genesys	API	Agent Call Logs, Customer Interactions	Based on the current license, we cant connect directly to the database. Current plan is to use the 16 reports they can provide and get the data from there.
Meiro	Still on discussion	Customer Segments
BOFE	Kafka	Customer Details
Core Banking	Kafka	Customer Details
Core Banking	Kafka	Customer Transactions
Bank Microservices	Kafka	Customer Details
Bank Microservices	Kafka	Customer Transactions
Slacker	Kafka/Data Lake	Risk Details
Loxon	Kafka	Collection Details
Firebase	Data Lake	Device Activity
JIRA	API	Customer Complaints Tickets
OSP	Kafka	Customer Onboarding Details
Tracker Files in Lark	Data Lake (TBD)	Various Files such as Manpower Tracker, Sales Tracker etc.	Some teams plan to use Lark Sheets to track different activities within their department.
ZipHR	Data Lake (TBD)	Employee Details	System is still being planned for Implementation

Notes:

Transactions flow through Kafka. Communication between microservices is being done via REST API.
There is an existing Airflow to schedule jobs.
Currently using Hashicorp Terraform and Vault for IaaS

Based on the current requirements, we plan to build the data lake as follows.

Raw - Contains raw data. A specific retention period will be set before it is moved to archive. This will ensure that the bank is storing the required years for the retention period
Archive - long term storage. Can easily be reloaded to BQ as necessary. Must be on compressed format for cost consideration
Trusted Store - contains the Raw data but implement data quailty rules.
Staging/Processed Store - Staging Layer for temporary use per job or per day. This will ensure that using the same processed store will not incur duplicate processing cost.
Data Marts - Formed based on the reports, products requirements.

SaFi Bank Space : Data Platform Architecture

Attachments: