SaFi Bank Space : Data Engineering Framework

Background

This document describes the Data team’s plans and implementation of its data engineering framework. Most of the pipelines are ELT, data are loaded immediately after extracting it, handling raw — often unstructured — data. Using this approach, the team can handle and process large amount of data at a faster rate.

Table below summarize the services that are used for the major steps in the pipeline. Detailed implementation of each part can be found in the links.

Process	Source	Description	Link
Extract	API	Cloud functions will be used for batch extracts	https://safibank.atlassian.net/l/cp/P7A1KJE9
Extract	Confluent Kafka	Kafka Sink Connector (IT Team)
Load	Bigquery (Data Lake)	Cloud functions will be used	https://safibank.atlassian.net/l/cp/P7A1KJE9
Transform	API and Kafka	DBT Core will be used	https://safibank.atlassian.net/l/cp/BtXShZG9

Architecture

Below illustrates the different tools that will be used for each step of the data pipeline :

Data Engineering Best Practices

Code Modularity - python scripts developed reused the same functions for the following services (Bigquery, API Request, Cloud Storage, Secret Manager).
Standardized Naming Convention
Version Control - all codes are stored in Github
Ensure Data Quality
Pipeline Monitoring

SaFi Bank Space : Data Engineering Framework

Background

Architecture

Data Engineering Best Practices

Attachments: