The overall Data Management Landscape Plan consists of the different activities such as implementing Data Governance, Architecture and so on. The image below shows how each of the activities relate with each other. An overall Data Governance will be implemented throughout the Landscape to ensure that each implementation is properly done and consistent within the whole journey. A Metadata management process will be implemented to properly align the structures within the Data Landscape ensuring that any changes are accounted and down the line process will not be affected. This also helps in implementing the standards for our Reference data, building our Master data and properly identifying and understanding the Transaction data. When all of this considered we could properly implements our models based on the business requirements which in turn dictates how our data marts or the warehouse will be built. It is also important to check how we operationalize and store our data system gaining seamless integration to other systems and develops interoperability with the different systems within the company.
To further understand the overall Data Landscape, let’s first discover the current data flow.
System | How to get Data? | Type of Data | Remarks |
Genesys | API | Agent Call Logs, Customer Interactions | Based on the current license, we cant connect directly to the database. Current plan is to use the 16 reports they can provide and get the data from there. |
Meiro | Still on discussion | Customer Segments | |
BOFE | Kafka | Customer Details | |
Core Banking | Kafka | Customer Details | |
Customer Transactions | |||
Bank Microservices | Kafka | Customer Details | |
Customer Transactions | |||
Slacker | Kafka/Data Lake | Risk Details | |
Loxon | Kafka | Collection Details | |
Firebase | Data Lake | Device Activity | |
JIRA | API | Customer Complaints Tickets | |
OSP | Kafka | Customer Onboarding Details | |
Tracker Files in Lark | Data Lake (TBD) | Various Files such as Manpower Tracker, Sales Tracker etc. | Some teams plan to use Lark Sheets to track different activities within their department. |
ZipHR | Data Lake (TBD) | Employee Details | System is still being planned for Implementation |
Notes:
Transactions flow through Kafka. Communication between microservices is being done via REST API.
There is an existing Airflow to schedule jobs.
Currently using Hashicorp Terraform and Vault for IaaS
Based on the current requirements, we plan to build the data lake as follows.
Raw - Contains raw data. A specific retention period will be set before it is moved to archive. This will ensure that the bank is storing the required years for the retention period
Archive - long term storage. Can easily be reloaded to BQ as necessary. Must be on compressed format for cost consideration
Trusted Store - contains the Raw data but implement data quailty rules.
Staging/Processed Store - Staging Layer for temporary use per job or per day. This will ensure that using the same processed store will not incur duplicate processing cost.
Data Marts - Formed based on the reports, products requirements.
Attachments:
image-20230222-140927.png (image/png)
Data Architecture Diagrams-Data Flow.drawio.png (image/png)
Data Architecture Diagrams-Data Overall Plan.drawio.png (image/png)
Data Architecture Diagrams-Data Overall Plan.drawio.png (image/png)
Data Architecture Diagrams-Data Platform Structure.drawio.png (image/png)
Data Architecture Diagrams-Data Platform Structure.drawio (1).png (image/png)