Background

This document describes Data teams framework for transforming data in the data lake. DBT is a data transformation tool that uses sql scripts and yml files for transforming tables. It also follows data engineering best practices like code modularity, portability, CI/CD, data quality checks, lineage and documentation. The team will be using the core version which is an open source command line tool and will be using GCP to run the service.

In GCP, the team will use Cloud Run to run the dbt scripts. It is a fully managed, no VMs to setup or maintain. Its pricing model is based on request time, you’ll only get charged when the dbt service is running.

Below is the diagram of the architecture and the GCP services that will be used.

Service

Description

Cloud Scheduler

Used to schedule the cloud run

Cloud Build

Contain the dbt images

Cloud Run

Compute for running scripts

Architecture

Project Structure

├── raw-to-dataquality
     ├── analyses
     └── macros
     └── models
     | └── genesys
     |     ├── conversation
     |     |  ├── marts
     |     |  |    ├── prd_genesys_conversations_v1.sql
     |     |  └── staging
     |     |  |    ├── stg_genesys_conversations_v1.sql
     |     |  └── schema_genesys_conversations.yml
     |     └── evaluation
     |     |   ├── marts
     |     |   |    ├── prd_genesys_evaluations_v1.sql
     |     |   └── staging
     |     |   |    ├── stg_evaluations.sql
     |     |   └── schema_genesys_evaluations.yml
     |     └── user_detail
     └── seeds
     └── snapshots
     └── tests
     └── dbt_project.yml
     └── profiles.yml
     └── script.sh
     └── invoke.go
 

dbt_project.yml - This file is the main configuration file for your project.

profiles.yml - This file contains the database connection that dbt will use to connect to the data warehouse.

script.sh - This file contains the scripts that will be run.

invoke.go - This file contains the dbt package that will be deployed in cloud run

models - This folder contains all data models in your project (including sql scripts).

Attachments: