SaFi Bank Space : Data Quality Checklist

Background

High quality data are precondition for analyzing and using big data. This document describes the list of data quality checks the data team is applying in each of the data sources that will be used by the company.

Checklist

Dimension	Elements	Indicators
Uniqueness	Is there a unique identififier in the data source?	Check for unique identifier of the data source
	How are changes made and processed?	Check for update date/change date fields
	How far back in the data set will changes be made?
Timeliness	Whether the time interval from data collection and processing to release meets requirements?	Check documentation for data update frequency
Timeliness	Are the data reported as soon as possible after collection?
Integrity	Is the data format clear?	Correct metadata definition
	Is the data format clear?	Required fields or not nullable
	Data are consistent with content integrity	Units are appropriate and defined
		Value ranges are valid
		Timezones are consistent
	Data and the data from other data sources are consistent or verifiable	Lookup table or foreign key constraints
Completeness	How to identify if data is complete	Compare record count of source and extracted data

Data Quality Check in DBT

Most of the data quality checks are implemented in dbt. Below are the sample checks applied by data team.

not_null

models:
  - name: raw_genesys_conversations_v1
    columns:
      - name: conversationId
        tests:
          - not_null

2. Unique

models:
  - name: raw_genesys_conversations_v1
    columns:
      - name: conversationId
        tests:
          - unique

3. accepted_values

models:
  - name: raw_genesys_conversations_v1
    columns:
      - name: direction 
        tests:
          - accepted_values:
              
values
: [inbound, outbound]

4. Relationships

models:
  - name: raw_genesys_conversations_v1
    columns:
      - name: user_id
        tests:
          - relationships:
              to: ref('raw_genesys_users_v1')
              field: id

5. Singular Tests

select
   evaluationId 
    
from
 {{ 
ref
('evaluations' )}}
where
 answer.score < 0