We are able to stream data from Kafka directly into BigQuery Tables using connectors. This minimizes the need to set up additional streaming ingest pipelines, if using these native connectors.

BigQuery Schema

Currently, the BigQuery tables are set up with the schema inherited from the Kafka topics themselves. This is the raw dump that serves as the raw data in BigQuery. These raw dump tables will then go through further ETL for business users and other downstream functions.

The current naming convention for Kafka topics that are dumped into BigQuery tables is as follow. Each topic will be grouped within their own datasets. Each BigQuery dataset is named after the respective business function/domains as already outlined in the Kafka topics (e.g. Transactions, Customers, etc.). Within each dataset, there are individual tables that inherit the names of the Kafka topics in these domains.

  • Dataset Name: <domain name>

  • Table Name: <kafka topic name>

Sample screenshot of <domain name>.<kafka topic name> convention:

Current BigQuery Datasets: