Challenge 3: Dataform for automation
Previous Challenge Next Challenge
Introduction
Although we’ve only dealt with 3 tables so far, our data model has many more tables, and we have to perform multiple SQL operations to process the data. Doing this manually is error-prone and labor intensive. Wouldn’t it be great if we could automate this by developing and operationalizing scalable data transformation pipelines in BigQuery using SQL? Enter Dataform 🙂
Description
Create a new Dataform Repository, update its settings to use the BQ DWH Dataform Service Account, override workspace compilation settings to ensure that Project ID points to your project. Then link it to this Github repository, using HTTPS as the protocol, main as the Default branch name and the provided git-secret as the secret to connect.
After configuring the Dataform repository, create a new Development Workspace, solve any errors and execute the pipeline with the tag staging.
Note
The providedgit-secrethas a dummy value, it’ll be ignored when you pull from the repository. When you link a Git repository, Dataform clones that repository locally (in your Dataform Development Workspace) and you can commit your changes to your local copy. Normally you’d be able to push those changes to the remote (either main or a different branch), but since the provided secret doesn’t have any write permissions, you won’t be able to do that for this exercise.
Success Criteria
- There’s a successful execution of the provided Dataform pipeline for the
stagingtables. - The following 12 tables have been created in the
curateddataset:stg_addressstg_country_regionstg_credit_cardstg_customerstg_order_statusstg_personstg_productstg_product_categorystg_product_subcategorystg_sales_order_detailstg_sales_order_headerstg_state_province