Challenge 3: Dataform for automation
Previous Challenge Next Challenge
Introduction
Although we’ve only dealt with 3 tables so far, our data model has many more tables, and we have to perform multiple SQL operations to process the data. Doing this manually is error-prone and labor intensive. Wouldn’t it be great if we could automate this by developing and operationalizing scalable data transformation pipelines in BigQuery using SQL? Enter Dataform 🙂
Description
Create a new Dataform Repository, update its settings to use the BQ DWH Dataform Service Account
, override workspace compilation settings to ensure that Project ID points to your project. Then link it to this Github repository, using HTTPS
as the protocol, main
as the Default branch name and the provided git-secret
as the secret to connect.
After configuring the Dataform repository, create a new Development Workspace, solve any errors and execute the pipeline with the tag staging
.
Note The provided
git-secret
has a dummy value, it’ll be ignored when you pull from the repository. When you link a Git repository, Dataform clones that repository locally (in your Dataform Development Workspace) and you can commit your changes to your local copy. Normally you’d be able to push those changes to the remote (either main or a different branch), but since the provided secret doesn’t have any write permissions, you won’t be able to do that for this exercise.
Success Criteria
- There’s a successful execution of the provided Dataform pipeline for the
staging
tables. - The following 12 tables have been created in the
curated
dataset:stg_address
stg_country_region
stg_credit_card
stg_customer
stg_order_status
stg_person
stg_product
stg_product_category
stg_product_subcategory
stg_sales_order_detail
stg_sales_order_header
stg_state_province