Challenge 3: Dataform for automation

Previous Challenge Next Challenge

Introduction

Although we’ve only dealt with 3 tables so far, our data model has many more tables, and we have to perform multiple SQL operations to process the data. Doing this manually is error-prone and labor intensive. Wouldn’t it be great if we could automate this by developing and operationalizing scalable data transformation pipelines in BigQuery using SQL? Enter Dataform 🙂

Description

Create a new Dataform Repository, update its settings to use the BQ DWH Dataform Service Account, override workspace compilation settings to ensure that Project ID points to your project. Then link it to this Github repository, using HTTPS as the protocol, main as the Default branch name and the provided git-secret as the secret to connect.

After configuring the Dataform repository, create a new Development Workspace, solve any errors and execute the pipeline with the tag staging.

Note The provided git-secret has a dummy value, it’ll be ignored when you pull from the repository. When you link a Git repository, Dataform clones that repository locally (in your Dataform Development Workspace) and you can commit your changes to your local copy. Normally you’d be able to push those changes to the remote (either main or a different branch), but since the provided secret doesn’t have any write permissions, you won’t be able to do that for this exercise.

Success Criteria

  • There’s a successful execution of the provided Dataform pipeline for the staging tables.
  • The following 12 tables have been created in the curated dataset:
    • stg_address
    • stg_country_region
    • stg_credit_card
    • stg_customer
    • stg_order_status
    • stg_person
    • stg_product
    • stg_product_category
    • stg_product_subcategory
    • stg_sales_order_detail
    • stg_sales_order_header
    • stg_state_province

Learning Resources

Previous Challenge Next Challenge