Challenge 4: Dimensional modeling

Previous Challenge Next Challenge

Introduction

Dimensional modeling is a data warehousing technique that organizes data into fact tables containing measurements and dimension tables, which provide context for those measurements. This structure makes data analysis efficient and intuitive, allowing users to easily understand and query data related to specific business events.

Description

We’re going to create a star schema by extracting dimension tables and a fact table from the staging tables that have been created in the previous challenge. First you need to create another dataset and call it dwh.

We have already provided the code for the dimension tables, first run the pipeline for the tag dimension to generate the dimension tables. Then create a new fact_sales.sqlx file in the same folder as the dimension tables, configure it with the tag fact, order_date as the partition column and product_key as the clustering column. Include the following columns in the the fact table:

  • sales_key (surrogate key built out of sales_order_id and sales_order_detail_id)
  • product_key (surrogate key built out of product_id, to be used for clustering)
  • customer_key (surrogate key built out of customer_id)
  • credit_card_key (surrogate key built out of credit_card_id)
  • ship_address_key (surrogate key built out of ship_to_address_id)
  • order_status_key (surrogate key built out of status)
  • order_date_key (surrogate key built out of order_date)
  • order_date (to be used for the partitioning)
  • unit_price
  • unit_price_discount
  • cost_of_goods_sold (retrieved from stg_products table, standard_cost column)
  • order_quantity
  • gross_revenue (calculated by multiplying unit_price with order_quantity)
  • gross_profit (calculated by subtracting discounts and costs of goods sold from gross_revenue)

Once the configuration is complete run the Dataform pipeline with the tag fact and commit your changes.

Note If you’ve created the fact table with no or a different partition column, you’ll have to drop it first manually before you can run the Dataform pipeline with the fact tag.

Success Criteria

  • There is a new BigQuery dataset dwh in the same region as the other datasets.
  • There’s a successful execution of the provided Dataform pipeline for the fact tag.
  • There are dimension tables and a new partitioned and clustered fact table, fact_sales in the dwh dataset, with the columns as specified above having in total 121317 rows.

Learning Resources

Tips

  • Find out which staging tables have the sales_order_id and sales_order_details_id as their id columns. Those tables will be the basis of your fact table (and you’ll need one more table to complete your join).
  • Note that the discount column is a factor between 0 and 1 (0 being no discount and 1 being a 100% discount).

Previous Challenge Next Challenge