Challenge 4: Dimensional modeling
Previous Challenge Next Challenge
Introduction
Dimensional modeling is a data warehousing technique that organizes data into fact tables containing measurements and dimension tables, which provide context for those measurements. This structure makes data analysis efficient and intuitive, allowing users to easily understand and query data related to specific business events.
Description
We’re going to create a star schema by extracting dimension tables and a fact table from the staging tables that have been created in the previous challenge. First you need to create another dataset and call it dwh
.
We have already provided the code for the dimension tables, first run the pipeline for the tag dimension
to generate the dimension tables. Then create a new fact_sales.sqlx
file in the same folder as the dimension tables, configure it with the tag fact
, order_date
as the partition column and product_key
as the clustering column. Include the following columns in the the fact table:
sales_key
(surrogate key built out ofsales_order_id
andsales_order_detail_id
)product_key
(surrogate key built out ofproduct_id
, to be used for clustering)customer_key
(surrogate key built out ofcustomer_id
)credit_card_key
(surrogate key built out ofcredit_card_id
)ship_address_key
(surrogate key built out ofship_to_address_id
)order_status_key
(surrogate key built out ofstatus
)order_date_key
(surrogate key built out oforder_date
)order_date
(to be used for the partitioning)unit_price
unit_price_discount
cost_of_goods_sold
(retrieved fromstg_products
table,standard_cost
column)order_quantity
gross_revenue
(calculated by multiplyingunit_price
withorder_quantity
)gross_profit
(calculated by subtracting discounts and costs of goods sold fromgross_revenue
)
Once the configuration is complete run the Dataform pipeline with the tag fact
and commit your changes.
Note If you’ve created the fact table with no or a different partition column, you’ll have to drop it first manually before you can run the Dataform pipeline with the
fact
tag.
Success Criteria
- There is a new BigQuery dataset
dwh
in the same region as the other datasets. - There’s a successful execution of the provided Dataform pipeline for the
fact
tag. - There are dimension tables and a new partitioned and clustered fact table,
fact_sales
in thedwh
dataset, with the columns as specified above having in total 121317 rows.
Learning Resources
- Creating tables with Dataform
- Create table partitions and clusters with Dataform
- Defining additional table configurations with Dataform
- Introduction to partitioned tables
- Introduction to clustered tables
Tips
- Find out which staging tables have the
sales_order_id
andsales_order_details_id
as their id columns. Those tables will be the basis of your fact table (and you’ll need one more table to complete your join). - Note that the discount column is a factor between 0 and 1 (0 being no discount and 1 being a 100% discount).