Challenge 1: The open foundation
Introduction
Before we can analyze data, we need to lay the groundwork. Unlike standard BigQuery native tables, Iceberg tables store data in open formats (Parquet, Avro, ORC) in blob storage, such as Cloud Storage buckets, while BigQuery manages the metadata. This separation of compute and storage is the heart of the Lakehouse architecture.
Description
In BigQuery, we aim to have 2 datasets. One for marketing and one for sales. Your task is to create the sales dataset and the necessary tables.
You have been provided with raw sales data stored in a bucket on Google Cloud Storage. In this bucket, you will find the necessary data to create tables for orders, order_items and products. First, create a new Cloud Storage bucket to serve as our data lake storage. Then, configure a BigLake Connection (Cloud Resource) to allow BigQuery to access this bucket. Finally, create the sales BigQuery dataset and 3 Iceberg tables, located within the bucket you created earlier, named products, orders and order_items. Finally, load the raw data into the Iceberg tables.
The schema of the three tables are as follows:
Schema files
You can download the schema of these tables by using the following links. The schemas are in JSON format:
Success Criteria
- A Cloud Storage bucket exists in the same region as your dataset.
- A BigLake Cloud Resource connection is successfully created and has the required permissions to access the bucket.
- The tables
products,ordersandorder_itemsexist, and theirtable_formatis listed asICEBERG. - Querying the Iceberg tables returns data.
Learning Resources
- Introduction to BigLake tables
- Create and set up a Cloud resource connection
- Create Iceberg tables in BigQuery
Tips
- Pay close attention to Regions. Your Bucket, Connection, and Dataset must all be in the same location (e.g.,
us-central1). - You will need to grant specific IAM roles on your Storage Bucket to the Service Account created by the BigLake connection.