Challenge 1: The open foundation

Next Challenge

Introduction

Before we can analyze data, we need to lay the groundwork. Unlike standard BigQuery native tables, Iceberg tables store data in open formats (Parquet, Avro, ORC) in blob storage, such as Cloud Storage buckets, while BigQuery manages the metadata. This separation of compute and storage is the heart of the Lakehouse architecture.

Description

In BigQuery, we aim to have 2 datasets. One for marketing and one for sales. Your task is to create the sales dataset and the necessary tables.

You have been provided with raw sales data stored in a bucket on Google Cloud Storage. In this bucket, you will find the necessary data to create tables for orders, order_items and products. First, create a new Cloud Storage bucket to serve as our data lake storage. Then, configure a BigLake Connection (Cloud Resource) to allow BigQuery to access this bucket. Finally, create the sales BigQuery dataset and 3 Iceberg tables, located within the bucket you created earlier, named products, orders and order_items. Finally, load the raw data into the Iceberg tables.

The schema of the three tables are as follows:

Schema files

You can download the schema of these tables by using the following links. The schemas are in JSON format:

Success Criteria

  • A Cloud Storage bucket exists in the same region as your dataset.
  • A BigLake Cloud Resource connection is successfully created and has the required permissions to access the bucket.
  • The tables products, orders and order_items exist, and their table_format is listed as ICEBERG.
  • Querying the Iceberg tables returns data.

Learning Resources

Tips

  • Pay close attention to Regions. Your Bucket, Connection, and Dataset must all be in the same location (e.g., us-central1).
  • You will need to grant specific IAM roles on your Storage Bucket to the Service Account created by the BigLake connection.

Next Challenge