Challenge 2: Data interoperability across Lakehouse

Previous Challenge Next Challenge

Introduction

A cornerstone of the Google Cloud Lakehouse architecture is true interoperability: the ability to join disparate datasets without the latency or cost of data movement. Through BigLake, Google Cloud provides a unified interface that allows BigQuery to query Apache Iceberg tables alongside native BigQuery tables in a single SQL statement.

This architecture not only streamlines analytics but also serves as a robust foundation for AI integration. By storing data in open formats like Iceberg, you can feed cleaned, governed data directly into BigQuery’s AI capabilities without ever moving the underlying bits.

Description

Using the parquet file found in the raw data bucket on Cloud Storage, create a native BigQuery table called users in the marketing dataset. You should not need to define the table schema since BigQuery can automatically infer it from the parquet file.

Leverage the BigLake metadata layer to join your new users table with the existing Iceberg tables. Write a single SQL query to aggregate the following metrics by country:

  • Total number of orders
  • Total number of ordered items
  • Total sales price

Success Criteria

  • A BigQuery native table “users” is created with 100,000 rows.
  • Query output contains 1 row per country, and each row has 3 columns that contain the correct metrics.

Learning Resources

Previous Challenge Next Challenge