Challenge 2: Data interoperability across Lakehouse

Introduction

A cornerstone of the Google Cloud Lakehouse architecture is true interoperability: the ability to join disparate datasets without the latency or cost of data movement. The Open Lakehouse provides a unified interface that allows BigQuery to query Apache Iceberg tables alongside native BigQuery tables in a single SQL statement.

This architecture not only streamlines analytics but also serves as a robust foundation for AI integration. By storing data in open formats like Iceberg, you can feed cleaned, governed data directly into BigQuery’s AI capabilities without ever moving the underlying bits.

Description

Using the parquet file found in the raw data bucket on Cloud Storage, create a native BigQuery table called users in the marketing dataset. You should not need to define the table schema since BigQuery can automatically infer it from the parquet file.

Leverage the Open Lakehouse metadata layer to join your new users table with the existing Iceberg tables. Write a single SQL query to aggregate the following metrics by country:

Total number of orders
Total number of ordered items
Total sales price

Success Criteria

A BigQuery native table “users” is created with 100,000 rows.
Query output contains 1 row per country, and each row has 3 columns that contain the correct metrics.

Learning Resources

Previous Challenge Next Challenge