Challenge 6: Preparing the Context Layer

Previous Challenge Next Challenge

Target Persona: Data Engineer / AI Architect Estimated Duration: 40 minutes

Introduction

Before creating your conversational AI agents, you need to build a centralized context layer. This layer ensures your agents understand your business terminology, operational metadata, and data asset structures, leading to higher accuracy and fewer hallucinations.

Task 6.1: Technical Metadata Enrichment

  1. Navigate to your BigQuery dataset.
  2. Enrich your chosen data assets (such as disneyland_reviews) by adding schema descriptions to columns.
  3. Ensure critical columns like your vector embeddings, image analysis JSON fields, and source URIs have clear technical metadata descriptions.

Task 6.2: Business Glossary Alignment

To align raw technical structures with organizational understanding, you must map your catalog to a standardized business vocabulary.

  1. Glossary Creation: Create a centralized Business Glossary for the Disneyland analytics ecosystem.
  2. Core Definitions: Define core business terms and definitions inside the glossary (e.g., define terms like “Rollercoaster”, “Premium visitor”, or “Buffet Dining Category”
    An example: “Premium visitor”: A visitor who left more than 2 reviews).
  3. Asset Mapping to BigQuery: Link these business terms directly to their corresponding BigQuery columns to map technical metadata to business language.
  4. Asset Mapping to AlloyDB: Try mapping some terms to AlloyDB assets as well to see how Knowledge Catalog equally integrates to operational & analytical databases.

Task 6.3: Automated profiling & quality

It’s very important to understand the distribution of the values in a column and quality rules in order to better discover the data.

  1. Run a data profile scan on the disneyland_reviews table. Analyze the results.
  2. Define & run an automatic data quality scan with different rules (profile-based, predefined generic, custom, etc)

Task 6.4: Automated GCS Metadata Generation

Set up an automated extraction pipeline to handle documentation and unstructured assets within your environment.

 Tip This can be done in the BigQuery Metadata Curation tab.

  1. Configure the pipeline to analyze your unstructured gs://ghacks-disneyland-on-gcp/ bucket.
  2. Automatically generate and attach metadata tags (such as language, document type, target audience, and revision date) to the PDF assets, use semantic inference for better results.

Task 6.5: lookup context API Integration

Once your technical, business, and object storage metadata are established, wire them into your execution layer for application discovery.

  1. Utilize the LookupContext API to fetch operational and structural context dynamically. Test the API against a standard BigQuery data asset and an AlloyDB transactional database table. You can use Python or a rest API
  2. Verify that the API returns detailed, low-latency context maps that an LLM agent can ingest to understand the underlying database schemas and table relationships.

Success Criteria

To validate this challenge, you must demonstrate the following:

  • Show the enriched schema descriptions for your target tables directly within the BigQuery Console.
  • Provide a summary or export of the linked terms inside your centralized Disneyland Business Glossary.
  • Show the successful pipeline logs or sample metadata tags generated for the PDF assets in Cloud Storage.
  • Provide the API JSON response payload from a successful LookupContext call showing the multi-database schema mapping.

Previous Challenge Next Challenge