Challenge 6: Preparing the Context Layer
Previous Challenge Next Challenge
| Target Persona: Data Engineer / AI Architect | Estimated Duration: 40 minutes |
Introduction
Before creating your conversational AI agents, you need to build a centralized context layer. This layer ensures your agents understand your business terminology, operational metadata, and data asset structures, leading to higher accuracy and fewer hallucinations.
Task 6.1: Technical Metadata Enrichment
- Navigate to your BigQuery dataset.
- Enrich your chosen data assets (such as
disneyland_reviews) by adding schema descriptions to columns. - Ensure critical columns like your vector embeddings, image analysis JSON fields, and source URIs have clear technical metadata descriptions.
Task 6.2: Business Glossary Alignment
To align raw technical structures with organizational understanding, you must map your catalog to a standardized business vocabulary.
- Glossary Creation: Create a centralized Business Glossary for the Disneyland analytics ecosystem.
- Core Definitions: Define core business terms and definitions inside the glossary (e.g., define terms like “Rollercoaster”, “Premium visitor”, or “Buffet Dining Category”
An example: “Premium visitor”: A visitor who left more than 2 reviews). - Asset Mapping to BigQuery: Link these business terms directly to their corresponding BigQuery columns to map technical metadata to business language.
- Asset Mapping to AlloyDB: Try mapping some terms to AlloyDB assets as well to see how Knowledge Catalog equally integrates to operational & analytical databases.
Task 6.3: Automated profiling & quality
It’s very important to understand the distribution of the values in a column and quality rules in order to better discover the data.
- Run a data profile scan on the disneyland_reviews table. Analyze the results.
- Define & run an automatic data quality scan with different rules (profile-based, predefined generic, custom, etc)
Task 6.4: Automated GCS Metadata Generation
Set up an automated extraction pipeline to handle documentation and unstructured assets within your environment.
Tip This can be done in the BigQuery Metadata Curation tab.
- Configure the pipeline to analyze your unstructured
gs://ghacks-disneyland-on-gcp/bucket. - Automatically generate and attach metadata tags (such as language, document type, target audience, and revision date) to the PDF assets, use semantic inference for better results.
Task 6.5: lookup context API Integration
Once your technical, business, and object storage metadata are established, wire them into your execution layer for application discovery.
- Utilize the LookupContext API to fetch operational and structural context dynamically. Test the API against a standard BigQuery data asset and an AlloyDB transactional database table. You can use Python or a rest API
- Verify that the API returns detailed, low-latency context maps that an LLM agent can ingest to understand the underlying database schemas and table relationships.
Success Criteria
To validate this challenge, you must demonstrate the following:
- Show the enriched schema descriptions for your target tables directly within the BigQuery Console.
- Provide a summary or export of the linked terms inside your centralized Disneyland Business Glossary.
- Show the successful pipeline logs or sample metadata tags generated for the PDF assets in Cloud Storage.
- Provide the API JSON response payload from a successful
LookupContextcall showing the multi-database schema mapping.