Challenge 2: Creating the agentic database layer

Previous Challenge Next Challenge

Target Persona: DBA / Database Developer Estimated Duration: 40 minutes Prerequisite: Challenge 1

Introduction

In this challenge, you will establish the agentic database layer directly on top of your AlloyDB instance. This involves configuring QueryData to guarantee predictable SQL generation for common queries (NL2SQL), and exposing advanced database capabilities (like hybrid search and semantic filtering) as SQL functions.

By preparing these operational capabilities now, you lay the foundation for the agent to interact securely and deterministically with your data.

Description

QueryData is an intelligent natural language-to-SQL (NL2SQL) engine in AlloyDB that translates conversational questions into precise, secure SQL queries. Rather than just matching static “golden queries”, it uses a Context Set (a JSON file defining your schema’s business logic) to dynamically construct queries using:

  • Query Templates: Parameterized QA pairs (using $1, $2) that allow the engine to generalize query patterns.
  • Query Facets: Reusable filters (e.g., mapping “highly rated” to rating >= 4) that can be dynamically appended.
  • Value Search Queries: Background queries (exact, fuzzy, or semantic) that map user-typed entities to actual database values.

For more details, see the AlloyDB QueryData Documentation.

Task 2.1: Enable the AlloyDB Data API

QueryData and downstream API integration require the AlloyDB Data API (also known as the executesql API) to be explicitly enabled on your AlloyDB instance. Run the following curl command in your Cloud Shell to enable it:

curl -X PATCH \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://alloydb.googleapis.com/v1alpha/projects/\${GOOGLE_CLOUD_PROJECT}/locations/<YOUR_REGION>/clusters/disney-cluster/instances/disney-instance?updateMask=dataApiAccess" \
  -d '{"dataApiAccess": "ENABLED"}'

(Note: Replace <YOUR_REGION> with the region where your cluster is deployed, e.g. europe-west1. The project ID is automatically resolved via the GOOGLE_CLOUD_PROJECT environment variable in Cloud Shell.)

Task 2.2: Register IAM Users in AlloyDB

To interact with the database, QueryData relies on IAM Database Authentication. This means that each user that needs to use querydata must be registered as a database user and given privileges. We will just use personal accounts for querydata.

To register your user, follow these steps:

  1. Log in using Basic Authentication:
    • In the Google Cloud Console, navigate to AlloyDB > Clusters and select your cluster.
    • Click AlloyDB Studio in the left menu.
    • Sign in to the disney database using Basic Authentication (Username: postgres, Password: <your_cluster_password>).
  2. Register the IAM User:
    • In your AlloyDB console, navigate to Users
    • Click Add user account
    • Select Cloud IAM
    • Enter the email you use to log into the Google Cloud Console in the User ID field
    • Click Save

Task 2.3: Set Up the QueryData Context Set in AlloyDB

  1. Create the JSON Context Set File: Create a file named querydata_disney_context.json on your local system. Fill in the SQL templates and queries according to the requirements:

     {
       "templates": [
         {
           "nlQuery": "Show available attractions in Disneyland Paris",
           "sql": "/* TODO: Write SQL to select name, description from public.disneyland_attractions filtered by branch */",
           "intent": "List all attractions for a specific park branch",
           "manifest": "List attractions by branch",
           "parameterized": {
             "parameterized_intent": "Show available attractions in $1",
             "parameterized_sql": "/* TODO: Write the parameterized SQL using $1 */"
           }
         },
         {
           "nlQuery": "Find reviews with rating 5 for Space Mountain",
           "sql": "/* TODO: Write SQL to join reviews and attractions on branch, filtered by attraction name and rating */",
           "intent": "Get reviews with a specific rating for a named attraction",
           "manifest": "Get reviews by attraction and rating",
           "parameterized": {
             "parameterized_intent": "Find reviews with rating $2 for $1",
             "parameterized_sql": "/* TODO: Write the parameterized SQL using $1 (name) and $2 (rating) */"
           }
         },
         {
           "nlQuery": "Average rating of attractions in California Adventure",
           "sql": "/* TODO: Write SQL to select the average rating from public.disneyland_reviews filtered by branch */",
           "intent": "Calculate the average review rating for a specific branch",
           "manifest": "Average rating by branch",
           "parameterized": {
             "parameterized_intent": "Average rating of attractions in $1",
             "parameterized_sql": "/* TODO: Write the parameterized SQL using $1 */"
           }
         }
       ],
       "facets": [
         {
           "sql_snippet": "/* TODO: Write SQL snippet to filter rating >= 4 */",
           "intent": "highly rated reviews",
           "manifest": "Filter reviews by a minimum rating threshold",
           "parameterized": {
             "parameterized_intent": "reviews with rating greater than or equal to $1",
             "parameterized_sql_snippet": "/* TODO: Write the parameterized SQL snippet using $1 */"
           }
         }
       ],
       "value_searches": [
         {
           "query": "SELECT DISTINCT T.name as value, 'public.disneyland_attractions.name' as columns, 'Attraction Name' as concept_type, (1.0 - ts_rank(to_tsvector('english', T.name), plainto_tsquery('english', $value))) as distance, '{}'::text as context FROM public.disneyland_attractions T WHERE to_tsvector('english', T.name) @@ plainto_tsquery('english', $value)",
           "concept_type": "Attraction Name",
           "description": "Full-text search for attraction names"
         },
         {
           "query": "SELECT DISTINCT T.branch as value, 'public.disneyland_attractions.branch' as columns, 'Branch Name' as concept_type, (1.0 - ts_rank(to_tsvector('english', T.branch), plainto_tsquery('english', $value))) as distance, '{}'::text as context FROM public.disneyland_attractions T WHERE to_tsvector('english', T.branch) @@ plainto_tsquery('english', $value)",
           "concept_type": "Branch Name",
           "description": "Full-text search for park branches"
         }
       ]
     }
    
  2. Upload Context Set to AlloyDB (via IAM Authentication):

    • In the Google Cloud Console, open AlloyDB Studio again.
    • This time, sign in using IAM Authentication (this will use your logged-in Google account, which you registered in Task 2.2). You can click on the “Switch user/database” button (icon with a person and database) to switch to IAM authentication.
    • In the left sidebar, check for Context sets.
    • Click Create context set, name it disney-context, and upload the querydata_disney_context.json file.
    • Validate the setup by using the Test context set feature with variations of your natural language templates.

Task 2.4: Expose AlloyDB AI Operators

You will prepare SQL queries that leverage AlloyDB’s advanced AI features to expose them as tools.

 Tip Remember to switch back to basic authentication (logging in as the postgres user) in AlloyDB Studio before executing the SQL queries in this task.

  1. Hybrid Search (ScaNN + FTS): Before setting up hybrid search, you need to install the required extensions in AlloyDB.

    What is Hybrid Search (ai.hybrid_search)? Hybrid search combines the strengths of semantic search (vector similarity) and keyword search (Full-Text Search) to deliver highly relevant results. While vector search excels at understanding the conceptual meaning of a query, keyword search ensures exact matches (like names or specific terms) aren’t missed. The ai.hybrid_search function, provided by the ai extension, executes both searches in parallel and merges their results using Reciprocal Rank Fusion (RRF) to produce a single, unified relevance score.

    Run the following SQL commands in AlloyDB Studio to install the extensions and create the necessary indexes:

     -- Install required extensions
     CREATE EXTENSION IF NOT EXISTS alloydb_scann CASCADE;
     CREATE EXTENSION IF NOT EXISTS rum CASCADE;
    
     -- Index RUM for Keyword FTS
     CREATE INDEX IF NOT EXISTS attractions_tsvector_idx ON disneyland_attractions USING RUM (description_tsvector rum_tsvector_ops);
    
     -- Index ScaNN for Vector Cosine Similarity
     CREATE INDEX IF NOT EXISTS attractions_vector_idx ON disneyland_attractions USING scann (embedding cosine) WITH (num_leaves=10);
    

    After creating the indexes, try running a hybrid search query using the ai.hybrid_search function to see it in action (try searching for a “thrilling space roller coaster”).

    This hybrid search capability will be mapped later directly to an MCP tool. Save it as a SQL file for later reference.

  2. Semantic Filtering (google_ml.if): Create a SQL function to easily evaluate if a specific attraction is suitable or safe based on a guest’s natural language profile (e.g., checking if a specific ride is safe for pregnant women or suitable for toddlers).

    What is Semantic Filtering (google_ml.if)? Semantic filtering allows you to evaluate complex, natural language conditions directly within your SQL queries. Unlike vector similarity searches that rank results by closeness, google_ml.if uses a generative AI model to evaluate a prompt against your data on a row-by-row basis, returning a simple boolean (true or false). This is extremely powerful for ad-hoc safety or suitability checks where pre-defined categories or labels do not exist.

     Tip Before writing the reusable SQL function, it is highly recommended to build and run a standalone test query in AlloyDB Studio. This allows you to verify the behavior of google_ml.if on a single attraction (for example, testing if “Space Mountain” is suitable for “pregnant women”) before embedding it in the function.

    Create the SQL function:

     CREATE OR REPLACE FUNCTION check_attraction_suitability(attraction_name TEXT, suitability_profile TEXT)
     RETURNS TABLE(name TEXT, description TEXT) AS $$
       SELECT name, description 
       FROM disneyland_attractions 
       WHERE name = attraction_name
         AND /* TODO: Add the google_ml.if operator logic here to evaluate description against suitability_profile */;
     $$ LANGUAGE SQL;
    

Success Criteria

To validate this challenge, you must demonstrate the following:

  • Show a screenshot or validation proof of the QueryData displaying a successful natural-language-to-SQL translation.
  • Provide the SQL DDL definition for the check_attraction_suitability function in AlloyDB Studio.
  • Provide the SQL query you used to run and test the hybrid search (ScaNN + FTS) in AlloyDB Studio, along with its results.
  • Provide the SQL query showing how you called and tested the check_attraction_suitability function in AlloyDB Studio, along with its results.

Previous Challenge Next Challenge