Challenge 4: Image classification and brochure RAG

Previous Challenge Next Challenge

Target Persona: AI Developer / Data Scientist Estimated Duration: 45 minutes

Introduction

Disneyland managers have a collection of visitor photos and PDF brochures. In this challenge, you will construct a pipeline to organize this unstructured data, perform image classification, and build a Vector Search-based Retrieval-Augmented Generation (RAG) pipeline to query and extract information from park brochures using natural language.

Description

Task 4.1: Multimodal Image Classification

You have a GCS bucket containing park photos: gs://ghacks-disneyland-on-gcp/attraction_parc_photos/.

  1. Create a BigQuery Object Table pointing to the GCS bucket.
  2. Create a remote model in BigQuery pointing to a multimodal model (e.g., gemini-2.5-flash).
  3. Use AI.GENERATE_TEXT to pass the image URIs to the model with a prompt asking: “Is this image from a Disneyland park? Answer with a JSON object containing keys ‘is_disneyland’ (boolean) and ‘reason’ (string).”
  4. Save the structured results into a table images_classification.

Task 4.2: Streamlined PDF Document Processing

We want our assistant to answer detailed questions based on the official park brochures (PDFs) located in gs://ghacks-disneyland-on-gcp/disneyland_brochures/. You can choose between two options.

Create an Object Table in BigQuery pointing to the brochures bucket.

Option 1: AI.SEARCH with OBJECTREF

  • Use AI.SEARCH to find “Where can I find a buffet-style Tex-Mex meal?”

For a more streamlined pipeline, let’s create chunks of the PDFs, then generate embeddings. Finally, use Vector search to find similarities:

  • Extract chunks of the PDF files (using AI.GENERATE, AI.PARSE_DOCUMENT or a UDF Function).
  • Generate embeddings for each text chunk using a remote BQML embedding model (gemini-embedding-001).
  • Store the chunks and their vector embeddings in a table brochure_embeddings.

Task 4.3: Intelligent Search and Response with AI.SEARCH & AI.GENERATE_TEXT

  1. Perform a vector search over the brochure_embeddings table. Find the most relevant document chunks for the question: “Where can I find a buffet-style Tex-Mex meal?” (or French: “Où manger un repas tex-mex à volonté ?”).
  2. Pass the retrieved chunks as context along with the question to gemini-2.5-flash using AI.GENERATE_TEXT to generate a grounded, accurate response.

Success Criteria

To validate this challenge, you must demonstrate the following:

  • Verify the BQ Object Tables created for both photos and PDFs.
  • Show the results from the images_classification table displaying at least 5 images, their classification, and the reason.
  • Show the final SQL query performing the vector search and RAG generation, along with the grounded text response answering the Tex-Mex question.

Previous Challenge Next Challenge