Challenge 4: Image classification and brochure RAG
Previous Challenge Next Challenge
| Target Persona: AI Developer / Data Scientist | Estimated Duration: 45 minutes |
Introduction
Disneyland managers have a collection of visitor photos and PDF brochures. In this challenge, you will construct a pipeline to organize this unstructured data, perform image classification, and build a Vector Search-based Retrieval-Augmented Generation (RAG) pipeline to query and extract information from park brochures using natural language.
Description
Task 4.1: Multimodal Image Classification
You have a GCS bucket containing park photos: gs://ghacks-disneyland-on-gcp/attraction_parc_photos/.
- Create a BigQuery Object Table pointing to the GCS bucket.
- Create a remote model in BigQuery pointing to a multimodal model (e.g.,
gemini-2.5-flash). - Use
AI.GENERATE_TEXTto pass the image URIs to the model with a prompt asking: “Is this image from a Disneyland park? Answer with a JSON object containing keys ‘is_disneyland’ (boolean) and ‘reason’ (string).” - Save the structured results into a table
images_classification.
Task 4.2: Streamlined PDF Document Processing
We want our assistant to answer detailed questions based on the official park brochures (PDFs) located in gs://ghacks-disneyland-on-gcp/disneyland_brochures/. You can choose between two options.
Create an Object Table in BigQuery pointing to the brochures bucket.
Option 1: AI.SEARCH with OBJECTREF
- Use
AI.SEARCHto find “Where can I find a buffet-style Tex-Mex meal?”
Option 2: Chunking, Embeddings, and Vector Search
For a more streamlined pipeline, let’s create chunks of the PDFs, then generate embeddings. Finally, use Vector search to find similarities:
- Extract chunks of the PDF files (using
AI.GENERATE,AI.PARSE_DOCUMENTor a UDF Function). - Generate embeddings for each text chunk using a remote BQML embedding model (
gemini-embedding-001). - Store the chunks and their vector embeddings in a table
brochure_embeddings.
Task 4.3: Intelligent Search and Response with AI.SEARCH & AI.GENERATE_TEXT
- Perform a vector search over the
brochure_embeddingstable. Find the most relevant document chunks for the question: “Where can I find a buffet-style Tex-Mex meal?” (or French: “Où manger un repas tex-mex à volonté ?”). - Pass the retrieved chunks as context along with the question to
gemini-2.5-flashusingAI.GENERATE_TEXTto generate a grounded, accurate response.
Success Criteria
To validate this challenge, you must demonstrate the following:
- Verify the BQ Object Tables created for both photos and PDFs.
- Show the results from the
images_classificationtable displaying at least 5 images, their classification, and the reason. - Show the final SQL query performing the vector search and RAG generation, along with the grounded text response answering the Tex-Mex question.