Challenge 3: Sentiment and wait-time forecasting

Previous Challenge Next Challenge

Target Persona: Data Scientist / Data Analyst Estimated Duration: 45 minutes Note: Can be parallelized immediately using raw CSVs in BigQuery.

Introduction

Analyzing visitor sentiment and forecasting ride waiting times are crucial to improving the guest experience. In this challenge, you will use BigQuery ML and the new BigQuery Studio Data Science Agent to perform automated sentiment classification on reviews, train a time-series forecasting model to predict future waiting times, and build unsupervised classification and ranking models to categorize attractions by intensity.

Description

Task 3.1: Automated Sentiment Analysis with BQ Studio Data Science Agent

Rather than writing Python code from scratch, you will leverage the new Data Science Agent in BigQuery Studio to accelerate your analysis.

 Important Dependency Note: This task queries the disneyland_reviews table in BigQuery, which is replicated from AlloyDB. This requires Challenge 1 (specifically the Datastream replication in Task 1.3) to be completed first.

  1. Open the Data Science Agent panel in BigQuery Studio.
  2. Using natural language, prompt the agent to write a SQL query or a Python notebook that classifies the sentiment of the reviews in disneyland_reviews into Positive, Negative, or Neutral.
  3. The agent should suggest using AI.GENERATE_TEXT or AI.GENERATE with a Gemini model (e.g., gemini-2.5-flash) to perform the sentiment classification.
  4. Run the generated query on a sample of 100 reviews and save the results into a new table reviews_sentiment_analysis.

Task 3.2: Time-Series Wait Time Forecasting

We want our guest assistant to predict wait times for any hour of the day.

  1. Load the historical wait times dataset from: gs://ghacks-disneyland-on-gcp/waiting_time.csv into a BigQuery table named waiting_times.
  2. Use BigQuery ML to train a time-series forecasting model. You can choose either:
    • ARIMA_PLUS: The classic, fast statistical forecasting model.
    • TimesFM: Google’s state-of-the-art foundation model for time-series forecasting (using AI.FORECAST).
  3. Forecast the wait times for all attractions for the next 24 hours in 30-minute intervals, and save the results in a table named forecasted_waiting_times.

Task 3.3: Ride Clustering (Intensity & Popularity)

To better classify our rides, we will group attractions into logical clusters using unsupervised learning.

  1. Build a query that aggregates statistics for each attraction: average wait time, total review count, and average rating.
  2. Use AI.CLASSIFY to categorize rides based on their descriptions into one of three magical categories: [easy-peasy, thrilling, extreme].
  3. Use AI.SCORE to compare and order attractions based on a thrill level, where Rank 10 is the most extreme and Rank 1 is the least.

Success Criteria

To validate this challenge, you must demonstrate the following:

  • Show the prompt and the resulting SQL/Python code generated by the Data Science Agent.
  • Show the trained forecasting model and a query displaying the forecasted wait times for the next 24 hours.
  • Verify the creation and content of the following two tables:
    • thrill_class: containing a column class with the extracted category.
    • thrill_score: containing a column rank with the numerical thrill rank score.

Previous Challenge Next Challenge