Challenge 3: Sentiment and wait-time forecasting
Previous Challenge Next Challenge
| Target Persona: Data Scientist / Data Analyst | Estimated Duration: 45 minutes | Note: Can be parallelized immediately using raw CSVs in BigQuery. |
Introduction
Analyzing visitor sentiment and forecasting ride waiting times are crucial to improving the guest experience. In this challenge, you will use BigQuery ML and the new BigQuery Studio Data Science Agent to perform automated sentiment classification on reviews, train a time-series forecasting model to predict future waiting times, and build unsupervised classification and ranking models to categorize attractions by intensity.
Description
Task 3.1: Automated Sentiment Analysis with BQ Studio Data Science Agent
Rather than writing Python code from scratch, you will leverage the new Data Science Agent in BigQuery Studio to accelerate your analysis.
Important Dependency Note: This task queries the
disneyland_reviewstable in BigQuery, which is replicated from AlloyDB. This requires Challenge 1 (specifically the Datastream replication in Task 1.3) to be completed first.
- Open the Data Science Agent panel in BigQuery Studio.
- Using natural language, prompt the agent to write a SQL query or a Python notebook that classifies the sentiment of the reviews in
disneyland_reviewsintoPositive,Negative, orNeutral. - The agent should suggest using
AI.GENERATE_TEXTorAI.GENERATEwith a Gemini model (e.g.,gemini-2.5-flash) to perform the sentiment classification. - Run the generated query on a sample of 100 reviews and save the results into a new table
reviews_sentiment_analysis.
Task 3.2: Time-Series Wait Time Forecasting
We want our guest assistant to predict wait times for any hour of the day.
- Load the historical wait times dataset from:
gs://ghacks-disneyland-on-gcp/waiting_time.csvinto a BigQuery table namedwaiting_times. - Use BigQuery ML to train a time-series forecasting model. You can choose either:
- ARIMA_PLUS: The classic, fast statistical forecasting model.
- TimesFM: Google’s state-of-the-art foundation model for time-series forecasting (using
AI.FORECAST).
- Forecast the wait times for all attractions for the next 24 hours in 30-minute intervals, and save the results in a table named
forecasted_waiting_times.
Task 3.3: Ride Clustering (Intensity & Popularity)
To better classify our rides, we will group attractions into logical clusters using unsupervised learning.
- Build a query that aggregates statistics for each attraction: average wait time, total review count, and average rating.
- Use
AI.CLASSIFYto categorize rides based on their descriptions into one of three magical categories:[easy-peasy, thrilling, extreme]. - Use
AI.SCOREto compare and order attractions based on a thrill level, where Rank 10 is the most extreme and Rank 1 is the least.
Success Criteria
To validate this challenge, you must demonstrate the following:
- Show the prompt and the resulting SQL/Python code generated by the Data Science Agent.
- Show the trained forecasting model and a query displaying the forecasted wait times for the next 24 hours.
- Verify the creation and content of the following two tables:
thrill_class: containing a columnclasswith the extracted category.thrill_score: containing a columnrankwith the numerical thrill rank score.