Challenge 4: BigQuery ❤ LLMs
Previous Challenge Next Challenge
Introduction
So far we’ve used the Gemini APIs from the Vertex AI Python SDK. It’s also possible to use those through BigQuery, this challenge is all about using BigQuery to run an LLM.
Description
Before we start using the LLMs you’ll need to store the outputs of the Cloud Function in BigQuery. The first step is to create a BigQuery dataset called articles
(in multi-region US) and a table summaries
with the following columns, uri
, name
, title
and summary
.
We’ve already provided the code in the Cloud Function to store the results in the newly created table, just uncomment the call to store_results_in_bq
.
Once the table is there, configure BigQuery to use an LLM and run a query that categorizes each paper that’s in the articles.summaries
table using their summary
. Make sure that the LLM generates one of the following categories: Astrophysics
, Mathematics
, Computer Science
, Economics
and Quantitative Biology
.
Upload the following papers to Cloud Storage Bucket and run your SQL query in BigQuery to show the title and category of each paper
- Astrophysics
- Astrophysics
- Computer Science
- Computer Science
- Economics
- Economics
- Mathematics
- Mathematics
- Quantitative Biology
- Quantitative Biology
Warning
Currently GenAI models have a rate limit of 60 calls per minute, since every page from the documents is a single call, if you process more than 60 pages you might run into this limit. None of the provided examples has more than 60 pages, but if you add them all at the same time you’ll get to that limit.
Success Criteria
-
Running the SQL query yields the following results
Title Category From particles to orbits: precise dark matter density profiles using dynamical information Astrophysics Bayesian inference methodology to characterize the dust emissivity at far-infrared and submillimeter frequencies Astrophysics Computing Twin-Width Parameterized by the Feedback Edge Number Computer Science A 4-approximation algorithm for min max correlation clustering Computer Science Reconstructing supply networks Economics Student debt and behavioral bias: a trillion dollar problem Economics Singularities and clusters Mathematics Dynamics of automorphism groups of projective surfaces: classification, examples and outlook Mathematics Solvent constraints for biopolymer folding and evolution in extraterrestrial environments Quantitative Biology Full-Atom Protein Pocket Design via Iterative Refinement Quantitative Biology
Learning Resources
- Creating BigQuery datasets and tables
- BigQuery LLM support
Tips
- You could download and upload the papers manually, but you can also consider using
wget
andgsutil
from Cloud Shell. - If you get errors when using
wget
, change its user-agent parameter.