Challenge 4: BigQuery ❤ LLMs

Previous Challenge Next Challenge

Introduction

So far we’ve used the Gemini APIs from the Vertex AI Python SDK. It’s also possible to use those through BigQuery, this challenge is all about using BigQuery to run an LLM.

Description

Before we start using the LLMs you’ll need to store the outputs of the Cloud Function in BigQuery. The first step is to create a BigQuery dataset called articles (in multi-region US) and a table summaries with the following columns, uri, name, title and summary.

We’ve already provided the code in the Cloud Function to store the results in the newly created table, just uncomment the call to store_results_in_bq.

Once the table is there, configure BigQuery to use an LLM and run a query that categorizes each paper that’s in the articles.summaries table using their summary. Make sure that the LLM generates one of the following categories: Astrophysics, Mathematics, Computer Science, Economics and Quantitative Biology.

Upload the following papers to Cloud Storage Bucket and run your SQL query in BigQuery to show the title and category of each paper

Warning
Currently GenAI models have a rate limit of 60 calls per minute, since every page from the documents is a single call, if you process more than 60 pages you might run into this limit. None of the provided examples has more than 60 pages, but if you add them all at the same time you’ll get to that limit.

Success Criteria

  • Running the SQL query yields the following results

    Title Category
    From particles to orbits: precise dark matter density profiles using dynamical information Astrophysics
    Bayesian inference methodology to characterize the dust emissivity at far-infrared and submillimeter frequencies Astrophysics
    Computing Twin-Width Parameterized by the Feedback Edge Number Computer Science
    A 4-approximation algorithm for min max correlation clustering Computer Science
    Reconstructing supply networks Economics
    Student debt and behavioral bias: a trillion dollar problem Economics
    Singularities and clusters Mathematics
    Dynamics of automorphism groups of projective surfaces: classification, examples and outlook Mathematics
    Solvent constraints for biopolymer folding and evolution in extraterrestrial environments Quantitative Biology
    Full-Atom Protein Pocket Design via Iterative Refinement Quantitative Biology

Learning Resources

Tips

  • You could download and upload the papers manually, but you can also consider using wget and gsutil from Cloud Shell.
  • If you get errors when using wget, change its user-agent parameter.

Previous Challenge Next Challenge