Challenge 2: First steps into the LLM realm

Previous Challenge Next Challenge

Introduction

Let’s get started with a simple objective; we’re going to extract the title of a document using LLMs. In order to work with LLMs we need textual data, so the first step in our process is to extract text data from PDF documents. We’ve already implemented that functionality for you using Cloud Vision APIs in the provided Cloud Run Function. Go ahead and have a look at the extract_text_from_document function to understand where and how the results are stored. Now, with those results we can look into extracting the title from the text content of the document.

Description

For this challenge we’ll use Gemini to determine what the title (including any subtitle) of the uploaded document is, in a cost effective way. We’ve already provided the skeleton of the function extract_title_from_text, all you need to do is come up with the correct prompt and set the right values for the placeholder (in the format function) to pass the document content to your prompt. Once you’ve made your changes re-deploy the Cloud Run Function.

Success Criteria

  • Less than 2500 tokens are used to determine the title.
  • The following papers should yield the corresponding titles, you can see those in the Logs section of the Cloud Run Function. Make sure that only the title is output:

    Paper Title
    LOFAR paper The LOFAR Two-Metre Sky Survey (LOTSS) VI. Optical identifications for the second data release
    PEARL paper PEARLS: Near Infrared Photometry in the JWST North Ecliptic Pole Time Domain Field

Learning Resources

Tips

  • You can edit and redeploy the Cloud Run Function from the Console.
  • You can test your prompts using Vertex AI Studio.
  • You could get the content from PDF files by opening them in PDF reader and copying the text (or if you’re very familiar with the CLI and love experimenting with jq you can do that by using gsutil cat & jq commands from Cloud Shell by accessing the JSON files in the staging bucket).

Previous Challenge Next Challenge