Challenge 4: Sprinkle some AI on it

Introduction

The way we consume live media is evolving. Viewers no longer just want to watch; they want to engage, analyze, and get real-time insights. The Gemini Live API is a powerful new tool that enables developers to build real-time, interactive experiences by processing live streams of video and audio. This opens up a world of possibilities for a new generation of live media applications.

Your challenge is to build an innovative application using the Gemini Live API that transforms a live media stream from a Formula E race into an intelligent and interactive experience. Your application will process the live video and audio to provide new value to the audience in real time.

Tools Used

Gemini Live API
Norsk Studio (with AI components)

Description

First we must start with a prompt. The key is to instruct the model on its role and what to look for in the live stream.

For example:

You are a cricket match statistician. I will send you video from a match. For every ball bowled, report on the batsman's current score and the bowler's statistics.

Design a prompt that look at the feed of a Formula E race and look for overtakes and then explain what happened.

Log into your Norsk AI instance and replace placeholder with your actual project id:

URL: https://gemini.endpoints.[your_project_id].cloud.goog
Your coach will provide a username and password

Just as in the last challenge, you will start with a blank canvas in Norsk. On the left is the Component Library.

Add a Camera Feed component and a Gemini AI component to your canvas and connect them.

Configure the Gemini AI Component:

Give it a name
Set the API to Live
Update the System Instructions by replacing the default text with the prompt you designed.

Now we need to get a Gemini API Key

Go to the Google Cloud Console and search for the Gemini API page and enable the API
On that same page, create credentials of type: API key
Provide a name for your api key : example - Gemini API Key
In API restrictions section, click radio button Restrict Key
In the filter select Generative Language API and Vertex AI
A new API is created, save this key, you’ll be using it soon.

Next we will deploy and run the pipeline

SSH into the Norsk AI instance by finding it’s VM in the Google Cloud Console and click the SSH button to open a terminal window.
Add the API key to Norsk’s environment by editing this file:
- /var/norsk-studio/norsk-studio-docker/env/studio-env
- Find the GOOGLE_API_KEY variable and paste in your API key
Restart Norsk by issuing this command:
```
sudo systemctl restart norsk
```

And finally, let’s explore the output we are getting.

Go back to Norsk Studio. Click the Play button to observe the console output from the Gemini AI component and see the live commentary.

Success Criteria

Created a new Gemini API Key
Designed a prompt to look for overtakes in a Formula E race
Connected Gemini Live API to your video feed with commentary/audio feed.
Real time commentary is being produced by Gemini

Learning Resources

Getting Started with Gemini Live API

Previous Challenge Next Challenge