Challenge 4: Sprinkle some AI on it
Previous Challenge Next Challenge
Introduction
The way we consume live media is evolving. Viewers no longer just want to watch; they want to engage, analyze, and get real-time insights. The Gemini Live API is a powerful new tool that enables developers to build real-time, interactive experiences by processing live streams of video and audio. This opens up a world of possibilities for a new generation of live media applications.
Your challenge is to build an innovative application using the Gemini Live API that transforms a live media stream from a Formula E race into an intelligent and interactive experience. Your application will process the live video and audio to provide new value to the audience in real time.
Tools Used
- Gemini Live API
- Norsk Studio (with AI components)
Description
First we must start with a prompt. The key is to instruct the model on its role and what to look for in the live stream.
For example:
You are a cricket match statistician. I will send you video from a match. For every ball bowled, report on the batsman's current score and the bowler's statistics.
Design a prompt that look at the feed of a Formula E race and look for overtakes and then explain what happened.
Log into your Norsk AI instance and replace placeholder with your actual project id:
- URL: https://gemini.endpoints.[your_project_id].cloud.goog
- Your coach will provide a username and password
Just as in the last challenge, you will start with a blank canvas in Norsk. On the left is the Component Library.
Add a Camera Feed component and a Gemini AI component to your canvas and connect them.
Configure the Gemini AI Component:
- Give it a name
- Set the API to Live
- Update the System Instructions by replacing the default text with the prompt you designed.
Now we need to get a Gemini API Key
- Go to the Google Cloud Console and search for the Gemini API page and enable the API
- On that same page, create credentials of type: API key
- Provide a name for your api key : example - Gemini API Key
- In API restrictions section, click radio button Restrict Key
- In the filter select Generative Language API and Vertex AI
- A new API is created, save this key, you’ll be using it soon.
Next we will deploy and run the pipeline
- SSH into the Norsk AI instance by finding it’s VM in the Google Cloud Console and click the SSH button to open a terminal window.
- Add the API key to Norsk’s environment by editing this file:
/var/norsk-studio/norsk-studio-docker/env/studio-env
- Find the GOOGLE_API_KEY variable and paste in your API key
-
Restart Norsk by issuing this command:
sudo systemctl restart norsk
And finally, let’s explore the output we are getting.
Go back to Norsk Studio. Click the Play button to observe the console output from the Gemini AI component and see the live commentary.
Success Criteria
- Created a new Gemini API Key
- Designed a prompt to look for overtakes in a Formula E race
- Connected Gemini Live API to your video feed with commentary/audio feed.
- Real time commentary is being produced by Gemini