Practical SRE

Welcome to the Practical SRE (Site Reliability Engineering) gHack! In this hands-on session, you’ll step into the role of SREs and Product Owners for the Movie Guru GenAI app—a cutting-edge application that helps users find movies using natural language queries powered by AI. Your mission is to ensure that Movie Guru delivers a smooth, reliable, and responsive experience for its users.

The Movie Guru app’s backend is currently running in your cloud environments and has been pre-instrumented to silently generate a wealth of metrics through the use of a load generator. This means that as you work on these challenges, you’ll have access to valuable data reflecting its performance and user interactions, allowing you to make informed decisions throughout the workshop.

By the end of this workshop, you’ll have developed a comprehensive reliability framework for Movie Guru, gaining practical SRE skills that can be applied to real-world systems.

Remember, if there is a term being used in the challenge you don’t understand, look at the Learning Resources section at the bottom of the challenge text. Otherwise, Google can be your best friend.

Learning Objectives

In this hack you will learn how to:

  1. Identify User Journeys
  2. Identify your stakeholders in an organization.
  3. Design realistic SLOs
  4. Understanding metrics dashboards in Google Cloud Monitoring.
  5. Create SLOs in Google Cloud Monitoring.
  6. Creating Alerts
  7. SRE best practices

Challenges

Prerequisites

  • Your own GCP project with Editor IAM role.
  • kubectl command line tool
  • gcloud command line tool
  • Note We recommend using the Cloud Shell to run the challenges as it has all the necessary tooling already installed.

Contributors

  • Manasa Kandula
  • Steve McGhee