Challenge 8: What’s really UP, doc?
Prerequisites
- Connect to the GKE cluster from the Cloud Shell terminal by pasting the following command.
$GKE_CONNECTION_STRING ## This should print out the connection string command that will give your terminal credentials to connect to the GKE cluster
- Deploy a new frontend version (your devs have made a change and you want to have a new version running alongside the old one)
kubectl apply -f <(curl -s https://raw.githubusercontent.com/MKand/movie-guru/refs/heads/ghack-sre/k8s/frontend-v2.yaml)
-
Reset the backend server
Note: With this command we’re priming the backend that generates metrics to behave in a specific way. This simulates your colleagues making some changes that might have broken/fixed a few things.
## Check if the BACKEND_ADDRESS env variable is set in your environment before you do this. curl -X POST \ -H "Content-Type: application/json" \ -d '{ "ChatSuccess": 0.95, "ChatSafetyIssue": 0.1, "ChatEngaged": 0.70, "ChatAcknowledged": 0.15, "ChatRejected": 0.05, "ChatUnclassified": 0.1, "ChatSPositive": 0.6, "ChatSNegative": 0.1, "ChatSNeutral": 0.2, "ChatSUnclassified": 0.1, "LoginSuccess": 0.999, "StartupSuccess": 0.95, "PrefUpdateSuccess": 0.99, "PrefGetSuccess": 0.999, "LoginLatencyMinMS": 10, "LoginLatencyMaxMS": 200, "ChatLatencyMinMS": 906, "ChatLatencyMaxMS": 4683, "StartupLatencyMinMS": 400, "StartupLatencyMaxMS": 1000, "PrefGetLatencyMinMS": 153, "PrefGetLatencyMaxMS": 348, "PrefUpdateLatencyMinMS": 363, "PrefUpdateLatencyMaxMS": 645 }' \ $BACKEND_ADDRESS/phase
Introduction
The Calm Before the Storm: You settle in for another day of SRE serenity, casually monitoring the dashboards and basking in the glow of Movie Guru’s stable performance. Suddenly, your peaceful morning is shattered by a frantic colleague from customer support.
“Mayday! Mayday!” they exclaim, bursting into your cubicle. “Users are reporting that Movie Guru is acting up! They can’t seem to use the website properly!”
Description
-
Look at your SLO dashboards to spot issues (wait a few minutes before you check).
- Investigate the Issue:
- To get to the bottom of this mystery, open a new incognito/private browser window and navigate to the Movie Guru frontend.
- Refresh the page a few times and see if you spot something wrong.
- Compare your observations with the data displayed on the dashboards. What discrepancies do you notice?
- Explain the reason for the difference between what users are reporting and what the dashboards are showing.
- How can you improve your monitoring to better reflect the actual user experience?
Success Criteria
- Identify the monitoring gap
- Pinpoint the potential cause
- Propose solutions to improve monitoring
- [Optional] Dive deeper: Investigate further and discover the root cause