Challenge 6: SLOs on the dashboard
Previous Challenge Next Challenge
Prerequisites
Run the following command in the Cloud Shell terminal.
Note: With this command we’re priming the backend that generates metrics to behave in a specific way. You’re simulating fixes made to the app after 1-2 months work.
# Check if the BACKEND_ADDRESS env variable is set in your environment before you do this.
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"ChatSuccess": 0.95,
"ChatSafetyIssue": 0.1,
"ChatEngaged": 0.70,
"ChatAcknowledged": 0.15,
"ChatRejected": 0.05,
"ChatUnclassified": 0.1,
"ChatSPositive": 0.6,
"ChatSNegative": 0.1,
"ChatSNeutral": 0.2,
"ChatSUnclassified": 0.1,
"LoginSuccess": 0.999,
"StartupSuccess": 0.95,
"PrefUpdateSuccess": 0.99,
"PrefGetSuccess": 0.999,
"LoginLatencyMinMS": 10,
"LoginLatencyMaxMS": 200,
"ChatLatencyMinMS": 906,
"ChatLatencyMaxMS": 4683,
"StartupLatencyMinMS": 400,
"StartupLatencyMaxMS": 1000,
"PrefGetLatencyMinMS": 153,
"PrefGetLatencyMaxMS": 348,
"PrefUpdateLatencyMinMS": 363,
"PrefUpdateLatencyMaxMS": 645
}' \
$BACKEND_ADDRESS/phase
Introduction
This challenge is about setting up Achievable Service Level Objectives (SLOs) for the app in Cloud Monitoring Suite. These are targets you expect to meet in 1-2 months after making a few improvements to the app.
Description
- Create a service in the UI
Note: You can also create these via the API. Check Tips in Learning Resources for creating services via the API.
- Go to the SLOs tab in the monitoring suite. This is where you’ll define and manage your SLOs.
- Click on + Define Service > Custom Service.
- Give it a Display name.
-
Create 4 SLOs
Use the SLO targets you defined in the previous challenge. Use Request Based SLI calculations for now and NOT window based
- Chat Latency:
- Metric: movieguru_chat_latency_milliseconds_bucket (look under the prometheus targets > movieguru section)
- Chat Engagement:
- Metric: movieguru_chat_outcome_counter_total (Filter: Outcome=Engaged)
- Remarks: Ideally we would like to use Outcome=Engaged and Outcome=Acknowledged to indicate that the user finds the response relevant, but we will stick to just Engaged for now.
- [Optional] If you want to use a filter that incorporates both Engaged and Acknowledged, use the monitoring API to create the SLO (see example).
- Main Page Load Latency:
- Metric: movieguru_startup_latency_milliseconds_bucket (measured at the startup endpoint)
- Main Page Load Success Rate:
- Metric: This requires finding the ratio of two metrics: movieguru_startup_success_total and movieguru_startup_attempts_total.
- Remarks: Since the UI doesn’t support combining metrics, you’ll need to use the Cloud Monitoring API to define this SLO.
- Chat Latency:
Success Criteria
- You have all the SLOs created.
- You have created at least 1 SLO through the Monitoring API.
Learning Resources
Why are we creating services again?
In the context of creating GCP SLOs and services, creating a service doesn’t mean building the service itself from scratch. It just means defining a group of SLOs under a single service umbrella.
Why are some error budgets negative?
The SLIs of the service are measured from the start of the lab when the app was performing badly. This means, that your error budget was eaten into even before the SLOs were created. The budgets will reset once the compliance window reaches an end.
Tips
See below for high level steps for creating services and SLOs via API
Use the Setting SLOs with API as a reference for finding the right commands for the following steps.
- Create an access token.
- Create a service with a name like movieguru-backend (you can use a pre-existing service, but their id’s need to be referenced. For this step, it’s just easier to create one.)
- Create an SLO definition.
- Create the SLO from the definition.
Example
If creating a new service, run this step. If reusing an existing service, skip this step.
# Make sure the env variable PROJECT_ID is set.
echo $PROJECT_ID
# Get an access token
ACCESS_TOKEN=`gcloud auth print-access-token`
# Give the service a name
SERVICE_DISPLAY_NAME="my-first-service"
# Create a custom service definition
CREATE_SERVICE_POST_BODY=$(cat <<EOF
{
"displayName": "${SERVICE_DISPLAY_NAME}",
"custom": {},
"telemetry": {}
}
EOF
)
# POST to create the service
curl --http1.1 --header "Authorization: Bearer ${ACCESS_TOKEN}" --header "Content-Type: application/json" -X POST -d "${CREATE_SERVICE_POST_BODY}" https://monitoring.googleapis.com/v3/projects/${PROJECT_ID}/services?service_id=${SERVICE_DISPLAY_NAME}
- Create an SLO:
- Get the unique id of the service, by looking in the service page. This is often different from the service display name.
- Run the following command.
# Make sure the env variable PROJECT_ID is set.
echo $PROJECT_ID
# Unique Service ID of an existing service
SERVICE_ID=<service UNIQUE id>
# Get an access token
ACCESS_TOKEN=`gcloud auth print-access-token`
# Create an SLO definition
CHAT_ENGAGEMENT_SLO_POST_BODY=$(cat <<EOF
{
"displayName": "70% - Chat Engagement Rate - Rolling 24 Hour",
"goal": 0.7,
"rollingPeriod": "86400s",
"serviceLevelIndicator": {
"requestBased": {
"goodTotalRatio": {
"goodServiceFilter": "metric.type=\"prometheus.googleapis.com/movieguru_chat_outcome_counter_total/counter\" resource.type=\"prometheus_target\" metric.labels.Outcome=monitoring.regex.full_match(\"Engaged|Acknowledged\")",
"totalServiceFilter": "metric.type=\"prometheus.googleapis.com/movieguru_chat_outcome_counter_total/counter\" resource.type=\"prometheus_target\""
}
}
}
}
EOF
)
# POST the SLO definition
curl --http1.1 --header "Authorization: Bearer ${ACCESS_TOKEN}" --header "Content-Type: application/json" -X POST -d "${CHAT_ENGAGEMENT_SLO_POST_BODY}" https://monitoring.googleapis.com/v3/projects/${PROJECT_ID}/services/${SERVICE_ID}/serviceLevelObjectives