Challenge 6: Monitor your models

Previous Challenge Next Challenge

Introduction

There are times when the training data becomes not representative anymore because of changing demographics, trends etc. To catch any skew or drift in feature distributions or even in predictions, it is necessary to monitor your model performance continuously.

If you’ve chosen the online inferencing path, continue with Online Monitoring, otherwise please skip to the Batch Monitoring section.

Online Monitoring

Description

Vertex AI Endpoints provide Model Monitoring capabilities which will be configured for this challenge. Turn on Training-serving skew detection for your model and use an hourly granularity to get alerts. Create a new notification channel that uses Pub/Sub messages and configure it to use a new Pub/Sub topic.

Send at least 10K prediction requests to collect monitoring data.

Success Criteria

  1. Show that the Model Monitoring is running successfully for the endpoint that’s created in the previous challenge.
  2. Show that there’s new Pub/Sub topic and a Pub/Sub notification channel for the Model Monitoring job.
  3. By default Model Monitoring keeps request/response data in a BigQuery dataset, find and show that data.
  4. No code was modified.

Tips

  • You can use the sample.csv file from Challenge 1 as the baseline data.
  • You can use the same tool you’ve used for the previous challenge to generate the requests, make sure to include some data that has a different distribution than the training data.

Learning Resources

Batch Monitoring

Description

Vertex AI Batch prediction jobs provide Model Monitoring capabilities as well. Create a new Batch Predition job with monitoring turned on with BigQuery input and ouput tables, use default values for the alert thresholds. Create a new notification channel that uses Pub/Sub messages and configure it to use a new Pub/Sub topic.

Success Criteria

  1. There’s a new Batch Prediction job with monitoring turned on.
  2. Show that there’s new Pub/Sub topic and a Pub/Sub notification channel for the Model Monitoring job.
  3. As batch inferencing will take roughly ~10 minutes again, it’s sufficient to show the properly configured job configuration.
  4. No code was modified.

Tips

  • You can use the sample.csv file from Challenge 1 as the baseline training data.
  • You can use the same data you’ve used for the previous challenge to run the batch predictions, make sure to include some data that has a different distribution than the training data.

Learning Resources

Previous Challenge Next Challenge