Implementing MLOps on GCP

Introduction

In this hack, you’ll implement the full lifecycle of an ML project. We’ll provide you with a sample code base and you’ll work on automating continuous integration (CI), continuous delivery (CD), and continuous training (CT) for a machine learning (ML) system.

MLOps Overview
Picture is from this article

There’s no coding involved, we’ve already prepared the code to train a simple scikit-learn model; this could’ve been any other framework too, the model code has no dependencies on any Google Services or libraries.

We’re using the New York Taxi dataset to build a RandomForestClassifier to predict whether the tip for the trip is going to be more than 20% of the fare.

First step is all about exploration and running that code in an interactive environment for development and experimentation purposes.

Then we’ll store that code in a version control system so the whole team has access to it and we can keep track of all changes.

After that we’ll automate continuous integration and building of packages through build pipelines in Challenge 3.

Challenge 4 is all about data-to-model pipelines, orchestrating data extraction, validation, preparation, model training, evaluation and validation.

Once the model has been trained, in Challenge 5 we’ll deploy that model to an API endpoint for real-time inferencing, or choose for the batch option and run batch inferencing.

Challenge 6 is all about monitoring that endpoint/batch predictions and detecting any drift/skew between training data and inferencing data.

And finally in Challenge 7 we’ll bring all these things together by tapping into model monitoring and triggering re-training when the model starts to behave off.

Learning Objectives

This hack will help you explore the following tasks:

  • Using Cloud Source Repositories for version control
  • Using Cloud Build for automating continuous integration and delivery
  • Vertex AI for
    • Exploration through an interactive environment
    • Training on diverse hardware
    • Model registration
    • Managed pipelines
    • Model serving
    • Model monitoring

The instructions are minimal, meaning that you need to figure out things :) That’s by design

Challenges

Prerequisites

  • Knowledge of Python
  • Knowledge of Git
  • Basic knowledge of GCP
  • Access to a GCP environment

Contributors

  • Murat Eken