Challenge 8: Cloud Composer for orchestration
Previous Challenge Next Challenge
Introduction
Running the Dataform pipelines manually works, but it’s not very practical. We’d rather automate this process and run it periodically. Although Dataform provides a lot of functionality to automate and schedule running the pipelines, we’re going to consider a bit more flexible orchestrator that can also run additional steps that might not be part of the Dataform pipelines, such as pulling data from source systems, running the inferencing with the model we’ve created in the last challenge etc.
This challenge is all about Cloud Composer, which is basically a managed and serverless version of the well-known Apache Airflow framework, to schedule and run our complete pipeline.
Note There’s a myriad of different orchestration services on Google Cloud, see the documentation for more information and guidance on which one to pick for your specific needs.
Description
We’ve already created a Cloud Composer environment for you. You need to configure and run this pre-configured DAG (which is basically a collection of tasks organized with dependencies and relationships) on that environment. The DAG (Directed Acyclic Graph) is scheduled to run daily at midnight, pulls source data from different source systems (although in our case it’s using a dummy operator to illustrate the idea), runs the Dataform pipeline to generate all of the required tables, and finally runs the latest version of our churn model on our customer base to predict which customers will be churning and stores the predictions in a new BigQuery table.
Find the DAGs bucket for the Cloud Composer environment and copy the provided DAG into the correct location. Update the environment variables of the Cloud Composer environment to refer to the correct Dataform repository and use the tag v1.0.3
as the Git reference.
Note It might take a few minutes for the DAG to be discovered by Airflow, be patient :) Once the DAG is discovered it will be started automatically, make sure to configure the environment variables before you upload the DAG.
Success Criteria
- There’s a new DAG that’s triggered every day at midnight.
- There’s at least one successful run of the DAG.
- No code was modified.