APACHE-2.0 License
This repository contains two notebooks to demonstrate how to automate to produce a new AutoML model when the new dataset comes in. This project uses Vertex AI
in general, Vertex Managed Dataset
, Vertex Pipeline
, Vertex AutoML
, Cloud Storage
, and Cloud Function
in Google Cloud Platform.
There are two notebooks for this project. Everything can be setup by running each cell in the notebooks. The only thing you need to do manually is to setup IAMs.
For Vertex Pipeline
, we need Vertex Admin
, Cloud Storage Viewer
, Cloud Storage Editor
permissions(Some ML components need to access the managed dataset, and the Pipeline itself is stored in GCS(Google Cloud Storage) bukcet, so we need the listed permissions.). This can be setup under compute
service account since Vertex Pipeline
uses compute engines to run each component of the ML Pipeline. Also don't forget to enable compute service account
and Vertex AI API.
For Cloud Function
, we need Vertex Admin
and Cloud Build
enabled. Since the docker image that the Cloud Function
bases should be built by Cloud Build
, we need Cloud Build API
enabled. Also, Cloud Funcion
will trigger the Vertex Pipeline
, so we need Vertex Admin
Kubeflow Pipeline
with two additional custom components which determines if there is existing dataset or not. The entire notebook produces the pipeline spec json
file and put it in the GCS bucket.Cloud Function
, and you can directly deploy it within the notebook. Lastly, it tries to simulate the continuous adaptation scenario by putting each subset of data sequentially.