GCP Model Serving

2025-01-23

After data preparation and model development, we are at the final stage of the ML workflow: model serving.

Two steps:

Model deployment
Model monitoring

Model management exists throughout this whole workflow to manage the underlying machine learning infrastructure.

Model deployment

We have two options:

Deploy the model to an endpoint for real-time predictions (or often called online predictions); best when immediate results with low latency are needed
Request the prediction job directly from the model resource for batch prediction; best when no immediate response is required and no endpoint is needed

Deploying the model off-cloud is also possible. This approach is generally adopted when the model needs to be deployed in a specific environment to mitigate latency, ensure privacy, or enable offline functionality.

Model monitoring

Once the model is deployed and begins making predictions or generating contents, it is important to monitor its performance.

The backbone of automating ML workflow on Vertex AI is a toolkit called Vertex AI Pipelines. It automates, monitors, and governs machine learning systems by orchestrating the workflow in a serverless manner.

With Vertex AI Workbench, which is a notebook tool, you can define your own pipeline using SDKs.

#certification #engineer #machine #platform #cloud #path #learning #gcp #google #ai #model #database #sql #development #pipelines #feature engineering