GCP Data Preparation
This is the first stage of the ML workflow. During this stage, you must upload data and then prepare it for model training with feature engineering.
The data can come from Cloud Storage, BigQuery, or even your local machine.
AutoML supports four types of data:
- image: classify images into either single-label or multi-label, detect objects and discover image segmentation
- tabular: solve regression, classification, or forecasting problems
- text: classify text, extract entities and conduct sentiment analysis
- video: recognize video action, classify videos and track objects
After the data is loaded, the next step is preparing it for model ingestion with feature engineering.
A feature refers to a factor that contributes to the prediction.
Vertex AI provides a service called Vertex AI Feature Store, which is a centralized repository to manage, serve, and share features. It aggregates the features from different sources in BigQuery and makes them available for both real-time (often called online) and batch (often called offline) serving.
Worflow to set up serving via Verted AI Feature Store is:
- Prepare data in BigQuery
- Register the data source
- Configure the connection to the data source
- Serve latest features
Features in the Feature Store are:
- shareable for training and serving
- reusable
- scalable
- easy to use
#certification #engineer #machine #platform #cloud #path #learning #gcp #google #ai #model #database #sql #development #pipelines #feature engineering