Vertex AI (Google Cloud Platform)

8.2.1. Vertex AI (Google Cloud Platform)#

8.2.1.1. Overview#

overview

8.2.1.1.1. Introduction#

8.2.1.1.1.1. ML workflow#

Data preparation
- EDA
  - For small + medium dataset: Vertex AI Workbench
  - For large dataset: Dataproc Serverless Spark
- Storage
- Feature engineering: Vertex AI managed dataset
- Labeling
Model training
- Training
  - AutoML : without writing code and preparing data splits (tabular, image, text, video)
  - Custom training : control over the training process (ML framework, own training code, hyperparameter tuning options)
  - Model garden : choose the pretrain model (open-source models) to test, customize, deploy.
  - Generative AI: Access to Google large gen AI models for multiple modalities (text, code, images, speech), then tunning and deploy
- Tuning
  - Hyper-parameters:
    - For simple custom-trained models: Custom tuning jobs
    - For complex ML models: Vertex AI Vizier
  - Algorithms:
    - Multi-algorithms training: Vertex AI Experiments
    - Tensorflow: Vertex AI TensorBoard
  - Manage model versions
    - Register the version of trained models: Vertex AI Model Registry
- Evaluation (list of Model evaluation)
  - Evaluation in Vertex AI Model Registry
  - Evaluation in Vertex AI Pipelines
Model serving and monitoring
- Serving: deploy model to production and get predictions
  - For Custom training models
    - Real-time online predictions: prebuilt or custom containers
    - Batch predictions
  - BigQuery ML
- Manage features: Vertex AI Feature Store (for Tabular)
- Explain model: Vertex Explainable AI
- Monitoring: Vertex AI Model Monitoring (training-serving skew and prediction drift and sends you alerts when the incoming prediction data skews too far from the training baseline)

8.2.1.1.1.2. Training and deployment option#

AutoML lets you train tabular, image, text, or video data without writing code or preparing data splits.
Custom training gives you complete control over the training process, including using your preferred ML framework, writing your own training code, and choosing hyperparameter tuning options.
Model Garden lets you discover, test, customize, and deploy Vertex AI and select open-source (OSS) models and assets.
Generative AI gives you access to Google’s large generative AI models for multiple modalities (text, code, images, speech). You can tune Google’s LLMs to meet your needs, and then deploy them for use in your AI-powered applications.

8.2.1.1.1.3. Vertex-ai interaction ways#

Google Console (graphical user interface)
Google Cloud command-line interface (CLI) (gcloud ai command)
Terraform support for Vertex AI (Terraform)
Vertex AI SDK for Python (Python)
Vertex AI API REST (API)

8.2.1.1.2. Setup environment#

Create project and enable billing
Enable Vertex AI API
Install the Google Cloud CLI.
To initialize the gcloud CLI, run the following command:

gcloud init

Update and install gcloud components:

gcloud components update  
gcloud components install beta

Add role (read doc)
Install Vertex AI SDK for Python

8.2.1.1.3. Training methods#

Choosing a training method

	AutoML	BigQuery ML	Custom training
Characteristic	- minimal technical effort - quickly prototype models and explore new datasets	- train models using your BigQuery data directly in BigQuery using SQL commands -use SQL to get batch predictions	- create a training application optimized for your targeted outcome - control over training application functionality
DS expertise	No	No	Yes
Programming ability	No	SQL (build, evaluate model)	Yes
Time to trained model	- Lower - Less data preparation is required, and no development is needed.	- Lower - Don’t need build the infrastructure required for batch predictions or model training, as BigQuery ML leverages the BigQuery computational engine –> increases speed to training, evaluation, and prediction.	- Higher - More data preparation is required, and training application development is needed.
Limits on machine learning objectives	Yes (only AutoML’s predefined objectives)	Yes	No
Manually optimize model performance with hyperparameter tuning	No (automated hyperparameter tuning)	Yes. BigQuery ML supports hyperparameter tuning when training ML models using `CREATE MODEL` statements.	Yes
Control aspects of the training environment	- Limited. - Specify the number of node hours to train for + allow early stopping of training (for `tabular` + `image`)	No.	Yes. - Compute Engine machine type - Disk size - ML framework - Number of nodes.
Limits on data size	Yes - Preparing image training data - Preparing tabular training data - Preparing text training data - Preparing video training	Yes (base on Quotas)	- For unmanaged datasets: No - For managed datasets: same AutoML

8.2.1.1.4. Notebook tutorials#

List of totebook tutorials

8.2.1.2. Vertex AI Notebook#

Có 2 hướng tiếp cận để sử dụng được notebook:

- Colab Enterprise:
  - Chia sẻ và cộng tác: Dễ dàng chia sẻ notebook với người dùng khác, nhóm Google hoặc miền Google Workspace.
  - Quản lý hạ tầng: Không cần quản lý hạ tầng, Colab Enterprise tự động cung cấp và tắt runtime khi không cần thiết.
  - Tích hợp dịch vụ Google Cloud: Tích hợp với các dịch vụ như Vertex AI và BigQuery.
Vertex AI Workbench:
- Tùy chỉnh cao: Hỗ trợ nhiều loại instance Jupyter notebook và có thể thêm môi trường conda.
- Tích hợp dữ liệu: Truy cập dữ liệu từ Cloud Storage và BigQuery trực tiếp trong JupyterLab.
- Lập lịch và quản lý chi phí: Có thể chạy notebook theo lịch trình và tự động tắt khi không hoạt động.

Feature	Colab Enterprise	Vertex AI Workbench
Environment	Managed, collaborative	Customizable, developer-focused
Infrastructure Management	Serverless, managed by Google	User-controlled, flexible
Collaboration	Excellent, with IAM control	Good, with GitHub integration
Compute Provisioning	Automatic	User-configurable
Data Integration	Seamless with Google Cloud services	Seamless with Google Cloud services
Code Completion	Inline	Inline
Customization	Limited	Extensive
GPU Support	✓	✓
Conda Environments	✗	✓
Custom Containers	✗	✓
Automated Notebook Runs	✗	✓
Idle Shutdown	Automatic	Configurable
Persistent Storage	✗	✓
Access to VM	✗	✓
Original Jupyter UI	Modified	Retained
Khi nào nên sử dụng:

Colab Enterprise: Khi cần chia sẻ và cộng tác dễ dàng, không muốn quản lý hạ tầng.
Vertex AI Workbench: Khi cần tùy chỉnh cao và tích hợp sâu với các dịch vụ dữ liệu của Google Cloud.

8.2.1.2.1. Colab Enterprise#

(doc)

Key Features:

🔗 Share and Collaborate: Easily share notebooks with individuals, Google groups, or entire Google Workspace domains. Access control is handled through Google Cloud’s IAM.
🌐 Managed Compute: Colab Enterprise takes care of provisioning and managing compute resources. It starts runtimes when needed and shuts them down when not in use.
✅ Google Cloud Integration: Seamlessly work with Google Cloud services like Vertex AI and BigQuery from within your notebook.
✨ Inline Code Completion: Write code faster with suggestions that pop up as you type.
Runtime: a compute resource to run code in notebook
Runtime template: configure the template to optimize a runtime’s performance, cost, and other characteristics based on demand and problem.

Read Machine type & disk type to select resources suitable for the purpose

Cons:

Less efficient with heavy workloads: extended for long tasks or want the data to persist on the disk of the machine once it’s turned off (or released, in this case)
Not control the environment

Pricing: base on amount of used resources time include:

Compute Engine: the virtual machine that runs the notebook
Storage: data + source code
Networking: Communication between notebook and other services

8.2.1.2.2. Vertex AI Workbench#

Key Features:

👨🏻‍💻 Access to the VM: Unlike Colab Enterprise, you get full access to the virtual machine itself, allowing for in-depth configuration tailored to your specific needs. You can integrate more easily with your GCP environment based on IAM.
📦 Persistent Storage: Data isn’t lost when the machine restarts, as the VM’s disk is retained, ensuring your data remains intact.
☑ Controlling Instance Types: Choose from several types of instances, including N2 CPU or any GPU offering that GCP has.
🤏 Preinstalled Packages and GPU Support: All instances come with JupyterLab and a suite of deep learning packages like TensorFlow and PyTorch, with GPU support available.
</> GitHub Integration: Sync your notebooks with GitHub for version control and collaboration.
💾 Custom Environments and Containers: Add conda environments or create custom containers to tailor your setup to specific needs, so you don’t need to install dependencies every time a team member wants to launch a new machine.
👾 Data Integration: Access Cloud Storage and BigQuery directly from JupyterLab by identifying either as the user working on the notebook or as a service account.
🛠️ Automated Notebook Runs and Idle Shutdowns: Schedule notebook runs and automatically shut down idle instances to manage costs effectively.
🖥️ Original Jupyter UI: Workbench retains more of the original Jupyter UI, providing a cleaner and more familiar interface for users accustomed to Jupyter notebooks.

Pricing: base on amount of used resources time include:

CPU + RAM + GPU (if used): Charge only while running instance and execute code
Storage (boot disk + data disk): alway charging even if the instance is shutdown, this is because the data is stall stored on the disk
Workbench management fees: only charge when the instance is running

Tips
- If there is any scheduled tasks (like running notebook in the instance), those tasks will still execute even if the instance is shutdown, then this would be charged for the resources used during those executions
- Cost of Persistent storage base on the the actual amount of provisioned disk space. Therefore, it’s still a good idea to choose a size that’s appropriate for needs.
- The data should be stored in cloud storage (like buckets), you’re charged based on the amount of data you actually store in the bucket. This is called “used storage” and is a more flexible way to pay for storage.

8.2.1.2.2.1. Setup Instances#

8.2.1.2.2.1.1. Create an instance #

8.2.1.2.2.1.2. Instance shutdown#

Shutdown event:

Manual click to shutdown
After the idle inactive period
There is no kernel activity for the specified time period

running a cell or new output printing to a notebook is activity that resets the idle shutdown timer

Billing:

While your instance is shut down, there are no CPU or GPU usage charges except for scheduled executions that run during the shutdown
Disk storage charges still apply while your instance is shut down. For more information, see Pricing.

Automated shutdown: Shut down after being idle for a specific time period by default

Scheduled executions: Scheduled executions run while instance is shut down

gcloud CLI config:

Create instance

gcloud workbench instances create INSTANCE_NAME --metadata=idle-timeout-seconds=86400

Update instance

gcloud workbench instances update INSTANCE_NAME --metadata=idle-timeout-seconds=86400

8.2.1.2.2.1.3. Change the machine type and configure GPUs #

8.2.1.2.2.1.4. Migrate your data to a new Vertex AI Workbench instance #

8.2.1.2.2.1.5. Remote SSH #

8.2.1.2.2.1.6. Limitation #

8.2.1.2.2.2. Schedule run noteboook#

Set scheduler

Next to your instance’s name, click **Open JupyterLab
In the folder File Browser, double-click the example notebook file to open it.
Click the Execute button.
In the Submit notebooks to Executor dialog, in the Type field, select Schedule-based recurring executions.

By default, the executor runs your notebook file every hour at the 00 minute of the hour.
In Advanced options, enter a name for your bucket in the Cloud Storage bucket field, and then click Create and select. The executor stores your notebook output in the Cloud Storage bucket.
Click Submit. Your notebook file runs automatically on the schedule that you set.

View, share, and import an executed notebook file

8.2.1.2.2.3. Connect to data#

8.2.1.2.2.3.1. BigQuery Table #

8.2.1.2.2.3.1.1. Browse BigQuery resources #

In BigQuery in Notebooks. The BigQuery pane lists available projects and datasets

8.2.1.2.2.3.1.2. Query by Bigquery Magic Command #

To use these magics, you must first register them. Run the %load_ext magic in a Jupyter notebook cell.

%load_ext google.cloud.bigquery

The %%bigquery magic runs a SQL query and returns the results as a pandas DataFrame

%%bigquery  
SELECT name, SUM(number) as count  
FROM `bigquery-public-data.usa_names.usa_1910_current`  
GROUP BY name  
ORDER BY count DESC  
LIMIT 10

Assign the query results to a variable

%%bigquery df
SELECT name, SUM(number) as count  
FROM `bigquery-public-data.usa_names.usa_1910_current`  
GROUP BY name  
ORDER BY count DESC  
LIMIT 10

df

Explicitly specify a project

project_id = 'your-project-id'

%%bigquery --project $project_id  
SELECT name, SUM(number) as count  
FROM `bigquery-public-data.usa_names.usa_1910_current`  
GROUP BY name  
ORDER BY count DESC  
LIMIT 10

Run a parameterized query

params = {"limit": 10}

%%bigquery --params $params  
SELECT name, SUM(number) as count  
FROM `bigquery-public-data.usa_names.usa_1910_current`  
GROUP BY name  
ORDER BY count DESC  
LIMIT @limit

Get a summary of data

%bigquery_stats bigquery-public-data.google_trends.top_terms

After running for some time, an image appears with various statistics on each of the 7 variables in the top_terms table. The following image shows part of some example output:

International top terms overview of statistics.

8.2.1.2.2.3.1.3. Query by Bigquery Client Library #

from google.cloud import bigquery

class BigqueryConnector:
    def __init__(self, project_id):
        self.project_id = project_id
        self.client = bigquery.Client(project_id)

    def read_query(
        self, query: str, chunk_size: int | None = None
    ) -> Union[Iterator[pd.DataFrame], pd.DataFrame]:
        """
        Executes a BigQuery query and returns an iterator of pandas DataFrames if chunk_size is provided,
        otherwise returns a single pandas DataFrame.
        """
        query_job = self.client.query(query)
        result = query_job.result(page_size=chunk_size)
        return (
            result.to_dataframe_iterable()
            if chunk_size
            else result.to_dataframe()
        )

    def read_table(self, table_id):
        table = self.read_query(f"SELECT * FROM `{table_id}`")
        return table

    def write_bq(self, dataframe, table_id, if_exists="append", schema=None):
        write_mode = (
            "WRITE_TRUNCATE" if if_exists == "replace" else "WRITE_APPEND"
        )
        schema = (
            [
                bigquery.SchemaField(
                    name,
                    type_.upper(),
                    "NULLABLE" if mode is None else mode.upper(),
                )
                for name, type_, mode in schema
            ]
            if schema is not None
            else []
        )
        job_config = bigquery.LoadJobConfig(
            schema=schema,
            write_disposition=write_mode,
        )
        job = self.client.load_table_from_dataframe(
            dataframe, table_id, job_config=job_config
        )
        job.result()
        
    # write a function to update value in bigquery
    def update_bq(self, table_id, update_value, conditions={}):
        for k in update_value.keys():
            if isinstance(update_value[k], str):
                update_value[k] = f"'{update_value[k]}'"
        for k in conditions.keys():
            if isinstance(conditions[k], str):
                conditions[k] = f"'{conditions[k]}'"
        set_stasement = ", ".join(
            [f"{k} = {v}" for k, v in update_value.items()]
        )
        conditions = "".join(
            [f" and {k} = {v}" for k, v in conditions.items()]
        )
        sql = f"""
        UPDATE `{table_id}`
        SET
        {set_stasement}
        WHERE
        1 = 1 {conditions}
        """
        # return sql
        job = self.client.query(sql)
        job.result()

    def create_table(
        self, table_id, fields, partition_by=None, cluster_by=None
    ):
        schema = [
            bigquery.SchemaField(
                i["name"],
                i["type"].upper(),
                mode=(i["mode"] if "mode" in i else "NULLABLE"),
            )
            for i in fields
        ]
        table = bigquery.Table(table_id, schema=schema)
        if partition_by:
            partitioning = bigquery.TimePartitioning(
                type_=bigquery.TimePartitioningType.DAY,
                field=partition_by,
            )
            table.time_partitioning = partitioning
        if cluster_by:
            table.clustering_fields = cluster_by
        self.client.create_table(table, exists_ok=True)
        print(f"Created table '{table_id}' successfully.")

8.2.1.2.2.3.2. Cloud Storage buckets #

To mount and then access a Cloud Storage bucket, do the following:

In JupyterLab, make sure the folder File Browser tab is selected.
In the left sidebar, click the Mount shared storage button. If you don’t see the button, drag the right side of the sidebar to expand the sidebar until you see the button.
In the Bucket name field, enter the Cloud Storage bucket name that you want to mount.
Click Mount.
Your Cloud Storage bucket appears as a folder in the File browser tab of the left sidebar. Double-click the folder to open it and browse the contents.

8.2.1.2.2.4. Github integration #

8.2.1.2.2.5. Maintain#

8.2.1.2.2.5.1. Add a new conda environment #

If to want using pip

conda install pip
pip install <PACKAGE>
pip install -r requirements.txt

8.2.1.2.2.5.2. Modify a conda kernel#

Vertex AI Workbench instances come with pre-installed frameworks such as PyTorch and TensorFlow. If you need a different version, you can modify the libraries by using pip in the relevant conda environment.

For example, if you want to upgrade PyTorch:

# Check the name of the conda environment for PyTorch
conda env list

# Activate the environment for PyTorch
conda activate pytorch

# Display the PyTorch version
python -c "import torch; print(torch.__version__)"

# Make sure to use pip from the conda environment for PyTorch
# This should be `/opt/conda/envs/pytorch/bin/pip`
which pip

# Upgrade PyTorch
pip install --upgrade torch

8.2.1.2.2.5.3. Delete a conda kernel#

Some conda packages add default kernels to your environment when the packages are installed. For example, when you install R, conda might also add a python3 kernel. This can cause a duplication of kernels in your environment. To avoid duplicated kernels, delete the default kernel before you create a new kernel with the same name.

rm -rf /opt/conda/envs/CONDA_ENVIRONMENT_NAME/share/jupyter/kernels/python3

8.2.1.2.2.6. Monitor #

8.2.1.2.2.7. Control access#

8.2.1.2.2.8. Troubleshooting #

8.2.1.2.2.9. Usage Tips#

8.2.1.2.2.9.1. Idle Shutdown 😴#

Purpose: automatically turns off your notebook or virtual machine when you haven’t used it for a while

Save Money: When your notebook sits idle, it’s still using resources that you’re paying for. With auto-shutdown, you avoid those costs by having the system shut down on its own. This can really cut down on expenses (especially when using GPUs like A100).
Make the Most of Resources: Cloud providers have a limited number of resources to go around. If your notebook is just sitting there doing nothing, it’s using up space that could be used by others. Auto-shutdown helps free up those resources for everyone to use, making the cloud system work better for everyone.
Eco-Friendly: Less idle notebooks mean less energy is being used. This is good for the environment because it helps reduce the energy needed to run data centers, which in turn lowers the carbon footprint.

8.2.1.2.2.9.2. Add tags/label 🏷️#

Purpose: Label instance or service in Google Cloud, help to organize resources better

Control Access: Tags allow you to set specific access controls and permissions based on them.
Save Money: Tags help you track costs. You can set alerts for stuff with certain tags, so you know how much you’re spending.
Stay Organized: Tags group things based on where they belong, like “production,” “development,” or by team. It keeps everything in order.
Manage Operations: Tags make it easier for tools that work with Google Cloud to organize resources. This is especially useful for keeping track of what’s happening, reporting, and watching over resources.
Find Things Quickly: In the Google Cloud Console or using the gcloud tool, tags help you spot things fast.

8.2.1.2.2.9.3. Update the Python version#

Purpose: change to a different Python version

Step in Terminal

Open Terminal
Create the Python environment called py311 using conda create command.

conda create -n py311 python=3.11 --y

Once created activate it as follows:

conda activate py311

Install the IPython kernel (ipykernel), that allows users to interactively run Python code and display the output within a notebook

conda install ipykernel

Install IPython

ipython kernel install --name "py311" --user

Step by bash script

Create a bash file: create_conda_env.sh

#!/bin/bash

# Check if the correct number of arguments are provided
if [ "$#" -ne 2 ]; then
    echo "Usage: $0 environment_name python_version"
    exit 1
fi

ENV_NAME=$1
PYTHON_VERSION=$2

# Create a new conda environment with the provided name and Python version
conda create -n $ENV_NAME python=$PYTHON_VERSION --yes

# Activate the new environment
conda activate $ENV_NAME

# Install ipykernel in the activated environment
conda install ipykernel --yes

# Install the environment as an IPython kernel
ipython kernel install --name "$ENV_NAME" --user

Execute bash file

# give it execute permissions
chmod +x create_conda_env.sh

# run it in a terminal
./create_conda_env.sh py311 3.11

# If you work on a GPU with a preinstalled conda version you can update conda
conda install cudatoolkit=CUDA_VERSON -y

8.2.1.2.2.10. Notebook example #

8.2.1.3. Model workflow development#

BigQuery Type	JSON Type	Example value
String	String	“abc”
Integer	Integer	1
Float	Float	1.2
Numeric	Float	4925.000000000
Boolean	Boolean	true
TimeStamp	String	“2019-01-01 23:59:59.999999+00:00”
Date	String	“2018-12-31”
Time	String	“23:59:59.999999”
DateTime	String	“2019-01-01T00:00:00”
Record	Object	{ “A”: 1,”B”: 2}
Repeated Type	Array[Type]	[1, 2]
Nested Record	Object	{“A”: {“a”: 0}, “B”: 1}