7.2.1. Vertex AI (Google Cloud Platform)#
7.2.1.1. Overview#
7.2.1.1.1. Introduction#
7.2.1.1.1.1. ML workflow#
Data preparation
EDA
For small + medium dataset: Vertex AI Workbench
For large dataset: Dataproc Serverless Spark
Storage
Feature engineering: Vertex AI managed dataset
Labeling
Model training
Training
AutoML : without writing code and preparing data splits (
tabular
,image
,text
,video
)Custom training : control over the training process (ML framework, own training code, hyperparameter tuning options)
Model garden : choose the pretrain model (open-source models) to test, customize, deploy.
Generative AI: Access to Google large gen AI models for multiple modalities (text, code, images, speech), then tunning and deploy
Tuning
Hyper-parameters:
For simple custom-trained models: Custom tuning jobs
For complex ML models: Vertex AI Vizier
Algorithms:
Multi-algorithms training: Vertex AI Experiments
Tensorflow: Vertex AI TensorBoard
Manage model versions
Register the version of trained models: Vertex AI Model Registry
Evaluation (list of Model evaluation)
Evaluation in Vertex AI Model Registry
Evaluation in Vertex AI Pipelines
Model serving and monitoring
Serving: deploy model to production and get predictions
For Custom training models
Real-time online predictions: prebuilt or custom containers
Manage features: Vertex AI Feature Store (for Tabular)
Explain model: Vertex Explainable AI
Monitoring: Vertex AI Model Monitoring (training-serving skew and prediction drift and sends you alerts when the incoming prediction data skews too far from the training baseline)
7.2.1.1.1.2. Training and deployment option#
AutoML lets you train tabular, image, text, or video data without writing code or preparing data splits.
Custom training gives you complete control over the training process, including using your preferred ML framework, writing your own training code, and choosing hyperparameter tuning options.
Model Garden lets you discover, test, customize, and deploy Vertex AI and select open-source (OSS) models and assets.
Generative AI gives you access to Google’s large generative AI models for multiple modalities (text, code, images, speech). You can tune Google’s LLMs to meet your needs, and then deploy them for use in your AI-powered applications.
7.2.1.1.1.3. Vertex-ai interaction ways#
Google Console (graphical user interface)
Google Cloud command-line interface (CLI) (
gcloud ai
command)Terraform support for Vertex AI (Terraform)
Vertex AI SDK for Python (Python)
Vertex AI API REST (API)
7.2.1.1.2. Setup environment#
Create project and enable billing
Enable Vertex AI API
Install the Google Cloud CLI.
To initialize the gcloud CLI, run the following command:
gcloud init
Update and install
gcloud
components:
gcloud components update
gcloud components install beta
Add role (read doc)
Install Vertex AI SDK for Python
7.2.1.1.3. Training methods#
AutoML |
BigQuery ML |
Custom training |
|
---|---|---|---|
Characteristic |
- minimal technical effort |
- train models using your BigQuery data directly in BigQuery using SQL commands |
- create a training application optimized for your targeted outcome |
DS expertise |
No |
No |
Yes |
Programming ability |
No |
SQL (build, evaluate model) |
Yes |
Time to trained model |
- Lower |
- Lower |
- Higher |
Limits on machine learning objectives |
Yes (only AutoML’s predefined objectives) |
Yes |
No |
Manually optimize model performance with hyperparameter tuning |
No (automated hyperparameter tuning) |
Yes. BigQuery ML supports hyperparameter tuning when training ML models using |
Yes |
Control aspects of the training environment |
- Limited. |
No. |
Yes. |
Limits on data size |
Yes |
Yes (base on Quotas) |
- For unmanaged datasets: No |
7.2.1.1.4. Notebook tutorials#
7.2.1.2. Vertex AI Notebook#
Có 2 hướng tiếp cận để sử dụng được notebook:
Colab Enterprise:
Chia sẻ và cộng tác: Dễ dàng chia sẻ notebook với người dùng khác, nhóm Google hoặc miền Google Workspace.
Quản lý hạ tầng: Không cần quản lý hạ tầng, Colab Enterprise tự động cung cấp và tắt runtime khi không cần thiết.
Tích hợp dịch vụ Google Cloud: Tích hợp với các dịch vụ như Vertex AI và BigQuery.
Vertex AI Workbench:
Tùy chỉnh cao: Hỗ trợ nhiều loại instance Jupyter notebook và có thể thêm môi trường conda.
Tích hợp dữ liệu: Truy cập dữ liệu từ Cloud Storage và BigQuery trực tiếp trong JupyterLab.
Lập lịch và quản lý chi phí: Có thể chạy notebook theo lịch trình và tự động tắt khi không hoạt động.
Feature |
Colab Enterprise |
Vertex AI Workbench |
---|---|---|
Environment |
Managed, collaborative |
Customizable, developer-focused |
Infrastructure Management |
Serverless, managed by Google |
User-controlled, flexible |
Collaboration |
Excellent, with IAM control |
Good, with GitHub integration |
Compute Provisioning |
Automatic |
User-configurable |
Data Integration |
Seamless with Google Cloud services |
Seamless with Google Cloud services |
Code Completion |
Inline |
Inline |
Customization |
Limited |
Extensive |
GPU Support |
✓ |
✓ |
Conda Environments |
✗ |
✓ |
Custom Containers |
✗ |
✓ |
Automated Notebook Runs |
✗ |
✓ |
Idle Shutdown |
Automatic |
Configurable |
Persistent Storage |
✗ |
✓ |
Access to VM |
✗ |
✓ |
Original Jupyter UI |
Modified |
Retained |
Khi nào nên sử dụng: |
Colab Enterprise: Khi cần chia sẻ và cộng tác dễ dàng, không muốn quản lý hạ tầng.
Vertex AI Workbench: Khi cần tùy chỉnh cao và tích hợp sâu với các dịch vụ dữ liệu của Google Cloud.
7.2.1.2.1. Colab Enterprise#
(doc)
Key Features:
🔗 Share and Collaborate: Easily share notebooks with individuals, Google groups, or entire Google Workspace domains. Access control is handled through Google Cloud’s IAM.
🌐 Managed Compute: Colab Enterprise takes care of provisioning and managing compute resources. It starts runtimes when needed and shuts them down when not in use.
✅ Google Cloud Integration: Seamlessly work with Google Cloud services like Vertex AI and BigQuery from within your notebook.
✨ Inline Code Completion: Write code faster with suggestions that pop up as you type.
Runtime: a compute resource to run code in notebook
Runtime template: configure the template to optimize a runtime’s performance, cost, and other characteristics based on demand and problem.
Read Machine type & disk type to select resources suitable for the purpose
Cons:
Less efficient with heavy workloads: extended for long tasks or want the data to persist on the disk of the machine once it’s turned off (or released, in this case)
Not control the environment
Pricing: base on amount of used resources time include:
Compute Engine: the virtual machine that runs the notebook
Storage: data + source code
Networking: Communication between notebook and other services
7.2.1.2.2. Vertex AI Workbench#
Key Features:
👨🏻💻 Access to the VM: Unlike Colab Enterprise, you get full access to the virtual machine itself, allowing for in-depth configuration tailored to your specific needs. You can integrate more easily with your GCP environment based on IAM.
📦 Persistent Storage: Data isn’t lost when the machine restarts, as the VM’s disk is retained, ensuring your data remains intact.
☑ Controlling Instance Types: Choose from several types of instances, including N2 CPU or any GPU offering that GCP has.
🤏 Preinstalled Packages and GPU Support: All instances come with JupyterLab and a suite of deep learning packages like TensorFlow and PyTorch, with GPU support available.
</> GitHub Integration: Sync your notebooks with GitHub for version control and collaboration.
💾 Custom Environments and Containers: Add conda environments or create custom containers to tailor your setup to specific needs, so you don’t need to install dependencies every time a team member wants to launch a new machine.
👾 Data Integration: Access Cloud Storage and BigQuery directly from JupyterLab by identifying either as the user working on the notebook or as a service account.
🛠️ Automated Notebook Runs and Idle Shutdowns: Schedule notebook runs and automatically shut down idle instances to manage costs effectively.
🖥️ Original Jupyter UI: Workbench retains more of the original Jupyter UI, providing a cleaner and more familiar interface for users accustomed to Jupyter notebooks.
Pricing: base on amount of used resources time include:
CPU + RAM + GPU (if used): Charge only while running instance and execute code
Storage (boot disk + data disk): alway charging even if the instance is shutdown, this is because the data is stall stored on the disk
Workbench management fees: only charge when the instance is running
Tips
If there is any scheduled tasks (like running notebook in the instance), those tasks will still execute even if the instance is shutdown, then this would be charged for the resources used during those executions
Cost of Persistent storage base on the the actual amount of provisioned disk space. Therefore, it’s still a good idea to choose a size that’s appropriate for needs.
The data should be stored in cloud storage (like buckets), you’re charged based on the amount of data you actually store in the bucket. This is called “used storage” and is a more flexible way to pay for storage.
7.2.1.2.2.1. Setup Instances#
7.2.1.2.2.1.1. Create an instance#
7.2.1.2.2.1.2. Instance shutdown#
Shutdown event:
Manual click to
shutdown
After the idle inactive period
There is no kernel activity for the specified time period
running a cell or new output printing to a notebook is activity that resets the idle shutdown timer
Billing:
While your instance is shut down, there are no CPU or GPU usage charges except for scheduled executions that run during the shutdown
Disk storage charges still apply while your instance is shut down. For more information, see Pricing.
Automated shutdown: Shut down after being idle for a specific time period by default
Scheduled executions: Scheduled executions run while instance is shut down
gcloud CLI config:
Create instance
gcloud workbench instances create INSTANCE_NAME --metadata=idle-timeout-seconds=86400
Update instance
gcloud workbench instances update INSTANCE_NAME --metadata=idle-timeout-seconds=86400
7.2.1.2.2.1.3. Change the machine type and configure GPUs#
7.2.1.2.2.1.4. Migrate your data to a new Vertex AI Workbench instance#
7.2.1.2.2.1.5. Remote SSH#
7.2.1.2.2.1.6. Limitation#
7.2.1.2.2.2. Schedule run noteboook#
Set scheduler
Next to your instance’s name, click **Open JupyterLab
In the folder File Browser, double-click the example notebook file to open it.
Click the
Execute button.
In the Submit notebooks to Executor dialog, in the Type field, select Schedule-based recurring executions.
By default, the executor runs your notebook file every hour at the
00
minute of the hour.In Advanced options, enter a name for your bucket in the Cloud Storage bucket field, and then click Create and select. The executor stores your notebook output in the Cloud Storage bucket.
Click Submit. Your notebook file runs automatically on the schedule that you set.
7.2.1.2.2.3. Connect to data#
7.2.1.2.2.3.1. BigQuery Table#
7.2.1.2.2.3.1.1. Browse BigQuery resources#
In BigQuery in Notebooks. The BigQuery pane lists available projects and datasets
7.2.1.2.2.3.1.2. Query by Bigquery Magic Command#
To use these magics, you must first register them. Run the %load_ext
magic in a Jupyter notebook cell.
%load_ext google.cloud.bigquery
The %%bigquery
magic runs a SQL query and returns the results as a pandas DataFrame
%%bigquery
SELECT name, SUM(number) as count
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
ORDER BY count DESC
LIMIT 10
Assign the query results to a variable
%%bigquery df
SELECT name, SUM(number) as count
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
ORDER BY count DESC
LIMIT 10
df
Explicitly specify a project
project_id = 'your-project-id'
%%bigquery --project $project_id
SELECT name, SUM(number) as count
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
ORDER BY count DESC
LIMIT 10
Run a parameterized query
params = {"limit": 10}
%%bigquery --params $params
SELECT name, SUM(number) as count
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
ORDER BY count DESC
LIMIT @limit
Get a summary of data
%bigquery_stats bigquery-public-data.google_trends.top_terms
After running for some time, an image appears with various statistics on each of the 7 variables in the top_terms
table. The following image shows part of some example output:
7.2.1.2.2.3.1.3. Query by Bigquery Client Library#
from google.cloud import bigquery
class BigqueryConnector:
def __init__(self, project_id):
self.project_id = project_id
self.client = bigquery.Client(project_id)
def read_query(
self, query: str, chunk_size: int | None = None
) -> Union[Iterator[pd.DataFrame], pd.DataFrame]:
"""
Executes a BigQuery query and returns an iterator of pandas DataFrames if chunk_size is provided,
otherwise returns a single pandas DataFrame.
"""
query_job = self.client.query(query)
result = query_job.result(page_size=chunk_size)
return (
result.to_dataframe_iterable()
if chunk_size
else result.to_dataframe()
)
def read_table(self, table_id):
table = self.read_query(f"SELECT * FROM `{table_id}`")
return table
def write_bq(self, dataframe, table_id, if_exists="append", schema=None):
write_mode = (
"WRITE_TRUNCATE" if if_exists == "replace" else "WRITE_APPEND"
)
schema = (
[
bigquery.SchemaField(
name,
type_.upper(),
"NULLABLE" if mode is None else mode.upper(),
)
for name, type_, mode in schema
]
if schema is not None
else []
)
job_config = bigquery.LoadJobConfig(
schema=schema,
write_disposition=write_mode,
)
job = self.client.load_table_from_dataframe(
dataframe, table_id, job_config=job_config
)
job.result()
# write a function to update value in bigquery
def update_bq(self, table_id, update_value, conditions={}):
for k in update_value.keys():
if isinstance(update_value[k], str):
update_value[k] = f"'{update_value[k]}'"
for k in conditions.keys():
if isinstance(conditions[k], str):
conditions[k] = f"'{conditions[k]}'"
set_stasement = ", ".join(
[f"{k} = {v}" for k, v in update_value.items()]
)
conditions = "".join(
[f" and {k} = {v}" for k, v in conditions.items()]
)
sql = f"""
UPDATE `{table_id}`
SET
{set_stasement}
WHERE
1 = 1 {conditions}
"""
# return sql
job = self.client.query(sql)
job.result()
def create_table(
self, table_id, fields, partition_by=None, cluster_by=None
):
schema = [
bigquery.SchemaField(
i["name"],
i["type"].upper(),
mode=(i["mode"] if "mode" in i else "NULLABLE"),
)
for i in fields
]
table = bigquery.Table(table_id, schema=schema)
if partition_by:
partitioning = bigquery.TimePartitioning(
type_=bigquery.TimePartitioningType.DAY,
field=partition_by,
)
table.time_partitioning = partitioning
if cluster_by:
table.clustering_fields = cluster_by
self.client.create_table(table, exists_ok=True)
print(f"Created table '{table_id}' successfully.")
7.2.1.2.2.3.2. Cloud Storage buckets#
To mount and then access a Cloud Storage bucket, do the following:
In JupyterLab, make sure the folder File Browser tab is selected.
In the left sidebar, click the
Mount shared storage button. If you don’t see the button, drag the right side of the sidebar to expand the sidebar until you see the button.
In the Bucket name field, enter the Cloud Storage bucket name that you want to mount.
Click Mount.
Your Cloud Storage bucket appears as a folder in the File browser tab of the left sidebar. Double-click the folder to open it and browse the contents.
7.2.1.2.2.4. Github integration#
7.2.1.2.2.5. Maintain#
7.2.1.2.2.5.1. Add a new conda environment#
If to want using pip
conda install pip
pip install <PACKAGE>
pip install -r requirements.txt
7.2.1.2.2.5.2. Modify a conda kernel#
Vertex AI Workbench instances come with pre-installed frameworks such as PyTorch and TensorFlow. If you need a different version, you can modify the libraries by using pip
in the relevant conda environment.
For example, if you want to upgrade PyTorch:
# Check the name of the conda environment for PyTorch
conda env list
# Activate the environment for PyTorch
conda activate pytorch
# Display the PyTorch version
python -c "import torch; print(torch.__version__)"
# Make sure to use pip from the conda environment for PyTorch
# This should be `/opt/conda/envs/pytorch/bin/pip`
which pip
# Upgrade PyTorch
pip install --upgrade torch
7.2.1.2.2.5.3. Delete a conda kernel#
Some conda packages add default kernels to your environment when the packages are installed. For example, when you install R, conda might also add a python3
kernel. This can cause a duplication of kernels in your environment. To avoid duplicated kernels, delete the default kernel before you create a new kernel with the same name.
rm -rf /opt/conda/envs/CONDA_ENVIRONMENT_NAME/share/jupyter/kernels/python3
7.2.1.2.2.6. Monitor#
7.2.1.2.2.7. Control access#
7.2.1.2.2.8. Troubleshooting#
7.2.1.2.2.9. Usage Tips#
7.2.1.2.2.9.1. Idle Shutdown 😴#

Purpose: automatically turns off your notebook or virtual machine when you haven’t used it for a while
Save Money: When your notebook sits idle, it’s still using resources that you’re paying for. With auto-shutdown, you avoid those costs by having the system shut down on its own. This can really cut down on expenses (especially when using GPUs like A100).
Make the Most of Resources: Cloud providers have a limited number of resources to go around. If your notebook is just sitting there doing nothing, it’s using up space that could be used by others. Auto-shutdown helps free up those resources for everyone to use, making the cloud system work better for everyone.
Eco-Friendly: Less idle notebooks mean less energy is being used. This is good for the environment because it helps reduce the energy needed to run data centers, which in turn lowers the carbon footprint.
7.2.1.2.2.9.3. Update the Python version#
Purpose: change to a different Python version
Step in Terminal
Open
Terminal
Create the Python environment called
py311
usingconda create
command.
conda create -n py311 python=3.11 --y
Once created activate it as follows:
conda activate py311
Install the IPython kernel (
ipykernel
), that allows users to interactively run Python code and display the output within a notebook
conda install ipykernel
Install IPython
ipython kernel install --name "py311" --user
Step by bash script
Create a bash file:
create_conda_env.sh
#!/bin/bash
# Check if the correct number of arguments are provided
if [ "$#" -ne 2 ]; then
echo "Usage: $0 environment_name python_version"
exit 1
fi
ENV_NAME=$1
PYTHON_VERSION=$2
# Create a new conda environment with the provided name and Python version
conda create -n $ENV_NAME python=$PYTHON_VERSION --yes
# Activate the new environment
conda activate $ENV_NAME
# Install ipykernel in the activated environment
conda install ipykernel --yes
# Install the environment as an IPython kernel
ipython kernel install --name "$ENV_NAME" --user
Execute bash file
# give it execute permissions
chmod +x create_conda_env.sh
# run it in a terminal
./create_conda_env.sh py311 3.11
# If you work on a GPU with a preinstalled conda version you can update conda
conda install cudatoolkit=CUDA_VERSON -y
7.2.1.2.2.10. Notebook example#
7.2.1.3. Model workflow development#
BigQuery Type |
JSON Type |
Example value |
---|---|---|
String |
String |
“abc” |
Integer |
Integer |
1 |
Float |
Float |
1.2 |
Numeric |
Float |
4925.000000000 |
Boolean |
Boolean |
true |
TimeStamp |
String |
“2019-01-01 23:59:59.999999+00:00” |
Date |
String |
“2018-12-31” |
Time |
String |
“23:59:59.999999” |
DateTime |
String |
“2019-01-01T00:00:00” |
Record |
Object |
{ “A”: 1,”B”: 2} |
Repeated Type |
Array[Type] |
[1, 2] |
Nested Record |
Object |
{“A”: {“a”: 0}, “B”: 1} |