Skip to content

How to deploy a BentoML bundle to VertexAI

BentoML is a library that allows you to build the online serving API to serve your model. The purpose of this tutorial is to show you how to deploy a BentoML bundle to a VertexAI Endpoint.

Along with this tutorial, an example is provided, with some useful function to perform this process in a VertexAI pipeline. Check it out on Github

Table of contents

Should you use BentoML?

When not using the build-in algorithms, model deployment on VertexAI requires users to build their own container image and API server. BentoML is a library that allows you to turn your ML model into production API endpoint with just a few lines of code. It will handle the creation of the serving API and the Docker image. It can be an alternative to Tensorflow Serving or TorchServe.

Quote

Shipping ML models to production is broken. Data Scientists may not have all the expertise in building production services and the trained models they delivered are very hard to test and deploy. This often leads to a time consuming and error-prone workflow, where a pickled model or weights file is handed over to a software engineering team. BentoML is an end-to-end solution for model serving, making it possible for Data Science teams to ship their models as prediction services, in a way that is easy to test, easy to deploy, and easy to integrate with other DevOps tools.

Please read these ressources to go further:

Key ressources

This tutorial assume you already know the basics of BentoML. Please read these ressources first:

Steps

Here are the steps to deploy a BentoML bundle to VertexAI:

  1. Save the model to BentoML registry
  2. Create the API service
  3. Write the bentofile.yaml file
  4. Build the Docker image
  5. Testing the service locally
  6. Upload the model to Google Artifact Regitry (GAR)
  7. Import image to VertexAI model registry
  8. Deploy model to VertexAI endpoint
  9. Test the endpoint

Steps 1 to 5 are pure BentoML development steps. Here is a high-level overview of what they do:

  1. Save the model to BentoML registry: In this step, you save the trained model from your ML framework (scikit, pytorch) to a BentoML model registry.
  2. Create the API service: Create a service.py file to wrap your model and lay out the serving logic.
  3. Write the bentofile.yaml file: Package your model and the BentoML Service into a Bento through a configuration YAML file. Each Bento corresponds to a directory that contains all the source code, dependencies, and model files required to serve the Bento, and an auto-generated Dockerfile for containerization.
  4. Build the Docker image: This will build the Docker image and push it.
  5. Testing the service locally: In this step, you test that the prediction service works locally.

If you want to understand what is done at these steps, read the BentoML quick start.

In steps 6 to 9, we will deploy the serving API to VertexAI:

  1. Upload the model to Google Artifact Regitry (GAR): This will upload the Docker image of the serving API to Google Artifact Regitry.
  2. Import image to VertexAI model registry: Import the Docker Image as a custom model in Vertex AI model registry.
  3. Deploy model to VertexAI endpoint: Deploy the model from the registry to an online-prediction endpoint on VertexAI.
  4. Test the endpoint: Send a request to the VertexAI endpoint to test it.

1. Save the model to BentoML registry

Here is an example using Sklearn, but bentoml supports many frameworks.

bin/save_model.py
from sklearn import svm, datasets
import bentoml

MODEL_NAME = "iris_clf"

# Load training data
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Model Training
clf = svm.SVC()
clf.fit(X, y)

# Save the model to BentoML format
saved_model = bentoml.sklearn.save_model(MODEL_NAME, clf)
print(saved_model.path)

Once the model is saved, you can access it thought the CLI command: bentoml models get iris_clf:latest

2. Create the API service

One specificity here is that VertexAI endpoints only accept JSON input and output, formated in a specific way. Inputs must be formated as a JSON object with a key "instances" containing a list of lists of values. Outputs must be formated as a JSON object with a key "predictions" containing a list of values.

Example of input:

query.json
{
  "instances": [
    [5.1, 3.5, 1.4, 0.2],
    [4.9, 3.0, 1.4, 0.2]
  ]
}

Example of response:

{
  "predictions": [
    "setosa",
    "setosa"
  ]
}

So we need to write a service that will take the input, format it to a list of lists, and then call the BentoML bundle. Then, we need to format the output to the VertexAI format.

You can use pydantic to validate the input and output of your API. This will also enrich the API Swagger documentation that is automatically generated.

service.py
import bentoml
from bentoml.io import JSON
from pydantic import BaseModel
from typing import List

# write data model in a separate file for better code readability
class Query(BaseModel):
    instances: List[List[float]]

class Response(BaseModel):
    predictions: List[str]

# Load the BentoML bundle
iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()
input_schema = JSON(pydantic_model=Query)
output_schema = JSON(pydantic_model=Response)

svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])

@svc.api(input=input_schema, output=output_schema)
def classify(input_series: dict) -> dict:
    input_series = input_series["instances"]
    return {"predictions": iris_clf_runner.predict.run(input_series)}

You can test the service locally using the following command: bentoml serve service.py:svc --reload

3. Write the bentofile.yaml file

bentofile.yaml
service: "service.py:svc"
labels:
  owner: bentoml-team
  project: gallery
include:
  - "*.py"
python:
  packages:
    - scikit-learn
    - pandas

4. Build the Docker image

Here are the steps to build the Docker image and run it locally:

VERSION="0.0.1"
GCP_PROJECT="my-gcp-project"
SERVICE="iris_classifier"
YAML_PATH="bentofile.yaml"
IMAGE_URI=eu.gcr.io/$GCP_PROJECT/$SERVICE:$VERSION

bentoml build -f $YAML_PATH ./src/ --version $VERSION
bentoml serve $SERVICE:latest --production
bentoml containerize $SERVICE:$VERSION -t $IMAGE_URI

However, it's not handy to use the CLI if you want to do this step in a Python Script, and there is a lack of documentation on that. So I wrote a Python script to do this step, that can be used in a VertexAI component.

Under the hood, when you do the bentoml containerize command, docker actually builds the image. If you want to use this as a VertexAI component, you cannot rely on Docker as you are already in a container. The workaround here is to use Cloud Build to build the image.

Check it out: ??? note "workflows/build_bento.py"

```python
from loguru import logger
import bentoml
from utils.bentoml import delete_bento_models_if_exists, save_model_to_bento, delete_bento_service_if_exists
from utils.gcp import build_docker_image_with_cloud_build, upload_file_to_gcs


PROJECT_ID = 'gcp_project_id'


def save_model_workflow(model_path: str, model_name: str) -> str:
    """This function is used to save the model to BentoML and push it to GCR

    It's an equivalent to these commands:
    ```bash
    bentoml build -f "$YAML_PATH" ./ --version $VERSION
    bentoml containerize $SERVICE:$VERSION -t "$IMAGE_URI"
    docker push "$IMAGE_URI"
    ```

    Args:
        model_path (str): the path to the model artifact (eg. pickle).
            Eg. "gs://bucket-name/model_dirpath/"
        model_name (str): The name of the model. Eg. "iris_classifier"

    Returns:
        str: the URI of the pushed image
            eg. "gcr.io/project-id/iris-classifier:latest"
    """
    bento_filepath = "path/to/bento/file/bento.yaml"
    service_name = f"{model_name}_svc"
    delete_bento_models_if_exists(model_name)
    save_model_to_bento(model_path, model_name)
    logger.info(f"Model saved: {bentoml.models.list()}")
    delete_bento_service_if_exists(service_name)

    logger.info(f"Building Bento service {service_name} from {bento_filepath}")
    bento_build = bentoml.bentos.build_bentofile(
        bento_filepath,
        build_ctx=".",
        version="latest",
    )
    logger.info(f"Bento Service saved: {bentoml.bentos.list()}")
    service_name_tagged = f"{bento_build.tag.name}:{bento_build.tag.version}"
    export_filename = f"{service_name_tagged.replace(':', '_')}.zip"
    # `local_export_path` is the local file path where the Bento service is exported as a
    # zip file. It is the output path where the exported Bento service is saved on the
    # local machine before being uploaded to Google Cloud Storage (GCS).
    local_export_path = bentoml.bentos.export_bento(
        tag=service_name_tagged, path=f"outputs/{export_filename}", output_format="zip"
    )
    logger.info(f"Bento exported to {local_export_path}")
    export_gcs_uri = f"{model_path}/{export_filename}"
    logger.info(f"Uploading Bento to GCS to {export_gcs_uri}")
    upload_file_to_gcs(target_path=export_gcs_uri, local_path=local_export_path)
    docker_image_uri = f"europe-docker.pkg.dev/{PROJECT_ID}/eu.gcr.io/{service_name_tagged}"
    # Build Dockerfile of the Bento with cloud build, as an alternative to bentoml.container.build()
    # which is not working with Vertex AI.
    # the image is also pushed to the container registry (GAR)
    build_docker_image_with_cloud_build(
        export_gcs_uri,
        docker_image_uri,
        project_id=PROJECT_ID,
        dockerfile_path="env/docker/Dockerfile",  # Path to the Dockerfile in the Bento archive
    )
    logger.success(f"Pushed docker image {docker_image_uri}")
    return docker_image_uri


if __name__ == "__main__":
    MODEL_PATH = 'path_to_model_artifact.pkl'
    MODEL_NAME = 'iris_classifier'

    docker_image_uri = save_model_workflow(
        model_path=MODEL_PATH,
        model_name=MODEL_NAME,
    )
```

5. Testing the service locally

On a terminal, launch the prediction service:

bentoml serve $SERVICE:$VERSION

Then, do a query:

import requests
import json

query = json.load(open("query.json"))

response = requests.post(
    "http://0.0.0.0:3000/classify",
    json=query,
)
print(response.text)

Also, try to run the container:

docker run -it --rm -p 3000:3000 eu.gcr.io/$GCP_PROJECT_ID/$SERVICE:$VERSION serve --production

Then do the query again.

If it doesn't work at this step, it won't work on VertexAI either. So make sure you fix the errors before going further.

6. Upload the model to Google Artifact Regitry (GAR)

docker push $IMAGE_URI`

Or in Python using Cloud Run:

??? note "utils/gcp.py"

```python
from google.cloud import storage
from google.cloud.devtools import cloudbuild_v1
from loguru import logger


def build_docker_image_with_cloud_build(
    source_code_uri: str,
    docker_image_uri: str,
    project_id: str,
    dockerfile_path: str = "./Dockerfile",
):
    """This function build and push a docker image to GCR / Artifact registry using Cloud Build.
    It's the equivalent of the CLI command: gcloud builds submit --tag $DOCKER_IMAGE_URI --file $DOCKERFILE_PATH $SOURCE_CODE_URI

    Args:
        source_code_uri (str): The archive containing the source code to build the docker image.
            eg. gs://bucket-name/path/to/archive.tar.gz
        docker_image_uri (str): The URI of the docker image to build. (--tag in docker build)
            eg. europe-docker.pkg.dev/project-id/eu.gcr.io/image-name:tag
        project_id (str): The project id where the Cloud Build job will run.
        dockerfile_path (str): The path to the dockerfile to use to build the docker image. (--file in docker build)
            eg. "env/docker/Dockerfile"
    """
    logger.info(f"Building docker image {docker_image_uri} using cloud build")
    # parsing the source code uri to get the bucket name and blob name
    bucket_name, blob_name = (
        storage.Blob.from_string(source_code_uri).bucket.name,
        storage.Blob.from_string(source_code_uri).name,
    )
    client = cloudbuild_v1.CloudBuildClient()
    storage_source = cloudbuild_v1.StorageSource(bucket=str(bucket_name), object=str(blob_name))
    source = cloudbuild_v1.Source(storage_source=storage_source)
    build = cloudbuild_v1.Build(
        source=source,
        steps=[
            {
                "name": "gcr.io/cloud-builders/docker",
                "args": [
                    "build",
                    "-t",
                    docker_image_uri,
                    "-f",
                    dockerfile_path,  # position of the Dockerfile in the Bento directory
                    ".",
                ],
            }
        ],
        images=[docker_image_uri],
    )
    request = cloudbuild_v1.CreateBuildRequest(project_id=project_id, build=build)
    operation = client.create_build(request=request)
    response = operation.result()
    logger.info(f"Build response: {response}")


def upload_file_to_gcs(target_path: str, local_path: str) -> None:
    storage_client = storage.Client()

    bucket_name, blob_name = (
        storage.Blob.from_string(target_path).bucket.name,
        storage.Blob.from_string(target_path).name,
    )

    bucket = storage_client.bucket(bucket_name)
    logger.debug(f"Saving File to {target_path}")
    blob = bucket.blob(blob_name)
    blob.upload_from_filename(local_path)
```

7.Import image to VertexAI model registry

Now that your image is in Google Artifact Registry, you can import it to VertexAI model registry.

Pay attention to:

  • the predict route, it must be the same as your service (/classify in this example).
  • the port, by default BentoML uses 3000 so stick to that

Here is a bash script to do this step:

PREDICT_ROUTE="/classify"
HEALTH_ROUTE="/healthz"
PORTS=3000

echo "Import as VertexAI model"
MODEL_ID="$(gcloud ai models list --region $LOCATION --filter="DISPLAY_NAME: ${MODEL_NAME}" --format="value(MODEL_ID)")"
if [ -z "${MODEL_ID:=}" ]; then
  echo "No existing model found for ${MODEL_NAME}. Importing model."
  gcloud ai models upload \
    --region=$LOCATION \
    --display-name=$MODEL_NAME \
    --container-image-uri=$IMAGE_URI \
    --container-ports=$PORTS \
    --container-health-route=$HEALTH_ROUTE \
    --container-predict-route=$PREDICT_ROUTE \
    --project=$GCP_PROJECT
else
  echo "Existing model found for ${MODEL_NAME} (${MODEL_ID}). Importing new version"
  gcloud ai models upload \
    --region=$LOCATION \
    --display-name=$MODEL_NAME \
    --container-image-uri=$IMAGE_URI \
    --project=$GCP_PROJECT \
    --container-ports=$PORTS \
    --container-health-route=$HEALTH_ROUTE \
    --container-predict-route=$PREDICT_ROUTE \
    --parent-model=projects/${GCP_PROJECT}/locations/${LOCATION}/models/${MODEL_ID}
fi

And the Python Equivalent:

??? note "workflows/push_model.py"

```python
import os
from utils.vertexai import get_model_if_exists, upload_model_to_registry

IMAGE_TAG = os.getenv("IMAGE_TAG", "latest")
PREDICT_ROUTE = "/classify"
HEALTH_ROUTE = "/healthz"
PORTS = 3000


def push_model_workflow(
    model_name: str,
    serving_container_image_uri: str,
) -> str:
    """This workflow pushes model from a Google Artifact Registry to Vertex AI Model registry.
    If the model already exists, it will be updated.

    Args:
        model_name (str): Name of the display model in Google Registry.
        serving_container_image_uri (str): URI of the image in Google Artifact Registry.
            Eg. "eu.gcr.io/gcp-project-id/iris_classifier_svc:latest"
    """
    model = get_model_if_exists(model_name)
    model = upload_model_to_registry(
        model_name,
        serving_container_image_uri,
        parent_model=model.resource_name,
        serving_container_predict_route=PREDICT_ROUTE,
        serving_container_health_route=HEALTH_ROUTE,
        serving_container_ports=[PORTS],
        description="Product classification model, deployed automatically with Vertex AI",
        labels={"image_tag": os.getenv("IMAGE_TAG", "latest")},
        is_default_version=False,    # it safer to perform evaluation and make sure the model is good before setting the model as default
    )
    return model.display_name


if __name__ == "__main__":
    MODEL_NAME = "iris_classifier"
    PROJECT_ID = "gcp_project_id"
    SERVICE_NAME = f"{MODEL_NAME}_svc"
    SERVING_CONTAINER_IMAGE_URI = (
        f"europe-docker.pkg.dev/{PROJECT_ID}/eu.gcr.io/{SERVICE_NAME}:{IMAGE_TAG}"
    )
    push_model_workflow(MODEL_NAME, SERVING_CONTAINER_IMAGE_URI)
```

8.Deploy model to VertexAI endpoint

The final step is to deploy the model to a VertexAI "online prediction" endpoint.

To do it manually, follow the following step. To do it programatically, check out the Python script.

Manual steps:

  1. Go to VertexAI model registry
  2. Click on the model you want to deploy
  3. Click on the burger menu at the right on the latest version, then click on "Set as Default"
  4. Go to the VertexAI endpoint page and select the endpoint you want to deploy the model to
  5. Select the model you want to undeploy, and click on the burger menu at the right, then click on "Undeploy model from endpoint"
  6. Click on the burger menu at the right on the latest version, then click on "Deploy to Endpoint"
  7. "Add to existing endpoint" and select the endpoint
  8. Set traffic split to 100%, and define machine type
  9. Deploy

It takes approximatively 15 minutes. Then you get an email. If the deployment failed, you get an email (Object: "Vertex AI was unable to deploy model") with a link to the stackdriver logs. To find the error in the logs, filter on severity="error".

You will have to fix the error, then go thought the whole process again.

To know more, check out the official documentation.

9.Test the endpoint

# Get the endpoint ID from the endpoint name
ENDPOINT_NAME="iris_classifier_endpoint"
ENDPOINT_ID=$(gcloud ai endpoints list --filter DISPLAY_NAME=$ENDPOINT_NAME --region=$LOCATION --format="value(ENDPOINT_ID)")

# OR uncomment the following line to set the endpoint ID manually:
# ENDPOINT_ID="123456789"

INPUT_DATA_FILE="query.json"

echo "Sendind a request to the endpoint ${ENDPOINT_NAME} with ID: ${ENDPOINT_ID}"

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${ENDPOINT_ID}:predict \
-d "@${INPUT_DATA_FILE}"