pipelines/samples/contrib/aws-samples/hosting_model_monitor_pipeline
rd-pong d50d178936
chore(components): Add sample for Sagemaker Model Monitor component (#9405)
* Add sample pipeline for hosting and model monitor

* Update sample pipeline doc

* Address comments
2023-05-15 23:22:59 +00:00
..
model-monitor/baselining/data_quality chore(components): Add sample for Sagemaker Model Monitor component (#9405) 2023-05-15 23:22:59 +00:00
README.md chore(components): Add sample for Sagemaker Model Monitor component (#9405) 2023-05-15 23:22:59 +00:00
hosting_model_monitor_pipeline.py chore(components): Add sample for Sagemaker Model Monitor component (#9405) 2023-05-15 23:22:59 +00:00
requirements.txt chore(components): Add sample for Sagemaker Model Monitor component (#9405) 2023-05-15 23:22:59 +00:00

README.md

Host a Model and Create a SageMaker Model Monitor

This sample demonstrates a Kubeflow pipeline that

  • Hosts a machine learning model in Amazon SageMaker
  • Monitors a live endpoint for violations against constraints

Prerequisites

Follow the steps in Sample AWS SageMaker Kubeflow Pipelines

Install required packages

Run the following commands to install the script dependencies:

pip install -r requirements.txt

Create an IAM Role

Follow SageMaker execution role and create an IAM role for SageMaker execution.

Create an S3 Bucket

To setup an endpoint and create a monitoring schedule, we need an S3 bucket to store model and baseline data. Run the following commands to create an S3 bucket. Specify the value for SAGEMAKER_REGION as the region you want to create your SageMaker resources. For ease of use in the samples (using the default values of the pipeline), we suggest using us-east-1 as the region.

export SAGEMAKER_REGION=us-east-1
export S3_BUCKET_NAME="kfp-sm-data-bucket-${SAGEMAKER_REGION}-$RANDOM"

if [[ $SAGEMAKER_REGION == "us-east-1" ]]; then
    aws s3api create-bucket --bucket ${S3_BUCKET_NAME} --region ${SAGEMAKER_REGION}
else
    aws s3api create-bucket --bucket ${S3_BUCKET_NAME} --region ${SAGEMAKER_REGION} \
    --create-bucket-configuration LocationConstraint=${SAGEMAKER_REGION}
fi

echo ${S3_BUCKET_NAME}

Copy data to bucket

Fill the S3 bucket you just created with sample data which contains:

  • A pre-trained model
  • Baselining constraints and statistics generated by a ProcessingJob
  1. Clone this repository to use the pipelines and sample data.
    git clone https://github.com/kubeflow/pipelines.git
    cd samples/contrib/aws-samples/hosting_model_monitor_pipeline
    
  2. Download the sample model from the SageMaker sample bucket:
    aws s3 cp s3://sagemaker-sample-files/models/xgb-churn/xgb-churn-prediction-model.tar.gz model-monitor
    
  3. Run the following command to upload the sample data to your S3 bucket:
    aws s3 cp model-monitor s3://$S3_BUCKET_NAME/model-monitor --recursive
    

After going through above steps, make sure you have the following environment variables set:

  • S3_BUCKET_NAME: The name of the S3 bucket you created.
  • SAGEMAKER_EXECUTION_ROLE_ARN: The ARN of the IAM role you created.
  • SAGEMAKER_REGION: The region where you want to run the pipeline.

Compile and run the pipelines

  1. To compile the pipeline run: python hosting_model_monitor_pipeline.py. This will create a tar.gz file. After the compilation completes, you will see a message like this:
    =================Pipeline compiled=================
    Name prefix:  2023-05-11-10-45-32
    To delete the resources created by this pipeline, run the following commands:
        export NAME_PREFIX=2023-05-11-10-45-32
        export NAMESPACE=<xx> # Change it to your Kubeflow name space
        kubectl delete MonitoringSchedule $NAME_PREFIX-monitoring-schedule -n $NAMESPACE
        kubectl delete DataQualityJobDefinition $NAME_PREFIX-data-qual-job-defi -n $NAMESPACE
        kubectl delete Endpoint $NAME_PREFIX-endpoint -n $NAMESPACE
        kubectl delete EndpointConfig $NAME_PREFIX-endpointcfg -n $NAMESPACE
        kubectl delete Model  $NAME_PREFIX-model -n $NAMESPACE
    
    You can use the commands in the message to delete the resources created by this pipeline after you finished running the pipeline.
  2. In the Kubeflow Pipelines UI, upload this compiled pipeline specification (the .tar.gz file) and click on create run.
  3. Once the pipeline completes, you can see the outputs under 'Output parameters' in the component's Input/Output section.

Delete resources created by pipeline

Export the following environment variables:

export NAMESPACE=<YOUR_KUBEFLOW_NAMESPACE>
export NAME_PREFIX=<NAME_PREFIX>

If using standalone installation, namespace is Kubeflow. If using full kubeflow installation, you can find your Kubeflow NAMESPACE from the top bar on the Kubeflow central dashboard.

You can find the NAME_PREFIX from the component's output parameters sagemaker_resource_name, or from the command line output when you compile the pipeline.

To delete the custom resources created by this pipeline, run the following commands:

kubectl delete MonitoringSchedule $NAME_PREFIX-monitoring-schedule -n $NAMESPACE
kubectl delete DataQualityJobDefinition $NAME_PREFIX-data-qual-job-defi -n $NAMESPACE
kubectl delete Endpoint $NAME_PREFIX-endpoint -n $NAMESPACE
kubectl delete EndpointConfig $NAME_PREFIX-endpointcfg -n $NAMESPACE
kubectl delete Model  $NAME_PREFIX-model -n $NAMESPACE

To delete the S3 bucket, run the following command:

aws delete-bucket --bucket $S3_BUCKET_NAME --region $SAGEMAKER_REGION

Reference

Sample Notebook - Introduction to Amazon SageMaker Model Monitor