# Host a Model and Create a SageMaker Model Monitor This sample demonstrates a Kubeflow pipeline that - Hosts a machine learning model in Amazon SageMaker - Monitors a live endpoint for violations against constraints ## Prerequisites Follow the steps in [Sample AWS SageMaker Kubeflow Pipelines](../README.md#inputs-to-the-pipeline) ### Install required packages Run the following commands to install the script dependencies: ``` pip install -r requirements.txt ``` ### Create an IAM Role Follow [SageMaker execution role](../README.md#inputs-to-the-pipeline) and create an IAM role for SageMaker execution. ### Create an S3 Bucket To setup an endpoint and create a monitoring schedule, we need an S3 bucket to store model and baseline data. Run the following commands to create an S3 bucket. Specify the value for `SAGEMAKER_REGION` as the region you want to create your SageMaker resources. For ease of use in the samples (using the default values of the pipeline), we suggest using `us-east-1` as the region. ``` export SAGEMAKER_REGION=us-east-1 export S3_BUCKET_NAME="kfp-sm-data-bucket-${SAGEMAKER_REGION}-$RANDOM" if [[ $SAGEMAKER_REGION == "us-east-1" ]]; then aws s3api create-bucket --bucket ${S3_BUCKET_NAME} --region ${SAGEMAKER_REGION} else aws s3api create-bucket --bucket ${S3_BUCKET_NAME} --region ${SAGEMAKER_REGION} \ --create-bucket-configuration LocationConstraint=${SAGEMAKER_REGION} fi echo ${S3_BUCKET_NAME} ``` ### Copy data to bucket Fill the S3 bucket you just created with [sample data](./model-monitor/) which contains: - A pre-trained model - Baselining constraints and statistics generated by a ProcessingJob 1. Clone this repository to use the pipelines and sample data. ``` git clone https://github.com/kubeflow/pipelines.git cd samples/contrib/aws-samples/hosting_model_monitor_pipeline ``` 1. Download the sample model from the SageMaker sample bucket: ``` aws s3 cp s3://sagemaker-sample-files/models/xgb-churn/xgb-churn-prediction-model.tar.gz model-monitor ``` 1. Run the following command to upload the sample data to your S3 bucket: ``` aws s3 cp model-monitor s3://$S3_BUCKET_NAME/model-monitor --recursive ``` After going through above steps, make sure you have the following environment variables set: - `S3_BUCKET_NAME`: The name of the S3 bucket you created. - `SAGEMAKER_EXECUTION_ROLE_ARN`: The ARN of the IAM role you created. - `SAGEMAKER_REGION`: The region where you want to run the pipeline. ## Compile and run the pipelines 1. To compile the pipeline run: `python hosting_model_monitor_pipeline.py`. This will create a `tar.gz` file. After the compilation completes, you will see a message like this: ``` =================Pipeline compiled================= Name prefix: 2023-05-11-10-45-32 To delete the resources created by this pipeline, run the following commands: export NAME_PREFIX=2023-05-11-10-45-32 export NAMESPACE= # Change it to your Kubeflow name space kubectl delete MonitoringSchedule $NAME_PREFIX-monitoring-schedule -n $NAMESPACE kubectl delete DataQualityJobDefinition $NAME_PREFIX-data-qual-job-defi -n $NAMESPACE kubectl delete Endpoint $NAME_PREFIX-endpoint -n $NAMESPACE kubectl delete EndpointConfig $NAME_PREFIX-endpointcfg -n $NAMESPACE kubectl delete Model $NAME_PREFIX-model -n $NAMESPACE ``` You can use the commands in the message to delete the resources created by this pipeline after you finished running the pipeline. 1. In the Kubeflow Pipelines UI, upload this compiled pipeline specification (the *.tar.gz* file) and click on create run. 1. Once the pipeline completes, you can see the outputs under 'Output parameters' in the component's Input/Output section. ## Delete resources created by pipeline Export the following environment variables: ``` export NAMESPACE= export NAME_PREFIX= ``` If using standalone installation, namespace is Kubeflow. If using full kubeflow installation, you can find your Kubeflow NAMESPACE from the top bar on the Kubeflow central dashboard. You can find the NAME_PREFIX from the component's output parameters `sagemaker_resource_name`, or from the command line output when you compile the pipeline. To delete the custom resources created by this pipeline, run the following commands: ``` kubectl delete MonitoringSchedule $NAME_PREFIX-monitoring-schedule -n $NAMESPACE kubectl delete DataQualityJobDefinition $NAME_PREFIX-data-qual-job-defi -n $NAMESPACE kubectl delete Endpoint $NAME_PREFIX-endpoint -n $NAMESPACE kubectl delete EndpointConfig $NAME_PREFIX-endpointcfg -n $NAMESPACE kubectl delete Model $NAME_PREFIX-model -n $NAMESPACE ``` To delete the S3 bucket, run the following command: ``` aws delete-bucket --bucket $S3_BUCKET_NAME --region $SAGEMAKER_REGION ``` ## Reference [Sample Notebook - Introduction to Amazon SageMaker Model Monitor](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_model_monitor/introduction/SageMaker-ModelMonitoring.html)