* Add sample pipeline for hosting and model monitor * Update sample pipeline doc * Address comments |
||
|---|---|---|
| .. | ||
| model-monitor/baselining/data_quality | ||
| README.md | ||
| hosting_model_monitor_pipeline.py | ||
| requirements.txt | ||
README.md
Host a Model and Create a SageMaker Model Monitor
This sample demonstrates a Kubeflow pipeline that
- Hosts a machine learning model in Amazon SageMaker
- Monitors a live endpoint for violations against constraints
Prerequisites
Follow the steps in Sample AWS SageMaker Kubeflow Pipelines
Install required packages
Run the following commands to install the script dependencies:
pip install -r requirements.txt
Create an IAM Role
Follow SageMaker execution role and create an IAM role for SageMaker execution.
Create an S3 Bucket
To setup an endpoint and create a monitoring schedule, we need an S3 bucket to store model and baseline data. Run the following commands to create an S3 bucket. Specify the value for SAGEMAKER_REGION as the region you want to create your SageMaker resources. For ease of use in the samples (using the default values of the pipeline), we suggest using us-east-1 as the region.
export SAGEMAKER_REGION=us-east-1
export S3_BUCKET_NAME="kfp-sm-data-bucket-${SAGEMAKER_REGION}-$RANDOM"
if [[ $SAGEMAKER_REGION == "us-east-1" ]]; then
aws s3api create-bucket --bucket ${S3_BUCKET_NAME} --region ${SAGEMAKER_REGION}
else
aws s3api create-bucket --bucket ${S3_BUCKET_NAME} --region ${SAGEMAKER_REGION} \
--create-bucket-configuration LocationConstraint=${SAGEMAKER_REGION}
fi
echo ${S3_BUCKET_NAME}
Copy data to bucket
Fill the S3 bucket you just created with sample data which contains:
- A pre-trained model
- Baselining constraints and statistics generated by a ProcessingJob
- Clone this repository to use the pipelines and sample data.
git clone https://github.com/kubeflow/pipelines.git cd samples/contrib/aws-samples/hosting_model_monitor_pipeline - Download the sample model from the SageMaker sample bucket:
aws s3 cp s3://sagemaker-sample-files/models/xgb-churn/xgb-churn-prediction-model.tar.gz model-monitor - Run the following command to upload the sample data to your S3 bucket:
aws s3 cp model-monitor s3://$S3_BUCKET_NAME/model-monitor --recursive
After going through above steps, make sure you have the following environment variables set:
S3_BUCKET_NAME: The name of the S3 bucket you created.SAGEMAKER_EXECUTION_ROLE_ARN: The ARN of the IAM role you created.SAGEMAKER_REGION: The region where you want to run the pipeline.
Compile and run the pipelines
- To compile the pipeline run:
python hosting_model_monitor_pipeline.py. This will create atar.gzfile. After the compilation completes, you will see a message like this:
You can use the commands in the message to delete the resources created by this pipeline after you finished running the pipeline.=================Pipeline compiled================= Name prefix: 2023-05-11-10-45-32 To delete the resources created by this pipeline, run the following commands: export NAME_PREFIX=2023-05-11-10-45-32 export NAMESPACE=<xx> # Change it to your Kubeflow name space kubectl delete MonitoringSchedule $NAME_PREFIX-monitoring-schedule -n $NAMESPACE kubectl delete DataQualityJobDefinition $NAME_PREFIX-data-qual-job-defi -n $NAMESPACE kubectl delete Endpoint $NAME_PREFIX-endpoint -n $NAMESPACE kubectl delete EndpointConfig $NAME_PREFIX-endpointcfg -n $NAMESPACE kubectl delete Model $NAME_PREFIX-model -n $NAMESPACE - In the Kubeflow Pipelines UI, upload this compiled pipeline specification (the .tar.gz file) and click on create run.
- Once the pipeline completes, you can see the outputs under 'Output parameters' in the component's Input/Output section.
Delete resources created by pipeline
Export the following environment variables:
export NAMESPACE=<YOUR_KUBEFLOW_NAMESPACE>
export NAME_PREFIX=<NAME_PREFIX>
If using standalone installation, namespace is Kubeflow. If using full kubeflow installation, you can find your Kubeflow NAMESPACE from the top bar on the Kubeflow central dashboard.
You can find the NAME_PREFIX from the component's output parameters sagemaker_resource_name, or from the command line output when you compile the pipeline.
To delete the custom resources created by this pipeline, run the following commands:
kubectl delete MonitoringSchedule $NAME_PREFIX-monitoring-schedule -n $NAMESPACE
kubectl delete DataQualityJobDefinition $NAME_PREFIX-data-qual-job-defi -n $NAMESPACE
kubectl delete Endpoint $NAME_PREFIX-endpoint -n $NAMESPACE
kubectl delete EndpointConfig $NAME_PREFIX-endpointcfg -n $NAMESPACE
kubectl delete Model $NAME_PREFIX-model -n $NAMESPACE
To delete the S3 bucket, run the following command:
aws delete-bucket --bucket $S3_BUCKET_NAME --region $SAGEMAKER_REGION
Reference
Sample Notebook - Introduction to Amazon SageMaker Model Monitor