# Host a Model and Create a SageMaker Model Monitor

This sample demonstrates a Kubeflow pipeline that
- Hosts a machine learning model in Amazon SageMaker
- Monitors a live endpoint for violations against constraints

## Prerequisites
Follow the steps in [Sample AWS SageMaker Kubeflow Pipelines](../README.md#inputs-to-the-pipeline) 

### Install required packages
Run the following commands to install the script dependencies:
    
```
pip install -r requirements.txt
```
### Create an IAM Role
Follow [SageMaker execution role](../README.md#inputs-to-the-pipeline) and create an IAM role for SageMaker execution.

### Create an S3 Bucket
To setup an endpoint and create a monitoring schedule, we need an S3 bucket to store model and baseline data. Run the following commands to create an S3 bucket. Specify the value for `SAGEMAKER_REGION` as the region you want to create your SageMaker resources. For ease of use in the samples (using the default values of the pipeline), we suggest using `us-east-1` as the region.

```
export SAGEMAKER_REGION=us-east-1
export S3_BUCKET_NAME="kfp-sm-data-bucket-${SAGEMAKER_REGION}-$RANDOM"

if [[ $SAGEMAKER_REGION == "us-east-1" ]]; then
    aws s3api create-bucket --bucket ${S3_BUCKET_NAME} --region ${SAGEMAKER_REGION}
else
    aws s3api create-bucket --bucket ${S3_BUCKET_NAME} --region ${SAGEMAKER_REGION} \
    --create-bucket-configuration LocationConstraint=${SAGEMAKER_REGION}
fi

echo ${S3_BUCKET_NAME}
```

### Copy data to bucket
Fill the S3 bucket you just created with [sample data](./model-monitor/) which contains:
- A pre-trained model
- Baselining constraints and statistics generated by a ProcessingJob

1. Clone this repository to use the pipelines and sample data.
    ```
    git clone https://github.com/kubeflow/pipelines.git
    cd samples/contrib/aws-samples/hosting_model_monitor_pipeline
    ```
1. Download the sample model from the SageMaker sample bucket:
    ```
    aws s3 cp s3://sagemaker-sample-files/models/xgb-churn/xgb-churn-prediction-model.tar.gz model-monitor
    ```
1. Run the following command to upload the sample data to your S3 bucket:
    ```
    aws s3 cp model-monitor s3://$S3_BUCKET_NAME/model-monitor --recursive
    ```

After going through above steps, make sure you have the following environment variables set:
- `S3_BUCKET_NAME`: The name of the S3 bucket you created.
- `SAGEMAKER_EXECUTION_ROLE_ARN`: The ARN of the IAM role you created.
- `SAGEMAKER_REGION`: The region where you want to run the pipeline.

## Compile and run the pipelines
1. To compile the pipeline run: `python hosting_model_monitor_pipeline.py`. This will create a `tar.gz` file. After the compilation completes, you will see a message like this:
    ```
    =================Pipeline compiled=================
    Name prefix:  2023-05-11-10-45-32
    To delete the resources created by this pipeline, run the following commands:
        export NAME_PREFIX=2023-05-11-10-45-32
        export NAMESPACE=<xx> # Change it to your Kubeflow name space
        kubectl delete MonitoringSchedule $NAME_PREFIX-monitoring-schedule -n $NAMESPACE
        kubectl delete DataQualityJobDefinition $NAME_PREFIX-data-qual-job-defi -n $NAMESPACE
        kubectl delete Endpoint $NAME_PREFIX-endpoint -n $NAMESPACE
        kubectl delete EndpointConfig $NAME_PREFIX-endpointcfg -n $NAMESPACE
        kubectl delete Model  $NAME_PREFIX-model -n $NAMESPACE
    ```
    You can use the commands in the message to delete the resources created by this pipeline after you finished running the pipeline.
1. In the Kubeflow Pipelines UI, upload this compiled pipeline specification (the *.tar.gz* file) and click on create run.
1. Once the pipeline completes, you can see the outputs under 'Output parameters' in the component's Input/Output section.

## Delete resources created by pipeline

Export the following environment variables:
```
export NAMESPACE=<YOUR_KUBEFLOW_NAMESPACE>
export NAME_PREFIX=<NAME_PREFIX>
```
If using standalone installation, namespace is Kubeflow. If using full kubeflow installation, you can find your Kubeflow NAMESPACE from the top bar on the Kubeflow central dashboard. 

You can find the NAME_PREFIX from the component's output parameters `sagemaker_resource_name`, or from the command line output when you compile the pipeline.

To delete the custom resources created by this pipeline, run the following commands:

```
kubectl delete MonitoringSchedule $NAME_PREFIX-monitoring-schedule -n $NAMESPACE
kubectl delete DataQualityJobDefinition $NAME_PREFIX-data-qual-job-defi -n $NAMESPACE
kubectl delete Endpoint $NAME_PREFIX-endpoint -n $NAMESPACE
kubectl delete EndpointConfig $NAME_PREFIX-endpointcfg -n $NAMESPACE
kubectl delete Model  $NAME_PREFIX-model -n $NAMESPACE
```

To delete the S3 bucket, run the following command: 
```
aws delete-bucket --bucket $S3_BUCKET_NAME --region $SAGEMAKER_REGION
```

## Reference
[Sample Notebook - Introduction to Amazon SageMaker Model Monitor](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_model_monitor/introduction/SageMaker-ModelMonitoring.html)