# Ames housing value prediction using XGBoost on Kubeflow In this example we will demonstrate how to use Kubeflow with XGBoost using the [Kaggle Ames Housing Prices prediction](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/). We will do a detailed walk-through of how to implement, train and serve the model. You will be able to run the exact same workload on-prem and/or on any cloud provider. We will be using [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/) to show how the end-to-end workflow runs on [Google Cloud Platform](https://cloud.google.com/). # Pre-requisites As a part of running this setup on Google Cloud Platform, make sure you have enabled the [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/). In addition to that you will need to install [Docker](https://docs.docker.com/install/) and [gcloud](https://cloud.google.com/sdk/downloads). Note that this setup can run on-prem and on any cloud provider, but here we will demonstrate GCP cloud option. Finally, follow the [instructions](https://www.kubeflow.org/docs/started/getting-started-gke/) to create a GKE cluster. # Steps * [Kubeflow Setup](#kubeflow-setup) * [Data Preparation](#data-preparation) * [Dockerfile](#dockerfile) * [Model Training on GKE](#model-training-on-gke) * [Model Export](#model-export) * [Model Serving Locally](#model-serving-locally) * [Deploying Model to Kubernetes Cluster](#model-serving-on-gke) ## Kubeflow Setup In this part you will setup Kubeflow on an existing Kubernetes cluster. Checkout the Kubeflow [getting started guide](https://www.kubeflow.org/docs/started/getting-started/). ## Data Preparation You can download the dataset from the [Kaggle competition](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data). In order to make it convenient we have uploaded the dataset on GCS ``` gs://kubeflow-examples-data/ames_dataset/ ``` ## Dockerfile We have attached a Dockerfile with this repo which you can use to create a docker image. We have also uploaded the image to gcr.io, which you can use to directly download the image. ``` IMAGE_NAME=ames-housing VERSION=latest ``` Use `gcloud` command to get the GCP project ``` PROJECT_ID=`gcloud config get-value project` ``` Let's create a docker image from our Dockerfile ``` docker build -t gcr.io/$PROJECT_ID/${IMAGE_NAME}:${VERSION} . ``` Once the above command is successful you should be able to see the docker images on your local machine by running `docker images`. Next we will upload the image to [Google Container Registry](https://cloud.google.com/container-registry/) ``` gcloud auth configure-docker docker push gcr.io/${PROJECT_ID}/${IMAGE_NAME}:${VERSION} ``` A public copy is available at `gcr.io/kubeflow-examples/ames-housing:latest`. ## Model training on GKE In this section we will run the above docker container on a [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/). There are two steps to perform the training * Create a GKE cluster * Create a Persistent Volume * Follow the instructions [here](https://kubernetes.io/docs/tasks/configure-pod-container/configure-persistent-volume-storage/). You will need to run the following `kubectl create` commands in order to get the `claim` attached to the `pod`. ``` kubectl create -f py-volume.yaml kubectl create -f py-claim.yaml ``` * Run docker container on GKE * Use the `kubectl` command to run the image on GKE ``` kubectl create -f py-job.yaml ``` Once the above command finishes you will have an XGBoost model available at Persistent Volume `/mnt/xgboost/housing.dat`. ## Model Export The model is exported to the location `/tmp/ames/housing.dat`. We will use [Seldon Core](https://github.com/SeldonIO/seldon-core/) to serve the model asset. In order to make the model servable we have created `xgboost/seldon_serve` with the following assets * `HousingServe.py` * `housing.dat` * `requirements.txt` ## Model Serving Locally We are going to use [seldon-core](https://github.com/SeldonIO/seldon-core/) to serve the model. [HoussingServe.py](seldon_serve/HousingServe.py) contains the code to serve the model. You can find seldon core model wrapping details [here](https://github.com/SeldonIO/seldon-core/blob/master/docs/wrappers/python.md). The seldon-core microservice image can be built by the following command. ``` cd seldon_serve && s2i build . seldonio/seldon-core-s2i-python2:0.4 gcr.io/${PROJECT_ID}/housingserve:latest --loglevel=3 ``` Let's run the docker image locally. ``` docker run -p 5000:5000 gcr.io/${PROJECT_ID}/housingserve:latest ``` Now you are ready to send requests on `localhost:5000` ``` curl -H "Content-Type: application/x-www-form-urlencoded" -d 'json={"data":{"tensor":{"shape":[1,37],"values":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37]}}}' http://localhost:5000/predict ``` The response looks like this. ``` { "data": { "names": [ "t:0", "t:1" ], "tensor": { "shape": [ 1, 2 ], "values": [ 97522.359375, 97522.359375 ] } } } ``` ## Model serving on GKE One of the amazing features of Kubernetes is that you can run it anywhere i.e., local, on-prem and cloud. We will show you how to run your code on Google Kubernetes Engine. First off, start a GKE cluster. Deploy Seldon core to your GKE cluster by following the instructions in the Deploy Seldon Core section [here](https://github.com/kubeflow/examples/blob/fb2fb26f710f7c03996f08d81607f5ebf7d5af09/github_issue_summarization/serving_the_model.md#deploy-seldon-core). Once everything is successful you can verify it using `kubectl get pods -n ${NAMESPACE}`. ``` NAME READY STATUS RESTARTS AGE ambassador-849fb9c8c5-5kx6l 2/2 Running 0 16m ambassador-849fb9c8c5-pww4j 2/2 Running 0 16m ambassador-849fb9c8c5-zn6gl 2/2 Running 0 16m redis-75c969d887-fjqt8 1/1 Running 0 30s seldon-cluster-manager-6c78b7d6c9-6qhtg 1/1 Running 0 30s spartakus-volunteer-66cc8ccd5b-9f8tw 1/1 Running 0 16m tf-hub-0 1/1 Running 0 16m tf-job-dashboard-7b57c549c8-bfpp8 1/1 Running 0 16m tf-job-operator-594d8c7ddd-lqn8r 1/1 Running 0 16m ``` Second, we need to upload our previously built docker image to `gcr.io`. A public image is available at `gcr.io/kubeflow-examples/housingserve:latest` ``` gcloud auth configure-docker docker push gcr.io/${PROJECT_ID}/housingserve:latest ``` Finally, we can deploy the XGBoost model ``` ks generate seldon-serve-simple-v1alpha2 xgboost-ames \ --name=xgboost-ames \ --image=gcr.io/${PROJECT_ID}/housingserve:latest \ --namespace=${NAMESPACE} \ --replicas=1 ks apply ${KF_ENV} -c xgboost-ames ``` ## Sample request and response Seldon Core uses ambassador to route its requests. To send requests to the model, you can port-forward the ambassador container locally: ``` kubectl port-forward $(kubectl get pods -n ${NAMESPACE} -l service=ambassador -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} 8080:80 ``` Now you are ready to send requests on `localhost:8080` ``` curl -H "Content-Type:application/json" \ -d '{"data":{"tensor":{"shape":[1,37],"values":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37]}}}' \ http://localhost:8080/seldon/xgboost-ames/api/v0.1/predictions ``` ``` { "meta": { "puid": "8buc4oo78m67716m2vevvgtpap", "tags": { }, "routing": { } }, "data": { "names": ["t:0", "t:1"], "tensor": { "shape": [1, 2], "values": [97522.359375, 97522.359375] } } ```