mirror of https://github.com/kubeflow/examples.git
207 lines
8.0 KiB
Markdown
207 lines
8.0 KiB
Markdown
# Ames housing value prediction using XGBoost on Kubeflow
|
|
|
|
In this example we will demonstrate how to use Kubeflow with XGBoost using the [Kaggle Ames Housing Prices prediction](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/). We will do a detailed
|
|
walk-through of how to implement, train and serve the model. You will be able to run the exact same workload on-prem and/or on any cloud provider. We will be using [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/) to show how the end-to-end workflow runs on [Google Cloud Platform](https://cloud.google.com/).
|
|
|
|
# Pre-requisites
|
|
|
|
As a part of running this setup on Google Cloud Platform, make sure you have enabled the [Google
|
|
Kubernetes Engine](https://cloud.google.com/kubernetes-engine/). In addition to that you will need to install
|
|
[Docker](https://docs.docker.com/install/) and [gcloud](https://cloud.google.com/sdk/downloads). Note that this setup can run on-prem and on any cloud provider, but here we will demonstrate GCP cloud option. Finally, follow the [instructions](https://www.kubeflow.org/docs/started/getting-started-gke/) to create a GKE cluster.
|
|
|
|
# Steps
|
|
* [Kubeflow Setup](#kubeflow-setup)
|
|
* [Data Preparation](#data-preparation)
|
|
* [Dockerfile](#dockerfile)
|
|
* [Model Training on GKE](#model-training-on-gke)
|
|
* [Model Export](#model-export)
|
|
* [Model Serving Locally](#model-serving-locally)
|
|
* [Deploying Model to Kubernetes Cluster](#model-serving-on-gke)
|
|
|
|
## Kubeflow Setup
|
|
In this part you will setup Kubeflow on an existing Kubernetes cluster. Checkout the Kubeflow [getting started guide](https://www.kubeflow.org/docs/started/getting-started/).
|
|
|
|
## Data Preparation
|
|
You can download the dataset from the [Kaggle competition](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data). In order to make it convenient we have uploaded the dataset on GCS
|
|
|
|
```
|
|
gs://kubeflow-examples-data/ames_dataset/
|
|
```
|
|
|
|
## Dockerfile
|
|
We have attached a Dockerfile with this repo which you can use to create a
|
|
docker image. We have also uploaded the image to gcr.io, which you can use to
|
|
directly download the image.
|
|
|
|
```
|
|
IMAGE_NAME=ames-housing
|
|
VERSION=latest
|
|
```
|
|
|
|
Use `gcloud` command to get the GCP project
|
|
|
|
```
|
|
PROJECT_ID=`gcloud config get-value project`
|
|
```
|
|
|
|
Let's create a docker image from our Dockerfile
|
|
|
|
```
|
|
docker build -t gcr.io/$PROJECT_ID/${IMAGE_NAME}:${VERSION} .
|
|
```
|
|
|
|
Once the above command is successful you should be able to see the docker
|
|
images on your local machine by running `docker images`. Next we will upload the image to
|
|
[Google Container Registry](https://cloud.google.com/container-registry/)
|
|
|
|
```
|
|
gcloud auth configure-docker
|
|
docker push gcr.io/${PROJECT_ID}/${IMAGE_NAME}:${VERSION}
|
|
```
|
|
|
|
A public copy is available at `gcr.io/kubeflow-examples/ames-housing:latest`.
|
|
|
|
|
|
## Model training on GKE
|
|
In this section we will run the above docker container on a [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/). There are two steps to perform the training
|
|
|
|
* Create a GKE cluster
|
|
|
|
* Create a Persistent Volume
|
|
* Follow the instructions [here](https://kubernetes.io/docs/tasks/configure-pod-container/configure-persistent-volume-storage/). You will need to run the following `kubectl create` commands in order to get the `claim` attached to the `pod`.
|
|
|
|
```
|
|
kubectl create -f py-volume.yaml
|
|
kubectl create -f py-claim.yaml
|
|
```
|
|
|
|
|
|
* Run docker container on GKE
|
|
* Use the `kubectl` command to run the image on GKE
|
|
|
|
```
|
|
kubectl create -f py-job.yaml
|
|
```
|
|
|
|
Once the above command finishes you will have an XGBoost model available at Persistent Volume `/mnt/xgboost/housing.dat`.
|
|
|
|
## Model Export
|
|
The model is exported to the location `/tmp/ames/housing.dat`. We will use [Seldon Core](https://github.com/SeldonIO/seldon-core/) to serve the model asset. In order to make the model servable we have created `xgboost/seldon_serve` with the following assets
|
|
|
|
* `HousingServe.py`
|
|
* `housing.dat`
|
|
* `requirements.txt`
|
|
|
|
## Model Serving Locally
|
|
We are going to use [seldon-core](https://github.com/SeldonIO/seldon-core/) to serve the model. [HoussingServe.py](seldon_serve/HousingServe.py) contains the code to serve the model. You can find seldon core model wrapping details [here](https://github.com/SeldonIO/seldon-core/blob/master/docs/wrappers/python.md). The seldon-core microservice image can be built by the following command.
|
|
|
|
```
|
|
cd seldon_serve && s2i build . seldonio/seldon-core-s2i-python2:0.4 gcr.io/${PROJECT_ID}/housingserve:latest --loglevel=3
|
|
```
|
|
|
|
|
|
|
|
Let's run the docker image locally.
|
|
|
|
```
|
|
docker run -p 5000:5000 gcr.io/${PROJECT_ID}/housingserve:latest
|
|
```
|
|
|
|
Now you are ready to send requests on `localhost:5000`
|
|
|
|
```
|
|
curl -H "Content-Type: application/x-www-form-urlencoded" -d 'json={"data":{"tensor":{"shape":[1,37],"values":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37]}}}' http://localhost:5000/predict
|
|
```
|
|
|
|
The response looks like this.
|
|
```
|
|
{
|
|
"data": {
|
|
"names": [
|
|
"t:0",
|
|
"t:1"
|
|
],
|
|
"tensor": {
|
|
"shape": [
|
|
1,
|
|
2
|
|
],
|
|
"values": [
|
|
97522.359375,
|
|
97522.359375
|
|
]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Model serving on GKE
|
|
One of the amazing features of Kubernetes is that you can run it anywhere i.e., local, on-prem and cloud. We will show you how to run your code on Google Kubernetes Engine. First off, start a GKE cluster.
|
|
|
|
Deploy Seldon core to your GKE cluster by following the instructions in the Deploy Seldon Core section [here](https://github.com/kubeflow/examples/blob/fb2fb26f710f7c03996f08d81607f5ebf7d5af09/github_issue_summarization/serving_the_model.md#deploy-seldon-core). Once everything is successful you can verify it using `kubectl get pods -n ${NAMESPACE}`.
|
|
|
|
```
|
|
NAME READY STATUS RESTARTS AGE
|
|
ambassador-849fb9c8c5-5kx6l 2/2 Running 0 16m
|
|
ambassador-849fb9c8c5-pww4j 2/2 Running 0 16m
|
|
ambassador-849fb9c8c5-zn6gl 2/2 Running 0 16m
|
|
redis-75c969d887-fjqt8 1/1 Running 0 30s
|
|
seldon-cluster-manager-6c78b7d6c9-6qhtg 1/1 Running 0 30s
|
|
spartakus-volunteer-66cc8ccd5b-9f8tw 1/1 Running 0 16m
|
|
tf-hub-0 1/1 Running 0 16m
|
|
tf-job-dashboard-7b57c549c8-bfpp8 1/1 Running 0 16m
|
|
tf-job-operator-594d8c7ddd-lqn8r 1/1 Running 0 16m
|
|
```
|
|
|
|
Second, we need to upload our previously built docker image to `gcr.io`. A public image is available at `gcr.io/kubeflow-examples/housingserve:latest`
|
|
|
|
```
|
|
gcloud auth configure-docker
|
|
docker push gcr.io/${PROJECT_ID}/housingserve:latest
|
|
```
|
|
|
|
Finally, we can deploy the XGBoost model
|
|
```
|
|
ks generate seldon-serve-simple-v1alpha2 xgboost-ames \
|
|
--name=xgboost-ames \
|
|
--image=gcr.io/${PROJECT_ID}/housingserve:latest \
|
|
--namespace=${NAMESPACE} \
|
|
--replicas=1
|
|
|
|
ks apply ${KF_ENV} -c xgboost-ames
|
|
```
|
|
|
|
## Sample request and response
|
|
Seldon Core uses ambassador to route its requests. To send requests to the model, you can port-forward the ambassador container locally:
|
|
|
|
```
|
|
kubectl port-forward $(kubectl get pods -n ${NAMESPACE} -l service=ambassador -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} 8080:80
|
|
|
|
```
|
|
|
|
Now you are ready to send requests on `localhost:8080`
|
|
|
|
```
|
|
curl -H "Content-Type:application/json" \
|
|
-d '{"data":{"tensor":{"shape":[1,37],"values":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37]}}}' \
|
|
http://localhost:8080/seldon/xgboost-ames/api/v0.1/predictions
|
|
```
|
|
|
|
```
|
|
{
|
|
"meta": {
|
|
"puid": "8buc4oo78m67716m2vevvgtpap",
|
|
"tags": {
|
|
},
|
|
"routing": {
|
|
}
|
|
},
|
|
"data": {
|
|
"names": ["t:0", "t:1"],
|
|
"tensor": {
|
|
"shape": [1, 2],
|
|
"values": [97522.359375, 97522.359375]
|
|
}
|
|
}
|
|
```
|