* Add e2e test for xgboost housing example * fix typo add ks apply add [ modify example to trigger tests add prediction test add xgboost ks param rename the job name without _ use - instead of _ libson params rm redudent component rename component in prow config add ames-hoursing-env use - for all names use _ for params names use xgboost_ames_accross rename component name shorten the name change deploy-test command change to xgboost- namespace init ks app fix type add confest.py change path change deploy command change dep change the query URL for seldon add ks_app with seldon lib update ks_app use ks init only rerun change to kf-v0-4-n00 cluster add ks_app use ks-13 remove --namespace use kubeflow as namespace delete seldon deployment simplify ks_app retry on 503 fix typo query 1285 move deletion after prediction wait 10s always retry till 10 mins move check to retry fix pylint move clean-up to the delete template * set up xgboost component * check in ks component& run it directly * change comments * add comment on why use 'ks delete' * add two modules to pylint whitelist * ignore tf_operator/py * disable pylint per line * reorder import |
||
---|---|---|
.. | ||
ames_dataset | ||
ks_app | ||
seldon_serve | ||
test | ||
Dockerfile | ||
OWNERS | ||
README.md | ||
housing.py | ||
py-claim.yaml | ||
py-job.yaml | ||
py-volume.yaml |
README.md
Ames housing value prediction using XGBoost on Kubeflow
In this example we will demonstrate how to use Kubeflow with XGBoost using the Kaggle Ames Housing Prices prediction. We will do a detailed walk-through of how to implement, train and serve the model. You will be able to run the exact same workload on-prem and/or on any cloud provider. We will be using Google Kubernetes Engine to show how the end-to-end workflow runs on Google Cloud Platform.
Pre-requisites
As a part of running this setup on Google Cloud Platform, make sure you have enabled the Google Kubernetes Engine. In addition to that you will need to install Docker and gcloud. Note that this setup can run on-prem and on any cloud provider, but here we will demonstrate GCP cloud option. Finally, follow the instructions to create a GKE cluster.
Steps
- Kubeflow Setup
- Data Preparation
- Dockerfile
- Model Training on GKE
- Model Export
- Model Serving Locally
- Deploying Model to Kubernetes Cluster
Kubeflow Setup
In this part you will setup Kubeflow on an existing Kubernetes cluster. Checkout the Kubeflow getting started guide.
Data Preparation
You can download the dataset from the Kaggle competition. In order to make it convenient we have uploaded the dataset on GCS
gs://kubeflow-examples-data/ames_dataset/
Dockerfile
We have attached a Dockerfile with this repo which you can use to create a docker image. We have also uploaded the image to gcr.io, which you can use to directly download the image.
IMAGE_NAME=ames-housing
VERSION=latest
Use gcloud
command to get the GCP project
PROJECT_ID=`gcloud config get-value project`
Let's create a docker image from our Dockerfile
docker build -t gcr.io/$PROJECT_ID/${IMAGE_NAME}:${VERSION} .
Once the above command is successful you should be able to see the docker
images on your local machine by running docker images
. Next we will upload the image to
Google Container Registry
gcloud auth configure-docker
docker push gcr.io/${PROJECT_ID}/${IMAGE_NAME}:${VERSION}
A public copy is available at gcr.io/kubeflow-examples/ames-housing:latest
.
Model training on GKE
In this section we will run the above docker container on a Google Kubernetes Engine. There are two steps to perform the training
-
Create a GKE cluster
-
Create a Persistent Volume
-
Follow the instructions here. You will need to run the following
kubectl create
commands in order to get theclaim
attached to thepod
.kubectl create -f py-volume.yaml kubectl create -f py-claim.yaml
-
-
Run docker container on GKE
-
Use the
kubectl
command to run the image on GKEkubectl create -f py-job.yaml
Once the above command finishes you will have an XGBoost model available at Persistent Volume
/mnt/xgboost/housing.dat
.
-
Model Export
The model is exported to the location /tmp/ames/housing.dat
. We will use Seldon Core to serve the model asset. In order to make the model servable we have created xgboost/seldon_serve
with the following assets
HousingServe.py
housing.dat
requirements.txt
Model Serving Locally
We are going to use seldon-core to serve the model. HoussingServe.py contains the code to serve the model. You can find seldon core model wrapping details here. The seldon-core microservice image can be built by the following command.
cd seldon_serve && s2i build . seldonio/seldon-core-s2i-python2:0.4 gcr.io/${PROJECT_ID}/housingserve:latest --loglevel=3
Let's run the docker image locally.
docker run -p 5000:5000 gcr.io/${PROJECT_ID}/housingserve:latest
Now you are ready to send requests on localhost:5000
curl -H "Content-Type: application/x-www-form-urlencoded" -d 'json={"data":{"tensor":{"shape":[1,37],"values":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37]}}}' http://localhost:5000/predict
The response looks like this.
{
"data": {
"names": [
"t:0",
"t:1"
],
"tensor": {
"shape": [
1,
2
],
"values": [
97522.359375,
97522.359375
]
}
}
}
Model serving on GKE
One of the amazing features of Kubernetes is that you can run it anywhere i.e., local, on-prem and cloud. We will show you how to run your code on Google Kubernetes Engine. First off, start a GKE cluster.
Deploy Seldon core to your GKE cluster by following the instructions in the Deploy Seldon Core section here. Once everything is successful you can verify it using kubectl get pods -n ${NAMESPACE}
.
NAME READY STATUS RESTARTS AGE
ambassador-849fb9c8c5-5kx6l 2/2 Running 0 16m
ambassador-849fb9c8c5-pww4j 2/2 Running 0 16m
ambassador-849fb9c8c5-zn6gl 2/2 Running 0 16m
redis-75c969d887-fjqt8 1/1 Running 0 30s
seldon-cluster-manager-6c78b7d6c9-6qhtg 1/1 Running 0 30s
spartakus-volunteer-66cc8ccd5b-9f8tw 1/1 Running 0 16m
tf-hub-0 1/1 Running 0 16m
tf-job-dashboard-7b57c549c8-bfpp8 1/1 Running 0 16m
tf-job-operator-594d8c7ddd-lqn8r 1/1 Running 0 16m
Second, we need to upload our previously built docker image to gcr.io
. A public image is available at gcr.io/kubeflow-examples/housingserve:latest
gcloud auth configure-docker
docker push gcr.io/${PROJECT_ID}/housingserve:latest
Finally, we can deploy the XGBoost model
ks generate seldon-serve-simple-v1alpha2 xgboost-ames \
--name=xgboost-ames \
--image=gcr.io/${PROJECT_ID}/housingserve:latest \
--namespace=${NAMESPACE} \
--replicas=1
ks apply ${KF_ENV} -c xgboost-ames
Sample request and response
Seldon Core uses ambassador to route its requests. To send requests to the model, you can port-forward the ambassador container locally:
kubectl port-forward $(kubectl get pods -n ${NAMESPACE} -l service=ambassador -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} 8080:80
Now you are ready to send requests on localhost:8080
curl -H "Content-Type:application/json" \
-d '{"data":{"tensor":{"shape":[1,37],"values":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37]}}}' \
http://localhost:8080/seldon/xgboost-ames/api/v0.1/predictions
{
"meta": {
"puid": "8buc4oo78m67716m2vevvgtpap",
"tags": {
},
"routing": {
}
},
"data": {
"names": ["t:0", "t:1"],
"tensor": {
"shape": [1, 2],
"values": [97522.359375, 97522.359375]
}
}