Fix a bunch of issues with the xgboost_synthetic example (#621)

* Need to add kfmd to requirements.txt because the training code now uses
  kfmd to log data.

* The Dockerfile didn't build with kaniko; it looks like a permission problem
  trying to install python files into the conda directory. The problem appears
  to be fixed by not switching to user root.

* Updte the base docker image to 1.13.

* Remove some references in the notebook to namespace because the fairing
  code should now detect namespace automatically and the notebook will no longer
  be running namespace kubeflow

* When running training in a K8s job; the code will now try to contact the
  metadata server but this can fail if the ISTIO side car hasn't started yet.
  So we need to wait for ISTIO to start; we do this by trying to contact
  the metadata server for up to 3 minutes.

* Add a lot more explanation in the notebook to explain what is happening.

* Related to #619
This commit is contained in:
Jeremy Lewi 2019-08-19 16:05:32 -07:00 committed by Kubernetes Prow Robot
parent 2acf34f916
commit 5b3016fae9
5 changed files with 1293 additions and 545 deletions

View File

@ -4,4 +4,6 @@
**/__pycache__
*.zip
mlpipeline-metrics.json
mlpipeline-ui-metadata.json
mlpipeline-ui-metadata.json
build-train-deploy.py
**/.dat

View File

@ -3,21 +3,7 @@
# This docker image is based on existing notebook image
# It also includes the dependencies required for training and deploying
# this way we can use it as the base image
FROM gcr.io/kubeflow-images-public/tensorflow-1.12.0-notebook-cpu:v0.5.0
USER root
FROM gcr.io/kubeflow-images-public/tensorflow-1.13.1-notebook-cpu:v0.5.0
COPY requirements.txt .
RUN pip3 --no-cache-dir install -r requirements.txt
RUN apt-get update -y
RUN apt-get install -y emacs
RUN pip3 install https://storage.googleapis.com/ml-pipeline/release/0.1.20/kfp.tar.gz
# Checkout kubeflow/testing because we use some of its utilities
RUN mkdir -p /src/kubeflow && \
cd /src/kubeflow && \
git clone https://github.com/kubeflow/testing.git testing
USER jovyan

File diff suppressed because it is too large Load Diff

Binary file not shown.

After

Width:  |  Height:  |  Size: 112 KiB

View File

@ -3,6 +3,7 @@ fire
gitpython
google-cloud-storage
joblib
kfmd
numpy
pandas
retrying