Commit Graph

14 Commits

Author SHA1 Message Date
Zhenghui Wang 2d9a1db3aa
update xgboost_synthetic with metadata sdk 0.3.1 (#758) 2020-02-26 16:31:19 -08:00
Jeremy Lewi cc93a80420
Create a notebook for mnist E2E on GCP (#723)
* A notebook to run the mnist E2E example on GCP.

This fixes a number of issues with the example
* Use ISTIO instead of Ambassador to add reverse proxy routes
* The training job needs to be updated to run in a profile created namespace in order to have the required service accounts
     * See kubeflow/examples#713
     * Running inside a notebook running on Kubeflow should ensure user
       is running inside an appropriately setup namespace
* With ISTIO the default RBAC rules prevent the web UI from sending requests to the model server
     * A short term fix was to not include the ISTIO side car
     * In the future we can add an appropriate ISTIO rbac policy

* Using a notebook allows us to eliminate the use of kustomize
  * This resolves kubeflow/examples#713 which required people to use
    and old version of kustomize

  * Rather than using kustomize we can use python f style strings to
    write the YAML specs and then easily substitute in user specific values

  * This should be more informative; it avoids introducing kustomize and
    users can see the resource specs.

* I've opted to make the notebook GCP specific. I think its less confusing
  to users to have separate notebooks focused on specific platforms rather
  than having one notebook with a lot of caveats about what to do under
  different conditions

* I've deleted the kustomize overlays for GCS since we don't want users to
  use them anymore

* I used fairing and kaniko to eliminate the use of docker to build the images
  so that everything can run from a notebook running inside the cluster.

* k8s_utils.py has some reusable functions to add some details from users
  (e.g. low level calls to K8s APIs.)

* * Change the mnist test to just run the notebook
  * Copy the notebook test infra for xgboost_synthetic to py/kubeflow/examples/notebook_test to make it more reusable

* Fix lint.

* Update for lint.

* A notebook to run the mnist E2E example.

Related to: kubeflow/website#1553

* 1. Use fairing to build the model. 2. Construct the YAML spec directly in the notebook. 3. Use the TFJob python SDK.

* Fix the ISTIO rule.

* Fix UI and serving; need to update TF serving to match version trained on.

* Get the IAP endpoint.

* Start writing some helper python functions for K8s.

* Commit before switching from replace to delete.

* Create a library to bulk create objects.

* Cleanup.

* Add back k8s_util.py

* Delete train.yaml; this shouldn't have been aded.

* update the notebook image.

* Refactor code into k8s_util; print out links.

* Clean up the notebok. Should be working E2E.

* Added section to get logs from stackdriver.

* Add comment about profile.

* Latest.

* Override mnist_gcp.ipynb with mnist.ipynb

I accidentally put my latest changes in mnist.ipynb even though that file
was deleted.

* More fixes.

* Resolve some conflicts from the rebase; override with changes on remote branch.
2020-02-16 19:15:28 -08:00
Jeremy Lewi 712c29a18e Fix issues with the xgboost_synthetic example and deploying the model. (#682)
* Fix issues with the xgboost_synthetic example and deploying the model.

* install newer version of fairing
* modify preprocessor to use custom dockerfile
* use newer 0.7 base image.
* Fix endpoint.

Related to:

kubeflow/examples#673 model doesn't deploy its crash looping
Related to kubeflow/examples#655 update example to work with 0.7

* Add some comments to the notebook.
2019-11-25 14:55:10 -08:00
Jeremy Lewi 7e28cd6b23 Update xgboost_synthetic test infra; preliminary updates to work with 0.7.0 (#666)
* Update xgboost_synthetic test infra to use pytest and pyfunc.

* Related to #655 update xgboost_synthetic to use workload identity

* Related to to #665 no signal about xgboost_synthetic

* We need to update the xgboost_synthetic example to work with 0.7.0;
  e.g. workload identity

* This PR focuses on updating the test infra and some preliminary
  updates the notebook

* More fixes to the test and the notebook are probably needed in order
  to get it to actually pass

* Update job spec for 0.7; remove the secret and set the default service
  account.

  * This is to make it work with workload identity

* Instead of using kustomize to define the job to run the notebook we can just modify the YAML spec using python.
* Use the python API for K8s to create the job rather than shelling out.

* Notebook should do a 0.7 compatible check for credentials

  * We don't want to assume GOOGLE_APPLICATION_CREDENTIALS is set
    because we will be using workload identity.

* Take in repos as an argument akin to what checkout_repos.sh requires

* Convert xgboost_test.py to a pytest.

  * This allows us to mark it as expected to fail so we can start to get
    signal without blocking

  * We also need to emit junit files to show up in test grid.

* Convert the jsonnet workflow for the E2E test to a python function to
  define the workflow.

  * Remove the old jsonnet workflow.

* Address comments.

* Fix issues with the notebook
* Install pip packages in user space
  * 0.7.0 images are based on TF images and they have different permissions
* Install a newer version of fairing sdk that works with workload identity

* Split pip installing dependencies out of util.py and into notebook_setup.py

  * That's because util.py could depend on the packages being installed by
    notebook_setup.py

* After pip installing the modules into user space; we need to add the local
  path for pip packages to the python otherwise we get import not found
  errors.
2019-10-24 19:53:38 -07:00
Jin Chi He cfe166f73f update to kubeflow-metadata in examples (#646) 2019-09-26 16:13:34 -07:00
Jin Chi He 628babc66a update kubeflow-fairing commit sha to use job in clusterBuild (#643) 2019-09-20 08:07:00 -07:00
Jin Chi He 78a79e72dc update example to kubeflow-fairing (#637) 2019-09-17 06:36:24 -07:00
Jin Chi He 4f8cf87d4f add testing for xgboost_synthetic (#633) 2019-09-16 15:28:24 -07:00
Jeremy Lewi 5b3016fae9 Fix a bunch of issues with the xgboost_synthetic example (#621)
* Need to add kfmd to requirements.txt because the training code now uses
  kfmd to log data.

* The Dockerfile didn't build with kaniko; it looks like a permission problem
  trying to install python files into the conda directory. The problem appears
  to be fixed by not switching to user root.

* Updte the base docker image to 1.13.

* Remove some references in the notebook to namespace because the fairing
  code should now detect namespace automatically and the notebook will no longer
  be running namespace kubeflow

* When running training in a K8s job; the code will now try to contact the
  metadata server but this can fail if the ISTIO side car hasn't started yet.
  So we need to wait for ISTIO to start; we do this by trying to contact
  the metadata server for up to 3 minutes.

* Add a lot more explanation in the notebook to explain what is happening.

* Related to #619
2019-08-19 16:05:32 -07:00
Zhenghui Wang 22de8cf7c1 Add metadata logging to xgboost-synthetic example (#610)
* meta logging

* lint

* pip install fairing

* update prredict() functuion
2019-08-05 20:45:54 -07:00
Chun-Hsiang Wang 6e5ba488e2 Update readme for xgboost-synthetic and remove outdated yaml file (#605)
* Update readme for xgboost-synthetic and remove outdated yaml file.

* Update the class name to be more general.

* Update readme.

* Set google_application_credentials in the notebook.

* Install fairing from master branch.

* Do not set credentials again.

* Update readme.
2019-07-22 18:20:54 -07:00
Chun-Hsiang Wang fb6cd69def Install pip dependencies and build base image with kaniko (#603)
* Install required pip packages not included in the base package.

* Use Kaniko builder to build the base image first.

* Directly install packages from requirements.txt to be more flexible.
2019-07-18 22:35:12 -07:00
Chun-Hsiang Wang cda6efed27 Include newly trained model in the newly built docker image (#601) (#602) 2019-07-17 19:50:11 -07:00
Chun-Hsiang Wang ac9f2f1238 Add kubecon demo to xgboost_ames_housing directory (#589)
* Add xgboost-ames-housing demo from Kubecon EU 2019.

* fix links in the .ipynb in the xgboost-ames-housing demo

* update to the xgboost demo example from kubecon
- move example to its own directory
- remove unnecessarry files
- modify util and update notebook

* change the names related to kubecon and update readme

* use fairing instead of own fairing_util in the notebook

* remove fairing_util and move the remaining to util instead

* update synthetic data example as comments
- generalize yaml
- remove updating github procedures
- update readme
- rename files

* fix pylint.

* fix pylint.
2019-07-16 10:33:25 -07:00