* A notebook to run the mnist E2E example on GCP.
This fixes a number of issues with the example
* Use ISTIO instead of Ambassador to add reverse proxy routes
* The training job needs to be updated to run in a profile created namespace in order to have the required service accounts
* See kubeflow/examples#713
* Running inside a notebook running on Kubeflow should ensure user
is running inside an appropriately setup namespace
* With ISTIO the default RBAC rules prevent the web UI from sending requests to the model server
* A short term fix was to not include the ISTIO side car
* In the future we can add an appropriate ISTIO rbac policy
* Using a notebook allows us to eliminate the use of kustomize
* This resolveskubeflow/examples#713 which required people to use
and old version of kustomize
* Rather than using kustomize we can use python f style strings to
write the YAML specs and then easily substitute in user specific values
* This should be more informative; it avoids introducing kustomize and
users can see the resource specs.
* I've opted to make the notebook GCP specific. I think its less confusing
to users to have separate notebooks focused on specific platforms rather
than having one notebook with a lot of caveats about what to do under
different conditions
* I've deleted the kustomize overlays for GCS since we don't want users to
use them anymore
* I used fairing and kaniko to eliminate the use of docker to build the images
so that everything can run from a notebook running inside the cluster.
* k8s_utils.py has some reusable functions to add some details from users
(e.g. low level calls to K8s APIs.)
* * Change the mnist test to just run the notebook
* Copy the notebook test infra for xgboost_synthetic to py/kubeflow/examples/notebook_test to make it more reusable
* Fix lint.
* Update for lint.
* A notebook to run the mnist E2E example.
Related to: kubeflow/website#1553
* 1. Use fairing to build the model. 2. Construct the YAML spec directly in the notebook. 3. Use the TFJob python SDK.
* Fix the ISTIO rule.
* Fix UI and serving; need to update TF serving to match version trained on.
* Get the IAP endpoint.
* Start writing some helper python functions for K8s.
* Commit before switching from replace to delete.
* Create a library to bulk create objects.
* Cleanup.
* Add back k8s_util.py
* Delete train.yaml; this shouldn't have been aded.
* update the notebook image.
* Refactor code into k8s_util; print out links.
* Clean up the notebok. Should be working E2E.
* Added section to get logs from stackdriver.
* Add comment about profile.
* Latest.
* Override mnist_gcp.ipynb with mnist.ipynb
I accidentally put my latest changes in mnist.ipynb even though that file
was deleted.
* More fixes.
* Resolve some conflicts from the rebase; override with changes on remote branch.
* Fix issues with the xgboost_synthetic example and deploying the model.
* install newer version of fairing
* modify preprocessor to use custom dockerfile
* use newer 0.7 base image.
* Fix endpoint.
Related to:
kubeflow/examples#673 model doesn't deploy its crash looping
Related to kubeflow/examples#655 update example to work with 0.7
* Add some comments to the notebook.
* Update xgboost_synthetic test infra to use pytest and pyfunc.
* Related to #655 update xgboost_synthetic to use workload identity
* Related to to #665 no signal about xgboost_synthetic
* We need to update the xgboost_synthetic example to work with 0.7.0;
e.g. workload identity
* This PR focuses on updating the test infra and some preliminary
updates the notebook
* More fixes to the test and the notebook are probably needed in order
to get it to actually pass
* Update job spec for 0.7; remove the secret and set the default service
account.
* This is to make it work with workload identity
* Instead of using kustomize to define the job to run the notebook we can just modify the YAML spec using python.
* Use the python API for K8s to create the job rather than shelling out.
* Notebook should do a 0.7 compatible check for credentials
* We don't want to assume GOOGLE_APPLICATION_CREDENTIALS is set
because we will be using workload identity.
* Take in repos as an argument akin to what checkout_repos.sh requires
* Convert xgboost_test.py to a pytest.
* This allows us to mark it as expected to fail so we can start to get
signal without blocking
* We also need to emit junit files to show up in test grid.
* Convert the jsonnet workflow for the E2E test to a python function to
define the workflow.
* Remove the old jsonnet workflow.
* Address comments.
* Fix issues with the notebook
* Install pip packages in user space
* 0.7.0 images are based on TF images and they have different permissions
* Install a newer version of fairing sdk that works with workload identity
* Split pip installing dependencies out of util.py and into notebook_setup.py
* That's because util.py could depend on the packages being installed by
notebook_setup.py
* After pip installing the modules into user space; we need to add the local
path for pip packages to the python otherwise we get import not found
errors.
* Need to add kfmd to requirements.txt because the training code now uses
kfmd to log data.
* The Dockerfile didn't build with kaniko; it looks like a permission problem
trying to install python files into the conda directory. The problem appears
to be fixed by not switching to user root.
* Updte the base docker image to 1.13.
* Remove some references in the notebook to namespace because the fairing
code should now detect namespace automatically and the notebook will no longer
be running namespace kubeflow
* When running training in a K8s job; the code will now try to contact the
metadata server but this can fail if the ISTIO side car hasn't started yet.
So we need to wait for ISTIO to start; we do this by trying to contact
the metadata server for up to 3 minutes.
* Add a lot more explanation in the notebook to explain what is happening.
* Related to #619
* Update readme for xgboost-synthetic and remove outdated yaml file.
* Update the class name to be more general.
* Update readme.
* Set google_application_credentials in the notebook.
* Install fairing from master branch.
* Do not set credentials again.
* Update readme.
* Install required pip packages not included in the base package.
* Use Kaniko builder to build the base image first.
* Directly install packages from requirements.txt to be more flexible.
* Add xgboost-ames-housing demo from Kubecon EU 2019.
* fix links in the .ipynb in the xgboost-ames-housing demo
* update to the xgboost demo example from kubecon
- move example to its own directory
- remove unnecessarry files
- modify util and update notebook
* change the names related to kubecon and update readme
* use fairing instead of own fairing_util in the notebook
* remove fairing_util and move the remaining to util instead
* update synthetic data example as comments
- generalize yaml
- remove updating github procedures
- update readme
- rename files
* fix pylint.
* fix pylint.