examples

Commit Graph

Author	SHA1	Message	Date
Jeremy Lewi	cc93a80420	Create a notebook for mnist E2E on GCP (#723 ) * A notebook to run the mnist E2E example on GCP. This fixes a number of issues with the example * Use ISTIO instead of Ambassador to add reverse proxy routes * The training job needs to be updated to run in a profile created namespace in order to have the required service accounts * See kubeflow/examples#713 * Running inside a notebook running on Kubeflow should ensure user is running inside an appropriately setup namespace * With ISTIO the default RBAC rules prevent the web UI from sending requests to the model server * A short term fix was to not include the ISTIO side car * In the future we can add an appropriate ISTIO rbac policy * Using a notebook allows us to eliminate the use of kustomize * This resolves kubeflow/examples#713 which required people to use and old version of kustomize * Rather than using kustomize we can use python f style strings to write the YAML specs and then easily substitute in user specific values * This should be more informative; it avoids introducing kustomize and users can see the resource specs. * I've opted to make the notebook GCP specific. I think its less confusing to users to have separate notebooks focused on specific platforms rather than having one notebook with a lot of caveats about what to do under different conditions * I've deleted the kustomize overlays for GCS since we don't want users to use them anymore * I used fairing and kaniko to eliminate the use of docker to build the images so that everything can run from a notebook running inside the cluster. * k8s_utils.py has some reusable functions to add some details from users (e.g. low level calls to K8s APIs.) * * Change the mnist test to just run the notebook * Copy the notebook test infra for xgboost_synthetic to py/kubeflow/examples/notebook_test to make it more reusable * Fix lint. * Update for lint. * A notebook to run the mnist E2E example. Related to: kubeflow/website#1553 * 1. Use fairing to build the model. 2. Construct the YAML spec directly in the notebook. 3. Use the TFJob python SDK. * Fix the ISTIO rule. * Fix UI and serving; need to update TF serving to match version trained on. * Get the IAP endpoint. * Start writing some helper python functions for K8s. * Commit before switching from replace to delete. * Create a library to bulk create objects. * Cleanup. * Add back k8s_util.py * Delete train.yaml; this shouldn't have been aded. * update the notebook image. * Refactor code into k8s_util; print out links. * Clean up the notebok. Should be working E2E. * Added section to get logs from stackdriver. * Add comment about profile. * Latest. * Override mnist_gcp.ipynb with mnist.ipynb I accidentally put my latest changes in mnist.ipynb even though that file was deleted. * More fixes. * Resolve some conflicts from the rebase; override with changes on remote branch.	2020-02-16 19:15:28 -08:00
Jeremy Lewi	7e28cd6b23	Update xgboost_synthetic test infra; preliminary updates to work with 0.7.0 (#666 ) * Update xgboost_synthetic test infra to use pytest and pyfunc. * Related to #655 update xgboost_synthetic to use workload identity * Related to to #665 no signal about xgboost_synthetic * We need to update the xgboost_synthetic example to work with 0.7.0; e.g. workload identity * This PR focuses on updating the test infra and some preliminary updates the notebook * More fixes to the test and the notebook are probably needed in order to get it to actually pass * Update job spec for 0.7; remove the secret and set the default service account. * This is to make it work with workload identity * Instead of using kustomize to define the job to run the notebook we can just modify the YAML spec using python. * Use the python API for K8s to create the job rather than shelling out. * Notebook should do a 0.7 compatible check for credentials * We don't want to assume GOOGLE_APPLICATION_CREDENTIALS is set because we will be using workload identity. * Take in repos as an argument akin to what checkout_repos.sh requires * Convert xgboost_test.py to a pytest. * This allows us to mark it as expected to fail so we can start to get signal without blocking * We also need to emit junit files to show up in test grid. * Convert the jsonnet workflow for the E2E test to a python function to define the workflow. * Remove the old jsonnet workflow. * Address comments. * Fix issues with the notebook * Install pip packages in user space * 0.7.0 images are based on TF images and they have different permissions * Install a newer version of fairing sdk that works with workload identity * Split pip installing dependencies out of util.py and into notebook_setup.py * That's because util.py could depend on the packages being installed by notebook_setup.py * After pip installing the modules into user space; we need to add the local path for pip packages to the python otherwise we get import not found errors.	2019-10-24 19:53:38 -07:00
David Sabater Dinter	a9c6e69f0e	Lint fixes mnist (#581 ) * Remove modules from .pylintrc * Add lint inline exceptions * Add lint inline exceptions as all as the specific exception is not available for Pylint 1.8 * Fix string formatting logging message and remove unnecessary Pylint exception * Update app.yaml with correct environment details	2019-07-24 19:23:52 -07:00
Hougang Liu	9994b57497	add object detection grpc client (#378 ) * add object detection grpc client Fixes: #377 * fix kubeflow-examples-presubmit error object_detection_grpc_client.py depends on other files in https://github.com/tensorflow/models.git, pylint will fail for those files need to be compiled manually. Since mnist_DDP.py has similar dependency, here just follow mnist_DDP.py and ignore checking this file.	2018-12-06 18:51:24 -08:00
David Sabater Dinter	a630fcea34	[mnist_pytorch] fix train image (#342 ) * Default to model trained with CPUs TODO: Enable A/B testing with Seldon to load GPU and CPU models * Checkout 1.0rc1 release as latest Pytorch master seems to have MPI backend detection broken * Track changes in pytorch_mnist/training/ddp/mnist folder to trigger test jobs * Repoint to pull images from gcr.io/kubeflow-ci built during pre-submit * Fix image webui name * Fix logging * Add GCFS to CPU train * Fix logging * Add GCFS to CPU train * Default to model trained with GPUs TODO: Enable A/B testing with Seldon to load GPU and CPU models * Fix Predict() method as Seldon expects 3 arguments * Fix x reference	2018-11-24 13:22:28 -08:00
David Sabater Dinter	a402db1ccc	E2E Pytorch mnist example (#274 ) * Add Pytorch MNIST example * Fix link to Pytorch NMIST example * Fix indentation in README * Fix lint errors * Fix lint errors Add prediction proto files * Add build_image.sh script to build image and push to gcr.io * Add pytorch-mnist-webui-release release through automatic ksonnet package * Fix lint errors * Add pytorch-mnist-webui-release release through automatic ksonnet package * Add PB2 autogenerated files to ignore with Pylint * Fix lint errors * Add official Pytorch DDP examples to ignore with Pylint * Fix lint errors * Update component to web-ui release * Update mount point to kubeflow-gcfs as the example is GCP specific * 01_setup_a_kubeflow_cluster document complete * Test release job while PR is WIP * Reduce workflow name to avoid Argo error: "must be no more than 63 characters" * Fix extra_repos to pull worker image * Fix testing_image using kubeflow-ci rather than kubeflow-releasing * Fix extra_repo, only needs kubeflow/testing * Set build_image.sh executable * Update build_image.sh from CentralDashboard component * Remove old reference to centraldashboard in echo message * Build Pytorch serving image using Python Docker Seldon wrapper rather than s2i: https://github.com/SeldonIO/seldon-core/blob/master/docs/wrappers/python-docker.md * Build Pytorch serving image using Python Docker Seldon wrapper rather than s2i: https://github.com/SeldonIO/seldon-core/blob/master/docs/wrappers/python-docker.md * Add releases for the training and serving images * Add releases for the training and serving images * Fix testing_image using kubeflow-ci rather than kubeflow-releasing * Fix path to Seldon-wrapper build_image.sh * Fix image name in ksonnet parameter * Add 02 distributed training documentation * Add 03 serving the model documentation Update shared persistent reference in 02 distributed training documentation * Add 05 teardown documentation * Add section to test the model is deployed correctly in 03 serving the model * Add 04 querying the model documentation * Fix ks-app to ks_app * Set prow jobs back to postsubmit * Set prow jobs to trigger presubmit to kubeflow-ci and postsubmit to kubeflow-images-public * Change to kubeflow-ci project * Increase timeout limit during image build to compile Pytorch * Increase timeout limit during image build to compile Pytorch * Change build machine type to compile Pytorch for training image * Change build machine type to compile Pytorch for training image * Add OWNERS file to Pytorch example * Fix typo in documentation * Remove checking docker daemon as we are using gcloud build instead * Use logging module rather print() * Remove empty file, replace with .gitignore to keep tmp folder * Add ksonnet application to deploy model server and web-ui Delete model server JSON manifest * Refactor ks-app to ks_app * Parametrise serving_model ksonnet component Default web-ui to use ambassador route to seldon Remove form section in web-ui * Remove default environment from ksonnet application * Update documentation to use ksonnet application * Fix component name in documentation * Consolidate Pytorch train module and build_image.sh script * Consolidate Pytorch train module * Consolidate Pytorch train module * Consolidate Pytorch train module and build_image.sh script * Revert back build_image.sh scripts * Remove duplicates * Consolidate train Dockerflies and build_image.sh script using docker build rather than gcloud * Fix docker build command * Fix docker build command * Fix image name for cpu and gpu train * Consolidate Pytorch train module * Consolidate train Dockerflies and build_image.sh script using docker build rather than gcloud	2018-11-18 14:24:43 -08:00
Michelle Casbon	063c9a55c8	Add namespace to ksonnet apply command (#57 ) * Add namespace to ksonnet apply command * Resolve lint issues in flask_web/app.py	2018-04-02 09:41:02 -07:00
Michelle Casbon	41372c9314	Add .pylintrc (#61 ) * Add .pylintrc * Resolve lint complaints in agents/trainer/task.py * Resolve lint complaints with flask app.py * Resolve linting issues Remove duplicate seq2seq_utils.py from workflow/workspace/src * Use python 3.5.2 with pylint to match prow Put pybullet import back into agents/trainer/task.py with a pylint ignore statement Use main(_) to ensure it works with tf.app.run	2018-03-29 08:25:02 -07:00

8 Commits