mirror of https://github.com/kubeflow/examples.git
4 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
7a6dc7b911 |
[pytorch_mnist] Automate image build (#490)
* Add build and test presubmit jobs for Pytorch nmist example Keep postsubmit jobs as original release job to push images to examples registry * Refactor all jobs like mnist and GIS, will drop using release jobs * Implement test scripts and Ksonnet artifacts from mnist example to enable E2E tests * Remove release components as they are no longer used * Refactor YAML manifests as Ksonnet components * Update documentation to submit training jobs from Ksonnet * Updated to point to correct component and refactor to PytorchJob * Add seldon image build Add train CPU and GPU in jsonnet to build workflow Add Dockerfile.ksonnet and entrypoint * Commented out calls to tf-util until https://github.com/kubeflow/pytorch-operator/issues/108 is implemented * Refactor to PytorchJob * Add seldon image build Add train CPU and GPU in jsonnet to build workflow Add Dockerfile.ksonnet and entrypoint * Refactor to PytorchJob * Rename workflow to avoid dns issue with "_" * Add TODO note to convert to GRPC * Rename workflow to avoid dns issue with "_" * Rename workflow to avoid dns issue with "_" * Fix path to build Seldon image in Makefile * Fix tabs in Makefile * Fix tabs in Makefile * Fix rule in Makefile * Add sleep in Makefile to wait for docker ps * Change node worker image to have docker * Remove seldon image step from Makefile Add steps to wrap model with Seldon Add boolean flag to build Seldon steps * Add step id build- in jsonnet * Skip pull step for Seldon * Fix wait for in Seldon build * Fix lint errors * Set useimagecache to false first time the pipeline is executed to avoid error * Set contextDir as absolute path for Seldon step * Remove unnecessary argument and Dockerfile in Seldon step * Add absolute path for build in Seldon steps * Include absolute path inside jsonnet hardcoded to GCB /workspace/ Remove setting rootDir from Makefile * Update images with new naming from E2E tests * Change test-worker image version * Update images with new naming from E2E tests * Set useimagecache to true now that we have first images built * Fix cachelist in Seldon build * Fix cachelist in Seldon build * Leverage tf-operator test framework for test_runner As per https://github.com/kubeflow/pytorch-operator/issues/108 * Consolidate testing imports Rename testing package as https://github.com/kubeflow/tf-operator/pull/945 Added correct path to import test framework from tf-operator * Add test framework in PYTHONPATH in build_template * Remove old release jobs to build images * Update stepimage to same as GIS example * Bump up supported Pytorch operator versions from v1alpha2/v1beta1 to v1beta1/v1beta2 to support Kubeflow 0.5 - Refactor training manifests from v1alpha2 to v1beta2 - Update documents * Update KF cluster version to latest to run tests * Update KF cluster zone * Add pylint exception while importing test_runner class from tf-operator * Pass dummy tests to train, deploy and predict Remove no longer used test_data and conftest * Pass dummy tests to train, deploy and predict Remove no longer used test_data and conftest |
|
|
|
f9a707ee85 |
[pytorch_mnist] Point images back to gcr.io/kubeflow-examples (#360)
* Point images back to gcr.io/kubeflow-images-public * Point images back to gcr.io/kubeflow-examples * Point images back to gcr.io/kubeflow-examples |
|
|
|
a630fcea34 |
[mnist_pytorch] fix train image (#342)
* Default to model trained with CPUs TODO: Enable A/B testing with Seldon to load GPU and CPU models * Checkout 1.0rc1 release as latest Pytorch master seems to have MPI backend detection broken * Track changes in pytorch_mnist/training/ddp/mnist folder to trigger test jobs * Repoint to pull images from gcr.io/kubeflow-ci built during pre-submit * Fix image webui name * Fix logging * Add GCFS to CPU train * Fix logging * Add GCFS to CPU train * Default to model trained with GPUs TODO: Enable A/B testing with Seldon to load GPU and CPU models * Fix Predict() method as Seldon expects 3 arguments * Fix x reference |
|
|
|
a402db1ccc |
E2E Pytorch mnist example (#274)
* Add Pytorch MNIST example * Fix link to Pytorch NMIST example * Fix indentation in README * Fix lint errors * Fix lint errors Add prediction proto files * Add build_image.sh script to build image and push to gcr.io * Add pytorch-mnist-webui-release release through automatic ksonnet package * Fix lint errors * Add pytorch-mnist-webui-release release through automatic ksonnet package * Add PB2 autogenerated files to ignore with Pylint * Fix lint errors * Add official Pytorch DDP examples to ignore with Pylint * Fix lint errors * Update component to web-ui release * Update mount point to kubeflow-gcfs as the example is GCP specific * 01_setup_a_kubeflow_cluster document complete * Test release job while PR is WIP * Reduce workflow name to avoid Argo error: "must be no more than 63 characters" * Fix extra_repos to pull worker image * Fix testing_image using kubeflow-ci rather than kubeflow-releasing * Fix extra_repo, only needs kubeflow/testing * Set build_image.sh executable * Update build_image.sh from CentralDashboard component * Remove old reference to centraldashboard in echo message * Build Pytorch serving image using Python Docker Seldon wrapper rather than s2i: https://github.com/SeldonIO/seldon-core/blob/master/docs/wrappers/python-docker.md * Build Pytorch serving image using Python Docker Seldon wrapper rather than s2i: https://github.com/SeldonIO/seldon-core/blob/master/docs/wrappers/python-docker.md * Add releases for the training and serving images * Add releases for the training and serving images * Fix testing_image using kubeflow-ci rather than kubeflow-releasing * Fix path to Seldon-wrapper build_image.sh * Fix image name in ksonnet parameter * Add 02 distributed training documentation * Add 03 serving the model documentation Update shared persistent reference in 02 distributed training documentation * Add 05 teardown documentation * Add section to test the model is deployed correctly in 03 serving the model * Add 04 querying the model documentation * Fix ks-app to ks_app * Set prow jobs back to postsubmit * Set prow jobs to trigger presubmit to kubeflow-ci and postsubmit to kubeflow-images-public * Change to kubeflow-ci project * Increase timeout limit during image build to compile Pytorch * Increase timeout limit during image build to compile Pytorch * Change build machine type to compile Pytorch for training image * Change build machine type to compile Pytorch for training image * Add OWNERS file to Pytorch example * Fix typo in documentation * Remove checking docker daemon as we are using gcloud build instead * Use logging module rather print() * Remove empty file, replace with .gitignore to keep tmp folder * Add ksonnet application to deploy model server and web-ui Delete model server JSON manifest * Refactor ks-app to ks_app * Parametrise serving_model ksonnet component Default web-ui to use ambassador route to seldon Remove form section in web-ui * Remove default environment from ksonnet application * Update documentation to use ksonnet application * Fix component name in documentation * Consolidate Pytorch train module and build_image.sh script * Consolidate Pytorch train module * Consolidate Pytorch train module * Consolidate Pytorch train module and build_image.sh script * Revert back build_image.sh scripts * Remove duplicates * Consolidate train Dockerflies and build_image.sh script using docker build rather than gcloud * Fix docker build command * Fix docker build command * Fix image name for cpu and gpu train * Consolidate Pytorch train module * Consolidate train Dockerflies and build_image.sh script using docker build rather than gcloud |