examples

Commit Graph

Author	SHA1	Message	Date
David Sabater Dinter	a9c6e69f0e	Lint fixes mnist (#581 ) * Remove modules from .pylintrc * Add lint inline exceptions * Add lint inline exceptions as all as the specific exception is not available for Pylint 1.8 * Fix string formatting logging message and remove unnecessary Pylint exception * Update app.yaml with correct environment details	2019-07-24 19:23:52 -07:00
David Sabater Dinter	7a6dc7b911	[pytorch_mnist] Automate image build (#490 ) * Add build and test presubmit jobs for Pytorch nmist example Keep postsubmit jobs as original release job to push images to examples registry * Refactor all jobs like mnist and GIS, will drop using release jobs * Implement test scripts and Ksonnet artifacts from mnist example to enable E2E tests * Remove release components as they are no longer used * Refactor YAML manifests as Ksonnet components * Update documentation to submit training jobs from Ksonnet * Updated to point to correct component and refactor to PytorchJob * Add seldon image build Add train CPU and GPU in jsonnet to build workflow Add Dockerfile.ksonnet and entrypoint * Commented out calls to tf-util until https://github.com/kubeflow/pytorch-operator/issues/108 is implemented * Refactor to PytorchJob * Add seldon image build Add train CPU and GPU in jsonnet to build workflow Add Dockerfile.ksonnet and entrypoint * Refactor to PytorchJob * Rename workflow to avoid dns issue with "_" * Add TODO note to convert to GRPC * Rename workflow to avoid dns issue with "_" * Rename workflow to avoid dns issue with "_" * Fix path to build Seldon image in Makefile * Fix tabs in Makefile * Fix tabs in Makefile * Fix rule in Makefile * Add sleep in Makefile to wait for docker ps * Change node worker image to have docker * Remove seldon image step from Makefile Add steps to wrap model with Seldon Add boolean flag to build Seldon steps * Add step id build- in jsonnet * Skip pull step for Seldon * Fix wait for in Seldon build * Fix lint errors * Set useimagecache to false first time the pipeline is executed to avoid error * Set contextDir as absolute path for Seldon step * Remove unnecessary argument and Dockerfile in Seldon step * Add absolute path for build in Seldon steps * Include absolute path inside jsonnet hardcoded to GCB /workspace/ Remove setting rootDir from Makefile * Update images with new naming from E2E tests * Change test-worker image version * Update images with new naming from E2E tests * Set useimagecache to true now that we have first images built * Fix cachelist in Seldon build * Fix cachelist in Seldon build * Leverage tf-operator test framework for test_runner As per https://github.com/kubeflow/pytorch-operator/issues/108 * Consolidate testing imports Rename testing package as https://github.com/kubeflow/tf-operator/pull/945 Added correct path to import test framework from tf-operator * Add test framework in PYTHONPATH in build_template * Remove old release jobs to build images * Update stepimage to same as GIS example * Bump up supported Pytorch operator versions from v1alpha2/v1beta1 to v1beta1/v1beta2 to support Kubeflow 0.5 - Refactor training manifests from v1alpha2 to v1beta2 - Update documents * Update KF cluster version to latest to run tests * Update KF cluster zone * Add pylint exception while importing test_runner class from tf-operator * Pass dummy tests to train, deploy and predict Remove no longer used test_data and conftest * Pass dummy tests to train, deploy and predict Remove no longer used test_data and conftest	2019-06-14 16:20:09 -07:00
Craig Sterrett	838ad79898	Fixed typo in README and one bad link Two small fixes I ran into when trying the example. One is a type, it says it's displaying an 8 when it is a 7. The other was a bad link.	2019-02-15 11:14:23 -08:00
David Sabater Dinter	152c38b386	[mnist_pytorch] Optimise build and switch backend from MPI to GLOO (#480 ) * Refactor Python module: - Replace MPI by GLOO as backend to avoid having to recompily Pytorch - Replace DistributedDataParallel() class with official version when using GPUs - Remove unnecessary method to disable logs in workers - Refactor run() * Simplify Dockerfile by using Pytorch 0.4 official image with Cuda and remove mpirun call	2019-01-16 11:38:52 -08:00
Hung-Ting Wen	c83ed09a77	revert back removed v1alpha2 yaml manifests (#475 ) * revert back removed v1alpha2 yaml manifests * Add documentation * Fix format	2019-01-14 17:08:29 -08:00
Hung-Ting Wen	4dda73afbf	Update pytorch_mnist example to use v1beta1 (#445 ) * Add job_mnist_DDP_CPU for v1beta1 * Add job_mnist_DDP_GPU for v1beta1 * Update 02_distributed_training.md to use v1beta1 * Remove pytorch v1alpha2 config * Add missing CPU training config	2019-01-09 05:27:35 -08:00
David Sabater Dinter	38daafa0c3	[mnist_pytorch] Update documentation (#463 ) * Fix link to next section, training the model * Added links to next and previous sections in training the model README * Fix link to previous section, training the model * Remove TODO list	2019-01-08 15:32:51 -08:00
David Sabater Dinter	a1f0d6dfec	Fixed some outdated comments to trigger pushing web-ui and model serve images to gcr.io/kubeflow-examples (#444 )	2018-12-26 15:05:42 -08:00
David Sabater Dinter	f9a707ee85	[pytorch_mnist] Point images back to gcr.io/kubeflow-examples (#360 ) * Point images back to gcr.io/kubeflow-images-public * Point images back to gcr.io/kubeflow-examples * Point images back to gcr.io/kubeflow-examples	2018-11-28 22:48:16 -08:00
David Sabater Dinter	a630fcea34	[mnist_pytorch] fix train image (#342 ) * Default to model trained with CPUs TODO: Enable A/B testing with Seldon to load GPU and CPU models * Checkout 1.0rc1 release as latest Pytorch master seems to have MPI backend detection broken * Track changes in pytorch_mnist/training/ddp/mnist folder to trigger test jobs * Repoint to pull images from gcr.io/kubeflow-ci built during pre-submit * Fix image webui name * Fix logging * Add GCFS to CPU train * Fix logging * Add GCFS to CPU train * Default to model trained with GPUs TODO: Enable A/B testing with Seldon to load GPU and CPU models * Fix Predict() method as Seldon expects 3 arguments * Fix x reference	2018-11-24 13:22:28 -08:00
David Sabater Dinter	a402db1ccc	E2E Pytorch mnist example (#274 ) * Add Pytorch MNIST example * Fix link to Pytorch NMIST example * Fix indentation in README * Fix lint errors * Fix lint errors Add prediction proto files * Add build_image.sh script to build image and push to gcr.io * Add pytorch-mnist-webui-release release through automatic ksonnet package * Fix lint errors * Add pytorch-mnist-webui-release release through automatic ksonnet package * Add PB2 autogenerated files to ignore with Pylint * Fix lint errors * Add official Pytorch DDP examples to ignore with Pylint * Fix lint errors * Update component to web-ui release * Update mount point to kubeflow-gcfs as the example is GCP specific * 01_setup_a_kubeflow_cluster document complete * Test release job while PR is WIP * Reduce workflow name to avoid Argo error: "must be no more than 63 characters" * Fix extra_repos to pull worker image * Fix testing_image using kubeflow-ci rather than kubeflow-releasing * Fix extra_repo, only needs kubeflow/testing * Set build_image.sh executable * Update build_image.sh from CentralDashboard component * Remove old reference to centraldashboard in echo message * Build Pytorch serving image using Python Docker Seldon wrapper rather than s2i: https://github.com/SeldonIO/seldon-core/blob/master/docs/wrappers/python-docker.md * Build Pytorch serving image using Python Docker Seldon wrapper rather than s2i: https://github.com/SeldonIO/seldon-core/blob/master/docs/wrappers/python-docker.md * Add releases for the training and serving images * Add releases for the training and serving images * Fix testing_image using kubeflow-ci rather than kubeflow-releasing * Fix path to Seldon-wrapper build_image.sh * Fix image name in ksonnet parameter * Add 02 distributed training documentation * Add 03 serving the model documentation Update shared persistent reference in 02 distributed training documentation * Add 05 teardown documentation * Add section to test the model is deployed correctly in 03 serving the model * Add 04 querying the model documentation * Fix ks-app to ks_app * Set prow jobs back to postsubmit * Set prow jobs to trigger presubmit to kubeflow-ci and postsubmit to kubeflow-images-public * Change to kubeflow-ci project * Increase timeout limit during image build to compile Pytorch * Increase timeout limit during image build to compile Pytorch * Change build machine type to compile Pytorch for training image * Change build machine type to compile Pytorch for training image * Add OWNERS file to Pytorch example * Fix typo in documentation * Remove checking docker daemon as we are using gcloud build instead * Use logging module rather print() * Remove empty file, replace with .gitignore to keep tmp folder * Add ksonnet application to deploy model server and web-ui Delete model server JSON manifest * Refactor ks-app to ks_app * Parametrise serving_model ksonnet component Default web-ui to use ambassador route to seldon Remove form section in web-ui * Remove default environment from ksonnet application * Update documentation to use ksonnet application * Fix component name in documentation * Consolidate Pytorch train module and build_image.sh script * Consolidate Pytorch train module * Consolidate Pytorch train module * Consolidate Pytorch train module and build_image.sh script * Revert back build_image.sh scripts * Remove duplicates * Consolidate train Dockerflies and build_image.sh script using docker build rather than gcloud * Fix docker build command * Fix docker build command * Fix image name for cpu and gpu train * Consolidate Pytorch train module * Consolidate train Dockerflies and build_image.sh script using docker build rather than gcloud	2018-11-18 14:24:43 -08:00

11 Commits