Commit Graph

5 Commits

Author SHA1 Message Date
David Sabater Dinter a9c6e69f0e Lint fixes mnist (#581)
* Remove modules from .pylintrc

* Add lint inline exceptions

* Add lint inline exceptions as all as the specific exception is not available for Pylint 1.8

* Fix string formatting logging message and remove unnecessary Pylint exception

* Update app.yaml with correct environment details
2019-07-24 19:23:52 -07:00
David Sabater Dinter 7a6dc7b911 [pytorch_mnist] Automate image build (#490)
* Add build and test presubmit jobs for Pytorch nmist example
Keep postsubmit jobs as original release job to push images to examples registry

* Refactor all jobs like mnist and GIS, will drop using release jobs

* Implement test scripts and Ksonnet artifacts from mnist example to enable E2E tests

* Remove release components as they are no longer used

* Refactor YAML manifests as Ksonnet components

* Update documentation to submit training jobs from Ksonnet

* Updated to point to correct component and refactor to PytorchJob

* Add seldon image build
Add train CPU and GPU in jsonnet to build workflow
Add Dockerfile.ksonnet and entrypoint

* Commented out calls to tf-util until
https://github.com/kubeflow/pytorch-operator/issues/108 is implemented

* Refactor to PytorchJob

* Add seldon image build
Add train CPU and GPU in jsonnet to build workflow
Add Dockerfile.ksonnet and entrypoint

* Refactor to PytorchJob

* Rename workflow to avoid dns issue with "_"

* Add TODO note to convert to GRPC

* Rename workflow to avoid dns issue with "_"

* Rename workflow to avoid dns issue with "_"

* Fix path to build Seldon image in Makefile

* Fix tabs in Makefile

* Fix tabs in Makefile

* Fix rule in Makefile

* Add sleep in Makefile to wait for docker ps

* Change node worker image to have docker

* Remove seldon image step from Makefile
Add steps to wrap model with Seldon
Add boolean flag to build Seldon steps

* Add step id build- in jsonnet

* Skip pull step for Seldon

* Fix wait for in Seldon build

* Fix lint errors

* Set useimagecache to false first time the pipeline is executed to avoid error

* Set contextDir as absolute path for Seldon step

* Remove unnecessary argument and Dockerfile in Seldon step

* Add absolute path for build in Seldon steps

* Include absolute path inside jsonnet hardcoded to GCB /workspace/
Remove setting rootDir from Makefile

* Update images with new naming from E2E tests

* Change test-worker image version

* Update images with new naming from E2E tests

* Set useimagecache to true now that we have first images built

* Fix cachelist in Seldon build

* Fix cachelist in Seldon build

* Leverage tf-operator test framework for test_runner
As per https://github.com/kubeflow/pytorch-operator/issues/108

* Consolidate testing imports
Rename testing package as https://github.com/kubeflow/tf-operator/pull/945
Added correct path to import test framework from tf-operator

* Add test framework in PYTHONPATH in build_template

* Remove old release jobs to build images

* Update stepimage to same as GIS example

* Bump up supported Pytorch operator versions from v1alpha2/v1beta1 to v1beta1/v1beta2 to support Kubeflow 0.5
- Refactor training manifests from v1alpha2 to v1beta2
- Update documents

* Update KF cluster version to latest to run tests

* Update KF cluster zone

* Add pylint exception while importing test_runner class from tf-operator

* Pass dummy tests to train, deploy and predict
Remove no longer used test_data and conftest

* Pass dummy tests to train, deploy and predict
Remove no longer used test_data and conftest
2019-06-14 16:20:09 -07:00
David Sabater Dinter f9a707ee85 [pytorch_mnist] Point images back to gcr.io/kubeflow-examples (#360)
* Point images back to gcr.io/kubeflow-images-public

* Point images back to gcr.io/kubeflow-examples

* Point images back to gcr.io/kubeflow-examples
2018-11-28 22:48:16 -08:00
David Sabater Dinter a630fcea34 [mnist_pytorch] fix train image (#342)
* Default to model trained with CPUs
TODO: Enable A/B testing with Seldon to load GPU and CPU models

* Checkout 1.0rc1 release as latest Pytorch master seems to have MPI backend detection broken

* Track changes in pytorch_mnist/training/ddp/mnist folder to trigger test jobs

* Repoint to pull images from gcr.io/kubeflow-ci built during pre-submit

* Fix image webui name

* Fix logging

* Add GCFS to CPU train

* Fix logging

* Add GCFS to CPU train

* Default to model trained with GPUs
TODO: Enable A/B testing with Seldon to load GPU and CPU models

* Fix Predict() method as Seldon expects 3 arguments

* Fix x reference
2018-11-24 13:22:28 -08:00
David Sabater Dinter a402db1ccc E2E Pytorch mnist example (#274)
* Add Pytorch MNIST example

* Fix link to Pytorch NMIST example

* Fix indentation in README

* Fix lint errors

* Fix lint errors
Add prediction proto files

* Add build_image.sh script to build image and push to gcr.io

* Add pytorch-mnist-webui-release release through automatic ksonnet package

* Fix lint errors

* Add pytorch-mnist-webui-release release through automatic ksonnet package

* Add PB2 autogenerated files to ignore with Pylint

* Fix lint errors

* Add official Pytorch DDP examples to ignore with Pylint

* Fix lint errors

* Update component to web-ui release

* Update mount point to kubeflow-gcfs as the example is GCP specific

* 01_setup_a_kubeflow_cluster document complete

* Test release job while PR is WIP

* Reduce workflow name to avoid Argo error:
"must be no more than 63 characters"

* Fix extra_repos to pull worker image

* Fix testing_image using kubeflow-ci rather than kubeflow-releasing

* Fix extra_repo, only needs kubeflow/testing

* Set build_image.sh executable

* Update build_image.sh from CentralDashboard component

* Remove old reference to centraldashboard in echo message

* Build Pytorch serving image using Python Docker Seldon wrapper rather than s2i:
https://github.com/SeldonIO/seldon-core/blob/master/docs/wrappers/python-docker.md

* Build Pytorch serving image using Python Docker Seldon wrapper rather than s2i:
https://github.com/SeldonIO/seldon-core/blob/master/docs/wrappers/python-docker.md

* Add releases for the training and serving images

* Add releases for the training and serving images

* Fix testing_image using kubeflow-ci rather than kubeflow-releasing

* Fix path to Seldon-wrapper build_image.sh

* Fix image name in ksonnet parameter

* Add 02 distributed training documentation

* Add 03 serving the model documentation
Update shared persistent reference in 02 distributed training documentation

* Add 05 teardown documentation

* Add section to test the model is deployed correctly in 03 serving the model

* Add 04 querying the model documentation

* Fix ks-app to ks_app

* Set prow jobs back to postsubmit

* Set prow jobs to trigger presubmit to kubeflow-ci and postsubmit to
kubeflow-images-public

* Change to kubeflow-ci project

* Increase timeout limit during image build to compile Pytorch

* Increase timeout limit during image build to compile Pytorch

* Change build machine type to compile Pytorch for training image

* Change build machine type to compile Pytorch for training image

* Add OWNERS file to Pytorch example

* Fix typo in documentation

* Remove checking docker daemon as we are using gcloud build instead

* Use logging module rather print()

* Remove empty file, replace with .gitignore to keep tmp folder

* Add ksonnet application to deploy model server and web-ui
Delete model server JSON manifest

* Refactor ks-app to ks_app

* Parametrise serving_model ksonnet component
Default web-ui to use ambassador route to seldon
Remove form section in web-ui

* Remove default environment from ksonnet application

* Update documentation to use ksonnet application

* Fix component name in documentation

* Consolidate Pytorch train module and build_image.sh script

* Consolidate Pytorch train module

* Consolidate Pytorch train module

* Consolidate Pytorch train module and build_image.sh script

* Revert back build_image.sh scripts

* Remove duplicates

* Consolidate train Dockerflies and build_image.sh script using docker build rather than gcloud

* Fix docker build command

* Fix docker build command

* Fix image name for cpu and gpu train

* Consolidate Pytorch train module

* Consolidate train Dockerflies and build_image.sh script using docker build rather than gcloud
2018-11-18 14:24:43 -08:00