examples

Commit Graph

Author	SHA1	Message	Date
Inki Hwang	8e30631c54	example mnist upgrade to v1alpha2 (#246 ) * example mnist upgrade to v1alpha2 * Remove cleanPodPolicy * Fix kubeflow branch to v0.2.4	2018-09-09 13:01:21 -07:00
Maerville	ce2f1db11e	Fixed distributed training for LINEAR model (#130 ) * Fixed distributed training for LINEAR model * Make line shorter & remove pylint disable unused argument	2018-06-13 11:57:28 -07:00
Julien Stroheker	2d335b1302	minst example - Add Azure instructions / update Argo package (#114 ) * Adding Azure instructions * add auth	2018-05-14 09:33:25 -07:00
Elson Rodriguez	7434bb55ba	Updating mnist example to fix minio compatibility (#108 ) * Updating mnist example to fix minio compatibility * Changing default sa user for ksonnet entrypoint * Updating mnist example based on pr feedback.	2018-04-30 16:14:18 -07:00
Elson Rodriguez	ed60dc5972	Removing uneeded requirements, this was causing pip errors. (#83 )	2018-04-16 08:58:59 -07:00
Elson Rodriguez	1be7ccb142	Fixes #2 : End to end model training/serving example using S3, Argo, and Kubeflow (#42 ) * Add awscli tools container. * Add initial readme. * Add argo skeleton. * Run a an argo job. * Artifact support and argo test * Use built container (#3) * Fix artifacts and secrets * Add work in progress tfflow (#14) * Add kvc deployment to workflow. * Switch aws repo. * wip. * Add working tfflow job. * Add sidecar that waits for MASTER completion * Pass in job-name * Add volumemanager info step * Add input parameters to step * Adds nodeaffinity and hostpath * Add fixes for workflow (#17) - Use correct images for worker and ps - Use correct aws keys - Change volumemanager to mnist - Comment unused steps - Fix volume mount to correct containers * Fix hostpath for tfjob * Download all mnist files * added GCS stored artifacts comptability to Argo * Add initial inference workflow. (#30) * Initial serving step (#31) * Adds fixes to initial serving step * Ready for rough demo: Workflow in working state * Move conflicting readme. * Initial commit, everything boots without crashing. * Working, with some python errors. * Adding explicit flags * Working with ins-outs * Letting training job exit on success * Adding documentation skeletion * trying to properly save model * Almost working * Working * Adding export script, refactored to allow model more reusability * Starting documentation * little further on docs * More doc updates, fixing sleep logic * adding urls for mnist data * Removing download logic, it's to tied in with build-in tf examples. * Added argo workflow instructions, minor cleanups. * Adding mnist client. * Fixing typos * Adding instructions for installing components. * Added ksonnet container * Adding new entrypoint. * Added helm install instructions for kvc * doing things with variables * Typos. * Added better namespace support * S3 refactor. * Added missing region variables. * Adding tensorboard support. * Addding Container for Tensorboard. * Added temporary flag, added install instructions for CLI. * Removing invalid ksonnet environment. * Updating readme * Cleanup currently unused pieces * Add missint cluster-role * Minor cleanup. * Adding more parameters. * added changes to allow model to train on multiple workers and fixed some doc typos * Adding flag to enable/disable model serving. Adding s3 urls as outputs for future querying, renaming info step. * Adding seperate deployer workflow. * Split serving working. * Adding split workflow. * More parameters. * updates as to elson comments * Revert "added changes to allow model to train on multiple workers and fixed s…" * Initial working pure-s3 workflow. * Removed wait sidecars. * Remove unused flag. * Added part two, minor doc fixes * Inverted links... * Adding diff. * Fix url syntax * Documentation updates. * Added AWS Cli * Parameterized export. * Fixing image in s3 version. * Fixed documentation issues. * KVC snippet changes, need to find last working helm chart. * Temporarily pinning kvc version. * working master model and some doc typos fixes (#13) * added changes to allow model to train on multiple workers and fixed some doc typos * Adding flag to enable/disable model serving. Adding s3 urls as outputs for future querying, renaming info step. * Adding seperate deployer workflow. * Split serving working. * Adding split workflow. * More parameters. * updates as to elson comments * working master model and some doc typos * fixes as to Elson * Removign whitespace differences * updating diff * Changing parameters. * Undoing whitespace. * Changing termination policy on s3 version due to unknown issue. * Updating mnist diff. * Changing train steps. * Syncing Demo changes. * Update README.md * Going S3-native for initial example. Getting rid of Master. * Minor documentation tweaks, adding params, swapping aws cli for minio. * Updating KVC version. * Switching ksonnet repo, removing model name from client. * Updating git url. * Adding certificate hack to avoid RBAC errors. * Pinning KVC to commit while working on PR. * Updating version. * Updates README with additional details (#14) * Updates README with additional details * Adding clarity to kubectl config commands * Fixed comma placement * Refactoring notes for github and kubernetes credentials. * Forgot to add an overview of the argo template. * Updating example based on feedback. - Removed superflous images - Clarified use of KVC - Added unaltered model - Variable cleanup * Refactored grpc image into generic base image. * minor cleanup of resubmitting section. * Switching Argo deployment to ksonnet, conslidating install instructions. * Removing old cruft, clarifying cluster requirements. * [WIP] Switching out model (#15) * Switching to new mnist example. * Parameterized model, testing export. * Got CNN model exporting. * Attempting to do distributed training with Estimator, removed seperate export. * Adding master back, otherwise Estimator complains about not having a chief. * Switching to tf.estimator.train_and_evaluate. * Minor path/var name refactor. * Adding test data and new client. * Fixed documentation to reflect new client. * Getting rid of tf job shim. * Removing KVC from example, renaming directory * Modifying parent README * Removed reference to export. * Adding reference to export. * Removing unused Dockerfile. * Removing uneeded files, simplifying how to get status, refactor model serving workflow step. * Renaming directory * Minor doc improvements, removed extra clis. * Making SSL configurable for clusters without secured s3 endpoints. * Added a tf-user account for workflow. Fixed serving bug. * Updating gke version. * Re-ran through instructions, fixed errata. * Fixing lint issues * Pylint errors * Pylint errors * Adding parenthesis back. * pylint Hacks * Disabling argument filter, model bombs without empty arg. * Removing unneeded lambdas	2018-04-06 14:34:09 -07:00

6 Commits