* Add awscli tools container.
* Add initial readme.
* Add argo skeleton.
* Run a an argo job.
* Artifact support and argo test
* Use built container (#3)
* Fix artifacts and secrets
* Add work in progress tfflow (#14)
* Add kvc deployment to workflow.
* Switch aws repo.
* wip.
* Add working tfflow job.
* Add sidecar that waits for MASTER completion
* Pass in job-name
* Add volumemanager info step
* Add input parameters to step
* Adds nodeaffinity and hostpath
* Add fixes for workflow (#17)
- Use correct images for worker and ps
- Use correct aws keys
- Change volumemanager to mnist
- Comment unused steps
- Fix volume mount to correct containers
* Fix hostpath for tfjob
* Download all mnist files
* added GCS stored artifacts comptability to Argo
* Add initial inference workflow. (#30)
* Initial serving step (#31)
* Adds fixes to initial serving step
* Ready for rough demo: Workflow in working state
* Move conflicting readme.
* Initial commit, everything boots without crashing.
* Working, with some python errors.
* Adding explicit flags
* Working with ins-outs
* Letting training job exit on success
* Adding documentation skeletion
* trying to properly save model
* Almost working
* Working
* Adding export script, refactored to allow model more reusability
* Starting documentation
* little further on docs
* More doc updates, fixing sleep logic
* adding urls for mnist data
* Removing download logic, it's to tied in with build-in tf examples.
* Added argo workflow instructions, minor cleanups.
* Adding mnist client.
* Fixing typos
* Adding instructions for installing components.
* Added ksonnet container
* Adding new entrypoint.
* Added helm install instructions for kvc
* doing things with variables
* Typos.
* Added better namespace support
* S3 refactor.
* Added missing region variables.
* Adding tensorboard support.
* Addding Container for Tensorboard.
* Added temporary flag, added install instructions for CLI.
* Removing invalid ksonnet environment.
* Updating readme
* Cleanup currently unused pieces
* Add missint cluster-role
* Minor cleanup.
* Adding more parameters.
* added changes to allow model to train on multiple workers and fixed some doc typos
* Adding flag to enable/disable model serving. Adding s3 urls as outputs for future querying, renaming info step.
* Adding seperate deployer workflow.
* Split serving working.
* Adding split workflow.
* More parameters.
* updates as to elson comments
* Revert "added changes to allow model to train on multiple workers and fixed s…"
* Initial working pure-s3 workflow.
* Removed wait sidecars.
* Remove unused flag.
* Added part two, minor doc fixes
* Inverted links...
* Adding diff.
* Fix url syntax
* Documentation updates.
* Added AWS Cli
* Parameterized export.
* Fixing image in s3 version.
* Fixed documentation issues.
* KVC snippet changes, need to find last working helm chart.
* Temporarily pinning kvc version.
* working master model and some doc typos fixes (#13)
* added changes to allow model to train on multiple workers and fixed some doc typos
* Adding flag to enable/disable model serving. Adding s3 urls as outputs for future querying, renaming info step.
* Adding seperate deployer workflow.
* Split serving working.
* Adding split workflow.
* More parameters.
* updates as to elson comments
* working master model and some doc typos
* fixes as to Elson
* Removign whitespace differences
* updating diff
* Changing parameters.
* Undoing whitespace.
* Changing termination policy on s3 version due to unknown issue.
* Updating mnist diff.
* Changing train steps.
* Syncing Demo changes.
* Update README.md
* Going S3-native for initial example. Getting rid of Master.
* Minor documentation tweaks, adding params, swapping aws cli for minio.
* Updating KVC version.
* Switching ksonnet repo, removing model name from client.
* Updating git url.
* Adding certificate hack to avoid RBAC errors.
* Pinning KVC to commit while working on PR.
* Updating version.
* Updates README with additional details (#14)
* Updates README with additional details
* Adding clarity to kubectl config commands
* Fixed comma placement
* Refactoring notes for github and kubernetes credentials.
* Forgot to add an overview of the argo template.
* Updating example based on feedback.
- Removed superflous images
- Clarified use of KVC
- Added unaltered model
- Variable cleanup
* Refactored grpc image into generic base image.
* minor cleanup of resubmitting section.
* Switching Argo deployment to ksonnet, conslidating install instructions.
* Removing old cruft, clarifying cluster requirements.
* [WIP] Switching out model (#15)
* Switching to new mnist example.
* Parameterized model, testing export.
* Got CNN model exporting.
* Attempting to do distributed training with Estimator, removed seperate export.
* Adding master back, otherwise Estimator complains about not having a chief.
* Switching to tf.estimator.train_and_evaluate.
* Minor path/var name refactor.
* Adding test data and new client.
* Fixed documentation to reflect new client.
* Getting rid of tf job shim.
* Removing KVC from example, renaming directory
* Modifying parent README
* Removed reference to export.
* Adding reference to export.
* Removing unused Dockerfile.
* Removing uneeded files, simplifying how to get status, refactor model serving workflow step.
* Renaming directory
* Minor doc improvements, removed extra clis.
* Making SSL configurable for clusters without secured s3 endpoints.
* Added a tf-user account for workflow. Fixed serving bug.
* Updating gke version.
* Re-ran through instructions, fixed errata.
* Fixing lint issues
* Pylint errors
* Pylint errors
* Adding parenthesis back.
* pylint Hacks
* Disabling argument filter, model bombs without empty arg.
* Removing unneeded lambdas