Commit Graph

82 Commits

Author SHA1 Message Date
Lun-Kai Hsu 12b00f2921 send request to service directly (#85) 2018-04-18 14:09:00 -07:00
Elson Rodriguez ed60dc5972 Removing uneeded requirements, this was causing pip errors. (#83) 2018-04-16 08:58:59 -07:00
Ankush Agarwal 42926a8e98 Fix IssueSummarization.py typo (#80)
/cc @jlewi
Fixes #78
2018-04-09 12:40:09 -07:00
Ankush Agarwal d01d8435bf Use ambassador to talk to the frontend ui (#71)
* Create a ksonnet app component to deploy to k8s
2018-04-06 21:50:08 -07:00
Ankush Agarwal 9f6ccde03f Polish the github issue summarization UI (#69)
* Polish the github issue summarization UI

* Add kubeflow footer
2018-04-06 21:45:08 -07:00
Ankush Agarwal e3b826a5af Rename issue_summarization.py to IssueSummarization.py (#68)
* Rename issue_summarization.py to IssueSummarization.py

* The module name is supposed to be the same as the class name
* Fix the predict method signature

* Fix lint
2018-04-06 21:40:08 -07:00
Christopher Beitel a4576e48f1 Restore runnability of example; vendor agnostic storage (#72)
* Updates to the demo docs (notebook and readme) which were out-dated in multiple places

* Removed unused tools/ dir
* Update main readme to reference the example

* Inclusion of kubeflow vendor/ tf-job code

* Illustrates how logging and rendering to an attached volume can simplify the process of viewing logs with TensorHub and exploring
render outputs.

* Storage in user-space allows it to be sym-linked into directory tree
watched by TensorHub extension (which is running tensorboard --logdir=/home/jovyan)

* I anticipate this current approach to controlling volume mounts for NFS through
ksonnet to be replaced by doing so with python as I demonstrated in
the enhance example so I wouldn't lose sleep over the ksonnet
prototypes in this commit.
2018-04-06 21:37:08 -07:00
Elson Rodriguez 1be7ccb142 Fixes #2: End to end model training/serving example using S3, Argo, and Kubeflow (#42)
* Add awscli tools container.

* Add initial readme.

* Add argo skeleton.

* Run a an argo job.

* Artifact support and argo test

* Use built container (#3)

* Fix artifacts and secrets

* Add work in progress tfflow (#14)

* Add kvc deployment to workflow.

* Switch aws repo.

* wip.

* Add working tfflow job.

* Add sidecar that waits for MASTER completion

* Pass in job-name

* Add volumemanager info step

* Add input parameters to step

* Adds nodeaffinity and hostpath

* Add fixes for workflow (#17)

- Use correct images for worker and ps
- Use correct aws keys
- Change volumemanager to mnist
- Comment unused steps
- Fix volume mount to correct containers

* Fix hostpath for tfjob

* Download all mnist files

* added GCS stored artifacts comptability to Argo

* Add initial inference workflow. (#30)

* Initial serving step (#31)

* Adds fixes to initial serving step

* Ready for rough demo: Workflow in working state

* Move conflicting readme.

* Initial commit, everything boots without crashing.

* Working, with some python errors.

* Adding explicit flags

* Working with ins-outs

* Letting training job exit on success

* Adding documentation skeletion

* trying to properly save model

* Almost working

* Working

* Adding export script, refactored to allow model more reusability

* Starting documentation

* little further on docs

* More doc updates, fixing sleep logic

* adding urls for mnist data

* Removing download logic, it's to tied in with build-in tf examples.

* Added argo workflow instructions, minor cleanups.

* Adding mnist client.

* Fixing typos

* Adding instructions for installing components.

* Added ksonnet container

* Adding new entrypoint.

* Added helm install instructions for kvc

* doing things with variables

* Typos.

* Added better namespace support

* S3 refactor.

* Added missing region variables.

* Adding tensorboard support.

* Addding Container for Tensorboard.

* Added temporary flag, added install instructions for CLI.

* Removing invalid ksonnet environment.

* Updating readme

* Cleanup currently unused pieces

* Add missint cluster-role

* Minor cleanup.

* Adding more parameters.

* added changes to allow model to train on multiple workers and fixed some doc typos

* Adding flag to enable/disable model serving. Adding s3 urls as outputs for future querying, renaming info step.

* Adding seperate deployer workflow.

* Split serving working.

* Adding split workflow.

* More parameters.

* updates as to elson comments

* Revert "added changes to allow model to train on multiple workers and fixed s…"

* Initial working pure-s3 workflow.

* Removed wait sidecars.

* Remove unused flag.

* Added part two, minor doc fixes

* Inverted links...

* Adding diff.

* Fix url syntax

* Documentation updates.

* Added AWS Cli

* Parameterized export.

* Fixing image in s3 version.

* Fixed documentation issues.

* KVC snippet changes, need to find last working helm chart.

* Temporarily pinning kvc version.

* working master model and some doc typos fixes (#13)

* added changes to allow model to train on multiple workers and fixed some doc typos

* Adding flag to enable/disable model serving. Adding s3 urls as outputs for future querying, renaming info step.

* Adding seperate deployer workflow.

* Split serving working.

* Adding split workflow.

* More parameters.

* updates as to elson comments

* working master model and some doc typos

* fixes as to Elson

* Removign whitespace differences

* updating diff

* Changing parameters.

* Undoing whitespace.

* Changing termination policy on s3 version due to unknown issue.

* Updating mnist diff.

* Changing train steps.

* Syncing Demo changes.

* Update README.md

* Going S3-native for initial example. Getting rid of Master.

* Minor documentation tweaks, adding params, swapping aws cli for minio.

* Updating KVC version.

* Switching ksonnet repo, removing model name from client.

* Updating git url.

* Adding certificate hack to avoid RBAC errors.

* Pinning KVC to commit while working on PR.

* Updating version.

* Updates README with additional details (#14)

* Updates README with additional details

* Adding clarity to kubectl config commands

* Fixed comma placement

* Refactoring notes for github and kubernetes credentials.

* Forgot to add an overview of the argo template.

* Updating example based on feedback.

- Removed superflous images
- Clarified use of KVC
- Added unaltered model
- Variable cleanup

* Refactored grpc image into generic base image.

* minor cleanup of resubmitting section.

* Switching Argo deployment to ksonnet, conslidating install instructions.

* Removing old cruft, clarifying cluster requirements.

* [WIP] Switching out model (#15)

* Switching to new mnist example.

* Parameterized model, testing export.

* Got CNN model exporting.

* Attempting to do distributed training with Estimator, removed seperate export.

* Adding master back, otherwise Estimator complains about not having a chief.

* Switching to tf.estimator.train_and_evaluate.

* Minor path/var name refactor.

* Adding test data and new client.

* Fixed documentation to reflect new client.

* Getting rid of tf job shim.

* Removing KVC from example, renaming directory

* Modifying parent README

* Removed reference to export.

* Adding reference to export.

* Removing unused Dockerfile.

* Removing uneeded files, simplifying how to get status, refactor model serving workflow step.

* Renaming directory

* Minor doc improvements, removed extra clis.

* Making SSL configurable for clusters without secured s3 endpoints.

* Added a tf-user account for workflow. Fixed serving bug.

* Updating gke version.

* Re-ran through instructions, fixed errata.

* Fixing lint issues

* Pylint errors

* Pylint errors

* Adding parenthesis back.

* pylint Hacks

* Disabling argument filter, model bombs without empty arg.

* Removing unneeded lambdas
2018-04-06 14:34:09 -07:00
Michelle Casbon 063c9a55c8 Add namespace to ksonnet apply command (#57)
* Add namespace to ksonnet apply command

* Resolve lint issues in flask_web/app.py
2018-04-02 09:41:02 -07:00
Ankush Agarwal 1c72cf942f Move from mlkube-testing to kubeflow-ci for test-infra (#65)
Fixes https://github.com/kubeflow/examples/issues/63
2018-03-29 15:25:03 -07:00
Michelle Casbon 97442684ea Add cwbeitel reviewer & texasmichelle approver (#64) 2018-03-29 15:21:03 -07:00
Ankush Agarwal b24152cf06 Github Issue Summarization - Train using TFJob (#55)
* Github Issue Summarization - Train using TFJob

* Create a Dockerfile to build the image for tf-job
* Create a manifest to deploy the tf-job
* Create instructions on how to do all of this

Fixes https://github.com/kubeflow/examples/issues/43

* Address comments

* Add gcloud commands
* Add ks app
* Update Dockerfile base image
* Python train.py fixes

* Remove tfjob.yaml as it is replaced by ksonnet app

* Remove plot_model_history as it is not required for tfjob training

* Don't change WORKDIR

* Address reviewer comments

* Fix links

* Fix lint issues using yapf

* Sort imports
2018-03-29 13:37:04 -07:00
Michelle Casbon 41372c9314 Add .pylintrc (#61)
* Add .pylintrc

* Resolve lint complaints in agents/trainer/task.py

* Resolve lint complaints with flask app.py

* Resolve linting issues

Remove duplicate seq2seq_utils.py from workflow/workspace/src

* Use python 3.5.2 with pylint to match prow

Put pybullet import back into agents/trainer/task.py with a pylint ignore statement
Use main(_) to ensure it works with tf.app.run
2018-03-29 08:25:02 -07:00
Michelle Casbon 1d6946ead8 [GitHub Issue Summarization] (very) simple front-end web app (#53)
* Add barebones frontend

Add instructions for querying the trained model via a simple frontend
deployed locally.

* Add instructions for running the ui in-cluster

TODO: Resolve ksonnet namespace collisions for deployed-service
prototype

* Remove reference to running trained model locally
2018-03-21 15:22:04 -07:00
Hamel Husain 611e98ef1e Update Training.ipynb (#52)
Added Model Evaluation.  Deleted Table of Contents because you need Jupyter Extension to update that, so not worth it.
2018-03-19 16:08:01 -07:00
Hamel Husain 2ec3b03ed4 Update seq2seq_utils.py (#51)
Found a mistake with calculation of BLEU Score.
2018-03-18 12:25:58 -07:00
Ankush Agarwal 45255b52e3 Add instructions to deploy the seldon core model (#46)
Update the issue summarization end to end tutorial
to deploy the seldon core model to the k8s cluster

Update the sample request and response

Related to https://github.com/kubeflow/examples/issues/11
2018-03-15 14:31:23 -07:00
Ankush Agarwal 96c11b03cc ks upgrade test/workflows and agents/app (#49) 2018-03-15 14:24:24 -07:00
Michelle Casbon c50cda05ee Add file copy instructions after training (#47)
* Add file copy instructions after training

Fix broken link in cluster setup
Fix broken env variable in Training notebook
Change notebook name from Tutorial to Training

* Fix app selector value
2018-03-14 19:14:21 -07:00
Michelle Casbon 8ec9bac09e Add detail to cluster setup instructions (#44)
* Fix folder link

* Add detail to cluster setup instructions

Add a link to the image for this example.
In Tutorial.ipynb, move mounted directory into a variable to help avoid collisions on shared clusters.
2018-03-11 22:29:11 -07:00
Jeremy Lewi 0837557219
Merge pull request #30 from cwbeitel/agents
Reinforcement learning example with TensorFlow Agents
2018-03-09 14:59:59 -08:00
Ankush Agarwal d1a2adfb01 Move from a custom tornado server to a seldon-core server for serving the model (#36)
* Create a end-to-end kubeflow example using seq2seq model (4/n)

* Move from a custom tornado server to a seldon-core model

Related to #11

* Update to use gcr.io registry for serving image
2018-03-09 14:36:12 -08:00
Pascal Vicaire db358557dd Example workflow for Github issue summarization. (#35)
* Example workflow for Github issue summarization.

* Fixing quotes in README.md

* Fixing typo in README.md
2018-03-08 16:03:10 -08:00
Jeremy Lewi 0020daee7f
Merge pull request #31 from ankushagarwal/issue_summarization_serving
Create a simple tornado server to serve the model

TODO: Create a docker image for the server and deploy on kubeflow

Related to #11
2018-03-08 06:17:55 -08:00
cwbeitel 64de15f447 Remove unused Dockerfile 2018-03-08 04:33:51 +00:00
cwbeitel f6bb597dba Remove unnecessary demo container
- Previously instructed users to build demo container via doc/Dockerfile.
- Since rendering isn't working in the notebook and the rest of the installed dependencies are now available in the base tensorflow-notebook container building a container isn't necessary.
- Users are now instructed to run the base tensorflow-notebook-cpu container and clone the example code with git.
- The git clone command refers to https://github.com/kubeflow/examples instead of the URL of this fork so the docs will be incorrect in that regard until this is merged into master. Optionally until then we can add an instruction to switch branch.
2018-03-08 04:16:49 +00:00
cwbeitel fbf43567d8 Remove demo container refs, assume users build 2018-03-08 04:00:31 +00:00
Ankush Agarwal ae774e9658
Link to issue 2018-03-07 15:34:30 -08:00
Ankush Agarwal 910a15d258
Add comment 2018-03-07 15:32:50 -08:00
Ankush Agarwal 5f741ed851
README update 2018-03-07 09:32:59 -08:00
Ankush Agarwal ae6828cf3f
Create a end-to-end kubeflow example using seq2seq model (3/n)
* Create a simple tornado server to serve the model
* TODO: Create a docker image for the server and deploy on kubeflow

Related to https://github.com/kubeflow/examples/issues/11
2018-03-07 09:27:38 -08:00
cwbeitel 4e1e42a088 reinforcement learning example with TensorFlow Agents
see also https://github.com/cwbeitel/examples-old for previous PR
2018-03-05 00:36:49 +00:00
Michelle Casbon 8be733c24f
Merge pull request #6 from puneith/contributing
Contributing
2018-03-02 09:06:12 -08:00
Michelle Casbon c795b7d091 Add issue link for GH issue summarization 2018-03-02 09:03:24 -08:00
Michelle Casbon 0ad22a29e1 Merge remote-tracking branch 'upstream/master' into contributing 2018-03-02 11:50:58 -05:00
Michelle Casbon a855d666d8 Skeleton testing framework (#18)
* First stab at adding tests to this repo

* Add prow_config.yaml & remove test-infra dir

* Add .gitignore

* Add components.workflows.prow to params.libsonnet

Change ksonnet app name

* Add package names & EXTRA_REPOS, remove steps

* Put steps back

* Remove build step

* Remove cluster setup & teardown
2018-03-01 21:30:50 -08:00
Michelle Casbon 30cf675460
Merge pull request #22 from ankushagarwal/gitignore
Add a .gitignore
2018-03-01 14:12:47 -08:00
Michelle Casbon 4a9ebf931c
Merge pull request #19 from texasmichelle/third-party
Move example to third_party folder
2018-03-01 12:08:05 -08:00
Michelle Casbon 8c8ce2cc06 Move new file into renamed dir 2018-03-01 15:06:54 -05:00
Michelle Casbon adad73bad0 Merge remote-tracking branch 'upstream/master' into third-party 2018-03-01 15:05:54 -05:00
Jeremy Lewi dfce49ba35
Merge pull request #23 from ankushagarwal/issue-summarization-tfjob
Create a Dockerfile which builds the image used for this tutorial
Update Tutorial.ipynb
Add a section on downloading the training data
Remove the Feature Extraction section
Fix typo in setup_kubeflow_cluster.md
Add training_the_model.md

Related to #14
2018-03-01 11:53:14 -08:00
Jeremy Lewi 92c5f13a2c
Merge pull request #24 from jlewi/owners
Create an initial OWNERS file.
2018-03-01 11:48:27 -08:00
Ankush Agarwal ab2803254d Update gitignore 2018-03-01 11:17:28 -08:00
Ankush Agarwal 2d225ee123 Remove Dockerfile 2018-03-01 10:48:44 -08:00
Michelle Casbon bd4ac1b1c2 Move new files into renamed directory 2018-03-01 13:44:07 -05:00
Michelle Casbon 8e3ddb2eec Merge remote-tracking branch 'upstream/master' into third-party 2018-03-01 13:41:18 -05:00
Jeremy Lewi ab2940a7f7 Create an initial OWNERS file. 2018-02-28 19:31:58 -08:00
Ankush Agarwal fbf27c4ae0 Add .ipynb_checkpoints 2018-02-28 18:14:56 -08:00
Ankush Agarwal 84292d7b45 Update training_the_model.md 2018-02-28 18:09:37 -08:00
Ankush Agarwal 37782ea7d5 Fix typo 2018-02-28 14:36:34 -08:00