examples

Commit Graph

Author	SHA1	Message	Date
Jeremy Lewi	eaf0298590	Create a deployment to run the HP/Katib controller for the GitHub issue example. (#161 ) * Some of the code is copied over from https://github.com/kubeflow/katib/tree/master/examples/GKEDemo * I think it makes sense to centralize all the code in a single place. * Update the controller program (git-issue-summarize-demo.go) so that can specify the Docker image containing the training code. * Create a ksonnet deployment for running the controller on the cluster. * The HP tuning job isn't functional here's an incomplete list of issues * The training jobs launched fail because they don't have GCP credentials so they can't download the data. * We don't actually extract and report metrics back to Katib. Related to: kubeflow/katib#116	2018-07-11 08:46:25 -07:00
Michelle Casbon	836ad70421	Fix model file upload (#160 ) * Add component parameters Add model_url & port arguments to flask app Add service_type, image, and model_url parameters to ui component Fix problem argument in tensor2tensor component * Fix broken UI component Fix broken UI component structure by adding all, service, & deployment parts Add parameter defaults for tfjob to resolve failures deploying other components * Add missing imports in flask app Fix syntax error in argument parsing Remove underscores from parameter names to workaround ksonnet bug #554: https://github.com/ksonnet/ksonnet/issues/554 * Fix syntax errors in t2t instructions Add CPU image build arg to docker build command for t2t-training Fix link to ksonnet app dir Correct param names for tensor2tensor component Add missing params for tensor2tensor component Fix apply command syntax Swap out log view pod for t2t-master instead of tf-operator Fix link to training with tfjob * Fix model file upload Update default params for tfjob-v1alpha2 Fix build directory path in Makefile * Resolve lint issues Lines too long * Add specific image tag to tfjob-v1alpha2 default * Fix defaults for training output files Update image tag Add UI image tag * Revert service account secret details Update associated readme	2018-06-29 18:41:20 -07:00
Jeremy Lewi	98ed4b4a69	Fix v1alpha2 version of the T2T training job. (#158 ) * Update the Docker image for T2T to use a newer version of T2T library * Add parameters to set the GCP secret; we need GCP credentials to read from GCS even if reading a public bucket. We default to the parameters that are created automatically in the case of a GKE deployment. * Create a v1alpha2 template for the job that uses PVC.	2018-06-29 12:26:18 -07:00
Jeremy Lewi	93db7e369e	Update the GH summarization example to Kubeflow 0.2 and TFJob v1alpha2. (#157 ) * Update the GH summarization example to Kubeflow 0.2 and TFJob v1alpha2. * Upgrade the ksonnet app to Kubeflow 0.2 rc.1 * Add the examples package. * Add a .gitignore file and ignore all environments so that we won't pick up people's testing environments. * Add tfjob-v1alpha2 component; this trains the model using Keras using TFJob v1alpha2. * Update the parameters so that we use the GCP secrets created as part of the Kubeflow deployment. * Remove jlewi environment. * Verified that training ran successfully and outputted a model to GCS * There was an error about some missing arguments to a logging statement but this can be ignored although it would be good to fix. * Started working on T2T v1alpha2. Seems to be messing up the app. * Update the v1alpha2 template for the tensor2tensor job but it looks like there is an error 2018-06-29 17:45:23,369] Found unknown flag: --problem=github_issue_summarization_problem Traceback (most recent call last): File "/home/jovyan/.conda/bin/t2t-trainer", line 32, in <module> tf.app.run() File "/home/jovyan/.conda/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run _sys.exit(main(argv)) File "/home/jovyan/.conda/bin/t2t-trainer", line 28, in main t2t_trainer.main(argv) File "/home/jovyan/.conda/lib/python3.6/site-packages/tensor2tensor/bin/t2t_trainer.py", line 334, in main exp_fn = create_experiment_fn() File "/home/jovyan/.conda/lib/python3.6/site-packages/tensor2tensor/bin/t2t_trainer.py", line 158, in create_experiment_fn problem_name=get_problem_name(), File "/home/jovyan/.conda/lib/python3.6/site-packages/tensor2tensor/bin/t2t_trainer.py", line 115, in get_problem_name problems = FLAGS.problems.split("-") AttributeError: 'NoneType' object has no attribute 'split'	2018-06-29 11:31:23 -07:00
Michelle Casbon	11b75edfd9	Add component parameters (#155 ) * Add component parameters Add model_url & port arguments to flask app Add service_type, image, and model_url parameters to ui component Fix problem argument in tensor2tensor component * Fix broken UI component Fix broken UI component structure by adding all, service, & deployment parts Add parameter defaults for tfjob to resolve failures deploying other components * Add missing imports in flask app Fix syntax error in argument parsing Remove underscores from parameter names to workaround ksonnet bug #554: https://github.com/ksonnet/ksonnet/issues/554 * Fix syntax errors in t2t instructions Add CPU image build arg to docker build command for t2t-training Fix link to ksonnet app dir Correct param names for tensor2tensor component Add missing params for tensor2tensor component Fix apply command syntax Swap out log view pod for t2t-master instead of tf-operator Fix link to training with tfjob	2018-06-28 13:52:21 -07:00
Puneith Kaul	174d6602ac	Update README.md (#116 )	2018-05-20 17:43:48 -07:00
Jeremy Lewi	002119010f	Fix data-downloader; parameters are in the wrong order. (#115 ) * URL should be the first argument; data dir should be the second.	2018-05-17 11:16:51 -07:00
Carol Willing	0b303e70f1	Edit navigation and markdown for github example (#93 ) * edit TF example readme * prefix tutorial steps with a number for nicer display in repo * fix typo * edit steps 4 and 5 * edit docs * add navigation and formatting edits to example	2018-05-09 12:12:54 -07:00
Jeremy Lewi	79aa2074cd	Improvements to the tensor2tensor trainer for the GitHub summarization example. (#109 ) * Improvements to the tensor2tensor traininer for the GitHub summarization example. * Simplify the launcher; we can just pass through most command line arguments and not use environment variables and command line arguments. * This makes it easier to control the job just by setting the parameters in the template rather than having to rebuild the images. * Add a Makefile to build the image. * Replace the tensor2tensor jsonnet with a newer version of the jsonnet used with T2T. * Address reviewer comments. * Install pip packages as user Jovyan * Rely on implicit string conversion with concatenation in template file.	2018-04-29 20:39:16 -07:00
Jeremy Lewi	afdd4c544e	Add a component to run TensorBoard. (#110 ) * Add a component to run TensorBoard. * Autoformate jsonnet file. * * Set a default of "" for logDir; there's not a really good default location because it will depend on where the data is stored.	2018-04-29 20:34:16 -07:00
Jeremy Lewi	e12231bae3	Make it easier to demo serving and run in Katacoda (#107 ) * Make it easier to demo serving and run in Katacoda * Allow the model path to be specified via environment variables so that we could potentially load the model from PVC. * Continue to bake the model into the image so that we don't need to train in order to serve. * Parameterize download_data.sh so we could potentially fetch different sources. * Update the Makefile so that we can build and set the image for the serving component. * Fix lint. * Update the serving docs.	2018-04-28 08:11:18 -07:00
Ankush Agarwal	26d68ead6c	Replace kubeflow-images-staging with kubeflow-images-public (#99 ) Fixes https://github.com/kubeflow/kubeflow/issues/534	2018-04-27 11:46:20 -07:00
Jeremy Lewi	4b33d44af6	Support training using a PVC for the data. (#98 ) * Support training using a PVC for the data. * This will make it easier to run the example on Katacoda and non-GCP platforms. * Modify train.py so we can use a GCS location or local file paths. * Update the Dockerfile. The jupyter Docker images and had a bunch of dependencies removed and the latest images don't have the dependencies needed to run the examples. * Creat a tfjob-pvc component that trains reading/writing using PVC and not GCP. * * Address reviewer comments * Ignore changes to the ksonnet parameters when determining whether to include dirty and sha of the diff in the image. This way we can update the ksonnet app with the newly built image without it leading to subsequent images being marked dirty. * Fix lint issues. * Fix lint import issue.	2018-04-27 04:08:19 -07:00
Jeremy Lewi	34d6f8809d	Add a job to download the data to PVC. (#97 ) * This is the first step to doing training and serving using a PV as opposed to GCS. * This will make the sample easier to run anyhere and in particular on Katacoda. * This currently would work as follows User creates a PVC ks apply ${ENV} -c data-pvc User runs a K8s job to download the data to PVC ks apply ${ENV} -c data-downloader In subsequent PRs we will update the train and serve steps to load the model from the PVC as opposed to GCS. Related to #91	2018-04-25 10:36:02 -07:00
Michelle Casbon	1a4f4dc1ea	Remove vendor from .gitignore (#94 ) * Remove vendor from .gitignore * Tell pylint to ignore generated file	2018-04-24 15:25:01 -07:00
Michelle Casbon	fb2fb26f71	Add demo scripts & improvements to instructions (#84 ) * Add setup scripts & github token param * Clarify instructions Add pointers to resolution for common friction points of new cluster setup: GitHub rate limiting and RBAC permissions Setup persistent disk before Jupyterhub so that it is only setup once Clarify instructions about copying trained model files locally Add version number to frontend image build Add github_token ks parameter for frontend * Change port to 8080 Fix indentation of bullet points * Fix var name & link spacing * Update description of serving script * Use a single ksonnet environment Move ksonnet app out of notebooks subdirectory Rename ksonnet app to ks-kubeflow Update instructions & scripts Remove instructions to delete ksonnet app directory * Remove github access token	2018-04-23 16:23:59 -07:00
Ankush Agarwal	6cf382f597	Distributed training using tensor2tensor (#86 ) * Distributed training using tensor2tensor * Use a transformer model to train the github issue summarization problem * Dockerfile for building training image * ksonnet component for deploying tfjob Fixes https://github.com/kubeflow/examples/issues/43 * Fix lint issues	2018-04-19 17:43:59 -07:00
Lun-Kai Hsu	12b00f2921	send request to service directly (#85 )	2018-04-18 14:09:00 -07:00
Ankush Agarwal	42926a8e98	Fix IssueSummarization.py typo (#80 ) /cc @jlewi Fixes #78	2018-04-09 12:40:09 -07:00
Ankush Agarwal	d01d8435bf	Use ambassador to talk to the frontend ui (#71 ) * Create a ksonnet app component to deploy to k8s	2018-04-06 21:50:08 -07:00
Ankush Agarwal	9f6ccde03f	Polish the github issue summarization UI (#69 ) * Polish the github issue summarization UI * Add kubeflow footer	2018-04-06 21:45:08 -07:00
Ankush Agarwal	e3b826a5af	Rename issue_summarization.py to IssueSummarization.py (#68 ) * Rename issue_summarization.py to IssueSummarization.py * The module name is supposed to be the same as the class name * Fix the predict method signature * Fix lint	2018-04-06 21:40:08 -07:00
Michelle Casbon	063c9a55c8	Add namespace to ksonnet apply command (#57 ) * Add namespace to ksonnet apply command * Resolve lint issues in flask_web/app.py	2018-04-02 09:41:02 -07:00
Ankush Agarwal	b24152cf06	Github Issue Summarization - Train using TFJob (#55 ) * Github Issue Summarization - Train using TFJob * Create a Dockerfile to build the image for tf-job * Create a manifest to deploy the tf-job * Create instructions on how to do all of this Fixes https://github.com/kubeflow/examples/issues/43 * Address comments * Add gcloud commands * Add ks app * Update Dockerfile base image * Python train.py fixes * Remove tfjob.yaml as it is replaced by ksonnet app * Remove plot_model_history as it is not required for tfjob training * Don't change WORKDIR * Address reviewer comments * Fix links * Fix lint issues using yapf * Sort imports	2018-03-29 13:37:04 -07:00
Michelle Casbon	41372c9314	Add .pylintrc (#61 ) * Add .pylintrc * Resolve lint complaints in agents/trainer/task.py * Resolve lint complaints with flask app.py * Resolve linting issues Remove duplicate seq2seq_utils.py from workflow/workspace/src * Use python 3.5.2 with pylint to match prow Put pybullet import back into agents/trainer/task.py with a pylint ignore statement Use main(_) to ensure it works with tf.app.run	2018-03-29 08:25:02 -07:00
Michelle Casbon	1d6946ead8	[GitHub Issue Summarization] (very) simple front-end web app (#53 ) * Add barebones frontend Add instructions for querying the trained model via a simple frontend deployed locally. * Add instructions for running the ui in-cluster TODO: Resolve ksonnet namespace collisions for deployed-service prototype * Remove reference to running trained model locally	2018-03-21 15:22:04 -07:00
Hamel Husain	611e98ef1e	Update Training.ipynb (#52 ) Added Model Evaluation. Deleted Table of Contents because you need Jupyter Extension to update that, so not worth it.	2018-03-19 16:08:01 -07:00
Hamel Husain	2ec3b03ed4	Update seq2seq_utils.py (#51 ) Found a mistake with calculation of BLEU Score.	2018-03-18 12:25:58 -07:00
Ankush Agarwal	45255b52e3	Add instructions to deploy the seldon core model (#46 ) Update the issue summarization end to end tutorial to deploy the seldon core model to the k8s cluster Update the sample request and response Related to https://github.com/kubeflow/examples/issues/11	2018-03-15 14:31:23 -07:00
Michelle Casbon	c50cda05ee	Add file copy instructions after training (#47 ) * Add file copy instructions after training Fix broken link in cluster setup Fix broken env variable in Training notebook Change notebook name from Tutorial to Training * Fix app selector value	2018-03-14 19:14:21 -07:00
Michelle Casbon	8ec9bac09e	Add detail to cluster setup instructions (#44 ) * Fix folder link * Add detail to cluster setup instructions Add a link to the image for this example. In Tutorial.ipynb, move mounted directory into a variable to help avoid collisions on shared clusters.	2018-03-11 22:29:11 -07:00
Ankush Agarwal	d1a2adfb01	Move from a custom tornado server to a seldon-core server for serving the model (#36 ) * Create a end-to-end kubeflow example using seq2seq model (4/n) * Move from a custom tornado server to a seldon-core model Related to #11 * Update to use gcr.io registry for serving image	2018-03-09 14:36:12 -08:00
Pascal Vicaire	db358557dd	Example workflow for Github issue summarization. (#35 ) * Example workflow for Github issue summarization. * Fixing quotes in README.md * Fixing typo in README.md	2018-03-08 16:03:10 -08:00
Ankush Agarwal	ae774e9658	Link to issue	2018-03-07 15:34:30 -08:00
Ankush Agarwal	910a15d258	Add comment	2018-03-07 15:32:50 -08:00
Ankush Agarwal	5f741ed851	README update	2018-03-07 09:32:59 -08:00
Ankush Agarwal	ae6828cf3f	Create a end-to-end kubeflow example using seq2seq model (3/n) * Create a simple tornado server to serve the model * TODO: Create a docker image for the server and deploy on kubeflow Related to https://github.com/kubeflow/examples/issues/11	2018-03-07 09:27:38 -08:00
Michelle Casbon	8c8ce2cc06	Move new file into renamed dir	2018-03-01 15:06:54 -05:00
Michelle Casbon	adad73bad0	Merge remote-tracking branch 'upstream/master' into third-party	2018-03-01 15:05:54 -05:00
Michelle Casbon	bd4ac1b1c2	Move new files into renamed directory	2018-03-01 13:44:07 -05:00
Michelle Casbon	8e3ddb2eec	Merge remote-tracking branch 'upstream/master' into third-party	2018-03-01 13:41:18 -05:00
Michelle Casbon	76862c5141	Remove third_party folder & MIT license file	2018-02-27 13:17:42 -05:00

1 2

92 Commits