examples

Commit Graph

Author	SHA1	Message	Date
Jeremy Lewi	eaf0298590	Create a deployment to run the HP/Katib controller for the GitHub issue example. (#161 ) * Some of the code is copied over from https://github.com/kubeflow/katib/tree/master/examples/GKEDemo * I think it makes sense to centralize all the code in a single place. * Update the controller program (git-issue-summarize-demo.go) so that can specify the Docker image containing the training code. * Create a ksonnet deployment for running the controller on the cluster. * The HP tuning job isn't functional here's an incomplete list of issues * The training jobs launched fail because they don't have GCP credentials so they can't download the data. * We don't actually extract and report metrics back to Katib. Related to: kubeflow/katib#116	2018-07-11 08:46:25 -07:00
Sanyam Kapoor	d692db36e8	Search UI Components (#168 ) * Initialize search UI. Needs connection to search service * Fix page title * Add component for code search results, dummy values for now * Fix title and manifest * Add mock loading UI. Need to fill in real API results * Wrap application into Dockerfile	2018-07-10 20:08:25 -07:00
Sanyam Kapoor	c5f13464b4	Add negative sampling to Transformer network (#167 ) * Add negative sampling to Transformer network * Add generate data flag, can skip t2t-datagen step	2018-07-04 20:14:22 -07:00
Daniel Castellanos	b6a3c4c0ea	Added tutorial for object detection distributed training (#74 ) * Added tutorial for object detection distributed training Added steps on how to leverage kubeflow tooling to submit a distributed object detection training job in a small kubernetes cluster (minikube, 2-4 node cluster) * Added Jobs to prepare the training data and model * Updated instructions * fixed typos and added export tf graph job * Fixed paths in jobs and instructions * Enhanced instructions and re-arranged folder structure * Updated links to kubeflow user guide documentation	2018-07-03 14:10:20 -07:00
Sanyam Kapoor	5a9748bf8f	Add similarity transformer body (#159 ) * Add similarity transformer body * Update pipeline to Write a single CSV file * Fix lint errors * Use CSV writer to handle formatting rows * Use direct transformer encoding methods with variable scopes * Complete end-to-end training with new model and problem * Read from mutliple csv files	2018-07-03 11:14:19 -07:00
Michelle Casbon	836ad70421	Fix model file upload (#160 ) * Add component parameters Add model_url & port arguments to flask app Add service_type, image, and model_url parameters to ui component Fix problem argument in tensor2tensor component * Fix broken UI component Fix broken UI component structure by adding all, service, & deployment parts Add parameter defaults for tfjob to resolve failures deploying other components * Add missing imports in flask app Fix syntax error in argument parsing Remove underscores from parameter names to workaround ksonnet bug #554: https://github.com/ksonnet/ksonnet/issues/554 * Fix syntax errors in t2t instructions Add CPU image build arg to docker build command for t2t-training Fix link to ksonnet app dir Correct param names for tensor2tensor component Add missing params for tensor2tensor component Fix apply command syntax Swap out log view pod for t2t-master instead of tf-operator Fix link to training with tfjob * Fix model file upload Update default params for tfjob-v1alpha2 Fix build directory path in Makefile * Resolve lint issues Lines too long * Add specific image tag to tfjob-v1alpha2 default * Fix defaults for training output files Update image tag Add UI image tag * Revert service account secret details Update associated readme	2018-06-29 18:41:20 -07:00
Jeremy Lewi	98ed4b4a69	Fix v1alpha2 version of the T2T training job. (#158 ) * Update the Docker image for T2T to use a newer version of T2T library * Add parameters to set the GCP secret; we need GCP credentials to read from GCS even if reading a public bucket. We default to the parameters that are created automatically in the case of a GKE deployment. * Create a v1alpha2 template for the job that uses PVC.	2018-06-29 12:26:18 -07:00
Jeremy Lewi	93db7e369e	Update the GH summarization example to Kubeflow 0.2 and TFJob v1alpha2. (#157 ) * Update the GH summarization example to Kubeflow 0.2 and TFJob v1alpha2. * Upgrade the ksonnet app to Kubeflow 0.2 rc.1 * Add the examples package. * Add a .gitignore file and ignore all environments so that we won't pick up people's testing environments. * Add tfjob-v1alpha2 component; this trains the model using Keras using TFJob v1alpha2. * Update the parameters so that we use the GCP secrets created as part of the Kubeflow deployment. * Remove jlewi environment. * Verified that training ran successfully and outputted a model to GCS * There was an error about some missing arguments to a logging statement but this can be ignored although it would be good to fix. * Started working on T2T v1alpha2. Seems to be messing up the app. * Update the v1alpha2 template for the tensor2tensor job but it looks like there is an error 2018-06-29 17:45:23,369] Found unknown flag: --problem=github_issue_summarization_problem Traceback (most recent call last): File "/home/jovyan/.conda/bin/t2t-trainer", line 32, in <module> tf.app.run() File "/home/jovyan/.conda/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run _sys.exit(main(argv)) File "/home/jovyan/.conda/bin/t2t-trainer", line 28, in main t2t_trainer.main(argv) File "/home/jovyan/.conda/lib/python3.6/site-packages/tensor2tensor/bin/t2t_trainer.py", line 334, in main exp_fn = create_experiment_fn() File "/home/jovyan/.conda/lib/python3.6/site-packages/tensor2tensor/bin/t2t_trainer.py", line 158, in create_experiment_fn problem_name=get_problem_name(), File "/home/jovyan/.conda/lib/python3.6/site-packages/tensor2tensor/bin/t2t_trainer.py", line 115, in get_problem_name problems = FLAGS.problems.split("-") AttributeError: 'NoneType' object has no attribute 'split'	2018-06-29 11:31:23 -07:00
Sanyam Kapoor	c1b2802313	Add new TF-Serving component with sample task (#152 ) * Add new TF-Serving component with sample task * Unify nmslib and t2t packages, need to be cohesive * [WIP] update references to the package * Replace old T2T problem * Add representative code for encoding/decoding from tf serving service * Add rest API port to TF serving (replaces custom http proxy) * Fix linting * Add NMSLib creator and server components * Add docs to CLI module	2018-06-28 20:37:21 -07:00
Michelle Casbon	11b75edfd9	Add component parameters (#155 ) * Add component parameters Add model_url & port arguments to flask app Add service_type, image, and model_url parameters to ui component Fix problem argument in tensor2tensor component * Fix broken UI component Fix broken UI component structure by adding all, service, & deployment parts Add parameter defaults for tfjob to resolve failures deploying other components * Add missing imports in flask app Fix syntax error in argument parsing Remove underscores from parameter names to workaround ksonnet bug #554: https://github.com/ksonnet/ksonnet/issues/554 * Fix syntax errors in t2t instructions Add CPU image build arg to docker build command for t2t-training Fix link to ksonnet app dir Correct param names for tensor2tensor component Add missing params for tensor2tensor component Fix apply command syntax Swap out log view pod for t2t-master instead of tf-operator Fix link to training with tfjob	2018-06-28 13:52:21 -07:00
Sanyam Kapoor	f20161167e	Add a new similarity transformer model, register new problem (#146 ) * Add a new similarity transformer model, register new problem * Remove useless constructor	2018-06-27 11:00:18 -07:00
Sanyam Kapoor	656e1e3e7c	Extension of T2T Ksonnet component (#149 ) * Add jobs derived from t2t component, GCP credentials assumed * Add script to create IAM role bindings for Docker container to use * Fix names to hyphens * Add t2t-exporter wrapper * Fix typos * A temporary workaround for tensorflow/tensor2tensor#879 * Complete working pipeline of datagen, trainer and exporter * Add docstring to create_secrets.sh	2018-06-25 15:09:22 -07:00
Sanyam Kapoor	21506ffc51	Python package for indexing and serving the index (#150 ) * Add a utility python package for indexing and serving the index * Add CLI arguments, conditional GCS download * Complete skeleton CLIs for serving and index creation * Fix lint issues	2018-06-20 15:34:05 -07:00
Sanyam Kapoor	4bd30a1e68	Language task on kubeflow (#143 ) * [WIP] initialize ksonnet app * Push images to GCR * Upgrade Docker container to run T2T entrypoint with appropriate env vars * Add a tf-job based t2t-job * Fix GPU parameters	2018-06-15 18:16:34 -07:00
Sanyam Kapoor	242c2e6d20	Add custom metrics, write raw tokens to GCS (#141 ) * Add custom metrics, write raw tokens to GCS * Change number of output file shards to 1	2018-06-13 12:03:27 -07:00
Maerville	ce2f1db11e	Fixed distributed training for LINEAR model (#130 ) * Fixed distributed training for LINEAR model * Make line shorter & remove pylint disable unused argument	2018-06-13 11:57:28 -07:00
Sanyam Kapoor	3bff3339f7	Isolate t2t execution into docker (#131 ) * Isolate t2t execution into a docker * Add image build script, update run interface * Fix grammar typo	2018-06-12 12:53:29 -07:00
Sanyam Kapoor	d3c781772c	Language modeling using Transformer Networks (#129 ) * Add Github language modeling problem * Rename folders, update README with datagen and train scripts * Fix linting	2018-06-07 06:31:22 -07:00
Sanyam Kapoor	f4c8b7f80d	Add error handling to Dataflow (#128 ) * Add error handling to dataflow * Fix lint issues * Update pipeline with error handling on tokenization and info splitting	2018-06-06 21:46:24 -07:00
Christopher Beitel	05f90b8ebd	Tune application-focused description (#124 )	2018-06-06 19:56:24 -07:00
Sanyam Kapoor	6220907044	New tensor2tensor problem datagen for function summarization (#127 ) * New tensor2tensor problem for function summarization * Consolidate README with improved docs * Remove old readme * Add T2T Trainer using Transformer Networks * Fix missing requirement for t2t-trainer	2018-06-06 00:38:58 -07:00
Sanyam Kapoor	17dd02b803	Add num workers options to Dataflow (#125 )	2018-06-05 17:05:56 -07:00
Michelle Casbon	bb7451c0ba	Proposed repo strategy (#117 ) * Proposed repo strategy Define and describe example types (end-to-end, component-based, third-party hosted) Define requirements for housing examples Update list of ideas for additional examples * Add get involved * Move descriptions into CONTRIBUTING Add application-specific category Add clarifying details	2018-06-05 09:57:57 -07:00
Sanyam Kapoor	e26a290f0f	Fix utf-8 encoding issues (#122 )	2018-06-01 10:35:56 -07:00
Sanyam Kapoor	26ff66d747	Semantic Code Search Example Data Ingestion (#120 ) * Code Search Preprocessing Pipeline * Add missing pipeline execution to git tree * Move the preprocessing step into its own package * Add docstrings * Fix pylint errors	2018-05-31 15:28:56 -07:00
Puneith Kaul	174d6602ac	Update README.md (#116 )	2018-05-20 17:43:48 -07:00
Jeremy Lewi	002119010f	Fix data-downloader; parameters are in the wrong order. (#115 ) * URL should be the first argument; data dir should be the second.	2018-05-17 11:16:51 -07:00
Julien Stroheker	2d335b1302	minst example - Add Azure instructions / update Argo package (#114 ) * Adding Azure instructions * add auth	2018-05-14 09:33:25 -07:00
Carol Willing	0b303e70f1	Edit navigation and markdown for github example (#93 ) * edit TF example readme * prefix tutorial steps with a number for nicer display in repo * fix typo * edit steps 4 and 5 * edit docs * add navigation and formatting edits to example	2018-05-09 12:12:54 -07:00
Elson Rodriguez	7434bb55ba	Updating mnist example to fix minio compatibility (#108 ) * Updating mnist example to fix minio compatibility * Changing default sa user for ksonnet entrypoint * Updating mnist example based on pr feedback.	2018-04-30 16:14:18 -07:00
Jeremy Lewi	79aa2074cd	Improvements to the tensor2tensor trainer for the GitHub summarization example. (#109 ) * Improvements to the tensor2tensor traininer for the GitHub summarization example. * Simplify the launcher; we can just pass through most command line arguments and not use environment variables and command line arguments. * This makes it easier to control the job just by setting the parameters in the template rather than having to rebuild the images. * Add a Makefile to build the image. * Replace the tensor2tensor jsonnet with a newer version of the jsonnet used with T2T. * Address reviewer comments. * Install pip packages as user Jovyan * Rely on implicit string conversion with concatenation in template file.	2018-04-29 20:39:16 -07:00
Jeremy Lewi	afdd4c544e	Add a component to run TensorBoard. (#110 ) * Add a component to run TensorBoard. * Autoformate jsonnet file. * * Set a default of "" for logDir; there's not a really good default location because it will depend on where the data is stored.	2018-04-29 20:34:16 -07:00
Jeremy Lewi	e12231bae3	Make it easier to demo serving and run in Katacoda (#107 ) * Make it easier to demo serving and run in Katacoda * Allow the model path to be specified via environment variables so that we could potentially load the model from PVC. * Continue to bake the model into the image so that we don't need to train in order to serve. * Parameterize download_data.sh so we could potentially fetch different sources. * Update the Makefile so that we can build and set the image for the serving component. * Fix lint. * Update the serving docs.	2018-04-28 08:11:18 -07:00
Ankush Agarwal	26d68ead6c	Replace kubeflow-images-staging with kubeflow-images-public (#99 ) Fixes https://github.com/kubeflow/kubeflow/issues/534	2018-04-27 11:46:20 -07:00
Jeremy Lewi	4b33d44af6	Support training using a PVC for the data. (#98 ) * Support training using a PVC for the data. * This will make it easier to run the example on Katacoda and non-GCP platforms. * Modify train.py so we can use a GCS location or local file paths. * Update the Dockerfile. The jupyter Docker images and had a bunch of dependencies removed and the latest images don't have the dependencies needed to run the examples. * Creat a tfjob-pvc component that trains reading/writing using PVC and not GCP. * * Address reviewer comments * Ignore changes to the ksonnet parameters when determining whether to include dirty and sha of the diff in the image. This way we can update the ksonnet app with the newly built image without it leading to subsequent images being marked dirty. * Fix lint issues. * Fix lint import issue.	2018-04-27 04:08:19 -07:00
Jeremy Lewi	34d6f8809d	Add a job to download the data to PVC. (#97 ) * This is the first step to doing training and serving using a PV as opposed to GCS. * This will make the sample easier to run anyhere and in particular on Katacoda. * This currently would work as follows User creates a PVC ks apply ${ENV} -c data-pvc User runs a K8s job to download the data to PVC ks apply ${ENV} -c data-downloader In subsequent PRs we will update the train and serve steps to load the model from the PVC as opposed to GCS. Related to #91	2018-04-25 10:36:02 -07:00
Michelle Casbon	1a4f4dc1ea	Remove vendor from .gitignore (#94 ) * Remove vendor from .gitignore * Tell pylint to ignore generated file	2018-04-24 15:25:01 -07:00
Ankush Agarwal	a5d808cc88	Fix failing test due to https://github.com/kubeflow/testing/pull/111 (#95 )	2018-04-24 12:11:00 -07:00
Michelle Casbon	fb2fb26f71	Add demo scripts & improvements to instructions (#84 ) * Add setup scripts & github token param * Clarify instructions Add pointers to resolution for common friction points of new cluster setup: GitHub rate limiting and RBAC permissions Setup persistent disk before Jupyterhub so that it is only setup once Clarify instructions about copying trained model files locally Add version number to frontend image build Add github_token ks parameter for frontend * Change port to 8080 Fix indentation of bullet points * Fix var name & link spacing * Update description of serving script * Use a single ksonnet environment Move ksonnet app out of notebooks subdirectory Rename ksonnet app to ks-kubeflow Update instructions & scripts Remove instructions to delete ksonnet app directory * Remove github access token	2018-04-23 16:23:59 -07:00
Ankush Agarwal	6cf382f597	Distributed training using tensor2tensor (#86 ) * Distributed training using tensor2tensor * Use a transformer model to train the github issue summarization problem * Dockerfile for building training image * ksonnet component for deploying tfjob Fixes https://github.com/kubeflow/examples/issues/43 * Fix lint issues	2018-04-19 17:43:59 -07:00
Lun-Kai Hsu	12b00f2921	send request to service directly (#85 )	2018-04-18 14:09:00 -07:00
Elson Rodriguez	ed60dc5972	Removing uneeded requirements, this was causing pip errors. (#83 )	2018-04-16 08:58:59 -07:00
Ankush Agarwal	42926a8e98	Fix IssueSummarization.py typo (#80 ) /cc @jlewi Fixes #78	2018-04-09 12:40:09 -07:00
Ankush Agarwal	d01d8435bf	Use ambassador to talk to the frontend ui (#71 ) * Create a ksonnet app component to deploy to k8s	2018-04-06 21:50:08 -07:00
Ankush Agarwal	9f6ccde03f	Polish the github issue summarization UI (#69 ) * Polish the github issue summarization UI * Add kubeflow footer	2018-04-06 21:45:08 -07:00
Ankush Agarwal	e3b826a5af	Rename issue_summarization.py to IssueSummarization.py (#68 ) * Rename issue_summarization.py to IssueSummarization.py * The module name is supposed to be the same as the class name * Fix the predict method signature * Fix lint	2018-04-06 21:40:08 -07:00
Christopher Beitel	a4576e48f1	Restore runnability of example; vendor agnostic storage (#72 ) * Updates to the demo docs (notebook and readme) which were out-dated in multiple places * Removed unused tools/ dir * Update main readme to reference the example * Inclusion of kubeflow vendor/ tf-job code * Illustrates how logging and rendering to an attached volume can simplify the process of viewing logs with TensorHub and exploring render outputs. * Storage in user-space allows it to be sym-linked into directory tree watched by TensorHub extension (which is running tensorboard --logdir=/home/jovyan) * I anticipate this current approach to controlling volume mounts for NFS through ksonnet to be replaced by doing so with python as I demonstrated in the enhance example so I wouldn't lose sleep over the ksonnet prototypes in this commit.	2018-04-06 21:37:08 -07:00
Elson Rodriguez	1be7ccb142	Fixes #2 : End to end model training/serving example using S3, Argo, and Kubeflow (#42 ) * Add awscli tools container. * Add initial readme. * Add argo skeleton. * Run a an argo job. * Artifact support and argo test * Use built container (#3) * Fix artifacts and secrets * Add work in progress tfflow (#14) * Add kvc deployment to workflow. * Switch aws repo. * wip. * Add working tfflow job. * Add sidecar that waits for MASTER completion * Pass in job-name * Add volumemanager info step * Add input parameters to step * Adds nodeaffinity and hostpath * Add fixes for workflow (#17) - Use correct images for worker and ps - Use correct aws keys - Change volumemanager to mnist - Comment unused steps - Fix volume mount to correct containers * Fix hostpath for tfjob * Download all mnist files * added GCS stored artifacts comptability to Argo * Add initial inference workflow. (#30) * Initial serving step (#31) * Adds fixes to initial serving step * Ready for rough demo: Workflow in working state * Move conflicting readme. * Initial commit, everything boots without crashing. * Working, with some python errors. * Adding explicit flags * Working with ins-outs * Letting training job exit on success * Adding documentation skeletion * trying to properly save model * Almost working * Working * Adding export script, refactored to allow model more reusability * Starting documentation * little further on docs * More doc updates, fixing sleep logic * adding urls for mnist data * Removing download logic, it's to tied in with build-in tf examples. * Added argo workflow instructions, minor cleanups. * Adding mnist client. * Fixing typos * Adding instructions for installing components. * Added ksonnet container * Adding new entrypoint. * Added helm install instructions for kvc * doing things with variables * Typos. * Added better namespace support * S3 refactor. * Added missing region variables. * Adding tensorboard support. * Addding Container for Tensorboard. * Added temporary flag, added install instructions for CLI. * Removing invalid ksonnet environment. * Updating readme * Cleanup currently unused pieces * Add missint cluster-role * Minor cleanup. * Adding more parameters. * added changes to allow model to train on multiple workers and fixed some doc typos * Adding flag to enable/disable model serving. Adding s3 urls as outputs for future querying, renaming info step. * Adding seperate deployer workflow. * Split serving working. * Adding split workflow. * More parameters. * updates as to elson comments * Revert "added changes to allow model to train on multiple workers and fixed s…" * Initial working pure-s3 workflow. * Removed wait sidecars. * Remove unused flag. * Added part two, minor doc fixes * Inverted links... * Adding diff. * Fix url syntax * Documentation updates. * Added AWS Cli * Parameterized export. * Fixing image in s3 version. * Fixed documentation issues. * KVC snippet changes, need to find last working helm chart. * Temporarily pinning kvc version. * working master model and some doc typos fixes (#13) * added changes to allow model to train on multiple workers and fixed some doc typos * Adding flag to enable/disable model serving. Adding s3 urls as outputs for future querying, renaming info step. * Adding seperate deployer workflow. * Split serving working. * Adding split workflow. * More parameters. * updates as to elson comments * working master model and some doc typos * fixes as to Elson * Removign whitespace differences * updating diff * Changing parameters. * Undoing whitespace. * Changing termination policy on s3 version due to unknown issue. * Updating mnist diff. * Changing train steps. * Syncing Demo changes. * Update README.md * Going S3-native for initial example. Getting rid of Master. * Minor documentation tweaks, adding params, swapping aws cli for minio. * Updating KVC version. * Switching ksonnet repo, removing model name from client. * Updating git url. * Adding certificate hack to avoid RBAC errors. * Pinning KVC to commit while working on PR. * Updating version. * Updates README with additional details (#14) * Updates README with additional details * Adding clarity to kubectl config commands * Fixed comma placement * Refactoring notes for github and kubernetes credentials. * Forgot to add an overview of the argo template. * Updating example based on feedback. - Removed superflous images - Clarified use of KVC - Added unaltered model - Variable cleanup * Refactored grpc image into generic base image. * minor cleanup of resubmitting section. * Switching Argo deployment to ksonnet, conslidating install instructions. * Removing old cruft, clarifying cluster requirements. * [WIP] Switching out model (#15) * Switching to new mnist example. * Parameterized model, testing export. * Got CNN model exporting. * Attempting to do distributed training with Estimator, removed seperate export. * Adding master back, otherwise Estimator complains about not having a chief. * Switching to tf.estimator.train_and_evaluate. * Minor path/var name refactor. * Adding test data and new client. * Fixed documentation to reflect new client. * Getting rid of tf job shim. * Removing KVC from example, renaming directory * Modifying parent README * Removed reference to export. * Adding reference to export. * Removing unused Dockerfile. * Removing uneeded files, simplifying how to get status, refactor model serving workflow step. * Renaming directory * Minor doc improvements, removed extra clis. * Making SSL configurable for clusters without secured s3 endpoints. * Added a tf-user account for workflow. Fixed serving bug. * Updating gke version. * Re-ran through instructions, fixed errata. * Fixing lint issues * Pylint errors * Pylint errors * Adding parenthesis back. * pylint Hacks * Disabling argument filter, model bombs without empty arg. * Removing unneeded lambdas	2018-04-06 14:34:09 -07:00
Michelle Casbon	063c9a55c8	Add namespace to ksonnet apply command (#57 ) * Add namespace to ksonnet apply command * Resolve lint issues in flask_web/app.py	2018-04-02 09:41:02 -07:00
Ankush Agarwal	1c72cf942f	Move from mlkube-testing to kubeflow-ci for test-infra (#65 ) Fixes https://github.com/kubeflow/examples/issues/63	2018-03-29 15:25:03 -07:00

1 2 3 4 5

222 Commits All Branches Search

222 Commits

All Branches