examples

Commit Graph

Author	SHA1	Message	Date
Jeremy Lewi	2487194fbd	Modify K8s models to export the models; tensorboard manifests (#320 ) * Modify K8s models to export the models; tensorboard manifests * Use a K8s job not a TFJob to export the model. * Start an experiments.libsonnet file to define groups of parameters for different experiments that should be reused * Need to install tensorflow_hub in the Docker image because it is required by t2t exporter. * * Address review comments.	2018-11-11 19:09:42 -08:00
Yang Pan	c6ff5dbef8	Change dataflow default workdir to /src (#330 ) Otherwise when I want to execute dataflow code ``` python2 -m code_search.dataflow.cli.create_function_embeddings \ ``` it complains no setup.py I could workaround by using workingdir container API but setting it to default would be more convenient.	2018-11-11 15:37:59 -08:00
Jeremy Lewi	65e89a599b	code search example make distributed training work; Create some components to train models (#317 ) * Make distributed training work; Create some components to train models * Check in a ksonnet component to train a model using the tinyparam hyperparameter set. * We want to check in the ksonnet component to facilitate reproducibility. We need a better way to separate the particular experiments used for the CS search demo effort from the jobs we want customers to try. Related to #239 train a high quality model. * Check in the cs_demo ks environment; this was being ignored as a result of .gitignore Make distributed training work #208 * We got distributed synchronous training to work with TensorTensor 1.10 * This required creating a simple python script to start the TF standard server and run it as a sidecar of the chief pod and as the main container for the workers/ps. * Rename the model to kf_similarity_transformer to be consistent with other code. * We don't want to use the default name because we don't want to inadvertently use the SimilarityTransformer model defined in the Tensor2Tensor project. * replace build.sh by a Makefile. Makes it easier to add variant commands * Use the GitHash not a random id as the tag. * Add a label to the docker image to indicate the git version. * Put the Makefile at the top of the code_search tree; makes it easier to pull all the different sources for the Docker images. * Add an option to build the Docker iamges with GCB; this is more efficient when you are on a poor network connection because you don't have to download images locally. * Use jsonnet to define and parameterize the GCB workflow. * Build separate docker images for running Dataflow and for running the trainer. This helps avoid versioning conflicts caused by different versions of protobuf pulled in by the TF version used as the base image vs. the version used with apache beam. Fix #310 - Training fails with GPUs. * Changes to support distributed training. * Simplify t2t-entrypoint.sh so that all we do is parse TF_CONFIG and pass requisite config information as command line arguments; everything else can be set in the K8s spec. * Upgrade to T2T 1.10. * * Add ksonnet prototypes for tensorboard.	2018-11-08 16:13:01 -08:00
Jeremy Lewi	1043bc0c26	A bunch of changes to support distributed training using tf.estimator (#265 ) * Unify the code for training with Keras and TF.Estimator Create a single train.py and trainer.py which uses Keras inside TensorFlow Provide options to either train with Keras or TF.TensorFlow The code to train with TF.estimator doesn't worki See #196 The original PR (#203) worked around a blocking issue with Keras and TF.Estimator by commenting certain layers in the model architecture leading to a model that wouldn't generate meaningful predictions We weren't able to get TF.Estimator working but this PR should make it easier to troubleshoot further We've unified the existing code so that we don't duplicate the code just to train with TF.estimator We've added unitttests that can be used to verify training with TF.estimator works. This test can also be used to reproduce the current errors with TF.estimator. Add a Makefile to build the Docker image Add a NFS PVC to our Kubeflow demo deployment. Create a tfjob-estimator component in our ksonnet component. changes to distributed/train.py as part of merging with notebooks/train.py * Add command line arguments to specify paths rather than hard coding them. * Remove the code at the start of train.py to wait until the input data becomes available. * I think the original intent was to allow the TFJob to be started simultaneously with the preprocessing job and just block until the data is available * That should be unnecessary since we can just run the preprocessing job as a separate job. Fix notebooks/train.py (#186) The code wasn't actually calling Model Fit Add a unittest to verify we can invoke fit and evaluate without throwing exceptions. * Address comments.	2018-11-07 16:23:59 -08:00
Jeremy Lewi	d01b76b6f9	Update ksonnet for datagen (#309 ) * Update the datagen component. * We should use a K8s job rather than a TFJob. We can also simplify the ksonnet by just putting the spec into the jsonnet file rather than trying to share various bits of the spec with the TFJob for training. Related to kubeflow/examples#308 use globals to allow parameters to be shared across components (e.g. working directory.) * Update the README with information about data. * Fix table markdown.	2018-11-07 14:28:16 -08:00
Yang Pan	11879e2ff1	wait on create function embedding (#311 )	2018-11-06 14:37:11 -08:00
Jeremy Lewi	df278567f0	Fix performance of dataflow preprocessing job. (#302 ) * Fix performance of dataflow preprocessing job. * Fix #300; Dataflow job for preprocessing is really slow. * The problem is we are loading the spacy tokenization model on every invocation of the tokenization function and this is really expensive. * We should be doing this once per module import. * After fixing this issue; the job completed in approximately 20 minutes using 5 workers. * We can process all 1.3 million records in ~ 20 minutes (elapsed time) using 5 32 CPU workers and about 1 hour of CPU time altogether. * Add options to the Dataflow job to read from files as opposed to BigQuery and to skip BigQuery writes. This is useful for testing. * Add a "unittest" that verifies the Dataflow preprocessing job can run successfully using the DirectRunner. * Update the Docker image and a ksonnet component for a K8s job that can be used to submit the Dataflow job. * Fix #299; Add logging to the Dataflow preprocessing job to indicate that a Dataflow job was submitted. * Add an option to the preprocessing Dataflow job to read an entire BigQuery table as the input rather than running a query to get the input. This is useful in the case where the user wants to run a different query to select the repo paths and contents to process and write them to some table to be processed by the Dataflow job. * Fix lint. * More lint fixes.	2018-11-06 14:14:28 -08:00
Yang Pan	aa0061dae2	update instruction with proper namespace (#307 )	2018-11-05 20:47:46 -08:00
Yang Pan	1f82dc41cd	[code search] add flag to wait till code search job finish (#306 ) * add flag to wait till job finish * wait till -> wait until	2018-11-05 19:04:20 -08:00
Jeremy Lewi	f87dfd8e53	Create a demo cluster for the code search example. (#298 )	2018-11-05 06:07:52 -08:00
Jeremy Lewi	acd8007717	Use conditionals and add test for code search (#291 ) * Fix model export, loss function, and add some manual tests. Fix Model export to support computing code embeddings: Fix #260 * The previous exported model was always using the embeddings trained for the search query. * But we need to be able to compute embedding vectors for both the query and code. * To support this we add a new input feature "embed_code" and conditional ops. The exported model uses the value of the embed_code feature to determine whether to treat the inputs as a query string or code and computes the embeddings appropriately. * Originally based on #233 by @activatedgeek Loss function improvements * See #259 for a long discussion about different loss functions. * @activatedgeek was experimenting with different loss functions in #233 and this pulls in some of those changes. Add manual tests * Related to #258 * We add a smoke test for T2T steps so we can catch bugs in the code. * We also add a smoke test for serving the model with TFServing. * We add a sanity check to ensure we get different values for the same input based on which embeddings we are computing. Change Problem/Model name * Register the problem github_function_docstring with a different name to distinguish it from the version inside the Tensor2Tensor library. * * Skip the test when running under prow because its a manual test. * Fix some lint errors. * * Fix lint and skip tests. * Fix lint. * * Fix lint * Revert loss function changes; we can do that in a follow on PR. * * Run generate_data as part of the test rather than reusing a cached vocab and processed input file. * Modify SimilarityTransformer so we can overwrite the number of shards used easily to facilitate testing. * Comment out py-test for now.	2018-11-02 09:52:11 -07:00
Jeremy Lewi	07483c2dff	Remove inactive reviewers/approvers. (#296 ) https://devstats.kubeflow.org/d/46/user-reviews-repository-groups?orgId=1&var-period=d7&var-repo_name=All&var-repo=all&var-reviewers=DjangoPeng&var-reviewers=nkashy1 This will help blunderbuss assign better reviewers.	2018-11-02 08:34:20 -07:00
Karthik Ramasamy	847ecb414e	Delete readme (#294 )	2018-11-01 19:41:55 -07:00
Karthik Ramasamy	04f4c0767d	Create OWNERS file (#289 ) Adding my username to the owners file	2018-10-31 12:38:57 -07:00
Yu-Han Liu	266316bfd5	add pipelines/components (#285 )	2018-10-30 13:27:02 -07:00
Michelle Casbon	dde7d3ee8e	Upgrade demo to KF v0.3.1 (#278 ) * Upgrade demo to KF v0.3.1 Update env variable names and values in base file Cleanup ambassador metadata for UI component Add kfctl installation instructions Tighten minikube setup instructions and update k8s version Move environment variable setup to very beginning Replace cluster creation commands with links to the appropriate section in demo_setup/README.md Replace deploy.sh with kfctl Replace kubeflow-core component with individual components Remove connection to UI pod directly & connect via ambassador instead Add cleanup commands * Clarify wording * Update parameter file Resolve python error with file open Consolidate kubeflow install command	2018-10-26 12:58:00 -07:00
Konstantinos Samaras-Tsakiris	5c38c96fae	Fix #272 (#273 ) * Fix #272 Fix #272 where the `create-pet-record-job` pod produces this error: `models/research/object_detection/data/pet_label_map.pbtxt; No such file or directory` * Update create-pet-record-job.jsonnet	2018-10-22 14:57:24 -07:00
Konstantinos Samaras-Tsakiris	6edf7915f5	Fix #275 (#276 ) Fix #275 by changing the default mount path for the training data.	2018-10-22 12:14:13 -07:00
Konstantinos Samaras-Tsakiris	b0f9b4cfd0	Fix bash (#271 ) Remove spaces around a bash variable declaration.	2018-10-22 12:02:04 -07:00
Svendegroote91	bc0380dda6	minor fixes for instructions (#267 )	2018-10-15 10:02:17 -07:00
Jeremy Lewi	90044d24c4	Remove v1alpah1 TFJobs from the GH issue summarization example. (#264 ) * We should be using v1alpha2 exclusively now.	2018-10-15 09:52:01 -07:00
Jeremy Lewi	4ea761630d	Fix gh-demo.kubeflow.org and make it easy to setup. (#261 ) * Fix gh-demo.kubeflow.org and make it easy to setup. * Our public demo of the GitHub issue summarization example (gh-demo.kubeflow.org) is down. It was running in one of our dev clusters and with the the churn in dev clusters it ended up getting deleted. * To make it more stable lets move it to project kubecon-gh-demo-1 and create a separate cluster for running it. This cluster can also serve as a readily available Kubeflow cluster setup for giving demos. * Create the directory demo within the github_issue_summarization example to contain all the required files. * Add a makefile to make building the image work. * The ksonnet app for the public demo was previously stored here https://github.com/kubeflow/testing/tree/master/deployment/ks-app * Fix the uiservice account. * Address comments.	2018-10-15 08:36:11 -07:00
Svendegroote91	d3e1731d7f	add financial time series example (#252 ) * add financial time series example * fix ReadMe comments * fix PyLint remarks * clean up based on PR remarks * Completing docstrings and fixing PR remarks	2018-10-12 08:04:07 -07:00
Jeremy Lewi	adf614fc5f	Add tensorboard and check in vendor for the code search example. (#255 ) * Add tensorboard and check in vendor for the code search example. * * Remove the default env; when I ran ks show I got errors but removing it and adding a fresh env worked. It also won't point to the correct cluster for users.	2018-10-04 10:18:58 -07:00
Ankush Agarwal	2064b43def	Ankush Signing Out (#253 )	2018-09-28 16:17:20 -07:00
Michelle Casbon	5c2d8aefc2	Remove reviewers who are already approvers (#247 ) * Remove reviewers who are already approvers Remove ScorpioCPH and zjj2wry due to inactivity (no PRs or comments on PRs). * Add zjj2wry back on request	2018-09-24 17:25:32 -07:00
Akado2009	5329bfa59b	docs updated (#240 )	2018-09-24 15:07:27 -07:00
Michelle Casbon	42592fed4a	Update demo script & add notebook (#248 ) * Update demo script Update demo script to include deploy script and notebook created by @drscott173 Simplify by removing unnecessary commands Use default namespace instead of kubeflow * Add yelp notebook readme * Add cluster creation commands Add instructions for highlighting changes resulting from each command	2018-09-11 11:17:02 -07:00
Inki Hwang	8e30631c54	example mnist upgrade to v1alpha2 (#246 ) * example mnist upgrade to v1alpha2 * Remove cleanPodPolicy * Fix kubeflow branch to v0.2.4	2018-09-09 13:01:21 -07:00
Michelle Casbon	d878462bc5	Upgrade demo to use latest versions of kubeflow, tfjob, ksonnet, & gke (#242 ) * Upgrade ks dir to 0.12.0 * Upgrade kubeflow to v0.2.0-rc.1 Use https://github.com/kubeflow/kubeflow/blob/master/scripts/upgrade_ks_app.py to upgrade ks registry Add t2tcpu-v1alpha2 component * Rename t2tcpu-v1alpha2 -> t2tcpu Rename t2tcpu -> t2tcpu-v1alpha1 and t2tcpu-v1alpha2 -> t2tcpu Update demo_setup/README.md to reflect ks v0.12.0 Update REPO_PATH in demo_setup/kubeflow-demo-base.env Update initialClusterVersion in k8s cluster creation script to 1.10.6-gke.2 Remove quotation marks from serving.deployHttpProxy so that it is parsed as a boolean instead of string * Rename t2tgpu & t2ttpu Rename t2tgpu -> t2tgpu-v1alpha1 and add t2tgpu-v1alpha2 as t2tgpu Rename t2ttpu -> t2ttpu-v1alpha1 and add t2ttpu-v1alpha2 as t2ttpu Resolve jsonnet parsing issues * Upgrade kubeflow to v0.2.4 Add gke environment * Add instructions for creating TPU clusters * Replace hard-coded value with env var * Update kf version to v0.2.4 in env var file * Add non-gke requirements to t2tcpu component Sync t2tgpu with t2tcpu Remove non-gke statements from t2ttpu component Add k8s v1.10.6 to minikube start command * Fix bug with non-gke environment setup in t2t Add service account setup and k8s secret creation instructions for serving & UI * Single cluster with GPU & TPU Add creation script for single cluster with access to CPU, GPU, & TPU Update GPU driver installation to k8s-1.10 * Remove v1alpha1 components * Update parameter values for t2t components Increase disk size for minikube cluster creation since 0.2.4 is larger Update gke cluster creation command * Update TPU annotation to TF 1.9 * Update kf version to v0.2.5 Update tfJobImage version to v20180809-d2509aa	2018-09-05 05:46:33 -07:00
Katsunori Kanda	1b7df0c141	Fixed broken link in github issue summarization example (#235 )	2018-08-26 18:01:31 -07:00
Michał Jastrzębski	35786ed9cb	Add estimator example for github issues (#203 ) * Add estimator example for github issues This is code input for doc about writing Keras for tfjob. There are few todos: 1. bug in dataset injection, can't raise number of steps 2. intead of adding hostpath for data, we should have quick job + pvc for this * pyling * wip * confirmed working on minikube * pylint * remove t2t, add documentation * add note about storageclass * fix link * remove code redundancy * adress review * small language fix	2018-08-24 18:10:27 -07:00
Puneith Kaul	1d5ddf560b	Merge pull request #236 from kubeflow/xgboost_readme Update README.md	2018-08-24 15:35:07 -07:00
Puneith Kaul	ab61a75373	Update README.md	2018-08-24 15:34:48 -07:00
Puneith Kaul	7b7d671b87	Update README.md	2018-08-24 07:49:18 -07:00
Puneith Kaul	e7996c33a2	Update README.md	2018-08-24 07:48:18 -07:00
Puneith Kaul	bd07a2f84e	new PR for XGBoost due to problems with history rewrite (#232 ) * new PR for XGBoost due to problems with history rewrite * Update housing.py * Update HousingServe.py * Update housing.py * added bitly * removed test function * reorder imports * fix spaces * fix spaces * fixed lint errors * renamed to xgboost_ames_housing	2018-08-22 06:01:36 -07:00
Daniel Castellanos	e6b6730650	Updated object detection training example (#228 ) * Updated Dockerfile.traning to use latest tensorflow and tensorflow object detetion api. * Updated tf-training-job component and added a chief replica spec * Corrected some typos and updated some instructions	2018-08-20 19:32:12 -07:00
Sanyam Kapoor	f9873e6ac4	Upgrade notebook commands and other relevant changes (#229 ) * Replace double quotes for field values (ks convention) * Recreate the ksonnet application from scratch * Fix pip commands to find requirements and redo installation, fix ks param set * Use sed replace instead of ks param set. * Add cells to first show JobSpec and then apply * Upgrade T2T, fix conflicting problem types * Update docker images * Reduce to 200k samples for vocab * Use Jupyter notebook service account * Add illustrative gsutil commands to show output files, specify index files glob explicitly * List files after index creation step * Use the model in current repository and not upstream t2t * Update Docker images * Expose TF Serving Rest API at 9001 * Spawn terminal from the notebooks ui, no need to go to lab	2018-08-20 16:35:07 -07:00
Michelle Casbon	0843cdad66	Add Yelp restaurant review demo files (#220 ) * Add Yelp restaurant review demo files * Add video links * Resolve lint issues	2018-08-15 22:49:00 -07:00
Sanyam Kapoor	4e015e76a3	Cherry pick changes to PredictionDoFn (#226 ) * Cherry pick changes to PredictionDoFn * Disable lint checks for cherry picked file * Update TODO and notebook install instructions * Restore CUSTOM_COMMANDS todo	2018-08-15 06:21:00 -07:00
Sanyam Kapoor	18829159b0	Add a new github function docstring extended problem (#225 ) * Add a new github function docstring extended problem * Fix lint errors * Update images	2018-08-14 15:41:47 -07:00
Sanyam Kapoor	8fce4a7799	Allow ks param set for Code Search Ksonnet Application (#224 ) * Allow ks param set for t2t-code-search * Update notebook with working directory param set * Abstract out common variables for easy ks param set	2018-08-14 15:29:04 -07:00
Lun-Kai Hsu	f3806d0bac	Small fix to TF serving gpu (#221 ) * Small fix to TF serving gpu * fix * fix * fix	2018-08-14 14:27:35 -07:00
Sanyam Kapoor	a687c51036	Add a Jupyter notebook to be used for Kubeflow codelabs (#217 ) * Add a Jupyter notebook to be used for Kubeflow codelabs * Add help command for create_function_embeddings module * Update README to point to Jupyter Notebook * Add prerequisites to readme * Update README and getting started with notebook guide * [wip] * Update noebook with BigQuery previews * Update notebook to automatically select the latest MODEL_VERSION	2018-08-13 21:43:26 -07:00
Ankush Agarwal	a80c15b50e	Merge pull request #213 from activatedgeek/search-server-kubeflow Update Search Index server spec	2018-08-09 14:57:49 -07:00
Sanyam Kapoor	6e9150bad6	Parametrize volumes and ports for nmslib containers	2018-08-09 10:53:23 -07:00
Sanyam Kapoor	133e054033	Refactor job and deployment specs into different functions	2018-08-09 10:53:23 -07:00
Sanyam Kapoor	e34f9aca75	Build just one image with the correct tag instead of double the number	2018-08-09 10:53:23 -07:00
Sanyam Kapoor	c86f306d79	Use kind Job instead of Pod	2018-08-09 10:53:23 -07:00

... 2 3 4 5 6 ...

343 Commits All Branches Search

343 Commits

All Branches