examples

Commit Graph

Author	SHA1	Message	Date
Amy	443f4bd2a3	deprecating gis e2e example until it is fixed. (#736 )	2020-02-18 20:58:25 -08:00
dependabot[bot]	c20eafc4fc	Bump nltk from 3.2.5 to 3.4.5 in /github_issue_summarization (#698 ) Bumps [nltk](https://github.com/nltk/nltk) from 3.2.5 to 3.4.5. - [Release notes](https://github.com/nltk/nltk/releases) - [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog) - [Commits](https://github.com/nltk/nltk/compare/3.2.5...3.4.5) Signed-off-by: dependabot[bot] <support@github.com>	2019-12-17 15:54:03 -08:00
dependabot[bot]	f6a7adb2fc	Bump tensorflow-gpu from 1.3.0 to 1.15.0 in /github_issue_summarization (#697 ) Bumps [tensorflow-gpu](https://github.com/tensorflow/tensorflow) from 1.3.0 to 1.15.0. - [Release notes](https://github.com/tensorflow/tensorflow/releases) - [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md) - [Commits](https://github.com/tensorflow/tensorflow/compare/v1.3.0...v1.15.0) Signed-off-by: dependabot[bot] <support@github.com>	2019-12-16 12:59:39 -08:00
Amy	91374e6d27	notebook cleanup (#679 )	2019-11-11 16:42:06 -08:00
Amy	67041ec4d5	updates to reflect changed node credentials, minor cleanup, update component URLs, (#675 ) update serving-only pipeline as well	2019-11-05 18:29:01 -08:00
Amy	452aa428b6	Updates to the pipelines GH summarization lab to demonstrate component input/output (#669 ) * copy and training step params, remove unused args, use google-samples images * update notebook to reflect new pipeline * type definition change * fix typo, use kfp.dsl.RUN_ID_PLACEHOLDER * change 'serve' setp to use gcp secret- req'd for 0.7	2019-10-27 04:55:14 -07:00
Amy	ad55b0a246	change gpu limit (#651 )	2019-10-03 09:12:09 -07:00
Jin Chi He	cfe166f73f	update to kubeflow-metadata in examples (#646 )	2019-09-26 16:13:34 -07:00
Amy	c20ebb5c0f	fix component URLs in pipeline now that primary PR is in (#642 )	2019-09-19 14:46:58 -07:00
Amy	b5349df27d	Update to KFP pipelines codelab code (GH summarization) (#638 ) * checkpointing * checkpointing * refactored pipeline that uses pre-emptible VMs * checkpointing. istio routing for the webapp. * checkpointing * - temp testing components - initial v of metadata logging 'component' - new dirs; file rename * public md log image; add md server connect retry * update pipeline to include md logging steps * - file rename, notebook updates - update compiled pipeline; fix component name typo - change DAG to allow md logging concurrently; update pre-emptible VMS PL * pylint cleanup, readme/tutorial update/deprecation, minor tweaks * file cleanup * update the tfjob api version for an (unrelated) test to address presubmit issues * try annotating test_train in github_issue_summarization/testing/tfjob_test.py with @unittest.expectedFailure * try commenting out a (likely) problematic unittest unrelated to the code changes in this PR * try adding @test_util.expectedFailure annotation instead of commenting out test * update the codelab shortlink; revert to commenting out a problematic unit test	2019-09-19 08:47:00 -07:00
Nick Harvey	9675480997	minor update to the pachyderm seldon example (#562 ) * minor update to the pachyderm seldon exaple * Another minor update to the pipeline	2019-07-04 13:24:19 -07:00
Amy	767ecd240d	Use the client libs to do a GCS copy instead of gsutil (#558 ) * use gcs client libs to copy checkpoint dir * more minor cleanup, use tagged image, use newer pipeline param spec. syntax. pylint cleanup. added set_memory_limit() to notebook pipeline training steps. modified the pipelines definitions to use the user-defined params as defaults. * put a retry loop around the copy_blob	2019-05-17 14:00:11 -07:00
Michelle Casbon	1ebcd33881	Add memory limit (#552 ) This aids GKE autoprovisioning in creation of appropriate node sizes	2019-05-14 17:54:18 -07:00
Amy	ed38a1a35c	re-add file that I'd missed in porting over; minor comment phrasing change in notebook (#556 )	2019-05-13 12:28:50 -07:00
Amy	b23adc1f0b	import of Pipelines Github issue summarization examples & tutorial (#507 ) * initial import of Pipelines Github issue summarization examples & lab * more linting/cleanup, fix tf version to 1.12 * bit more linting; pin some lib versions * last? lint fixes * another attempt to fix linting issues * ughh * changed test cluster config info * update ktext package in a test docker image * hmm, retrying fix for the ktext package update	2019-04-18 17:57:54 -07:00
Nick Harvey	52795bcaf5	Adding Pachyderm Example (squashed) (#522 ) * Adding Pachyderm Example (squashed) * Add Dan Sanche to OWNERS (#520) Fixed tf_operator import for github_issue_summarization example (#527) * fixed tf_operator import * updated tf-operator import path * small change * updated PYTHONPATH * fixed syntax error * formating issue Mnist pipelines (#524) * added mnist pipelines sample * fixed lint issues	2019-03-20 08:41:00 -07:00
zabbasi	7924e0fe21	Fixed tf_operator import for github_issue_summarization example (#527 ) * fixed tf_operator import * updated tf-operator import path * small change * updated PYTHONPATH * fixed syntax error * formating issue	2019-03-14 18:36:58 -07:00
Michelle Casbon	692c78550e	Merge pull request #399 from govindKAG/patch-1 fixed "setting persistent disk" link	2019-02-17 09:04:42 -08:00
Zhenghui Wang	74378a2990	Add end2end test for Xgboost housing example (#493 ) * Add e2e test for xgboost housing example * fix typo add ks apply add [ modify example to trigger tests add prediction test add xgboost ks param rename the job name without _ use - instead of _ libson params rm redudent component rename component in prow config add ames-hoursing-env use - for all names use _ for params names use xgboost_ames_accross rename component name shorten the name change deploy-test command change to xgboost- namespace init ks app fix type add confest.py change path change deploy command change dep change the query URL for seldon add ks_app with seldon lib update ks_app use ks init only rerun change to kf-v0-4-n00 cluster add ks_app use ks-13 remove --namespace use kubeflow as namespace delete seldon deployment simplify ks_app retry on 503 fix typo query 1285 move deletion after prediction wait 10s always retry till 10 mins move check to retry fix pylint move clean-up to the delete template * set up xgboost component * check in ks component& run it directly * change comments * add comment on why use 'ks delete' * add two modules to pylint whitelist * ignore tf_operator/py * disable pylint per line * reorder import	2019-02-12 06:37:05 -08:00
govind cs	225a7e9f90	Merge branch 'master' into patch-1	2019-01-21 09:49:12 +05:30
govind cs	bf5e18a34e	Update 01_setup_a_kubeflow_cluster.md	2019-01-21 09:46:04 +05:30
cliveseldon	8d728f0b06	GitHub Summarization Seldon Update (#472 ) * Update model inference wrapping to use S2I and update docs * Add s2i reference in docs * Fix typo highlighted in review * Add pyLint annotation to allow protected-access on keras make predict function method	2019-01-17 16:07:34 -08:00
Jeremy Lewi	1cc4550b7d	GIS E2E test verify the TFJob runs successfully (#456 ) * Create a test for submitting the TFJob for the GitHub issue summarization example. * This test needs to be run manually right now. In a follow on PR we will integrate it into CI. * We use the image built from Dockerfile.estimator because that is the image we are running train_test.py in. * Note: The current version of the code now requires Python3 (I think this is due to an earlier PR which refactored the code into a shared implementation for using TF estimator and not TF estimator). * Create a TFJob component for TFJob v1beta1; this is the version in KF 0.4. TFJob component * Upgrade to v1beta to work with 0.4 * Update command line arguments to match the versions in the current code * input & output are now single parameters rather then separate parameters for bucket and name * change default input to a CSV file because the current version of the code doesn't handle unzipping it. * Use ks_util from kubeflow/testing * Address comments.	2019-01-08 15:06:49 -08:00
Jeremy Lewi	959d072e68	Setup continuous building of Docker images for GH Issue Summarization Example (#449 ) * Setup continuous building of Docker images and testing for GH Issue Summarization Example. * This is the first step in setting up a continuously running CI test. * Add support for building the Docker images using GCB; we will use GCB to trigger the builds from our CI system. * Make the Makefile top level (at root of GIS example) so that we can easily access all the different resources. * Add a .gitignore file to avoid checking in the build directory used by the Makefile. * Define an Argo workflow to use as the E2E test. Related to #92: E2E test & CI for github issue summarization * Trigger the test on pre & post submit * Dockerfile.estimator don't install the data_download.sh script * It doesn't look like we are currently using data_download.sh in the DockerImage * It looks like it only gets used vias the ksonnet job which mounts the script via a config map * Copying data_download.sh to the Docker image is currently weird given the organization of the Dockerfile and context. * Copy the test_data to the Docker images so that we can run the test inside the images. * Invoke the python unittest for training from our CI system. * In a follow on PR we will update the test to emit a JUnit XML file to report results to prow. * Fix image build.	2019-01-04 17:02:24 -08:00
Michelle Casbon	70a22d6d7b	[GH Issue Summarization] Upgrade to kf v0.4.0-rc.2 (#450 ) * Update tfjob components to v1beta1 Remove old version of tensor2tensor component * Combine UI into a single jsonnet file * Upgrade GH issue summarization to kf v0.4.0-rc.2 Use latest ksonnet v0.13.1 Use latest seldon v1alpha2 Remove ksonnet app with full kubeflow platform & replace with components specific to this example. Remove outdated scripts Add cluster creation links to Click-to-deploy & kfctl Add warning not to use the Training with an Estimator guide Replace commandline with bash for better syntax highlighting Replace messy port-forwarding commands with svc/ambassador Add modelUrl param to ui component Modify teardown instructions to remove the deployment Fix grammatical mistakes * Rearrange tfjob instructions	2018-12-30 20:05:29 -08:00
Jeremy Lewi	7990408207	Delete obsolete HP tuning code. (#451 ) * Katib no longer uses custom go programs. Instead it uses the new StudyJobController custom resource. * This code is no longer needed so delete it.	2018-12-29 19:00:14 -08:00
Karthic Rao	b69cf36a39	Fixing broken links (#403 ) - Fix broken links for the install instructions. - Minor modifications to the instructions. - Minior formatting fixes.	2018-12-05 18:42:11 -08:00
govind cs	60ba49c68d	fixed "setting persistent disk" link Fixed the linked to advanced customization link on kubeflow which currently redirects to a non-existent page.	2018-12-04 16:02:53 +05:30
Jeremy Lewi	1043bc0c26	A bunch of changes to support distributed training using tf.estimator (#265 ) * Unify the code for training with Keras and TF.Estimator Create a single train.py and trainer.py which uses Keras inside TensorFlow Provide options to either train with Keras or TF.TensorFlow The code to train with TF.estimator doesn't worki See #196 The original PR (#203) worked around a blocking issue with Keras and TF.Estimator by commenting certain layers in the model architecture leading to a model that wouldn't generate meaningful predictions We weren't able to get TF.Estimator working but this PR should make it easier to troubleshoot further We've unified the existing code so that we don't duplicate the code just to train with TF.estimator We've added unitttests that can be used to verify training with TF.estimator works. This test can also be used to reproduce the current errors with TF.estimator. Add a Makefile to build the Docker image Add a NFS PVC to our Kubeflow demo deployment. Create a tfjob-estimator component in our ksonnet component. changes to distributed/train.py as part of merging with notebooks/train.py * Add command line arguments to specify paths rather than hard coding them. * Remove the code at the start of train.py to wait until the input data becomes available. * I think the original intent was to allow the TFJob to be started simultaneously with the preprocessing job and just block until the data is available * That should be unnecessary since we can just run the preprocessing job as a separate job. Fix notebooks/train.py (#186) The code wasn't actually calling Model Fit Add a unittest to verify we can invoke fit and evaluate without throwing exceptions. * Address comments.	2018-11-07 16:23:59 -08:00
Jeremy Lewi	90044d24c4	Remove v1alpah1 TFJobs from the GH issue summarization example. (#264 ) * We should be using v1alpha2 exclusively now.	2018-10-15 09:52:01 -07:00
Jeremy Lewi	4ea761630d	Fix gh-demo.kubeflow.org and make it easy to setup. (#261 ) * Fix gh-demo.kubeflow.org and make it easy to setup. * Our public demo of the GitHub issue summarization example (gh-demo.kubeflow.org) is down. It was running in one of our dev clusters and with the the churn in dev clusters it ended up getting deleted. * To make it more stable lets move it to project kubecon-gh-demo-1 and create a separate cluster for running it. This cluster can also serve as a readily available Kubeflow cluster setup for giving demos. * Create the directory demo within the github_issue_summarization example to contain all the required files. * Add a makefile to make building the image work. * The ksonnet app for the public demo was previously stored here https://github.com/kubeflow/testing/tree/master/deployment/ks-app * Fix the uiservice account. * Address comments.	2018-10-15 08:36:11 -07:00
Akado2009	5329bfa59b	docs updated (#240 )	2018-09-24 15:07:27 -07:00
Katsunori Kanda	1b7df0c141	Fixed broken link in github issue summarization example (#235 )	2018-08-26 18:01:31 -07:00
Michał Jastrzębski	35786ed9cb	Add estimator example for github issues (#203 ) * Add estimator example for github issues This is code input for doc about writing Keras for tfjob. There are few todos: 1. bug in dataset injection, can't raise number of steps 2. intead of adding hostpath for data, we should have quick job + pvc for this * pyling * wip * confirmed working on minikube * pylint * remove t2t, add documentation * add note about storageclass * fix link * remove code redundancy * adress review * small language fix	2018-08-24 18:10:27 -07:00
Pete MacKinnon	d2c5e949e5	Update PVC to /home/jovyan (#119 )	2018-07-13 14:39:26 -07:00
Jeremy Lewi	eaf0298590	Create a deployment to run the HP/Katib controller for the GitHub issue example. (#161 ) * Some of the code is copied over from https://github.com/kubeflow/katib/tree/master/examples/GKEDemo * I think it makes sense to centralize all the code in a single place. * Update the controller program (git-issue-summarize-demo.go) so that can specify the Docker image containing the training code. * Create a ksonnet deployment for running the controller on the cluster. * The HP tuning job isn't functional here's an incomplete list of issues * The training jobs launched fail because they don't have GCP credentials so they can't download the data. * We don't actually extract and report metrics back to Katib. Related to: kubeflow/katib#116	2018-07-11 08:46:25 -07:00
Michelle Casbon	836ad70421	Fix model file upload (#160 ) * Add component parameters Add model_url & port arguments to flask app Add service_type, image, and model_url parameters to ui component Fix problem argument in tensor2tensor component * Fix broken UI component Fix broken UI component structure by adding all, service, & deployment parts Add parameter defaults for tfjob to resolve failures deploying other components * Add missing imports in flask app Fix syntax error in argument parsing Remove underscores from parameter names to workaround ksonnet bug #554: https://github.com/ksonnet/ksonnet/issues/554 * Fix syntax errors in t2t instructions Add CPU image build arg to docker build command for t2t-training Fix link to ksonnet app dir Correct param names for tensor2tensor component Add missing params for tensor2tensor component Fix apply command syntax Swap out log view pod for t2t-master instead of tf-operator Fix link to training with tfjob * Fix model file upload Update default params for tfjob-v1alpha2 Fix build directory path in Makefile * Resolve lint issues Lines too long * Add specific image tag to tfjob-v1alpha2 default * Fix defaults for training output files Update image tag Add UI image tag * Revert service account secret details Update associated readme	2018-06-29 18:41:20 -07:00
Jeremy Lewi	98ed4b4a69	Fix v1alpha2 version of the T2T training job. (#158 ) * Update the Docker image for T2T to use a newer version of T2T library * Add parameters to set the GCP secret; we need GCP credentials to read from GCS even if reading a public bucket. We default to the parameters that are created automatically in the case of a GKE deployment. * Create a v1alpha2 template for the job that uses PVC.	2018-06-29 12:26:18 -07:00
Jeremy Lewi	93db7e369e	Update the GH summarization example to Kubeflow 0.2 and TFJob v1alpha2. (#157 ) * Update the GH summarization example to Kubeflow 0.2 and TFJob v1alpha2. * Upgrade the ksonnet app to Kubeflow 0.2 rc.1 * Add the examples package. * Add a .gitignore file and ignore all environments so that we won't pick up people's testing environments. * Add tfjob-v1alpha2 component; this trains the model using Keras using TFJob v1alpha2. * Update the parameters so that we use the GCP secrets created as part of the Kubeflow deployment. * Remove jlewi environment. * Verified that training ran successfully and outputted a model to GCS * There was an error about some missing arguments to a logging statement but this can be ignored although it would be good to fix. * Started working on T2T v1alpha2. Seems to be messing up the app. * Update the v1alpha2 template for the tensor2tensor job but it looks like there is an error 2018-06-29 17:45:23,369] Found unknown flag: --problem=github_issue_summarization_problem Traceback (most recent call last): File "/home/jovyan/.conda/bin/t2t-trainer", line 32, in <module> tf.app.run() File "/home/jovyan/.conda/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run _sys.exit(main(argv)) File "/home/jovyan/.conda/bin/t2t-trainer", line 28, in main t2t_trainer.main(argv) File "/home/jovyan/.conda/lib/python3.6/site-packages/tensor2tensor/bin/t2t_trainer.py", line 334, in main exp_fn = create_experiment_fn() File "/home/jovyan/.conda/lib/python3.6/site-packages/tensor2tensor/bin/t2t_trainer.py", line 158, in create_experiment_fn problem_name=get_problem_name(), File "/home/jovyan/.conda/lib/python3.6/site-packages/tensor2tensor/bin/t2t_trainer.py", line 115, in get_problem_name problems = FLAGS.problems.split("-") AttributeError: 'NoneType' object has no attribute 'split'	2018-06-29 11:31:23 -07:00
Michelle Casbon	11b75edfd9	Add component parameters (#155 ) * Add component parameters Add model_url & port arguments to flask app Add service_type, image, and model_url parameters to ui component Fix problem argument in tensor2tensor component * Fix broken UI component Fix broken UI component structure by adding all, service, & deployment parts Add parameter defaults for tfjob to resolve failures deploying other components * Add missing imports in flask app Fix syntax error in argument parsing Remove underscores from parameter names to workaround ksonnet bug #554: https://github.com/ksonnet/ksonnet/issues/554 * Fix syntax errors in t2t instructions Add CPU image build arg to docker build command for t2t-training Fix link to ksonnet app dir Correct param names for tensor2tensor component Add missing params for tensor2tensor component Fix apply command syntax Swap out log view pod for t2t-master instead of tf-operator Fix link to training with tfjob	2018-06-28 13:52:21 -07:00
Puneith Kaul	174d6602ac	Update README.md (#116 )	2018-05-20 17:43:48 -07:00
Jeremy Lewi	002119010f	Fix data-downloader; parameters are in the wrong order. (#115 ) * URL should be the first argument; data dir should be the second.	2018-05-17 11:16:51 -07:00
Carol Willing	0b303e70f1	Edit navigation and markdown for github example (#93 ) * edit TF example readme * prefix tutorial steps with a number for nicer display in repo * fix typo * edit steps 4 and 5 * edit docs * add navigation and formatting edits to example	2018-05-09 12:12:54 -07:00
Jeremy Lewi	79aa2074cd	Improvements to the tensor2tensor trainer for the GitHub summarization example. (#109 ) * Improvements to the tensor2tensor traininer for the GitHub summarization example. * Simplify the launcher; we can just pass through most command line arguments and not use environment variables and command line arguments. * This makes it easier to control the job just by setting the parameters in the template rather than having to rebuild the images. * Add a Makefile to build the image. * Replace the tensor2tensor jsonnet with a newer version of the jsonnet used with T2T. * Address reviewer comments. * Install pip packages as user Jovyan * Rely on implicit string conversion with concatenation in template file.	2018-04-29 20:39:16 -07:00
Jeremy Lewi	afdd4c544e	Add a component to run TensorBoard. (#110 ) * Add a component to run TensorBoard. * Autoformate jsonnet file. * * Set a default of "" for logDir; there's not a really good default location because it will depend on where the data is stored.	2018-04-29 20:34:16 -07:00
Jeremy Lewi	e12231bae3	Make it easier to demo serving and run in Katacoda (#107 ) * Make it easier to demo serving and run in Katacoda * Allow the model path to be specified via environment variables so that we could potentially load the model from PVC. * Continue to bake the model into the image so that we don't need to train in order to serve. * Parameterize download_data.sh so we could potentially fetch different sources. * Update the Makefile so that we can build and set the image for the serving component. * Fix lint. * Update the serving docs.	2018-04-28 08:11:18 -07:00
Ankush Agarwal	26d68ead6c	Replace kubeflow-images-staging with kubeflow-images-public (#99 ) Fixes https://github.com/kubeflow/kubeflow/issues/534	2018-04-27 11:46:20 -07:00
Jeremy Lewi	4b33d44af6	Support training using a PVC for the data. (#98 ) * Support training using a PVC for the data. * This will make it easier to run the example on Katacoda and non-GCP platforms. * Modify train.py so we can use a GCS location or local file paths. * Update the Dockerfile. The jupyter Docker images and had a bunch of dependencies removed and the latest images don't have the dependencies needed to run the examples. * Creat a tfjob-pvc component that trains reading/writing using PVC and not GCP. * * Address reviewer comments * Ignore changes to the ksonnet parameters when determining whether to include dirty and sha of the diff in the image. This way we can update the ksonnet app with the newly built image without it leading to subsequent images being marked dirty. * Fix lint issues. * Fix lint import issue.	2018-04-27 04:08:19 -07:00
Jeremy Lewi	34d6f8809d	Add a job to download the data to PVC. (#97 ) * This is the first step to doing training and serving using a PV as opposed to GCS. * This will make the sample easier to run anyhere and in particular on Katacoda. * This currently would work as follows User creates a PVC ks apply ${ENV} -c data-pvc User runs a K8s job to download the data to PVC ks apply ${ENV} -c data-downloader In subsequent PRs we will update the train and serve steps to load the model from the PVC as opposed to GCS. Related to #91	2018-04-25 10:36:02 -07:00
Michelle Casbon	1a4f4dc1ea	Remove vendor from .gitignore (#94 ) * Remove vendor from .gitignore * Tell pylint to ignore generated file	2018-04-24 15:25:01 -07:00

1 2

77 Commits