examples

Commit Graph

Author	SHA1	Message	Date
Hougang Liu	1ed74b274c	create pv for pets-pv (#439 ) * create pv for pets-pv For a lot of user k8s clusters, dynamic volume provisioning isn't enabled. So the newcomer may be blocked since pets-pv will keep Pending. We can guide them to create a nfs PV as an option. * tell user how to check if a default storage class is defined * add link about how to create PV	2018-12-21 06:05:11 -08:00
Jeremy Lewi	2e6e891a5b	Update the ArgoCD app to use the kubeflow/examples repo (#440 ) * We were using jlewi's fork because PRs hadn't been committed but all the relevant PRs have been merged and master is the source of truth.	2018-12-19 21:26:49 -08:00
Jeremy Lewi	ba9af34805	Create a script to count lines of code. (#379 ) * Create a script to count lines of code. * This is used in the presentation to get an estimate of where the human effort is involved. * Fix lint issues.	2018-12-19 09:42:25 -08:00
Guang Ya Liu	345e69ab4c	Removed empty application centric section. (#375 )	2018-12-14 18:36:18 -08:00
Svendegroote91	a2e8a08e11	remove obsolete PS in GPU jsonnet (#407 )	2018-12-12 18:08:15 -08:00
Jeremy Lewi	9f061a0554	Update the central dashboard UI image to one that includes pipelines. (#430 )	2018-12-12 09:34:21 -08:00
Jeremy Lewi	1b643c2b81	Fix the web app. (#432 ) * We need to set the parameters for the model and index. * It looks like when we split up the web app into its own ksonnet app we forgot to set the parameters. * SInce the web app is being deployed in a separate namespace we need to copy the GCP credential to that namespace. Add instructions to the demo README.md on how to do that. * It looks like the pods were never getting started because the secret couldn't be mounted.	2018-12-12 09:24:40 -08:00
IronPan	4a7e2c868c	fix bq table dupliation (#418 ) * fix bq table dupliation * fix bq table dupliation * update * update image * use index for placeholder	2018-12-10 18:50:28 -08:00
David Sabater Dinter	d408ae09f0	Point images back to gcr.io/kubeflow-examples (#421 )	2018-12-09 16:02:24 -08:00
Jeremy Lewi	b26f7e9a48	Add pods/logs permission to the jupyter notebook role. (#419 ) * This is needed so that fairing can tail the logs.	2018-12-09 15:53:46 -08:00
Azmi Kamis	6cdc461b50	fix directories in Dockerfile (#416 )	2018-12-08 14:53:10 -08:00
Azmi Kamis	c234f18b0b	Fix curl command when sending request to seldon-served model on GKE (#415 ) * fix curl command when sending request to seldon-served model on GKE * modified response	2018-12-08 14:44:06 -08:00
Hougang Liu	fc5a85b948	reconcile tensorflow serving version (#409 ) Since default OBJ_DETECTION_IMAGE tensorflow version is 1.10.0, we pin consistent version 1.10.0 of TF across the example. Fixes: #408	2018-12-08 14:32:46 -08:00
Jeremy Lewi	67d42c4661	Expose ArgoCD UI behind Ambassador. (#413 ) * We need to disable TLS (its handled by ingress) because that leads to endless redirects. * ArgoCD is running in namespace argo-cd but Ambassador is running in a different namespace and currently only configured with RBAC to monitor a single namespace. * So we add a service in namespace kubeflow just to define the Ambassador mapping.	2018-12-08 12:49:34 -08:00
IronPan	4c970876dc	add notebook for code search pipeline (#410 )	2018-12-07 10:29:02 -08:00
Sam Shi	b2e6aa231c	Save the batch-predict package in the image; Create a separate Dockfile for GPU (#383 ) * Save the batch-predict package in the image; create a separate Dockerfile for gpu * remove commented code	2018-12-07 10:28:57 -08:00
IronPan	0d2f5b6342	Clean up code search pipeline (#406 ) * update pipeline to use out of box gcp credential support * Update index_update_pipeline.py	2018-12-07 10:13:12 -08:00
Hougang Liu	9994b57497	add object detection grpc client (#378 ) * add object detection grpc client Fixes: #377 * fix kubeflow-examples-presubmit error object_detection_grpc_client.py depends on other files in https://github.com/tensorflow/models.git, pylint will fail for those files need to be compiled manually. Since mnist_DDP.py has similar dependency, here just follow mnist_DDP.py and ignore checking this file.	2018-12-06 18:51:24 -08:00
Karthic Rao	b69cf36a39	Fixing broken links (#403 ) - Fix broken links for the install instructions. - Minor modifications to the instructions. - Minior formatting fixes.	2018-12-05 18:42:11 -08:00
IronPan	206ad8fda4	Add preprocess github data step to code search pipeline (#396 ) * refactor ks * remove unecessary params * update ks * address comments * add preprocess step * update images * update preprocess code * reformat * minor fix * reuse function embedding pipeline to preprocess * add preprocess * update pipeline * propagate failed token table * format code * copy vocabulary * address comments * address comments * update * fix * fix format * Update arguments.py	2018-12-05 18:06:06 -08:00
Michelle Casbon	5e395c1a88	Add components (#402 ) Replace files that were mistakenly removed in #376	2018-12-05 15:06:42 -08:00
govind cs	60ba49c68d	fixed "setting persistent disk" link Fixed the linked to advanced customization link on kubeflow which currently redirects to a non-existent page.	2018-12-04 16:02:53 +05:30
Michelle Casbon	fa1311833c	Update instructions and setup for yelp demo (#376 ) * Update instructions and setup for yelp demo Update kubeflow version to v0.3.4-rc.1 Add pipelines version v0.1.3-rc.2 Add simple pipelines example using GPUs Conform cluster name, secrets, and ks app directory name to click-to-deploy standard Update ks_app directory to v0.3.4-rc.1 Pin bokeh package to v0.13.0 in yelp notebook Fix bug in secret creation * Port-forward to svcs instead of pods Add clarification for using kfctl & updating component params	2018-12-03 22:39:51 -08:00
IronPan	cea0ffde0d	Update the ks parameter (#394 ) * refactor ks * remove unecessary params * update ks * address comments	2018-12-02 22:14:11 -08:00
Jeremy Lewi	78fdc74b56	Dataflow job should support writing embeddings to a different location (Fix #366 ). (#388 ) * Datflow job should support writing embeddings to a different location (Fix #366). * Dataflow job to compute code embeddings needs to have parameters controlling the location of the outputs independent of the inputs. Prior to this fix the same table in the dataset was always written and the files were always created in the data dir. * This made it very difficult to rerun the embeddings job on the latest GitHub data (e.g to regularly update the code embeddings) without overwritting the current embeddings. * Refactor how we create BQ sinks and sources in this pipeline * Rather than create a wrapper class that bundles together a sink and schema we should have a separate helper class for creating BQ schemas and then use WriteToBigQuery directly. * Similarly for ReadTransforms we don't need a wrapper class that bundles a query and source. We can just create a class/constant to represent queries and pass them directly to the appropriate source. * Change BQ write disposition to if empty so we don't overwrite existing data. * Fix #390 worker setup fails because requirements.dataflow.txt not found * Dataflow always uses the local file requirements.txt regardless of the local file used as the source. * When job is submitted it will also try to build a sdist package on the client which invokes setup.py * So we in setup.py we always refer to requirements.txt * If trying to install the package in other contexts, requirements.dataflow.txt should be renamed to requirements.txt * We do this in the Dockerfile. * Refactor the CreateFunctionEmbeddings code so that writing to BQ is not part of the compute function embeddings code; (will make it easier to test.) * * Fix typo in jsonnet with output dir; missing an "=".	2018-12-02 09:51:27 -08:00
IronPan	e8cf9c58ce	add pipeline step to push to git (#387 ) * add push to git * small fixes * work around .after() * format	2018-12-02 09:37:21 -08:00
IronPan	494fc05f16	Add IronPan to code_search owner (#386 )	2018-11-30 17:37:57 -08:00
IronPan	b807843031	add pipeline environment to code search web app (#372 ) * add pipeline * Update app.yaml	2018-11-30 07:51:00 -08:00
IronPan	3799bac22c	Update the update_index.sh (#373 ) * add search index creator container * add pipeline * update op name * update readme * update scripts * typo fix * Update Makefile * Update Makefile * address comments * fix ks * update pipeline * restructure the images * remove echo * update image * add code embedding launcher * small fixes * format * format * address comments * add flag * Update arguments.py * update parameter * revert to use --wait_until_finished. --wait_until_finish never works * update image * update git script * update script * update readme	2018-11-29 00:53:09 -08:00
Hougang Liu	6855802aa1	tf-training-job doesn't complete (#367 ) In tensorflow/models/research/object_detection/, only tensorflow/models/research/object_detection/legacy/train.py supports kubeflow sor far (construct cluster by reading TF_CONFIG environment var). Fixes: #277	2018-11-28 22:48:21 -08:00
David Sabater Dinter	f9a707ee85	[pytorch_mnist] Point images back to gcr.io/kubeflow-examples (#360 ) * Point images back to gcr.io/kubeflow-images-public * Point images back to gcr.io/kubeflow-examples * Point images back to gcr.io/kubeflow-examples	2018-11-28 22:48:16 -08:00
Guang Ya Liu	db8f4f4b37	Highlight the kubectl command. (#369 )	2018-11-28 22:41:40 -08:00
IronPan	7ffc50e0ee	Add dataflow launcher script (#364 ) * add search index creator container * add pipeline * update op name * update readme * update scripts * typo fix * Update Makefile * Update Makefile * address comments * fix ks * update pipeline * restructure the images * remove echo * update image * add code embedding launcher * small fixes * format * format * address comments * add flag * Update arguments.py * update parameter * revert to use --wait_until_finished. --wait_until_finish never works * update image	2018-11-27 19:23:54 -08:00
IronPan	760ba7b9e8	Cleanup build directory before code search GCB build (#370 ) The build directory cached the staled deleted files and without cleaning up the folder, those staled files are carried over to the new image.	2018-11-27 12:54:57 -08:00
IronPan	c0345dec90	Update setup.py to point to the new requirement file (#371 )	2018-11-27 12:45:07 -08:00
Michelle Casbon	6fcb28bc26	Use latest kubeflow release branch v0.3.4-rc.1 (#365 ) Remove separate pipelines installation Update kfp version to 0.1.3-rc.2 Clarify difference in installation paths (click-to-deploy vs CLI) Use set_gpu_limit() and remove generated yaml with resource limits	2018-11-27 09:27:34 -08:00
IronPan	31390d39a0	Add update search index pipeline (#361 ) * add search index creator container * add pipeline * update op name * update readme * update scripts * typo fix * Update Makefile * Update Makefile * address comments * fix ks * update pipeline * restructure the images * remove echo * update image * format * format * address comments	2018-11-27 04:43:55 -08:00
Hougang Liu	15007fdeea	Add ks env configuration guideline and directory(#346 ) (#347 )	2018-11-26 22:05:36 -08:00
Jeremy Lewi	e1e1422da4	Setup ArgoCD to synchornize the code search web app with the demo cluster. (#359 ) * Follow argocd instructions https://github.com/argoproj/argo-cd/blob/master/docs/getting_started.md to install ArgoCD on the cluster * Down the argocd manifest and update the namespace to argocd. * Check it in so ArgoCD can be deployed declaratively. * Update README.md with the instructions for deploying ArgoCD. Move the web app components into their own ksonnet app. * We do this because we want to be able to sync the web app components using Argo CD * ArgoCD doesn't allow us to apply autosync with granularity less than the app. We don't want to sync any of the components except the servers. * Rename the t2t-code-search-serving component to query-embed-server because this is more descriptive. * Check in a YAML spec defining the ksonnet application for the web UI. Update the instructions in nodebook code-search.ipynb * Provided updated instructions for deploying the web app due the fact that the web app is now a separate component. * Improve code-search.ipynb * Use gcloud to get sensible defaults for parameters like the project. * Provide more information about what the variables mean.	2018-11-26 18:19:19 -08:00
IronPan	7924fa7fd0	parameterize search index job name (#358 ) * parameterize search index job name * change namespace * Update search-index-creator.jsonnet	2018-11-26 12:03:30 -08:00
Jeremy Lewi	5d6a4e9d71	Create a script to update the index and lookup file used to serve predictions. (#352 ) * This script will be the last step in a pipeline to continuously update the index for serving. * The script updates the parameters of the search index server to point to the supplied index files. It then commits them and creates a PR to push those commits. * Restructure the parameters for the search index server so that we can use ks param set to override the indexFile and lookupFile. * We do this because we want to be able to push a new index by doing ks param set in a continuously running pipeline * Remove default parameters from search-index-server * Create a dockerfile suitable for running this script.	2018-11-26 06:35:27 -08:00
IronPan	4f95e85e63	add pipeline component (#356 ) * add pipeline component * update pipeline component	2018-11-26 06:21:07 -08:00
Sarah Maddox	62c2e4c249	Updated example and demo READMEs (#344 ) * Explained purpose of demos vs examples and added pipelines demo to README. * Fixed some rendering in list items.	2018-11-24 17:27:52 -08:00
Jeremy Lewi	a32227f371	Fix the ksonnet by defining globals. (#354 ) * The latest changes to the ksonnet components require certain values to be defined as defaults. * This is part of the move away from using a fake component to define parameters that should be reused across different modules. see #308 * Verify we can run ks show on a new environment and can evaluate the ksonnet. Fix #353	2018-11-24 14:36:43 -08:00
Jeremy Lewi	de17011066	Upgrade and fix the serving components. (#348 ) * Upgrade and fix the serving components. * Install a new version of the TFServing package so we can use the new template. * Fix the UI image. Use the same requirements file as for Dataflow so we are consistent w.r.t the version of TF and Tensor2Tesnro. * remove nms.libsonnet; move all the manifests into the actual component files rather than using a shared library. * Fix the name of the TFServing service and deployment; need to use the same name as used by the front end server. * Change the port of TFServing; we are now using the built in http server in TFServing which uses port 8500 as opposed to our custom http proxy. * We encountered an error importning nmslib; moving it to the top of the file appears to fix this. * Fix lint.	2018-11-24 13:22:34 -08:00
David Sabater Dinter	a630fcea34	[mnist_pytorch] fix train image (#342 ) * Default to model trained with CPUs TODO: Enable A/B testing with Seldon to load GPU and CPU models * Checkout 1.0rc1 release as latest Pytorch master seems to have MPI backend detection broken * Track changes in pytorch_mnist/training/ddp/mnist folder to trigger test jobs * Repoint to pull images from gcr.io/kubeflow-ci built during pre-submit * Fix image webui name * Fix logging * Add GCFS to CPU train * Fix logging * Add GCFS to CPU train * Default to model trained with GPUs TODO: Enable A/B testing with Seldon to load GPU and CPU models * Fix Predict() method as Seldon expects 3 arguments * Fix x reference	2018-11-24 13:22:28 -08:00
Jeremy Lewi	d2b68f15d7	Fix the K8s job to create the nmslib index. (#338 ) * Install nmslib in the Dataflow container so its suitable for running the index creation job. * Use command not args in the job specs. * Dockerfile.dataflow should install nmslib so that we can use that Docker image to create the index. * build.jsonnet should tag images as latest. We will use this to use the latest images as a layer cache to speed up builds. * Set logging level to info for start_search_server.py and create_search_index.py * Create search index pod keeps was getting evicted because node runs out of memory * Add a new node pool consisting of n1-standard-32 nodes to the demo cluster. These have 120 GB of RAM compared to 30GB in our default pool of n1-standard-8 * Set requests and limits on the creator search index pod. * Move all the config for the search-index-creator job into the search-index-creator.jsonnet file. We need to customize the memory resources so there's not much value to try to sharing config with other components.	2018-11-20 12:53:09 -08:00
David Sabater Dinter	a402db1ccc	E2E Pytorch mnist example (#274 ) * Add Pytorch MNIST example * Fix link to Pytorch NMIST example * Fix indentation in README * Fix lint errors * Fix lint errors Add prediction proto files * Add build_image.sh script to build image and push to gcr.io * Add pytorch-mnist-webui-release release through automatic ksonnet package * Fix lint errors * Add pytorch-mnist-webui-release release through automatic ksonnet package * Add PB2 autogenerated files to ignore with Pylint * Fix lint errors * Add official Pytorch DDP examples to ignore with Pylint * Fix lint errors * Update component to web-ui release * Update mount point to kubeflow-gcfs as the example is GCP specific * 01_setup_a_kubeflow_cluster document complete * Test release job while PR is WIP * Reduce workflow name to avoid Argo error: "must be no more than 63 characters" * Fix extra_repos to pull worker image * Fix testing_image using kubeflow-ci rather than kubeflow-releasing * Fix extra_repo, only needs kubeflow/testing * Set build_image.sh executable * Update build_image.sh from CentralDashboard component * Remove old reference to centraldashboard in echo message * Build Pytorch serving image using Python Docker Seldon wrapper rather than s2i: https://github.com/SeldonIO/seldon-core/blob/master/docs/wrappers/python-docker.md * Build Pytorch serving image using Python Docker Seldon wrapper rather than s2i: https://github.com/SeldonIO/seldon-core/blob/master/docs/wrappers/python-docker.md * Add releases for the training and serving images * Add releases for the training and serving images * Fix testing_image using kubeflow-ci rather than kubeflow-releasing * Fix path to Seldon-wrapper build_image.sh * Fix image name in ksonnet parameter * Add 02 distributed training documentation * Add 03 serving the model documentation Update shared persistent reference in 02 distributed training documentation * Add 05 teardown documentation * Add section to test the model is deployed correctly in 03 serving the model * Add 04 querying the model documentation * Fix ks-app to ks_app * Set prow jobs back to postsubmit * Set prow jobs to trigger presubmit to kubeflow-ci and postsubmit to kubeflow-images-public * Change to kubeflow-ci project * Increase timeout limit during image build to compile Pytorch * Increase timeout limit during image build to compile Pytorch * Change build machine type to compile Pytorch for training image * Change build machine type to compile Pytorch for training image * Add OWNERS file to Pytorch example * Fix typo in documentation * Remove checking docker daemon as we are using gcloud build instead * Use logging module rather print() * Remove empty file, replace with .gitignore to keep tmp folder * Add ksonnet application to deploy model server and web-ui Delete model server JSON manifest * Refactor ks-app to ks_app * Parametrise serving_model ksonnet component Default web-ui to use ambassador route to seldon Remove form section in web-ui * Remove default environment from ksonnet application * Update documentation to use ksonnet application * Fix component name in documentation * Consolidate Pytorch train module and build_image.sh script * Consolidate Pytorch train module * Consolidate Pytorch train module * Consolidate Pytorch train module and build_image.sh script * Revert back build_image.sh scripts * Remove duplicates * Consolidate train Dockerflies and build_image.sh script using docker build rather than gcloud * Fix docker build command * Fix docker build command * Fix image name for cpu and gpu train * Consolidate Pytorch train module * Consolidate train Dockerflies and build_image.sh script using docker build rather than gcloud	2018-11-18 14:24:43 -08:00
Michelle Casbon	4bbc0c8fd8	Simple pipeline demo (#322 ) * Add simple pipeline demo * Add hyperparameter tuning & GPU autoprovisioning Use pipelines v0.1.2 * Resolve lint issues * Disable lint warning Correct SDK syntax that labels the name of the pipeline step * Add postprocessing step Basically empty step just to show more than one step * Add clarity to instructions * Update pipelines install to release v0.1.2 * Add repo cloning with release versions Remove katib patch Use kubeflow v0.3.3 Add PROJECT to env var override file Further clarification of instructions	2018-11-16 11:16:12 -08:00
Yang Pan	60a7413cc5	Remove ksonnet registry from dockerignore file (#333 ) In order to build a pipeline that can runs ksonnet command, the ksonnet registry need to be containerized. Remove it from dockerignore to unblock the work.	2018-11-14 13:45:15 -08:00

... 5 6 7 8 9 ...

546 Commits All Branches Search

546 Commits

All Branches