A repository to host extended examples and tutorials

Go to file

Jeremy Lewi 78fdc74b56 Dataflow job should support writing embeddings to a different location (Fix #366 ). (#388 ) * Datflow job should support writing embeddings to a different location (Fix #366). * Dataflow job to compute code embeddings needs to have parameters controlling the location of the outputs independent of the inputs. Prior to this fix the same table in the dataset was always written and the files were always created in the data dir. * This made it very difficult to rerun the embeddings job on the latest GitHub data (e.g to regularly update the code embeddings) without overwritting the current embeddings. * Refactor how we create BQ sinks and sources in this pipeline * Rather than create a wrapper class that bundles together a sink and schema we should have a separate helper class for creating BQ schemas and then use WriteToBigQuery directly. * Similarly for ReadTransforms we don't need a wrapper class that bundles a query and source. We can just create a class/constant to represent queries and pass them directly to the appropriate source. * Change BQ write disposition to if empty so we don't overwrite existing data. * Fix #390 worker setup fails because requirements.dataflow.txt not found * Dataflow always uses the local file requirements.txt regardless of the local file used as the source. * When job is submitted it will also try to build a sdist package on the client which invokes setup.py * So we in setup.py we always refer to requirements.txt * If trying to install the package in other contexts, requirements.dataflow.txt should be renamed to requirements.txt * We do this in the Dockerfile. * Refactor the CreateFunctionEmbeddings code so that writing to BQ is not part of the compute function embeddings code; (will make it easier to test.) * * Fix typo in jsonnet with output dir; missing an "=".		2018-12-02 09:51:27 -08:00
agents	Update PVC to /home/jovyan (#119 )	2018-07-13 14:39:26 -07:00
code_search	Dataflow job should support writing embeddings to a different location (Fix #366 ). (#388 )	2018-12-02 09:51:27 -08:00
codelab-image	Update Ksonnet version, Add Python2 pip (#216 )	2018-08-07 22:58:20 -07:00
demos	Use latest kubeflow release branch v0.3.4-rc.1 (#365 )	2018-11-27 09:27:34 -08:00
financial_time_series	minor fixes for instructions (#267 )	2018-10-15 10:02:17 -07:00
github_issue_summarization	A bunch of changes to support distributed training using tf.estimator (#265 )	2018-11-07 16:23:59 -08:00
mnist	example mnist upgrade to v1alpha2 (#246 )	2018-09-09 13:01:21 -07:00
object_detection	tf-training-job doesn't complete (#367 )	2018-11-28 22:48:21 -08:00
pipelines	Delete readme (#294 )	2018-11-01 19:41:55 -07:00
pytorch_mnist	[pytorch_mnist] Point images back to gcr.io/kubeflow-examples (#360 )	2018-11-28 22:48:16 -08:00
test/workflows	E2E Pytorch mnist example (#274 )	2018-11-18 14:24:43 -08:00
xgboost_ames_housing	new PR for XGBoost due to problems with history rewrite (#232 )	2018-08-22 06:01:36 -07:00
.gitignore	Add estimator example for github issues (#203 )	2018-08-24 18:10:27 -07:00
.pylintrc	[mnist_pytorch] fix train image (#342 )	2018-11-24 13:22:28 -08:00
CONTRIBUTING.md	Added tutorial for object detection distributed training (#74 )	2018-07-03 14:10:20 -07:00
LICENSE	Initial commit	2018-02-01 13:13:10 -08:00
OWNERS	Remove inactive reviewers/approvers. (#296 )	2018-11-02 08:34:20 -07:00
README.md	Updated example and demo READMEs (#344 )	2018-11-24 17:27:52 -08:00
prow_config.yaml	[mnist_pytorch] fix train image (#342 )	2018-11-24 13:22:28 -08:00

README.md

kubeflow-examples

A repository to share extended Kubeflow examples and tutorials to demonstrate machine learning concepts, data science workflows, and Kubeflow deployments. The examples illustrate the happy path, acting as a starting point for new users and a reference guide for experienced users.

This repository is home to the following types of examples and demos:

End-to-end
Component-focused
Application-specific
Demos