A repository to host extended examples and tutorials
Go to file
Jeremy Lewi 78fdc74b56 Dataflow job should support writing embeddings to a different location (Fix #366). (#388)
* Datflow job should support writing embeddings to a different location (Fix #366).

* Dataflow job to compute code embeddings needs to have parameters controlling
  the location of the outputs independent of the inputs. Prior to this fix the
  same table in the dataset was always written and the files were always created
  in the data dir.

* This made it very difficult to rerun the embeddings job on the latest GitHub
  data (e.g to regularly update the code embeddings) without overwritting
  the current embeddings.

* Refactor how we create BQ sinks and sources in this pipeline

  * Rather than create a wrapper class that bundles together a sink and schema
    we should have a separate helper class for creating BQ schemas and then
    use WriteToBigQuery directly.

  * Similarly for ReadTransforms we don't need a wrapper class that bundles
    a query and source. We can just create a class/constant to represent
    queries and pass them directly to the appropriate source.

* Change BQ write disposition to if empty so we don't overwrite existing data.

* Fix #390 worker setup fails because requirements.dataflow.txt not found

  * Dataflow always uses the local file requirements.txt regardless of the
    local file used as the source.

  * When job is submitted it will also try to build a sdist package on
    the client which invokes setup.py

  * So we in setup.py we always refer to requirements.txt

  * If trying to install the package in other contexts,
    requirements.dataflow.txt should be renamed to requirements.txt

  * We do this in the Dockerfile.

* Refactor the CreateFunctionEmbeddings code so that writing to BQ
  is not part of the compute function embeddings code;
  (will make it easier to test.)

* * Fix typo in jsonnet with output dir; missing an "=".
2018-12-02 09:51:27 -08:00
agents Update PVC to /home/jovyan (#119) 2018-07-13 14:39:26 -07:00
code_search Dataflow job should support writing embeddings to a different location (Fix #366). (#388) 2018-12-02 09:51:27 -08:00
codelab-image Update Ksonnet version, Add Python2 pip (#216) 2018-08-07 22:58:20 -07:00
demos Use latest kubeflow release branch v0.3.4-rc.1 (#365) 2018-11-27 09:27:34 -08:00
financial_time_series minor fixes for instructions (#267) 2018-10-15 10:02:17 -07:00
github_issue_summarization A bunch of changes to support distributed training using tf.estimator (#265) 2018-11-07 16:23:59 -08:00
mnist example mnist upgrade to v1alpha2 (#246) 2018-09-09 13:01:21 -07:00
object_detection tf-training-job doesn't complete (#367) 2018-11-28 22:48:21 -08:00
pipelines Delete readme (#294) 2018-11-01 19:41:55 -07:00
pytorch_mnist [pytorch_mnist] Point images back to gcr.io/kubeflow-examples (#360) 2018-11-28 22:48:16 -08:00
test/workflows E2E Pytorch mnist example (#274) 2018-11-18 14:24:43 -08:00
xgboost_ames_housing new PR for XGBoost due to problems with history rewrite (#232) 2018-08-22 06:01:36 -07:00
.gitignore Add estimator example for github issues (#203) 2018-08-24 18:10:27 -07:00
.pylintrc [mnist_pytorch] fix train image (#342) 2018-11-24 13:22:28 -08:00
CONTRIBUTING.md Added tutorial for object detection distributed training (#74) 2018-07-03 14:10:20 -07:00
LICENSE Initial commit 2018-02-01 13:13:10 -08:00
OWNERS Remove inactive reviewers/approvers. (#296) 2018-11-02 08:34:20 -07:00
README.md Updated example and demo READMEs (#344) 2018-11-24 17:27:52 -08:00
prow_config.yaml [mnist_pytorch] fix train image (#342) 2018-11-24 13:22:28 -08:00

README.md

kubeflow-examples

A repository to share extended Kubeflow examples and tutorials to demonstrate machine learning concepts, data science workflows, and Kubeflow deployments. The examples illustrate the happy path, acting as a starting point for new users and a reference guide for experienced users.

This repository is home to the following types of examples and demos:

End-to-end

GitHub issue summarization

Author: Hamel Husain

This example covers the following concepts:

  1. Natural Language Processing (NLP) with Keras and Tensorflow
  2. Connecting to Jupyterhub
  3. Shared persistent storage
  4. Training a Tensorflow model
    1. CPU
    2. GPU
  5. Serving with Seldon Core
  6. Flask front-end

Pytorch MNIST

Author: David Sabater

This example covers the following concepts:

  1. Distributed Data Parallel (DDP) training with Pytorch on CPU and GPU
  2. Shared persistent storage
  3. Training a Pytorch model
    1. CPU
    2. GPU
  4. Serving with Seldon Core
  5. Flask front-end

MNIST

Author: Elson Rodriguez

This example covers the following concepts:

  1. Image recognition of handwritten digits
  2. S3 storage
  3. Training automation with Argo
  4. Monitoring with Argo UI and Tensorboard
  5. Serving with Tensorflow

Distributed Object Detection

Author: Daniel Castellanos

This example covers the following concepts:

  1. Gathering and preparing the data for model training using K8s jobs
  2. Using Kubeflow tf-job and tf-operator to launch a distributed object training job
  3. Serving the model through Kubeflow's tf-serving

Financial Time Series

Author: Sven Degroote

This example covers the following concepts:

  1. Deploying Kubeflow to a GKE cluster
  2. Exploration via JupyterHub (prospect data, preprocess data, develop ML model)
  3. Training several tensorflow models at scale with TF-jobs
  4. Deploy and serve with TF-serving
  5. Iterate training and serving
  6. Training on GPU

Component-focused

XGBoost - Ames housing price prediction

Author: Puneith Kaul

This example covers the following concepts:

  1. Training an XGBoost model
  2. Shared persistent storage
  3. GCS and GKE
  4. Serving with Seldon Core

Application-specific

Demos

Demos are for showing Kubeflow or one of its components publicly, with the intent of highlighting product vision, not necessarily teaching. In contrast, the goal of the examples is to provide a self-guided walkthrough of Kubeflow or one of its components, for the purpose of teaching you how to install and use the product.

In an example, all commands should be embedded in the process and explained. In a demo, most details should be done behind the scenes, to optimize for on-stage rhythm and limited timing.

You can find the demos in the /demos directory.

Third-party hosted

Source Example Description

Get Involved

In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.

The Kubeflow community is guided by our Code of Conduct, which we encourage everybody to read before participating.