A repository to host extended examples and tutorials
Go to file
Jeremy Lewi acd8007717 Use conditionals and add test for code search (#291)
* Fix model export, loss function, and add some manual tests.

Fix Model export to support computing code embeddings: Fix #260

* The previous exported model was always using the embeddings trained for
  the search query.

* But we need to be able to compute embedding vectors for both the query
  and code.

* To support this we add a new input feature "embed_code" and conditional
  ops. The exported model uses the value of the embed_code feature to determine
  whether to treat the inputs as a query string or code and computes
  the embeddings appropriately.

* Originally based on #233 by @activatedgeek

Loss function improvements

* See #259 for a long discussion about different loss functions.

* @activatedgeek was experimenting with different loss functions in #233
  and this pulls in some of those changes.

Add manual tests

* Related to #258

* We add a smoke test for T2T steps so we can catch bugs in the code.
* We also add a smoke test for serving the model with TFServing.
* We add a sanity check to ensure we get different values for the same
  input based on which embeddings we are computing.

Change Problem/Model name

* Register the problem github_function_docstring with a different name
  to distinguish it from the version inside the Tensor2Tensor library.

* * Skip the test when running under prow because its a manual test.
* Fix some lint errors.

* * Fix lint and skip tests.

* Fix lint.

* * Fix lint
* Revert loss function changes; we can do that in a follow on PR.

* * Run generate_data as part of the test rather than reusing a cached
  vocab and processed input file.

* Modify SimilarityTransformer so we can overwrite the number of shards
  used easily to facilitate testing.

* Comment out py-test for now.
2018-11-02 09:52:11 -07:00
agents Update PVC to /home/jovyan (#119) 2018-07-13 14:39:26 -07:00
code_search Use conditionals and add test for code search (#291) 2018-11-02 09:52:11 -07:00
codelab-image Update Ksonnet version, Add Python2 pip (#216) 2018-08-07 22:58:20 -07:00
demos Upgrade demo to KF v0.3.1 (#278) 2018-10-26 12:58:00 -07:00
financial_time_series minor fixes for instructions (#267) 2018-10-15 10:02:17 -07:00
github_issue_summarization Remove v1alpah1 TFJobs from the GH issue summarization example. (#264) 2018-10-15 09:52:01 -07:00
mnist example mnist upgrade to v1alpha2 (#246) 2018-09-09 13:01:21 -07:00
object_detection Fix #272 (#273) 2018-10-22 14:57:24 -07:00
pipelines Delete readme (#294) 2018-11-01 19:41:55 -07:00
test/workflows Use conditionals and add test for code search (#291) 2018-11-02 09:52:11 -07:00
xgboost_ames_housing new PR for XGBoost due to problems with history rewrite (#232) 2018-08-22 06:01:36 -07:00
.gitignore Add estimator example for github issues (#203) 2018-08-24 18:10:27 -07:00
.pylintrc Add namespace to ksonnet apply command (#57) 2018-04-02 09:41:02 -07:00
CONTRIBUTING.md Added tutorial for object detection distributed training (#74) 2018-07-03 14:10:20 -07:00
LICENSE Initial commit 2018-02-01 13:13:10 -08:00
OWNERS Remove inactive reviewers/approvers. (#296) 2018-11-02 08:34:20 -07:00
README.md add financial time series example (#252) 2018-10-12 08:04:07 -07:00
prow_config.yaml Skeleton testing framework (#18) 2018-03-01 21:30:50 -08:00

README.md

kubeflow-examples

A repository to share extended Kubeflow examples and tutorials to demonstrate machine learning concepts, data science workflows, and Kubeflow deployments. They illustrate the happy path, acting as a starting point for new users and a reference guide for experienced users.

This repository is home to three types of examples:

  1. End-to-end
  2. Component-focused
  3. Application-specific

End-to-end

GitHub issue summarization

Author: Hamel Husain

This example covers the following concepts:

  1. Natural Language Processing (NLP) with Keras and Tensorflow
  2. Connecting to Jupyterhub
  3. Shared persistent storage
  4. Training a Tensorflow model
  5. CPU
  6. GPU
  7. Serving with Seldon Core
  8. Flask front-end

MNIST

Author: Elson Rodriguez

This example covers the following concepts:

  1. Image recognition of handwritten digits
  2. S3 storage
  3. Training automation with Argo
  4. Monitoring with Argo UI and Tensorboard
  5. Serving with Tensorflow

Distributed Object Detection

Author: Daniel Castellanos

This example covers the following concepts:

  1. Gathering and preparing the data for model training using K8s jobs
  2. Using Kubeflow tf-job and tf-operator to launch a distributed object training job
  3. Serving the model through Kubeflow's tf-serving

Financial Time Series

Author: Sven Degroote

This example covers the following concepts:

  1. Deploying Kubeflow to a GKE cluster
  2. Exploration via JupyterHub (prospect data, preprocess data, develop ML model)
  3. Training several tensorflow models at scale with TF-jobs
  4. Deploy and serve with TF-serving
  5. Iterate training and serving
  6. Training on GPU

Component-focused

XGBoost - Ames housing price prediction

Author: Puneith Kaul

This example covers the following concepts:

  1. Training an XGBoost model
  2. Shared persistent storage
  3. GCS and GKE
  4. Serving with Seldon Core

Application-specific

Third-party hosted

Source Example Description

Get Involved

In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.

The Kubeflow community is guided by our Code of Conduct, which we encourage everybody to read before participating.