* Fix model export, loss function, and add some manual tests. Fix Model export to support computing code embeddings: Fix #260 * The previous exported model was always using the embeddings trained for the search query. * But we need to be able to compute embedding vectors for both the query and code. * To support this we add a new input feature "embed_code" and conditional ops. The exported model uses the value of the embed_code feature to determine whether to treat the inputs as a query string or code and computes the embeddings appropriately. * Originally based on #233 by @activatedgeek Loss function improvements * See #259 for a long discussion about different loss functions. * @activatedgeek was experimenting with different loss functions in #233 and this pulls in some of those changes. Add manual tests * Related to #258 * We add a smoke test for T2T steps so we can catch bugs in the code. * We also add a smoke test for serving the model with TFServing. * We add a sanity check to ensure we get different values for the same input based on which embeddings we are computing. Change Problem/Model name * Register the problem github_function_docstring with a different name to distinguish it from the version inside the Tensor2Tensor library. * * Skip the test when running under prow because its a manual test. * Fix some lint errors. * * Fix lint and skip tests. * Fix lint. * * Fix lint * Revert loss function changes; we can do that in a follow on PR. * * Run generate_data as part of the test rather than reusing a cached vocab and processed input file. * Modify SimilarityTransformer so we can overwrite the number of shards used easily to facilitate testing. * Comment out py-test for now. |
||
---|---|---|
agents | ||
code_search | ||
codelab-image | ||
demos | ||
financial_time_series | ||
github_issue_summarization | ||
mnist | ||
object_detection | ||
pipelines | ||
test/workflows | ||
xgboost_ames_housing | ||
.gitignore | ||
.pylintrc | ||
CONTRIBUTING.md | ||
LICENSE | ||
OWNERS | ||
README.md | ||
prow_config.yaml |
README.md
kubeflow-examples
A repository to share extended Kubeflow examples and tutorials to demonstrate machine learning concepts, data science workflows, and Kubeflow deployments. They illustrate the happy path, acting as a starting point for new users and a reference guide for experienced users.
This repository is home to three types of examples:
End-to-end
GitHub issue summarization
Author: Hamel Husain
This example covers the following concepts:
- Natural Language Processing (NLP) with Keras and Tensorflow
- Connecting to Jupyterhub
- Shared persistent storage
- Training a Tensorflow model
- CPU
- GPU
- Serving with Seldon Core
- Flask front-end
MNIST
Author: Elson Rodriguez
This example covers the following concepts:
- Image recognition of handwritten digits
- S3 storage
- Training automation with Argo
- Monitoring with Argo UI and Tensorboard
- Serving with Tensorflow
Distributed Object Detection
Author: Daniel Castellanos
This example covers the following concepts:
- Gathering and preparing the data for model training using K8s jobs
- Using Kubeflow tf-job and tf-operator to launch a distributed object training job
- Serving the model through Kubeflow's tf-serving
Financial Time Series
Author: Sven Degroote
This example covers the following concepts:
- Deploying Kubeflow to a GKE cluster
- Exploration via JupyterHub (prospect data, preprocess data, develop ML model)
- Training several tensorflow models at scale with TF-jobs
- Deploy and serve with TF-serving
- Iterate training and serving
- Training on GPU
Component-focused
XGBoost - Ames housing price prediction
Author: Puneith Kaul
This example covers the following concepts:
- Training an XGBoost model
- Shared persistent storage
- GCS and GKE
- Serving with Seldon Core
Application-specific
Third-party hosted
Source | Example | Description |
---|---|---|
Get Involved
In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.
The Kubeflow community is guided by our Code of Conduct, which we encourage everybody to read before participating.