* Fix model export, loss function, and add some manual tests. Fix Model export to support computing code embeddings: Fix #260 * The previous exported model was always using the embeddings trained for the search query. * But we need to be able to compute embedding vectors for both the query and code. * To support this we add a new input feature "embed_code" and conditional ops. The exported model uses the value of the embed_code feature to determine whether to treat the inputs as a query string or code and computes the embeddings appropriately. * Originally based on #233 by @activatedgeek Loss function improvements * See #259 for a long discussion about different loss functions. * @activatedgeek was experimenting with different loss functions in #233 and this pulls in some of those changes. Add manual tests * Related to #258 * We add a smoke test for T2T steps so we can catch bugs in the code. * We also add a smoke test for serving the model with TFServing. * We add a sanity check to ensure we get different values for the same input based on which embeddings we are computing. Change Problem/Model name * Register the problem github_function_docstring with a different name to distinguish it from the version inside the Tensor2Tensor library. * * Skip the test when running under prow because its a manual test. * Fix some lint errors. * * Fix lint and skip tests. * Fix lint. * * Fix lint * Revert loss function changes; we can do that in a follow on PR. * * Run generate_data as part of the test rather than reusing a cached vocab and processed input file. * Modify SimilarityTransformer so we can overwrite the number of shards used easily to facilitate testing. * Comment out py-test for now. |
||
---|---|---|
.. | ||
docker | ||
kubeflow | ||
src | ||
.dockerignore | ||
.gitignore | ||
README.md | ||
code-search.ipynb | ||
developer_guide.md |
README.md
Code Search on Kubeflow
This demo implements End-to-End Code Search on Kubeflow.
Prerequisites
NOTE: If using the JupyterHub Spawner on a Kubeflow cluster, use the Docker image
gcr.io/kubeflow-images-public/kubeflow-codelab-notebook
which has baked all the pre-prequisites.
-
Kubeflow Latest
This notebook assumes a Kubeflow cluster is already deployed. See Getting Started with Kubeflow. -
Python 2.7
(bundled withpip
) For this demo, we will use Python 2.7. This restriction is due to Apache Beam, which does not support Python 3 yet (See BEAM-1251). -
Google Cloud SDK
This example will use tools from the Google Cloud SDK. The SDK must be authenticated and authorized. See Authentication Overview. -
Ksonnet 0.12
We use Ksonnet to write Kubernetes jobs in a declarative manner to be run on top of Kubeflow.
Getting Started
To get started, follow the instructions below.
NOTE: We will assume that the Kubeflow cluster is available at kubeflow.example.com
. Make sure
you replace this with the true FQDN of your Kubeflow cluster in any subsequent instructions.
-
Spawn a new JupyterLab instance inside the Kubeflow cluster by pointing your browser to https://kubeflow.example.com/hub and clicking "Start My Server".
-
In the Image text field, enter
gcr.io/kubeflow-images-public/kubeflow-codelab-notebook:v20180808-v0.2-22-gcfdcb12
. This image contains all the pre-requisites needed for the demo. -
Once spawned, you should be redirected to the Jupyter Notebooks UI.
-
Spawn a new Terminal and run
$ git clone --branch=master --depth=1 https://github.com/kubeflow/examples
This will create an examples folder. It is safe to close the terminal now.
-
Navigate back to the Jupyter Notebooks UI and navigate to
examples/code_search
. Open the Jupyter notebookcode-search.ipynb
and follow it along.
Acknowledgements
This project derives from hamelsmu/code_search.