examples/code_search
Jeremy Lewi acd8007717 Use conditionals and add test for code search (#291)
* Fix model export, loss function, and add some manual tests.

Fix Model export to support computing code embeddings: Fix #260

* The previous exported model was always using the embeddings trained for
  the search query.

* But we need to be able to compute embedding vectors for both the query
  and code.

* To support this we add a new input feature "embed_code" and conditional
  ops. The exported model uses the value of the embed_code feature to determine
  whether to treat the inputs as a query string or code and computes
  the embeddings appropriately.

* Originally based on #233 by @activatedgeek

Loss function improvements

* See #259 for a long discussion about different loss functions.

* @activatedgeek was experimenting with different loss functions in #233
  and this pulls in some of those changes.

Add manual tests

* Related to #258

* We add a smoke test for T2T steps so we can catch bugs in the code.
* We also add a smoke test for serving the model with TFServing.
* We add a sanity check to ensure we get different values for the same
  input based on which embeddings we are computing.

Change Problem/Model name

* Register the problem github_function_docstring with a different name
  to distinguish it from the version inside the Tensor2Tensor library.

* * Skip the test when running under prow because its a manual test.
* Fix some lint errors.

* * Fix lint and skip tests.

* Fix lint.

* * Fix lint
* Revert loss function changes; we can do that in a follow on PR.

* * Run generate_data as part of the test rather than reusing a cached
  vocab and processed input file.

* Modify SimilarityTransformer so we can overwrite the number of shards
  used easily to facilitate testing.

* Comment out py-test for now.
2018-11-02 09:52:11 -07:00
..
docker Use conditionals and add test for code search (#291) 2018-11-02 09:52:11 -07:00
kubeflow Add tensorboard and check in vendor for the code search example. (#255) 2018-10-04 10:18:58 -07:00
src Use conditionals and add test for code search (#291) 2018-11-02 09:52:11 -07:00
.dockerignore Integrate batch prediction (#184) 2018-07-23 16:26:23 -07:00
.gitignore Extension of T2T Ksonnet component (#149) 2018-06-25 15:09:22 -07:00
README.md Upgrade notebook commands and other relevant changes (#229) 2018-08-20 16:35:07 -07:00
code-search.ipynb Upgrade notebook commands and other relevant changes (#229) 2018-08-20 16:35:07 -07:00
developer_guide.md Use conditionals and add test for code search (#291) 2018-11-02 09:52:11 -07:00

README.md

Code Search on Kubeflow

This demo implements End-to-End Code Search on Kubeflow.

Prerequisites

NOTE: If using the JupyterHub Spawner on a Kubeflow cluster, use the Docker image gcr.io/kubeflow-images-public/kubeflow-codelab-notebook which has baked all the pre-prequisites.

  • Kubeflow Latest This notebook assumes a Kubeflow cluster is already deployed. See Getting Started with Kubeflow.

  • Python 2.7 (bundled with pip) For this demo, we will use Python 2.7. This restriction is due to Apache Beam, which does not support Python 3 yet (See BEAM-1251).

  • Google Cloud SDK This example will use tools from the Google Cloud SDK. The SDK must be authenticated and authorized. See Authentication Overview.

  • Ksonnet 0.12 We use Ksonnet to write Kubernetes jobs in a declarative manner to be run on top of Kubeflow.

Getting Started

To get started, follow the instructions below.

NOTE: We will assume that the Kubeflow cluster is available at kubeflow.example.com. Make sure you replace this with the true FQDN of your Kubeflow cluster in any subsequent instructions.

  • Spawn a new JupyterLab instance inside the Kubeflow cluster by pointing your browser to https://kubeflow.example.com/hub and clicking "Start My Server".

  • In the Image text field, enter gcr.io/kubeflow-images-public/kubeflow-codelab-notebook:v20180808-v0.2-22-gcfdcb12. This image contains all the pre-requisites needed for the demo.

  • Once spawned, you should be redirected to the Jupyter Notebooks UI.

  • Spawn a new Terminal and run

    $ git clone --branch=master --depth=1 https://github.com/kubeflow/examples
    

    This will create an examples folder. It is safe to close the terminal now.

  • Navigate back to the Jupyter Notebooks UI and navigate to examples/code_search. Open the Jupyter notebook code-search.ipynb and follow it along.

Acknowledgements

This project derives from hamelsmu/code_search.