examples/code_search
Jeremy Lewi 26c400a4cd Create a component to submit the Dataflow job to compute embeddings for code search (#324)
* Create a component to submit the Dataflow job to compute embeddings for code search.

* Update Beam to 2.8.0
* Remove nmslib from Apache beam requirements.txt; its not needed and appears
  to have problems installing on the Dataflow workers.

* Spacy download was failing on Dataflow workers; reinstalling the spacy
  package as a pip package appears to fix this.

* Fix some bugs in the workflow for building the Docker images.

* * Split requirements.txt into separate requirements for the Dataflow
  workers and the UI.

* We don't want to install unnecessary dependencies in the Dataflow workers.
  Some unnecessary dependencies; e.g. nmslib were also having problems
  being installed in the workers.
2018-11-14 13:45:09 -08:00
..
demo code search example make distributed training work; Create some components to train models (#317) 2018-11-08 16:13:01 -08:00
docker Create a component to submit the Dataflow job to compute embeddings for code search (#324) 2018-11-14 13:45:09 -08:00
kubeflow Create a component to submit the Dataflow job to compute embeddings for code search (#324) 2018-11-14 13:45:09 -08:00
src Create a component to submit the Dataflow job to compute embeddings for code search (#324) 2018-11-14 13:45:09 -08:00
.dockerignore Integrate batch prediction (#184) 2018-07-23 16:26:23 -07:00
.gitignore code search example make distributed training work; Create some components to train models (#317) 2018-11-08 16:13:01 -08:00
Makefile Create a component to submit the Dataflow job to compute embeddings for code search (#324) 2018-11-14 13:45:09 -08:00
README.md Upgrade notebook commands and other relevant changes (#229) 2018-08-20 16:35:07 -07:00
code-search.ipynb update instruction with proper namespace (#307) 2018-11-05 20:47:46 -08:00
developer_guide.md Use conditionals and add test for code search (#291) 2018-11-02 09:52:11 -07:00

README.md

Code Search on Kubeflow

This demo implements End-to-End Code Search on Kubeflow.

Prerequisites

NOTE: If using the JupyterHub Spawner on a Kubeflow cluster, use the Docker image gcr.io/kubeflow-images-public/kubeflow-codelab-notebook which has baked all the pre-prequisites.

  • Kubeflow Latest This notebook assumes a Kubeflow cluster is already deployed. See Getting Started with Kubeflow.

  • Python 2.7 (bundled with pip) For this demo, we will use Python 2.7. This restriction is due to Apache Beam, which does not support Python 3 yet (See BEAM-1251).

  • Google Cloud SDK This example will use tools from the Google Cloud SDK. The SDK must be authenticated and authorized. See Authentication Overview.

  • Ksonnet 0.12 We use Ksonnet to write Kubernetes jobs in a declarative manner to be run on top of Kubeflow.

Getting Started

To get started, follow the instructions below.

NOTE: We will assume that the Kubeflow cluster is available at kubeflow.example.com. Make sure you replace this with the true FQDN of your Kubeflow cluster in any subsequent instructions.

  • Spawn a new JupyterLab instance inside the Kubeflow cluster by pointing your browser to https://kubeflow.example.com/hub and clicking "Start My Server".

  • In the Image text field, enter gcr.io/kubeflow-images-public/kubeflow-codelab-notebook:v20180808-v0.2-22-gcfdcb12. This image contains all the pre-requisites needed for the demo.

  • Once spawned, you should be redirected to the Jupyter Notebooks UI.

  • Spawn a new Terminal and run

    $ git clone --branch=master --depth=1 https://github.com/kubeflow/examples
    

    This will create an examples folder. It is safe to close the terminal now.

  • Navigate back to the Jupyter Notebooks UI and navigate to examples/code_search. Open the Jupyter notebook code-search.ipynb and follow it along.

Acknowledgements

This project derives from hamelsmu/code_search.