* Create a component to submit the Dataflow job to compute embeddings for code search. * Update Beam to 2.8.0 * Remove nmslib from Apache beam requirements.txt; its not needed and appears to have problems installing on the Dataflow workers. * Spacy download was failing on Dataflow workers; reinstalling the spacy package as a pip package appears to fix this. * Fix some bugs in the workflow for building the Docker images. * * Split requirements.txt into separate requirements for the Dataflow workers and the UI. * We don't want to install unnecessary dependencies in the Dataflow workers. Some unnecessary dependencies; e.g. nmslib were also having problems being installed in the workers. |
||
|---|---|---|
| .. | ||
| demo | ||
| docker | ||
| kubeflow | ||
| src | ||
| .dockerignore | ||
| .gitignore | ||
| Makefile | ||
| README.md | ||
| code-search.ipynb | ||
| developer_guide.md | ||
README.md
Code Search on Kubeflow
This demo implements End-to-End Code Search on Kubeflow.
Prerequisites
NOTE: If using the JupyterHub Spawner on a Kubeflow cluster, use the Docker image
gcr.io/kubeflow-images-public/kubeflow-codelab-notebook which has baked all the pre-prequisites.
-
Kubeflow LatestThis notebook assumes a Kubeflow cluster is already deployed. See Getting Started with Kubeflow. -
Python 2.7(bundled withpip) For this demo, we will use Python 2.7. This restriction is due to Apache Beam, which does not support Python 3 yet (See BEAM-1251). -
Google Cloud SDKThis example will use tools from the Google Cloud SDK. The SDK must be authenticated and authorized. See Authentication Overview. -
Ksonnet 0.12We use Ksonnet to write Kubernetes jobs in a declarative manner to be run on top of Kubeflow.
Getting Started
To get started, follow the instructions below.
NOTE: We will assume that the Kubeflow cluster is available at kubeflow.example.com. Make sure
you replace this with the true FQDN of your Kubeflow cluster in any subsequent instructions.
-
Spawn a new JupyterLab instance inside the Kubeflow cluster by pointing your browser to https://kubeflow.example.com/hub and clicking "Start My Server".
-
In the Image text field, enter
gcr.io/kubeflow-images-public/kubeflow-codelab-notebook:v20180808-v0.2-22-gcfdcb12. This image contains all the pre-requisites needed for the demo. -
Once spawned, you should be redirected to the Jupyter Notebooks UI.
-
Spawn a new Terminal and run
$ git clone --branch=master --depth=1 https://github.com/kubeflow/examplesThis will create an examples folder. It is safe to close the terminal now.
-
Navigate back to the Jupyter Notebooks UI and navigate to
examples/code_search. Open the Jupyter notebookcode-search.ipynband follow it along.
Acknowledgements
This project derives from hamelsmu/code_search.