examples

History

Jeremy Lewi 46a795693a Minor fixes to the notebook. (#427 ) * Need to fix the import and compile commands. * Check if an experiment with the name already exists.		2019-01-15 08:33:19 -08:00
..
CodeSearchPipelineNotebook.ipynb	Minor fixes to the notebook. (#427 )	2019-01-15 08:33:19 -08:00
README.md	Update the update_index.sh (#373 )	2018-11-29 00:53:09 -08:00
index_update_pipeline.py	fix bq table dupliation (#418 )	2018-12-10 18:50:28 -08:00

README.md

Overview

This directory shows how to build a scheduled pipeline to periodically update the search index and update the search UI using the new index. It also uses github to store the search UI's Kubernetes spec and hooks up Argo CD to automatically update the search UI.

At a high level, the pipeline automate the process to

Compute the function embeddings
Create new search index file
Update the github manifest pointing to the new search index file

ArgoCD then triggers a new service deployment with the new manifest.

Perquisite

A cluster with kubeflow deployed, including kubeflow pipeline
A pre trained code search model.

Instruction

Upload the ks-web-app/ dir to a github repository, and set up Argo CD following the instruction Set up Automated sync if you want the search UI to be updated at real time. Otherwise Argo CD will pull latest config every 3 minutes as default.
Create a github token following instruction and store it in the cluster as secret. This allows pipeline to update github. The secret is stored in the kubeflow namespace, assuming it's the same namespace as which the kubeflow is stored

kubectl create secret generic github-access-token --from-literal=token=[your_github_token] -n kubeflow

To run the pipeline, follow the kubeflow pipeline instruction and compile index_update_pipeline.py and upload to pipeline page.

Provide the parameter, e.g.

PROJECT='code-search-demo'
CLUSTER_NAME='cs-demo-1103'
WORKING_DIR='gs://code-search-demo/pipeline'
SAVED_MODEL_DIR='gs://code-search-demo/models/20181107-dist-sync-gpu/export/1541712907/'
DATA_DIR='gs://code-search-demo/20181104/data'

TODO(IronPan): more details on how to run pipeline