mirror of https://github.com/kubeflow/examples.git
* Need to fix the import and compile commands. * Check if an experiment with the name already exists. |
||
---|---|---|
.. | ||
CodeSearchPipelineNotebook.ipynb | ||
README.md | ||
index_update_pipeline.py |
README.md
Overview
This directory shows how to build a scheduled pipeline to periodically update the search index and update the search UI using the new index. It also uses github to store the search UI's Kubernetes spec and hooks up Argo CD to automatically update the search UI.
At a high level, the pipeline automate the process to
- Compute the function embeddings
- Create new search index file
- Update the github manifest pointing to the new search index file
ArgoCD then triggers a new service deployment with the new manifest.
Perquisite
- A cluster with kubeflow deployed, including kubeflow pipeline
- A pre trained code search model.
Instruction
- Upload the ks-web-app/ dir to a github repository, and set up Argo CD following the instruction Set up Automated sync if you want the search UI to be updated at real time. Otherwise Argo CD will pull latest config every 3 minutes as default.
- Create a github token following instruction and store it in the cluster as secret. This allows pipeline to update github. The secret is stored in the kubeflow namespace, assuming it's the same namespace as which the kubeflow is stored
kubectl create secret generic github-access-token --from-literal=token=[your_github_token] -n kubeflow
- To run the pipeline, follow the kubeflow pipeline instruction and compile index_update_pipeline.py and upload to pipeline page.
Provide the parameter, e.g.
PROJECT='code-search-demo'
CLUSTER_NAME='cs-demo-1103'
WORKING_DIR='gs://code-search-demo/pipeline'
SAVED_MODEL_DIR='gs://code-search-demo/models/20181107-dist-sync-gpu/export/1541712907/'
DATA_DIR='gs://code-search-demo/20181104/data'
TODO(IronPan): more details on how to run pipeline