mirror of https://github.com/kubeflow/examples.git
* Install nmslib in the Dataflow container so its suitable for running the index creation job. * Use command not args in the job specs. * Dockerfile.dataflow should install nmslib so that we can use that Docker image to create the index. * build.jsonnet should tag images as latest. We will use this to use the latest images as a layer cache to speed up builds. * Set logging level to info for start_search_server.py and create_search_index.py * Create search index pod keeps was getting evicted because node runs out of memory * Add a new node pool consisting of n1-standard-32 nodes to the demo cluster. These have 120 GB of RAM compared to 30GB in our default pool of n1-standard-8 * Set requests and limits on the creator search index pod. * Move all the config for the search-index-creator job into the search-index-creator.jsonnet file. We need to customize the memory resources so there's not much value to try to sharing config with other components. |
||
|---|---|---|
| .. | ||
| cs-demo-1103 | ||
| README.md | ||
README.md
Demo
This directory contains assets for setting up a demo of the code search example. It is primarily intended for use by Kubeflow contributors working on the shared demo.
Users looking to run the example should follow the README.md in the parent directory.
GCP Resources
We are using the following project
- org: kubeflow.org
- project: code-search-demo
- code-search-team@kubeflow.org Google group administering access
Results
2018-11-05
jlewi@ ran experiments that produced the following results
| What | location | Description |
|---|---|---|
| Preprocessed data | gs://code-search-demo/20181104/data/func-doc-pairs-00???-of-00100.csv | This is the output of the Dataflow preprocessing job |
| Training data | gs://code-search-demo/20181104/data/kf_github_function_docstring-train-00???-of-00100 | TFRecord files produced by running T2T datagen |
Models
| hparams | Location |
|---|---|
| transformer_tine | gs://code-search-demo/models/20181105-tinyparams/ |
| transformer_base_single_gpu | gs://code-search-demo/models/20181105-single-gpu |
| transformer_base | gs://code-search-demo/models/20181107-dist-sync-gpu |
Performance
| hparams | Resources | Steps/sec |
|---|---|---|
| transformer_tiny | 1 CPU worker | ~1.8 global step /sec |
| transformer_base_single_gpu | 1 GPU worker (K80) | ~3.22611 global step /sec |
| transformer_base | 1 chief with K80, 8 workers with 1 K80, sync training | ~ 0.0588723 global step /sec |
| transformer_base | 1 chief (no GPU), 8 workers (no GPU), sync training | ~ 0.707014 global step /sec |