examples/code_search/demo
Jeremy Lewi d2b68f15d7 Fix the K8s job to create the nmslib index. (#338)
* Install nmslib in the Dataflow container so its suitable for running
  the index creation job.

* Use command not args in the job specs.

* Dockerfile.dataflow should install nmslib so that we can use that Docker
  image to create the index.

* build.jsonnet should tag images as latest. We will use this to use
  the latest images as a layer cache to speed up builds.

* Set logging level to info for start_search_server.py and
  create_search_index.py

* Create search index pod keeps was getting evicted because node runs out of
  memory

* Add a new node pool consisting of n1-standard-32 nodes to the demo cluster.
 These have 120 GB of RAM compared to 30GB in our default pool of n1-standard-8

* Set requests and limits on the creator search index pod.

* Move all the config for the search-index-creator job into the
  search-index-creator.jsonnet file. We need to customize the memory resources
  so there's not much value to try to sharing config with other components.
2018-11-20 12:53:09 -08:00
..
cs-demo-1103 Fix the K8s job to create the nmslib index. (#338) 2018-11-20 12:53:09 -08:00
README.md code search example make distributed training work; Create some components to train models (#317) 2018-11-08 16:13:01 -08:00

README.md

Demo

This directory contains assets for setting up a demo of the code search example. It is primarily intended for use by Kubeflow contributors working on the shared demo.

Users looking to run the example should follow the README.md in the parent directory.

GCP Resources

We are using the following project

Results

2018-11-05

jlewi@ ran experiments that produced the following results

What location Description
Preprocessed data gs://code-search-demo/20181104/data/func-doc-pairs-00???-of-00100.csv This is the output of the Dataflow preprocessing job
Training data gs://code-search-demo/20181104/data/kf_github_function_docstring-train-00???-of-00100 TFRecord files produced by running T2T datagen

Models

hparams Location
transformer_tine gs://code-search-demo/models/20181105-tinyparams/
transformer_base_single_gpu gs://code-search-demo/models/20181105-single-gpu
transformer_base gs://code-search-demo/models/20181107-dist-sync-gpu

Performance

hparams Resources Steps/sec
transformer_tiny 1 CPU worker ~1.8 global step /sec
transformer_base_single_gpu 1 GPU worker (K80) ~3.22611 global step /sec
transformer_base 1 chief with K80, 8 workers with 1 K80, sync training ~ 0.0588723 global step /sec
transformer_base 1 chief (no GPU), 8 workers (no GPU), sync training ~ 0.707014 global step /sec