History

Jeremy Lewi d2b68f15d7 Fix the K8s job to create the nmslib index. (#338 ) * Install nmslib in the Dataflow container so its suitable for running the index creation job. * Use command not args in the job specs. * Dockerfile.dataflow should install nmslib so that we can use that Docker image to create the index. * build.jsonnet should tag images as latest. We will use this to use the latest images as a layer cache to speed up builds. * Set logging level to info for start_search_server.py and create_search_index.py * Create search index pod keeps was getting evicted because node runs out of memory * Add a new node pool consisting of n1-standard-32 nodes to the demo cluster. These have 120 GB of RAM compared to 30GB in our default pool of n1-standard-8 * Set requests and limits on the creator search index pod. * Move all the config for the search-index-creator job into the search-index-creator.jsonnet file. We need to customize the memory resources so there's not much value to try to sharing config with other components.	2018-11-20 12:53:09 -08:00
..
cs-demo-1103	Fix the K8s job to create the nmslib index. (#338 )	2018-11-20 12:53:09 -08:00
README.md	code search example make distributed training work; Create some components to train models (#317 )	2018-11-08 16:13:01 -08:00

Jeremy Lewi d2b68f15d7 Fix the K8s job to create the nmslib index. (#338 )

* Install nmslib in the Dataflow container so its suitable for running
  the index creation job.

* Use command not args in the job specs.

* Dockerfile.dataflow should install nmslib so that we can use that Docker
  image to create the index.

* build.jsonnet should tag images as latest. We will use this to use
  the latest images as a layer cache to speed up builds.

* Set logging level to info for start_search_server.py and
  create_search_index.py

* Create search index pod keeps was getting evicted because node runs out of
  memory

* Add a new node pool consisting of n1-standard-32 nodes to the demo cluster.
 These have 120 GB of RAM compared to 30GB in our default pool of n1-standard-8

* Set requests and limits on the creator search index pod.

* Move all the config for the search-index-creator job into the
  search-index-creator.jsonnet file. We need to customize the memory resources
  so there's not much value to try to sharing config with other components.

2018-11-20 12:53:09 -08:00

cs-demo-1103

Fix the K8s job to create the nmslib index. (#338 )

2018-11-20 12:53:09 -08:00

README.md

code search example make distributed training work; Create some components to train models (#317 )

2018-11-08 16:13:01 -08:00

README.md

Demo

This directory contains assets for setting up a demo of the code search example. It is primarily intended for use by Kubeflow contributors working on the shared demo.

Users looking to run the example should follow the README.md in the parent directory.

GCP Resources

We are using the following project

org: kubeflow.org
project: code-search-demo
code-search-team@kubeflow.org Google group administering access

Results

2018-11-05

jlewi@ ran experiments that produced the following results

What	location	Description
Preprocessed data	gs://code-search-demo/20181104/data/func-doc-pairs-00???-of-00100.csv	This is the output of the Dataflow preprocessing job
Training data	gs://code-search-demo/20181104/data/kf_github_function_docstring-train-00???-of-00100	TFRecord files produced by running T2T datagen

Models

hparams	Location
transformer_tine	gs://code-search-demo/models/20181105-tinyparams/
transformer_base_single_gpu	gs://code-search-demo/models/20181105-single-gpu
transformer_base	gs://code-search-demo/models/20181107-dist-sync-gpu

Performance

hparams	Resources	Steps/sec
transformer_tiny	1 CPU worker	~1.8 global step /sec
transformer_base_single_gpu	1 GPU worker (K80)	~3.22611 global step /sec
transformer_base	1 chief with K80, 8 workers with 1 K80, sync training	~ 0.0588723 global step /sec
transformer_base	1 chief (no GPU), 8 workers (no GPU), sync training	~ 0.707014 global step /sec