* Install nmslib in the Dataflow container so its suitable for running
the index creation job.
* Use command not args in the job specs.
* Dockerfile.dataflow should install nmslib so that we can use that Docker
image to create the index.
* build.jsonnet should tag images as latest. We will use this to use
the latest images as a layer cache to speed up builds.
* Set logging level to info for start_search_server.py and
create_search_index.py
* Create search index pod keeps was getting evicted because node runs out of
memory
* Add a new node pool consisting of n1-standard-32 nodes to the demo cluster.
These have 120 GB of RAM compared to 30GB in our default pool of n1-standard-8
* Set requests and limits on the creator search index pod.
* Move all the config for the search-index-creator job into the
search-index-creator.jsonnet file. We need to customize the memory resources
so there's not much value to try to sharing config with other components.
* Create a component to submit the Dataflow job to compute embeddings for code search.
* Update Beam to 2.8.0
* Remove nmslib from Apache beam requirements.txt; its not needed and appears
to have problems installing on the Dataflow workers.
* Spacy download was failing on Dataflow workers; reinstalling the spacy
package as a pip package appears to fix this.
* Fix some bugs in the workflow for building the Docker images.
* * Split requirements.txt into separate requirements for the Dataflow
workers and the UI.
* We don't want to install unnecessary dependencies in the Dataflow workers.
Some unnecessary dependencies; e.g. nmslib were also having problems
being installed in the workers.
Otherwise when I want to execute dataflow code
```
python2 -m code_search.dataflow.cli.create_function_embeddings \
```
it complains no setup.py
I could workaround by using workingdir container API but setting it to default would be more convenient.
* Make distributed training work; Create some components to train models
* Check in a ksonnet component to train a model using the tinyparam
hyperparameter set.
* We want to check in the ksonnet component to facilitate reproducibility.
We need a better way to separate the particular experiments used for
the CS search demo effort from the jobs we want customers to try.
Related to #239 train a high quality model.
* Check in the cs_demo ks environment; this was being ignored as a result of
.gitignore
Make distributed training work #208
* We got distributed synchronous training to work with TensorTensor 1.10
* This required creating a simple python script to start the TF standard
server and run it as a sidecar of the chief pod and as the main container
for the workers/ps.
* Rename the model to kf_similarity_transformer to be consistent with other
code.
* We don't want to use the default name because we don't want to inadvertently
use the SimilarityTransformer model defined in the Tensor2Tensor project.
* replace build.sh by a Makefile. Makes it easier to add variant commands
* Use the GitHash not a random id as the tag.
* Add a label to the docker image to indicate the git version.
* Put the Makefile at the top of the code_search tree; makes it easier
to pull all the different sources for the Docker images.
* Add an option to build the Docker iamges with GCB; this is more efficient
when you are on a poor network connection because you don't have to download
images locally.
* Use jsonnet to define and parameterize the GCB workflow.
* Build separate docker images for running Dataflow and for running the trainer.
This helps avoid versioning conflicts caused by different versions of protobuf
pulled in by the TF version used as the base image vs. the version used
with apache beam.
Fix#310 - Training fails with GPUs.
* Changes to support distributed training.
* Simplify t2t-entrypoint.sh so that all we do is parse TF_CONFIG
and pass requisite config information as command line arguments;
everything else can be set in the K8s spec.
* Upgrade to T2T 1.10.
* * Add ksonnet prototypes for tensorboard.