Commit Graph

9 Commits

Author SHA1 Message Date
Jeremy Lewi e15bfffca4 An Argo workflow to use as the E2E test for code_search example. (#446)
* An Argo workflow to use as the E2E test for code_search example.

* The workflow builds the Docker images and then runs the python test
  to train and export a model

* Move common utilities into util.libsonnet.

* Add the workflow to the set of triggered workflows.

* Update the test environment used by the test ksonnet app; we've since
  changed the location of the app.

Related to #295

* Refactor the jsonnet file defining the GCB build workflow

  * Use an external variable to conditionally pull and use a previous
    Docker image as a cache

  * Reduce code duplication by building a shared template for all the different
    workflows.

* BUILD_ID needs to be defined in the default parameters otherwise we get an error when adding a new environment.

* Define suitable defaults.
2018-12-28 16:12:32 -08:00
IronPan 760ba7b9e8 Cleanup build directory before code search GCB build (#370)
The build directory cached the staled deleted files and without cleaning up the folder, those staled files are carried over to the new image.
2018-11-27 12:54:57 -08:00
IronPan 31390d39a0 Add update search index pipeline (#361)
* add search index creator container

* add pipeline

* update op name

* update readme

* update scripts

* typo fix

* Update Makefile

* Update Makefile

* address comments

* fix ks

* update pipeline

* restructure the images

* remove echo

* update image

* format

* format

* address comments
2018-11-27 04:43:55 -08:00
Jeremy Lewi 5d6a4e9d71 Create a script to update the index and lookup file used to serve predictions. (#352)
* This script will be the last step in a pipeline to continuously update
  the index for serving.

* The script updates the parameters of the search index server to point
  to the supplied index files. It then commits them and creates a PR
  to push those commits.

* Restructure the parameters for the search index server so that we can use
  ks param set to override the indexFile and lookupFile.

* We do this because we want to be able to push a new index by doing
  ks param set in a continuously running pipeline
* Remove default parameters from search-index-server

* Create a dockerfile suitable for running this script.
2018-11-26 06:35:27 -08:00
Jeremy Lewi de17011066 Upgrade and fix the serving components. (#348)
* Upgrade and fix the serving components.

* Install a new version of the TFServing package so we can use the new template.

* Fix the UI image. Use the same requirements file as for Dataflow so we are
consistent w.r.t the version of TF and Tensor2Tesnro.

* remove nms.libsonnet; move all the manifests into the actual component
  files rather than using a shared library.

* Fix the name of the TFServing service and deployment; need to use the same
  name as used by the front end server.

* Change the port of TFServing; we are now using the built in http server
  in TFServing which uses port 8500 as opposed to our custom http proxy.

* We encountered an error importning nmslib; moving it to the top of the file
  appears to fix this.

* Fix lint.
2018-11-24 13:22:34 -08:00
Jeremy Lewi d2b68f15d7 Fix the K8s job to create the nmslib index. (#338)
* Install nmslib in the Dataflow container so its suitable for running
  the index creation job.

* Use command not args in the job specs.

* Dockerfile.dataflow should install nmslib so that we can use that Docker
  image to create the index.

* build.jsonnet should tag images as latest. We will use this to use
  the latest images as a layer cache to speed up builds.

* Set logging level to info for start_search_server.py and
  create_search_index.py

* Create search index pod keeps was getting evicted because node runs out of
  memory

* Add a new node pool consisting of n1-standard-32 nodes to the demo cluster.
 These have 120 GB of RAM compared to 30GB in our default pool of n1-standard-8

* Set requests and limits on the creator search index pod.

* Move all the config for the search-index-creator job into the
  search-index-creator.jsonnet file. We need to customize the memory resources
  so there's not much value to try to sharing config with other components.
2018-11-20 12:53:09 -08:00
Jeremy Lewi 26c400a4cd Create a component to submit the Dataflow job to compute embeddings for code search (#324)
* Create a component to submit the Dataflow job to compute embeddings for code search.

* Update Beam to 2.8.0
* Remove nmslib from Apache beam requirements.txt; its not needed and appears
  to have problems installing on the Dataflow workers.

* Spacy download was failing on Dataflow workers; reinstalling the spacy
  package as a pip package appears to fix this.

* Fix some bugs in the workflow for building the Docker images.

* * Split requirements.txt into separate requirements for the Dataflow
  workers and the UI.

* We don't want to install unnecessary dependencies in the Dataflow workers.
  Some unnecessary dependencies; e.g. nmslib were also having problems
  being installed in the workers.
2018-11-14 13:45:09 -08:00
Yang Pan ee74868bec Fix build-dataflow makefile rule (#325) 2018-11-11 21:26:35 -08:00
Jeremy Lewi 65e89a599b code search example make distributed training work; Create some components to train models (#317)
* Make distributed training work; Create some components to train models

* Check in a ksonnet component to train a model using the tinyparam
  hyperparameter set.

* We want to check in the ksonnet component to facilitate reproducibility.
  We need a better way to separate the particular experiments used for
  the CS search demo effort from the jobs we want customers to try.

   Related to #239 train a high quality model.

* Check in the cs_demo ks environment; this was being ignored as a result of
  .gitignore

Make distributed training work #208

* We got distributed synchronous training to work with TensorTensor 1.10
* This required creating a simple python script to start the TF standard
  server and run it as a sidecar of the chief pod and as the main container
  for the workers/ps.

* Rename the model to kf_similarity_transformer to be consistent with other
  code.
  * We don't want to use the default name because we don't want to inadvertently
  use the SimilarityTransformer model defined in the Tensor2Tensor project.

* replace build.sh by a Makefile. Makes it easier to add variant commands
  * Use the GitHash not a random id as the tag.
  * Add a label to the docker image to indicate the git version.

* Put the Makefile at the top of the code_search tree; makes it easier
  to pull all the different sources for the Docker images.

* Add an option to build the Docker iamges with GCB; this is more efficient
  when you are on a poor network connection because you don't have to download
  images locally.
    * Use jsonnet to define and parameterize the GCB workflow.

* Build separate docker images for running Dataflow and for running the trainer.
  This helps avoid versioning conflicts caused by different versions of protobuf
  pulled in by the TF version used as the base image vs. the version used
  with apache beam.

      Fix #310 - Training fails with GPUs.

* Changes to support distributed training.
* Simplify t2t-entrypoint.sh so that all we do is parse TF_CONFIG
  and pass requisite config information as command line arguments;
  everything else can be set in the K8s spec.

* Upgrade to T2T 1.10.

* * Add ksonnet prototypes for tensorboard.
2018-11-08 16:13:01 -08:00