Commit Graph

9 Commits

Author SHA1 Message Date
Jeremy Lewi 2e6e891a5b Update the ArgoCD app to use the kubeflow/examples repo (#440)
* We were using jlewi's fork because PRs hadn't been committed but
  all the relevant PRs have been merged and master is the source of truth.
2018-12-19 21:26:49 -08:00
Jeremy Lewi 9f061a0554 Update the central dashboard UI image to one that includes pipelines. (#430) 2018-12-12 09:34:21 -08:00
Jeremy Lewi b26f7e9a48 Add pods/logs permission to the jupyter notebook role. (#419)
* This is needed so that fairing can tail the logs.
2018-12-09 15:53:46 -08:00
Jeremy Lewi 67d42c4661 Expose ArgoCD UI behind Ambassador. (#413)
* We need to disable TLS (its handled by ingress) because that leads to
  endless redirects.

* ArgoCD is running in namespace argo-cd but Ambassador is running in a
  different namespace and currently only configured with RBAC to monitor
  a single namespace.

* So we add a service in namespace kubeflow just to define the Ambassador mapping.
2018-12-08 12:49:34 -08:00
Jeremy Lewi e1e1422da4 Setup ArgoCD to synchornize the code search web app with the demo cluster. (#359)
* Follow argocd instructions
  https://github.com/argoproj/argo-cd/blob/master/docs/getting_started.md
  to install ArgoCD on the cluster

  * Down the argocd manifest and update the namespace to argocd.
  * Check it in so ArgoCD can be deployed declaratively.

* Update README.md with the instructions for deploying ArgoCD.

Move the web app components into their own ksonnet app.

* We do this because we want to be able to sync the web app components using
  Argo CD

* ArgoCD doesn't allow us to apply autosync with granularity less than the
  app. We don't want to sync any of the components except the servers.

* Rename the t2t-code-search-serving component to query-embed-server because
  this is more descriptive.

* Check in a YAML spec defining the ksonnet application for the web UI.

Update the instructions in nodebook code-search.ipynb

  * Provided updated instructions for deploying the web app due the
  fact that the web app is now a separate component.

  * Improve code-search.ipynb
    * Use gcloud to get sensible defaults for parameters like the project.
    * Provide more information about what the variables mean.
2018-11-26 18:19:19 -08:00
IronPan 4f95e85e63 add pipeline component (#356)
* add pipeline component

* update pipeline component
2018-11-26 06:21:07 -08:00
Jeremy Lewi d2b68f15d7 Fix the K8s job to create the nmslib index. (#338)
* Install nmslib in the Dataflow container so its suitable for running
  the index creation job.

* Use command not args in the job specs.

* Dockerfile.dataflow should install nmslib so that we can use that Docker
  image to create the index.

* build.jsonnet should tag images as latest. We will use this to use
  the latest images as a layer cache to speed up builds.

* Set logging level to info for start_search_server.py and
  create_search_index.py

* Create search index pod keeps was getting evicted because node runs out of
  memory

* Add a new node pool consisting of n1-standard-32 nodes to the demo cluster.
 These have 120 GB of RAM compared to 30GB in our default pool of n1-standard-8

* Set requests and limits on the creator search index pod.

* Move all the config for the search-index-creator job into the
  search-index-creator.jsonnet file. We need to customize the memory resources
  so there's not much value to try to sharing config with other components.
2018-11-20 12:53:09 -08:00
Jeremy Lewi df278567f0 Fix performance of dataflow preprocessing job. (#302)
* Fix performance of dataflow preprocessing job.

* Fix #300; Dataflow job for preprocessing is really slow.

  * The problem is we are loading the spacy tokenization model on every
    invocation of the tokenization function and this is really expensive.
  * We should be doing this once per module import.

* After fixing this issue; the job completed in approximately 20 minutes using
  5 workers.

  * We can process all 1.3 million records in ~ 20 minutes (elapsed time) using 5 32 CPU workers and about 1 hour of CPU time altogether.

* Add options to the Dataflow job to read from files as opposed to BigQuery
  and to skip BigQuery writes. This is useful for testing.

* Add a "unittest" that verifies the Dataflow preprocessing job can run
  successfully using the DirectRunner.

* Update the Docker image and a ksonnet component for a K8s job that
  can be used to submit the Dataflow job.

* Fix #299; Add logging to the Dataflow preprocessing job to indicate that
  a Dataflow job was submitted.

* Add an option to the preprocessing Dataflow job to read an entire
  BigQuery table as the input rather than running a query to get the input.
  This is useful in the case where the user wants to run a different
  query to select the repo paths and contents to process and write them
  to some table to be processed by the Dataflow job.

* Fix lint.

* More lint fixes.
2018-11-06 14:14:28 -08:00
Jeremy Lewi f87dfd8e53 Create a demo cluster for the code search example. (#298) 2018-11-05 06:07:52 -08:00