* We need to disable TLS (its handled by ingress) because that leads to
endless redirects.
* ArgoCD is running in namespace argo-cd but Ambassador is running in a
different namespace and currently only configured with RBAC to monitor
a single namespace.
* So we add a service in namespace kubeflow just to define the Ambassador mapping.
* Follow argocd instructions
https://github.com/argoproj/argo-cd/blob/master/docs/getting_started.md
to install ArgoCD on the cluster
* Down the argocd manifest and update the namespace to argocd.
* Check it in so ArgoCD can be deployed declaratively.
* Update README.md with the instructions for deploying ArgoCD.
Move the web app components into their own ksonnet app.
* We do this because we want to be able to sync the web app components using
Argo CD
* ArgoCD doesn't allow us to apply autosync with granularity less than the
app. We don't want to sync any of the components except the servers.
* Rename the t2t-code-search-serving component to query-embed-server because
this is more descriptive.
* Check in a YAML spec defining the ksonnet application for the web UI.
Update the instructions in nodebook code-search.ipynb
* Provided updated instructions for deploying the web app due the
fact that the web app is now a separate component.
* Improve code-search.ipynb
* Use gcloud to get sensible defaults for parameters like the project.
* Provide more information about what the variables mean.
* Install nmslib in the Dataflow container so its suitable for running
the index creation job.
* Use command not args in the job specs.
* Dockerfile.dataflow should install nmslib so that we can use that Docker
image to create the index.
* build.jsonnet should tag images as latest. We will use this to use
the latest images as a layer cache to speed up builds.
* Set logging level to info for start_search_server.py and
create_search_index.py
* Create search index pod keeps was getting evicted because node runs out of
memory
* Add a new node pool consisting of n1-standard-32 nodes to the demo cluster.
These have 120 GB of RAM compared to 30GB in our default pool of n1-standard-8
* Set requests and limits on the creator search index pod.
* Move all the config for the search-index-creator job into the
search-index-creator.jsonnet file. We need to customize the memory resources
so there's not much value to try to sharing config with other components.
* Fix performance of dataflow preprocessing job.
* Fix#300; Dataflow job for preprocessing is really slow.
* The problem is we are loading the spacy tokenization model on every
invocation of the tokenization function and this is really expensive.
* We should be doing this once per module import.
* After fixing this issue; the job completed in approximately 20 minutes using
5 workers.
* We can process all 1.3 million records in ~ 20 minutes (elapsed time) using 5 32 CPU workers and about 1 hour of CPU time altogether.
* Add options to the Dataflow job to read from files as opposed to BigQuery
and to skip BigQuery writes. This is useful for testing.
* Add a "unittest" that verifies the Dataflow preprocessing job can run
successfully using the DirectRunner.
* Update the Docker image and a ksonnet component for a K8s job that
can be used to submit the Dataflow job.
* Fix#299; Add logging to the Dataflow preprocessing job to indicate that
a Dataflow job was submitted.
* Add an option to the preprocessing Dataflow job to read an entire
BigQuery table as the input rather than running a query to get the input.
This is useful in the case where the user wants to run a different
query to select the repo paths and contents to process and write them
to some table to be processed by the Dataflow job.
* Fix lint.
* More lint fixes.