examples

Commit Graph

Author	SHA1	Message	Date
Jeremy Lewi	2e6e891a5b	Update the ArgoCD app to use the kubeflow/examples repo (#440 ) * We were using jlewi's fork because PRs hadn't been committed but all the relevant PRs have been merged and master is the source of truth.	2018-12-19 21:26:49 -08:00
Jeremy Lewi	9f061a0554	Update the central dashboard UI image to one that includes pipelines. (#430 )	2018-12-12 09:34:21 -08:00
Jeremy Lewi	b26f7e9a48	Add pods/logs permission to the jupyter notebook role. (#419 ) * This is needed so that fairing can tail the logs.	2018-12-09 15:53:46 -08:00
Jeremy Lewi	67d42c4661	Expose ArgoCD UI behind Ambassador. (#413 ) * We need to disable TLS (its handled by ingress) because that leads to endless redirects. * ArgoCD is running in namespace argo-cd but Ambassador is running in a different namespace and currently only configured with RBAC to monitor a single namespace. * So we add a service in namespace kubeflow just to define the Ambassador mapping.	2018-12-08 12:49:34 -08:00
Jeremy Lewi	e1e1422da4	Setup ArgoCD to synchornize the code search web app with the demo cluster. (#359 ) * Follow argocd instructions https://github.com/argoproj/argo-cd/blob/master/docs/getting_started.md to install ArgoCD on the cluster * Down the argocd manifest and update the namespace to argocd. * Check it in so ArgoCD can be deployed declaratively. * Update README.md with the instructions for deploying ArgoCD. Move the web app components into their own ksonnet app. * We do this because we want to be able to sync the web app components using Argo CD * ArgoCD doesn't allow us to apply autosync with granularity less than the app. We don't want to sync any of the components except the servers. * Rename the t2t-code-search-serving component to query-embed-server because this is more descriptive. * Check in a YAML spec defining the ksonnet application for the web UI. Update the instructions in nodebook code-search.ipynb * Provided updated instructions for deploying the web app due the fact that the web app is now a separate component. * Improve code-search.ipynb * Use gcloud to get sensible defaults for parameters like the project. * Provide more information about what the variables mean.	2018-11-26 18:19:19 -08:00
IronPan	4f95e85e63	add pipeline component (#356 ) * add pipeline component * update pipeline component	2018-11-26 06:21:07 -08:00
Jeremy Lewi	d2b68f15d7	Fix the K8s job to create the nmslib index. (#338 ) * Install nmslib in the Dataflow container so its suitable for running the index creation job. * Use command not args in the job specs. * Dockerfile.dataflow should install nmslib so that we can use that Docker image to create the index. * build.jsonnet should tag images as latest. We will use this to use the latest images as a layer cache to speed up builds. * Set logging level to info for start_search_server.py and create_search_index.py * Create search index pod keeps was getting evicted because node runs out of memory * Add a new node pool consisting of n1-standard-32 nodes to the demo cluster. These have 120 GB of RAM compared to 30GB in our default pool of n1-standard-8 * Set requests and limits on the creator search index pod. * Move all the config for the search-index-creator job into the search-index-creator.jsonnet file. We need to customize the memory resources so there's not much value to try to sharing config with other components.	2018-11-20 12:53:09 -08:00
Jeremy Lewi	df278567f0	Fix performance of dataflow preprocessing job. (#302 ) * Fix performance of dataflow preprocessing job. * Fix #300; Dataflow job for preprocessing is really slow. * The problem is we are loading the spacy tokenization model on every invocation of the tokenization function and this is really expensive. * We should be doing this once per module import. * After fixing this issue; the job completed in approximately 20 minutes using 5 workers. * We can process all 1.3 million records in ~ 20 minutes (elapsed time) using 5 32 CPU workers and about 1 hour of CPU time altogether. * Add options to the Dataflow job to read from files as opposed to BigQuery and to skip BigQuery writes. This is useful for testing. * Add a "unittest" that verifies the Dataflow preprocessing job can run successfully using the DirectRunner. * Update the Docker image and a ksonnet component for a K8s job that can be used to submit the Dataflow job. * Fix #299; Add logging to the Dataflow preprocessing job to indicate that a Dataflow job was submitted. * Add an option to the preprocessing Dataflow job to read an entire BigQuery table as the input rather than running a query to get the input. This is useful in the case where the user wants to run a different query to select the repo paths and contents to process and write them to some table to be processed by the Dataflow job. * Fix lint. * More lint fixes.	2018-11-06 14:14:28 -08:00
Jeremy Lewi	f87dfd8e53	Create a demo cluster for the code search example. (#298 )	2018-11-05 06:07:52 -08:00

9 Commits