examples

Commit Graph

Author	SHA1	Message	Date
Jeremy Lewi	9f061a0554	Update the central dashboard UI image to one that includes pipelines. (#430 )	2018-12-12 09:34:21 -08:00
Jeremy Lewi	b26f7e9a48	Add pods/logs permission to the jupyter notebook role. (#419 ) * This is needed so that fairing can tail the logs.	2018-12-09 15:53:46 -08:00
IronPan	4f95e85e63	add pipeline component (#356 ) * add pipeline component * update pipeline component	2018-11-26 06:21:07 -08:00
Jeremy Lewi	df278567f0	Fix performance of dataflow preprocessing job. (#302 ) * Fix performance of dataflow preprocessing job. * Fix #300; Dataflow job for preprocessing is really slow. * The problem is we are loading the spacy tokenization model on every invocation of the tokenization function and this is really expensive. * We should be doing this once per module import. * After fixing this issue; the job completed in approximately 20 minutes using 5 workers. * We can process all 1.3 million records in ~ 20 minutes (elapsed time) using 5 32 CPU workers and about 1 hour of CPU time altogether. * Add options to the Dataflow job to read from files as opposed to BigQuery and to skip BigQuery writes. This is useful for testing. * Add a "unittest" that verifies the Dataflow preprocessing job can run successfully using the DirectRunner. * Update the Docker image and a ksonnet component for a K8s job that can be used to submit the Dataflow job. * Fix #299; Add logging to the Dataflow preprocessing job to indicate that a Dataflow job was submitted. * Add an option to the preprocessing Dataflow job to read an entire BigQuery table as the input rather than running a query to get the input. This is useful in the case where the user wants to run a different query to select the repo paths and contents to process and write them to some table to be processed by the Dataflow job. * Fix lint. * More lint fixes.	2018-11-06 14:14:28 -08:00
Jeremy Lewi	f87dfd8e53	Create a demo cluster for the code search example. (#298 )	2018-11-05 06:07:52 -08:00

5 Commits