Sanyam Kapoor
242c2e6d20
Add custom metrics, write raw tokens to GCS ( #141 )
...
* Add custom metrics, write raw tokens to GCS
* Change number of output file shards to 1
2018-06-13 12:03:27 -07:00
Sanyam Kapoor
f4c8b7f80d
Add error handling to Dataflow ( #128 )
...
* Add error handling to dataflow
* Fix lint issues
* Update pipeline with error handling on tokenization and info splitting
2018-06-06 21:46:24 -07:00
Sanyam Kapoor
17dd02b803
Add num workers options to Dataflow ( #125 )
2018-06-05 17:05:56 -07:00
Sanyam Kapoor
26ff66d747
Semantic Code Search Example Data Ingestion ( #120 )
...
* Code Search Preprocessing Pipeline
* Add missing pipeline execution to git tree
* Move the preprocessing step into its own package
* Add docstrings
* Fix pylint errors
2018-05-31 15:28:56 -07:00