Commit Graph

5 Commits

Author SHA1 Message Date
Sanyam Kapoor 5a9748bf8f Add similarity transformer body (#159)
* Add similarity transformer body

* Update pipeline to Write a single CSV file

* Fix lint errors

* Use CSV writer to handle formatting rows

* Use direct transformer encoding methods with variable scopes

* Complete end-to-end training with new model and problem

* Read from mutliple csv files
2018-07-03 11:14:19 -07:00
Sanyam Kapoor 242c2e6d20 Add custom metrics, write raw tokens to GCS (#141)
* Add custom metrics, write raw tokens to GCS

* Change number of output file shards to 1
2018-06-13 12:03:27 -07:00
Sanyam Kapoor f4c8b7f80d Add error handling to Dataflow (#128)
* Add error handling to dataflow

* Fix lint issues

* Update pipeline with error handling on tokenization and info splitting
2018-06-06 21:46:24 -07:00
Sanyam Kapoor 17dd02b803 Add num workers options to Dataflow (#125) 2018-06-05 17:05:56 -07:00
Sanyam Kapoor 26ff66d747 Semantic Code Search Example Data Ingestion (#120)
* Code Search Preprocessing Pipeline

* Add missing pipeline execution to git tree

* Move the preprocessing step into its own package

* Add docstrings

* Fix pylint errors
2018-05-31 15:28:56 -07:00