Commit Graph

2 Commits

Author SHA1 Message Date
Sanyam Kapoor 767c90ff20 Refactor dataflow pipelines (#197)
* Update to a new dataflow package

* [WIP] updating docstrings, fixing redundancies

* Limit the scope of Github Transform pipeline, make everything unicode

* Add ability to start github pipelines from transformed bigquery dataset

* Upgrade batch prediction pipeline to be modular

* Fix lint errors

* Add write disposition to BigQuery transform

* Update documentation format

* Nicer names for modules

* Add unicode encoding to parsed function docstring tuples

* Use Apache Beam options parser to expose all CLI arguments
2018-07-27 06:26:56 -07:00
Sanyam Kapoor 636cf1c3d0 Integrate batch prediction (#184)
* Refactor the dataflow package

* Create placeholder for new prediction pipeline

* [WIP] add dofn for encoding

* Merge all modules under single package

* Pipeline data flow complete, wip prediction values

* Fallback to custom commands for extra dependency

* Working Dataflow runner installs, separate docker-related folder

* [WIP] Updated local user journey in README, fully working commands, easy container translation

* Working Batch Predictions.

* Remove docstring embeddings

* Complete batch prediction pipeline

* Update Dockerfiles and T2T Ksonnet components

* Fix linting

* Downgrade runtime to Python2, wip memory issues so use lesser data

* Pin master to index 0.

* Working batch prediction pipeline

* Modular Github Batch Prediction Pipeline, stores back to BigQuery

* Fix lint errors

* Fix module-wide imports, pin batch-prediction version

* Fix relative import, update docstrings

* Add references to issue and current workaround for Batch Prediction dependency.
2018-07-23 16:26:23 -07:00