Commit Graph

11 Commits

Author SHA1 Message Date
Sanyam Kapoor 656e1e3e7c Extension of T2T Ksonnet component (#149)
* Add jobs derived from t2t component, GCP credentials assumed

* Add script to create IAM role bindings for Docker container to use

* Fix names to hyphens

* Add t2t-exporter wrapper

* Fix typos

* A temporary workaround for tensorflow/tensor2tensor#879

* Complete working pipeline of datagen, trainer and exporter

* Add docstring to create_secrets.sh
2018-06-25 15:09:22 -07:00
Sanyam Kapoor 21506ffc51 Python package for indexing and serving the index (#150)
* Add a utility python package for indexing and serving the index

* Add CLI arguments, conditional GCS download

* Complete skeleton CLIs for serving and index creation

* Fix lint issues
2018-06-20 15:34:05 -07:00
Sanyam Kapoor 4bd30a1e68 Language task on kubeflow (#143)
* [WIP] initialize ksonnet app

* Push images to GCR

* Upgrade Docker container to run T2T entrypoint with appropriate env vars

* Add a tf-job based t2t-job

* Fix GPU parameters
2018-06-15 18:16:34 -07:00
Sanyam Kapoor 242c2e6d20 Add custom metrics, write raw tokens to GCS (#141)
* Add custom metrics, write raw tokens to GCS

* Change number of output file shards to 1
2018-06-13 12:03:27 -07:00
Sanyam Kapoor 3bff3339f7 Isolate t2t execution into docker (#131)
* Isolate t2t execution into a docker

* Add image build script, update run interface

* Fix grammar typo
2018-06-12 12:53:29 -07:00
Sanyam Kapoor d3c781772c Language modeling using Transformer Networks (#129)
* Add Github language modeling problem

* Rename folders, update README with datagen and train scripts

* Fix linting
2018-06-07 06:31:22 -07:00
Sanyam Kapoor f4c8b7f80d Add error handling to Dataflow (#128)
* Add error handling to dataflow

* Fix lint issues

* Update pipeline with error handling on tokenization and info splitting
2018-06-06 21:46:24 -07:00
Sanyam Kapoor 6220907044 New tensor2tensor problem datagen for function summarization (#127)
* New tensor2tensor problem for function summarization

* Consolidate README with improved docs

* Remove old readme

* Add T2T Trainer using Transformer Networks

* Fix missing requirement for t2t-trainer
2018-06-06 00:38:58 -07:00
Sanyam Kapoor 17dd02b803 Add num workers options to Dataflow (#125) 2018-06-05 17:05:56 -07:00
Sanyam Kapoor e26a290f0f Fix utf-8 encoding issues (#122) 2018-06-01 10:35:56 -07:00
Sanyam Kapoor 26ff66d747 Semantic Code Search Example Data Ingestion (#120)
* Code Search Preprocessing Pipeline

* Add missing pipeline execution to git tree

* Move the preprocessing step into its own package

* Add docstrings

* Fix pylint errors
2018-05-31 15:28:56 -07:00