Commit Graph

80 Commits

Author SHA1 Message Date
Sanyam Kapoor 133e054033 Refactor job and deployment specs into different functions 2018-08-09 10:53:23 -07:00
Sanyam Kapoor e34f9aca75 Build just one image with the correct tag instead of double the number 2018-08-09 10:53:23 -07:00
Sanyam Kapoor c86f306d79 Use kind Job instead of Pod 2018-08-09 10:53:23 -07:00
Sanyam Kapoor 6527aba7c1 Upgrade JS app to be served at any path prefix 2018-08-09 10:53:23 -07:00
Sanyam Kapoor 9ce23d9fc6 Working search index server 2018-08-09 10:53:23 -07:00
Sanyam Kapoor 02db0065c1 Make search index creation a one-off job 2018-08-09 10:53:23 -07:00
Sanyam Kapoor d4669467d8 Update Search Index server spec with new commands 2018-08-09 10:53:23 -07:00
Sanyam Kapoor f2151f66fc Merge UI and Search Server (#209)
* Use the nicer tf.gfile interface for search index creation

* Update documentation and more maintainable interface to search server

* Add ability to control number of outputs

* Serve React UI from the Flask server

* Update Dockerfile for the unified server and ui
2018-08-03 15:56:09 -07:00
Sanyam Kapoor e9e844022e Disable Distributed Training (#207)
* Upgrade TFJob and Ksonnet app

* Container name should be tensorflow. See #563.

* Working single node training and serving on Kubeflow

* Add issue link for fixme

* Remove redundant create secrets and use Kubeflow provided secrets
2018-08-02 23:02:05 -07:00
Sanyam Kapoor fd2e750990 Fix T2T memory problem (#205)
* Update T2T problems to workaround memory limitations

* Add max_samples_for_vocab to prevent memory overflow

* Fix a base URL to download data from, sweet spot for max samples

* Convert class variables to class properties

* Fix lint errors

* Use Python2/3 compatible code for StringIO

* Fix lint errors

* Fix source data files format

* Move to Text2TextProblem instead of TranslateProblem

* Update details for num_shards and T2T problem dataset
2018-08-01 13:37:41 -07:00
Sanyam Kapoor 767c90ff20 Refactor dataflow pipelines (#197)
* Update to a new dataflow package

* [WIP] updating docstrings, fixing redundancies

* Limit the scope of Github Transform pipeline, make everything unicode

* Add ability to start github pipelines from transformed bigquery dataset

* Upgrade batch prediction pipeline to be modular

* Fix lint errors

* Add write disposition to BigQuery transform

* Update documentation format

* Nicer names for modules

* Add unicode encoding to parsed function docstring tuples

* Use Apache Beam options parser to expose all CLI arguments
2018-07-27 06:26:56 -07:00
Sanyam Kapoor 994fdf82c0 Integrate nmslib (#194)
* Integrate NMSLib server with new data file

* Integrate UI with query URL of search server
2018-07-23 17:17:24 -07:00
Sanyam Kapoor 636cf1c3d0 Integrate batch prediction (#184)
* Refactor the dataflow package

* Create placeholder for new prediction pipeline

* [WIP] add dofn for encoding

* Merge all modules under single package

* Pipeline data flow complete, wip prediction values

* Fallback to custom commands for extra dependency

* Working Dataflow runner installs, separate docker-related folder

* [WIP] Updated local user journey in README, fully working commands, easy container translation

* Working Batch Predictions.

* Remove docstring embeddings

* Complete batch prediction pipeline

* Update Dockerfiles and T2T Ksonnet components

* Fix linting

* Downgrade runtime to Python2, wip memory issues so use lesser data

* Pin master to index 0.

* Working batch prediction pipeline

* Modular Github Batch Prediction Pipeline, stores back to BigQuery

* Fix lint errors

* Fix module-wide imports, pin batch-prediction version

* Fix relative import, update docstrings

* Add references to issue and current workaround for Batch Prediction dependency.
2018-07-23 16:26:23 -07:00
Sanyam Kapoor 2adbb7ace4 Fix transformer export (#169)
* Add auto-downloads for the data

* Make top() a no-op, working export

* Fix lint errors

* Integrate NMSlib server with TF Serving

* Clarify data URLs purpose
2018-07-16 14:06:52 -07:00
Sanyam Kapoor d692db36e8 Search UI Components (#168)
* Initialize search UI. Needs connection to search service

* Fix page title

* Add component for code search results, dummy values for now

* Fix title and manifest

* Add mock loading UI. Need to fill in real API results

* Wrap application into Dockerfile
2018-07-10 20:08:25 -07:00
Sanyam Kapoor c5f13464b4 Add negative sampling to Transformer network (#167)
* Add negative sampling to Transformer network

* Add generate data flag, can skip t2t-datagen step
2018-07-04 20:14:22 -07:00
Sanyam Kapoor 5a9748bf8f Add similarity transformer body (#159)
* Add similarity transformer body

* Update pipeline to Write a single CSV file

* Fix lint errors

* Use CSV writer to handle formatting rows

* Use direct transformer encoding methods with variable scopes

* Complete end-to-end training with new model and problem

* Read from mutliple csv files
2018-07-03 11:14:19 -07:00
Sanyam Kapoor c1b2802313 Add new TF-Serving component with sample task (#152)
* Add new TF-Serving component with sample task

* Unify nmslib and t2t packages, need to be cohesive

* [WIP] update references to the package

* Replace old T2T problem

* Add representative code for encoding/decoding from tf serving service

* Add rest API port to TF serving (replaces custom http proxy)

* Fix linting

* Add NMSLib creator and server components

* Add docs to CLI module
2018-06-28 20:37:21 -07:00
Sanyam Kapoor f20161167e Add a new similarity transformer model, register new problem (#146)
* Add a new similarity transformer model, register new problem

* Remove useless constructor
2018-06-27 11:00:18 -07:00
Sanyam Kapoor 656e1e3e7c Extension of T2T Ksonnet component (#149)
* Add jobs derived from t2t component, GCP credentials assumed

* Add script to create IAM role bindings for Docker container to use

* Fix names to hyphens

* Add t2t-exporter wrapper

* Fix typos

* A temporary workaround for tensorflow/tensor2tensor#879

* Complete working pipeline of datagen, trainer and exporter

* Add docstring to create_secrets.sh
2018-06-25 15:09:22 -07:00
Sanyam Kapoor 21506ffc51 Python package for indexing and serving the index (#150)
* Add a utility python package for indexing and serving the index

* Add CLI arguments, conditional GCS download

* Complete skeleton CLIs for serving and index creation

* Fix lint issues
2018-06-20 15:34:05 -07:00
Sanyam Kapoor 4bd30a1e68 Language task on kubeflow (#143)
* [WIP] initialize ksonnet app

* Push images to GCR

* Upgrade Docker container to run T2T entrypoint with appropriate env vars

* Add a tf-job based t2t-job

* Fix GPU parameters
2018-06-15 18:16:34 -07:00
Sanyam Kapoor 242c2e6d20 Add custom metrics, write raw tokens to GCS (#141)
* Add custom metrics, write raw tokens to GCS

* Change number of output file shards to 1
2018-06-13 12:03:27 -07:00
Sanyam Kapoor 3bff3339f7 Isolate t2t execution into docker (#131)
* Isolate t2t execution into a docker

* Add image build script, update run interface

* Fix grammar typo
2018-06-12 12:53:29 -07:00
Sanyam Kapoor d3c781772c Language modeling using Transformer Networks (#129)
* Add Github language modeling problem

* Rename folders, update README with datagen and train scripts

* Fix linting
2018-06-07 06:31:22 -07:00
Sanyam Kapoor f4c8b7f80d Add error handling to Dataflow (#128)
* Add error handling to dataflow

* Fix lint issues

* Update pipeline with error handling on tokenization and info splitting
2018-06-06 21:46:24 -07:00
Sanyam Kapoor 6220907044 New tensor2tensor problem datagen for function summarization (#127)
* New tensor2tensor problem for function summarization

* Consolidate README with improved docs

* Remove old readme

* Add T2T Trainer using Transformer Networks

* Fix missing requirement for t2t-trainer
2018-06-06 00:38:58 -07:00
Sanyam Kapoor 17dd02b803 Add num workers options to Dataflow (#125) 2018-06-05 17:05:56 -07:00
Sanyam Kapoor e26a290f0f Fix utf-8 encoding issues (#122) 2018-06-01 10:35:56 -07:00
Sanyam Kapoor 26ff66d747 Semantic Code Search Example Data Ingestion (#120)
* Code Search Preprocessing Pipeline

* Add missing pipeline execution to git tree

* Move the preprocessing step into its own package

* Add docstrings

* Fix pylint errors
2018-05-31 15:28:56 -07:00