examples

Commit Graph

Author	SHA1	Message	Date
Sanyam Kapoor	133e054033	Refactor job and deployment specs into different functions	2018-08-09 10:53:23 -07:00
Sanyam Kapoor	e34f9aca75	Build just one image with the correct tag instead of double the number	2018-08-09 10:53:23 -07:00
Sanyam Kapoor	c86f306d79	Use kind Job instead of Pod	2018-08-09 10:53:23 -07:00
Sanyam Kapoor	6527aba7c1	Upgrade JS app to be served at any path prefix	2018-08-09 10:53:23 -07:00
Sanyam Kapoor	9ce23d9fc6	Working search index server	2018-08-09 10:53:23 -07:00
Sanyam Kapoor	02db0065c1	Make search index creation a one-off job	2018-08-09 10:53:23 -07:00
Sanyam Kapoor	d4669467d8	Update Search Index server spec with new commands	2018-08-09 10:53:23 -07:00
Sanyam Kapoor	f2151f66fc	Merge UI and Search Server (#209 ) * Use the nicer tf.gfile interface for search index creation * Update documentation and more maintainable interface to search server * Add ability to control number of outputs * Serve React UI from the Flask server * Update Dockerfile for the unified server and ui	2018-08-03 15:56:09 -07:00
Sanyam Kapoor	e9e844022e	Disable Distributed Training (#207 ) * Upgrade TFJob and Ksonnet app * Container name should be tensorflow. See #563. * Working single node training and serving on Kubeflow * Add issue link for fixme * Remove redundant create secrets and use Kubeflow provided secrets	2018-08-02 23:02:05 -07:00
Sanyam Kapoor	fd2e750990	Fix T2T memory problem (#205 ) * Update T2T problems to workaround memory limitations * Add max_samples_for_vocab to prevent memory overflow * Fix a base URL to download data from, sweet spot for max samples * Convert class variables to class properties * Fix lint errors * Use Python2/3 compatible code for StringIO * Fix lint errors * Fix source data files format * Move to Text2TextProblem instead of TranslateProblem * Update details for num_shards and T2T problem dataset	2018-08-01 13:37:41 -07:00
Sanyam Kapoor	767c90ff20	Refactor dataflow pipelines (#197 ) * Update to a new dataflow package * [WIP] updating docstrings, fixing redundancies * Limit the scope of Github Transform pipeline, make everything unicode * Add ability to start github pipelines from transformed bigquery dataset * Upgrade batch prediction pipeline to be modular * Fix lint errors * Add write disposition to BigQuery transform * Update documentation format * Nicer names for modules * Add unicode encoding to parsed function docstring tuples * Use Apache Beam options parser to expose all CLI arguments	2018-07-27 06:26:56 -07:00
Sanyam Kapoor	994fdf82c0	Integrate nmslib (#194 ) * Integrate NMSLib server with new data file * Integrate UI with query URL of search server	2018-07-23 17:17:24 -07:00
Sanyam Kapoor	636cf1c3d0	Integrate batch prediction (#184 ) * Refactor the dataflow package * Create placeholder for new prediction pipeline * [WIP] add dofn for encoding * Merge all modules under single package * Pipeline data flow complete, wip prediction values * Fallback to custom commands for extra dependency * Working Dataflow runner installs, separate docker-related folder * [WIP] Updated local user journey in README, fully working commands, easy container translation * Working Batch Predictions. * Remove docstring embeddings * Complete batch prediction pipeline * Update Dockerfiles and T2T Ksonnet components * Fix linting * Downgrade runtime to Python2, wip memory issues so use lesser data * Pin master to index 0. * Working batch prediction pipeline * Modular Github Batch Prediction Pipeline, stores back to BigQuery * Fix lint errors * Fix module-wide imports, pin batch-prediction version * Fix relative import, update docstrings * Add references to issue and current workaround for Batch Prediction dependency.	2018-07-23 16:26:23 -07:00
Sanyam Kapoor	2adbb7ace4	Fix transformer export (#169 ) * Add auto-downloads for the data * Make top() a no-op, working export * Fix lint errors * Integrate NMSlib server with TF Serving * Clarify data URLs purpose	2018-07-16 14:06:52 -07:00
Sanyam Kapoor	d692db36e8	Search UI Components (#168 ) * Initialize search UI. Needs connection to search service * Fix page title * Add component for code search results, dummy values for now * Fix title and manifest * Add mock loading UI. Need to fill in real API results * Wrap application into Dockerfile	2018-07-10 20:08:25 -07:00
Sanyam Kapoor	c5f13464b4	Add negative sampling to Transformer network (#167 ) * Add negative sampling to Transformer network * Add generate data flag, can skip t2t-datagen step	2018-07-04 20:14:22 -07:00
Sanyam Kapoor	5a9748bf8f	Add similarity transformer body (#159 ) * Add similarity transformer body * Update pipeline to Write a single CSV file * Fix lint errors * Use CSV writer to handle formatting rows * Use direct transformer encoding methods with variable scopes * Complete end-to-end training with new model and problem * Read from mutliple csv files	2018-07-03 11:14:19 -07:00
Sanyam Kapoor	c1b2802313	Add new TF-Serving component with sample task (#152 ) * Add new TF-Serving component with sample task * Unify nmslib and t2t packages, need to be cohesive * [WIP] update references to the package * Replace old T2T problem * Add representative code for encoding/decoding from tf serving service * Add rest API port to TF serving (replaces custom http proxy) * Fix linting * Add NMSLib creator and server components * Add docs to CLI module	2018-06-28 20:37:21 -07:00
Sanyam Kapoor	f20161167e	Add a new similarity transformer model, register new problem (#146 ) * Add a new similarity transformer model, register new problem * Remove useless constructor	2018-06-27 11:00:18 -07:00
Sanyam Kapoor	656e1e3e7c	Extension of T2T Ksonnet component (#149 ) * Add jobs derived from t2t component, GCP credentials assumed * Add script to create IAM role bindings for Docker container to use * Fix names to hyphens * Add t2t-exporter wrapper * Fix typos * A temporary workaround for tensorflow/tensor2tensor#879 * Complete working pipeline of datagen, trainer and exporter * Add docstring to create_secrets.sh	2018-06-25 15:09:22 -07:00
Sanyam Kapoor	21506ffc51	Python package for indexing and serving the index (#150 ) * Add a utility python package for indexing and serving the index * Add CLI arguments, conditional GCS download * Complete skeleton CLIs for serving and index creation * Fix lint issues	2018-06-20 15:34:05 -07:00
Sanyam Kapoor	4bd30a1e68	Language task on kubeflow (#143 ) * [WIP] initialize ksonnet app * Push images to GCR * Upgrade Docker container to run T2T entrypoint with appropriate env vars * Add a tf-job based t2t-job * Fix GPU parameters	2018-06-15 18:16:34 -07:00
Sanyam Kapoor	242c2e6d20	Add custom metrics, write raw tokens to GCS (#141 ) * Add custom metrics, write raw tokens to GCS * Change number of output file shards to 1	2018-06-13 12:03:27 -07:00
Sanyam Kapoor	3bff3339f7	Isolate t2t execution into docker (#131 ) * Isolate t2t execution into a docker * Add image build script, update run interface * Fix grammar typo	2018-06-12 12:53:29 -07:00
Sanyam Kapoor	d3c781772c	Language modeling using Transformer Networks (#129 ) * Add Github language modeling problem * Rename folders, update README with datagen and train scripts * Fix linting	2018-06-07 06:31:22 -07:00
Sanyam Kapoor	f4c8b7f80d	Add error handling to Dataflow (#128 ) * Add error handling to dataflow * Fix lint issues * Update pipeline with error handling on tokenization and info splitting	2018-06-06 21:46:24 -07:00
Sanyam Kapoor	6220907044	New tensor2tensor problem datagen for function summarization (#127 ) * New tensor2tensor problem for function summarization * Consolidate README with improved docs * Remove old readme * Add T2T Trainer using Transformer Networks * Fix missing requirement for t2t-trainer	2018-06-06 00:38:58 -07:00
Sanyam Kapoor	17dd02b803	Add num workers options to Dataflow (#125 )	2018-06-05 17:05:56 -07:00
Sanyam Kapoor	e26a290f0f	Fix utf-8 encoding issues (#122 )	2018-06-01 10:35:56 -07:00
Sanyam Kapoor	26ff66d747	Semantic Code Search Example Data Ingestion (#120 ) * Code Search Preprocessing Pipeline * Add missing pipeline execution to git tree * Move the preprocessing step into its own package * Add docstrings * Fix pylint errors	2018-05-31 15:28:56 -07:00

1 2

80 Commits