examples

Commit Graph

Author	SHA1	Message	Date
Jeremy Lewi	f87dfd8e53	Create a demo cluster for the code search example. (#298 )	2018-11-05 06:07:52 -08:00
Jeremy Lewi	acd8007717	Use conditionals and add test for code search (#291 ) * Fix model export, loss function, and add some manual tests. Fix Model export to support computing code embeddings: Fix #260 * The previous exported model was always using the embeddings trained for the search query. * But we need to be able to compute embedding vectors for both the query and code. * To support this we add a new input feature "embed_code" and conditional ops. The exported model uses the value of the embed_code feature to determine whether to treat the inputs as a query string or code and computes the embeddings appropriately. * Originally based on #233 by @activatedgeek Loss function improvements * See #259 for a long discussion about different loss functions. * @activatedgeek was experimenting with different loss functions in #233 and this pulls in some of those changes. Add manual tests * Related to #258 * We add a smoke test for T2T steps so we can catch bugs in the code. * We also add a smoke test for serving the model with TFServing. * We add a sanity check to ensure we get different values for the same input based on which embeddings we are computing. Change Problem/Model name * Register the problem github_function_docstring with a different name to distinguish it from the version inside the Tensor2Tensor library. * * Skip the test when running under prow because its a manual test. * Fix some lint errors. * * Fix lint and skip tests. * Fix lint. * * Fix lint * Revert loss function changes; we can do that in a follow on PR. * * Run generate_data as part of the test rather than reusing a cached vocab and processed input file. * Modify SimilarityTransformer so we can overwrite the number of shards used easily to facilitate testing. * Comment out py-test for now.	2018-11-02 09:52:11 -07:00
Jeremy Lewi	adf614fc5f	Add tensorboard and check in vendor for the code search example. (#255 ) * Add tensorboard and check in vendor for the code search example. * * Remove the default env; when I ran ks show I got errors but removing it and adding a fresh env worked. It also won't point to the correct cluster for users.	2018-10-04 10:18:58 -07:00
Sanyam Kapoor	f9873e6ac4	Upgrade notebook commands and other relevant changes (#229 ) * Replace double quotes for field values (ks convention) * Recreate the ksonnet application from scratch * Fix pip commands to find requirements and redo installation, fix ks param set * Use sed replace instead of ks param set. * Add cells to first show JobSpec and then apply * Upgrade T2T, fix conflicting problem types * Update docker images * Reduce to 200k samples for vocab * Use Jupyter notebook service account * Add illustrative gsutil commands to show output files, specify index files glob explicitly * List files after index creation step * Use the model in current repository and not upstream t2t * Update Docker images * Expose TF Serving Rest API at 9001 * Spawn terminal from the notebooks ui, no need to go to lab	2018-08-20 16:35:07 -07:00
Sanyam Kapoor	4e015e76a3	Cherry pick changes to PredictionDoFn (#226 ) * Cherry pick changes to PredictionDoFn * Disable lint checks for cherry picked file * Update TODO and notebook install instructions * Restore CUSTOM_COMMANDS todo	2018-08-15 06:21:00 -07:00
Sanyam Kapoor	18829159b0	Add a new github function docstring extended problem (#225 ) * Add a new github function docstring extended problem * Fix lint errors * Update images	2018-08-14 15:41:47 -07:00
Sanyam Kapoor	8fce4a7799	Allow ks param set for Code Search Ksonnet Application (#224 ) * Allow ks param set for t2t-code-search * Update notebook with working directory param set * Abstract out common variables for easy ks param set	2018-08-14 15:29:04 -07:00
Sanyam Kapoor	a687c51036	Add a Jupyter notebook to be used for Kubeflow codelabs (#217 ) * Add a Jupyter notebook to be used for Kubeflow codelabs * Add help command for create_function_embeddings module * Update README to point to Jupyter Notebook * Add prerequisites to readme * Update README and getting started with notebook guide * [wip] * Update noebook with BigQuery previews * Update notebook to automatically select the latest MODEL_VERSION	2018-08-13 21:43:26 -07:00
Sanyam Kapoor	6e9150bad6	Parametrize volumes and ports for nmslib containers	2018-08-09 10:53:23 -07:00
Sanyam Kapoor	133e054033	Refactor job and deployment specs into different functions	2018-08-09 10:53:23 -07:00
Sanyam Kapoor	e34f9aca75	Build just one image with the correct tag instead of double the number	2018-08-09 10:53:23 -07:00
Sanyam Kapoor	c86f306d79	Use kind Job instead of Pod	2018-08-09 10:53:23 -07:00
Sanyam Kapoor	6527aba7c1	Upgrade JS app to be served at any path prefix	2018-08-09 10:53:23 -07:00
Sanyam Kapoor	9ce23d9fc6	Working search index server	2018-08-09 10:53:23 -07:00
Sanyam Kapoor	02db0065c1	Make search index creation a one-off job	2018-08-09 10:53:23 -07:00
Sanyam Kapoor	d4669467d8	Update Search Index server spec with new commands	2018-08-09 10:53:23 -07:00
Sanyam Kapoor	f2151f66fc	Merge UI and Search Server (#209 ) * Use the nicer tf.gfile interface for search index creation * Update documentation and more maintainable interface to search server * Add ability to control number of outputs * Serve React UI from the Flask server * Update Dockerfile for the unified server and ui	2018-08-03 15:56:09 -07:00
Sanyam Kapoor	e9e844022e	Disable Distributed Training (#207 ) * Upgrade TFJob and Ksonnet app * Container name should be tensorflow. See #563. * Working single node training and serving on Kubeflow * Add issue link for fixme * Remove redundant create secrets and use Kubeflow provided secrets	2018-08-02 23:02:05 -07:00
Sanyam Kapoor	fd2e750990	Fix T2T memory problem (#205 ) * Update T2T problems to workaround memory limitations * Add max_samples_for_vocab to prevent memory overflow * Fix a base URL to download data from, sweet spot for max samples * Convert class variables to class properties * Fix lint errors * Use Python2/3 compatible code for StringIO * Fix lint errors * Fix source data files format * Move to Text2TextProblem instead of TranslateProblem * Update details for num_shards and T2T problem dataset	2018-08-01 13:37:41 -07:00
Sanyam Kapoor	767c90ff20	Refactor dataflow pipelines (#197 ) * Update to a new dataflow package * [WIP] updating docstrings, fixing redundancies * Limit the scope of Github Transform pipeline, make everything unicode * Add ability to start github pipelines from transformed bigquery dataset * Upgrade batch prediction pipeline to be modular * Fix lint errors * Add write disposition to BigQuery transform * Update documentation format * Nicer names for modules * Add unicode encoding to parsed function docstring tuples * Use Apache Beam options parser to expose all CLI arguments	2018-07-27 06:26:56 -07:00
Sanyam Kapoor	994fdf82c0	Integrate nmslib (#194 ) * Integrate NMSLib server with new data file * Integrate UI with query URL of search server	2018-07-23 17:17:24 -07:00
Sanyam Kapoor	636cf1c3d0	Integrate batch prediction (#184 ) * Refactor the dataflow package * Create placeholder for new prediction pipeline * [WIP] add dofn for encoding * Merge all modules under single package * Pipeline data flow complete, wip prediction values * Fallback to custom commands for extra dependency * Working Dataflow runner installs, separate docker-related folder * [WIP] Updated local user journey in README, fully working commands, easy container translation * Working Batch Predictions. * Remove docstring embeddings * Complete batch prediction pipeline * Update Dockerfiles and T2T Ksonnet components * Fix linting * Downgrade runtime to Python2, wip memory issues so use lesser data * Pin master to index 0. * Working batch prediction pipeline * Modular Github Batch Prediction Pipeline, stores back to BigQuery * Fix lint errors * Fix module-wide imports, pin batch-prediction version * Fix relative import, update docstrings * Add references to issue and current workaround for Batch Prediction dependency.	2018-07-23 16:26:23 -07:00
Sanyam Kapoor	2adbb7ace4	Fix transformer export (#169 ) * Add auto-downloads for the data * Make top() a no-op, working export * Fix lint errors * Integrate NMSlib server with TF Serving * Clarify data URLs purpose	2018-07-16 14:06:52 -07:00
Sanyam Kapoor	d692db36e8	Search UI Components (#168 ) * Initialize search UI. Needs connection to search service * Fix page title * Add component for code search results, dummy values for now * Fix title and manifest * Add mock loading UI. Need to fill in real API results * Wrap application into Dockerfile	2018-07-10 20:08:25 -07:00
Sanyam Kapoor	c5f13464b4	Add negative sampling to Transformer network (#167 ) * Add negative sampling to Transformer network * Add generate data flag, can skip t2t-datagen step	2018-07-04 20:14:22 -07:00
Sanyam Kapoor	5a9748bf8f	Add similarity transformer body (#159 ) * Add similarity transformer body * Update pipeline to Write a single CSV file * Fix lint errors * Use CSV writer to handle formatting rows * Use direct transformer encoding methods with variable scopes * Complete end-to-end training with new model and problem * Read from mutliple csv files	2018-07-03 11:14:19 -07:00
Sanyam Kapoor	c1b2802313	Add new TF-Serving component with sample task (#152 ) * Add new TF-Serving component with sample task * Unify nmslib and t2t packages, need to be cohesive * [WIP] update references to the package * Replace old T2T problem * Add representative code for encoding/decoding from tf serving service * Add rest API port to TF serving (replaces custom http proxy) * Fix linting * Add NMSLib creator and server components * Add docs to CLI module	2018-06-28 20:37:21 -07:00
Sanyam Kapoor	f20161167e	Add a new similarity transformer model, register new problem (#146 ) * Add a new similarity transformer model, register new problem * Remove useless constructor	2018-06-27 11:00:18 -07:00
Sanyam Kapoor	656e1e3e7c	Extension of T2T Ksonnet component (#149 ) * Add jobs derived from t2t component, GCP credentials assumed * Add script to create IAM role bindings for Docker container to use * Fix names to hyphens * Add t2t-exporter wrapper * Fix typos * A temporary workaround for tensorflow/tensor2tensor#879 * Complete working pipeline of datagen, trainer and exporter * Add docstring to create_secrets.sh	2018-06-25 15:09:22 -07:00
Sanyam Kapoor	21506ffc51	Python package for indexing and serving the index (#150 ) * Add a utility python package for indexing and serving the index * Add CLI arguments, conditional GCS download * Complete skeleton CLIs for serving and index creation * Fix lint issues	2018-06-20 15:34:05 -07:00
Sanyam Kapoor	4bd30a1e68	Language task on kubeflow (#143 ) * [WIP] initialize ksonnet app * Push images to GCR * Upgrade Docker container to run T2T entrypoint with appropriate env vars * Add a tf-job based t2t-job * Fix GPU parameters	2018-06-15 18:16:34 -07:00
Sanyam Kapoor	242c2e6d20	Add custom metrics, write raw tokens to GCS (#141 ) * Add custom metrics, write raw tokens to GCS * Change number of output file shards to 1	2018-06-13 12:03:27 -07:00
Sanyam Kapoor	3bff3339f7	Isolate t2t execution into docker (#131 ) * Isolate t2t execution into a docker * Add image build script, update run interface * Fix grammar typo	2018-06-12 12:53:29 -07:00
Sanyam Kapoor	d3c781772c	Language modeling using Transformer Networks (#129 ) * Add Github language modeling problem * Rename folders, update README with datagen and train scripts * Fix linting	2018-06-07 06:31:22 -07:00
Sanyam Kapoor	f4c8b7f80d	Add error handling to Dataflow (#128 ) * Add error handling to dataflow * Fix lint issues * Update pipeline with error handling on tokenization and info splitting	2018-06-06 21:46:24 -07:00
Sanyam Kapoor	6220907044	New tensor2tensor problem datagen for function summarization (#127 ) * New tensor2tensor problem for function summarization * Consolidate README with improved docs * Remove old readme * Add T2T Trainer using Transformer Networks * Fix missing requirement for t2t-trainer	2018-06-06 00:38:58 -07:00
Sanyam Kapoor	17dd02b803	Add num workers options to Dataflow (#125 )	2018-06-05 17:05:56 -07:00
Sanyam Kapoor	e26a290f0f	Fix utf-8 encoding issues (#122 )	2018-06-01 10:35:56 -07:00
Sanyam Kapoor	26ff66d747	Semantic Code Search Example Data Ingestion (#120 ) * Code Search Preprocessing Pipeline * Add missing pipeline execution to git tree * Move the preprocessing step into its own package * Add docstrings * Fix pylint errors	2018-05-31 15:28:56 -07:00

1 2

89 Commits