Commit Graph

24 Commits

Author SHA1 Message Date
Amy b23adc1f0b import of Pipelines Github issue summarization examples & tutorial (#507)
* initial import of Pipelines Github issue summarization examples & lab

* more linting/cleanup, fix tf version to 1.12

* bit more linting; pin some lib versions

* last? lint fixes

* another attempt to fix linting issues

* ughh

* changed test cluster config info

* update ktext package in a test docker image

* hmm, retrying fix for the ktext package update
2019-04-18 17:57:54 -07:00
cliveseldon 8d728f0b06 GitHub Summarization Seldon Update (#472)
* Update model inference wrapping to use S2I and update docs

* Add s2i reference in docs

* Fix typo highlighted in review

* Add pyLint annotation to allow protected-access on keras make predict function method
2019-01-17 16:07:34 -08:00
Jeremy Lewi 1cc4550b7d GIS E2E test verify the TFJob runs successfully (#456)
* Create a test for submitting the TFJob for the GitHub issue summarization example.

* This test needs to be run manually right now. In a follow on PR we will
  integrate it into CI.

* We use the image built from Dockerfile.estimator because that is the image
  we are running train_test.py in.

  * Note: The current version of the code now requires Python3 (I think this
    is due to an earlier PR which refactored the code into a shared
    implementation for using TF estimator and not TF estimator).

* Create a TFJob component for TFJob v1beta1; this is the version
  in KF 0.4.

TFJob component
  * Upgrade to v1beta to work with 0.4
  * Update command line arguments to match the versions in the current code
      * input & output are now single parameters rather then separate parameters
        for bucket and name

  * change default input to a CSV file because the current version of the
    code doesn't handle unzipping it.

* Use ks_util from kubeflow/testing

* Address comments.
2019-01-08 15:06:49 -08:00
Jeremy Lewi 959d072e68 Setup continuous building of Docker images for GH Issue Summarization Example (#449)
* Setup continuous building of Docker images and testing  for GH Issue Summarization Example.

* This is the first step in setting up a continuously running CI test.

* Add support for building the Docker images using GCB; we will use GCB
  to trigger the builds from our CI system.

  * Make the Makefile top level (at root of GIS example) so that we can
    easily access all the different resources.

* Add a .gitignore file to avoid checking in the build directory used by
  the Makefile.

* Define an Argo workflow to use as the E2E test.

Related to #92: E2E test & CI for github issue summarization

* Trigger the test on pre & post submit

* Dockerfile.estimator don't install the data_download.sh script
  * It doesn't look like we are currently using data_download.sh in the
    DockerImage
  * It looks like it only gets used vias the ksonnet job which mounts the
    script via a config map

  * Copying data_download.sh to the Docker image is currently weird
    given the organization of the Dockerfile and context.

* Copy the test_data to the Docker images so that we can run the test
  inside the images.

* Invoke the python unittest for training from our CI system.

  * In a follow on PR we will update the test to emit a JUnit XML file to
    report results to prow.

* Fix image build.
2019-01-04 17:02:24 -08:00
Michelle Casbon 70a22d6d7b [GH Issue Summarization] Upgrade to kf v0.4.0-rc.2 (#450)
* Update tfjob components to v1beta1

Remove old version of tensor2tensor component

* Combine UI into a single jsonnet file

* Upgrade GH issue summarization to kf v0.4.0-rc.2

Use latest ksonnet v0.13.1
Use latest seldon v1alpha2
Remove ksonnet app with full kubeflow platform & replace with components specific to this example.
Remove outdated scripts
Add cluster creation links to Click-to-deploy & kfctl
Add warning not to use the Training with an Estimator guide
Replace commandline with bash for better syntax highlighting
Replace messy port-forwarding commands with svc/ambassador
Add modelUrl param to ui component
Modify teardown instructions to remove the deployment
Fix grammatical mistakes

* Rearrange tfjob instructions
2018-12-30 20:05:29 -08:00
Jeremy Lewi 1043bc0c26 A bunch of changes to support distributed training using tf.estimator (#265)
* Unify the code for training with Keras and TF.Estimator

Create a single train.py and trainer.py which uses Keras inside TensorFlow
Provide options to either train with Keras or TF.TensorFlow
The code to train with TF.estimator doesn't worki

See #196
The original PR (#203) worked around a blocking issue with Keras and TF.Estimator by commenting
certain layers in the model architecture leading to a model that wouldn't generate meaningful
predictions
We weren't able to get TF.Estimator working but this PR should make it easier to troubleshoot further

We've unified the existing code so that we don't duplicate the code just to train with TF.estimator
We've added unitttests that can be used to verify training with TF.estimator works. This test
can also be used to reproduce the current errors with TF.estimator.
Add a Makefile to build the Docker image

Add a NFS PVC to our Kubeflow demo deployment.

Create a tfjob-estimator component in our ksonnet component.

changes to distributed/train.py as part of merging with notebooks/train.py
* Add command line arguments to specify paths rather than hard coding them.
* Remove the code at the start of train.py to wait until the input data
becomes available.
* I think the original intent was to allow the TFJob to be started simultaneously with the preprocessing
job and just block until the data is available
* That should be unnecessary since we can just run the preprocessing job as a separate job.

Fix notebooks/train.py (#186)

The code wasn't actually calling Model Fit
Add a unittest to verify we can invoke fit and evaluate without throwing exceptions.

* Address comments.
2018-11-07 16:23:59 -08:00
Michelle Casbon 836ad70421 Fix model file upload (#160)
* Add component parameters

Add model_url & port arguments to flask app
Add service_type, image, and model_url parameters to ui component
Fix problem argument in tensor2tensor component

* Fix broken UI component

Fix broken UI component structure by adding all, service, & deployment parts
Add parameter defaults for tfjob to resolve failures deploying other components

* Add missing imports in flask app

Fix syntax error in argument parsing
Remove underscores from parameter names to workaround ksonnet bug #554: https://github.com/ksonnet/ksonnet/issues/554

* Fix syntax errors in t2t instructions

Add CPU image build arg to docker build command for t2t-training
Fix link to ksonnet app dir
Correct param names for tensor2tensor component
Add missing params for tensor2tensor component
Fix apply command syntax
Swap out log view pod for t2t-master instead of tf-operator
Fix link to training with tfjob

* Fix model file upload

Update default params for tfjob-v1alpha2
Fix build directory path in Makefile

* Resolve lint issues

Lines too long

* Add specific image tag to tfjob-v1alpha2 default

* Fix defaults for training output files

Update image tag
Add UI image tag

* Revert service account secret details

Update associated readme
2018-06-29 18:41:20 -07:00
Jeremy Lewi e12231bae3 Make it easier to demo serving and run in Katacoda (#107)
* Make it easier to demo serving and run in Katacoda

* Allow the model path to be specified via environment variables so that
  we could potentially load the model from PVC.

* Continue to bake the model into the image so that we don't need to train
  in order to serve.

* Parameterize download_data.sh so we could potentially fetch different sources.

* Update the Makefile so that we can build and set the image for the serving
  component.

* Fix lint.

* Update the serving docs.
2018-04-28 08:11:18 -07:00
Ankush Agarwal 26d68ead6c Replace kubeflow-images-staging with kubeflow-images-public (#99)
Fixes https://github.com/kubeflow/kubeflow/issues/534
2018-04-27 11:46:20 -07:00
Jeremy Lewi 4b33d44af6 Support training using a PVC for the data. (#98)
* Support training using a PVC for the data.

* This will make it easier to run the example on Katacoda and non-GCP platforms.

* Modify train.py so we can use a GCS location or local file paths.

* Update the Dockerfile. The jupyter Docker images and had a bunch of
  dependencies removed and the latest images don't have the dependencies
  needed to run the examples.

* Creat a tfjob-pvc component that trains reading/writing using PVC
  and not GCP.

* * Address reviewer comments

* Ignore changes to the ksonnet parameters when determining whether to include
  dirty and sha of the diff in the image. This way we can update the
  ksonnet app with the newly built image without it leading to subsequent
  images being marked dirty.

* Fix lint issues.

* Fix lint import issue.
2018-04-27 04:08:19 -07:00
Michelle Casbon fb2fb26f71 Add demo scripts & improvements to instructions (#84)
* Add setup scripts & github token param

* Clarify instructions

Add pointers to resolution for common friction points of new cluster
setup: GitHub rate limiting and RBAC permissions
Setup persistent disk before Jupyterhub so that it is only setup once
Clarify instructions about copying trained model files locally
Add version number to frontend image build
Add github_token ks parameter for frontend

* Change port to 8080

Fix indentation of bullet points

* Fix var name & link spacing

* Update description of serving script

* Use a single ksonnet environment

Move ksonnet app out of notebooks subdirectory
Rename ksonnet app to ks-kubeflow
Update instructions & scripts
Remove instructions to delete ksonnet app directory

* Remove github access token
2018-04-23 16:23:59 -07:00
Ankush Agarwal 6cf382f597 Distributed training using tensor2tensor (#86)
* Distributed training using tensor2tensor

* Use a transformer model to train the github issue summarization
problem
* Dockerfile for building training image
* ksonnet component for deploying tfjob

Fixes https://github.com/kubeflow/examples/issues/43

* Fix lint issues
2018-04-19 17:43:59 -07:00
Ankush Agarwal d01d8435bf Use ambassador to talk to the frontend ui (#71)
* Create a ksonnet app component to deploy to k8s
2018-04-06 21:50:08 -07:00
Ankush Agarwal e3b826a5af Rename issue_summarization.py to IssueSummarization.py (#68)
* Rename issue_summarization.py to IssueSummarization.py

* The module name is supposed to be the same as the class name
* Fix the predict method signature

* Fix lint
2018-04-06 21:40:08 -07:00
Ankush Agarwal b24152cf06 Github Issue Summarization - Train using TFJob (#55)
* Github Issue Summarization - Train using TFJob

* Create a Dockerfile to build the image for tf-job
* Create a manifest to deploy the tf-job
* Create instructions on how to do all of this

Fixes https://github.com/kubeflow/examples/issues/43

* Address comments

* Add gcloud commands
* Add ks app
* Update Dockerfile base image
* Python train.py fixes

* Remove tfjob.yaml as it is replaced by ksonnet app

* Remove plot_model_history as it is not required for tfjob training

* Don't change WORKDIR

* Address reviewer comments

* Fix links

* Fix lint issues using yapf

* Sort imports
2018-03-29 13:37:04 -07:00
Michelle Casbon 41372c9314 Add .pylintrc (#61)
* Add .pylintrc

* Resolve lint complaints in agents/trainer/task.py

* Resolve lint complaints with flask app.py

* Resolve linting issues

Remove duplicate seq2seq_utils.py from workflow/workspace/src

* Use python 3.5.2 with pylint to match prow

Put pybullet import back into agents/trainer/task.py with a pylint ignore statement
Use main(_) to ensure it works with tf.app.run
2018-03-29 08:25:02 -07:00
Hamel Husain 611e98ef1e Update Training.ipynb (#52)
Added Model Evaluation.  Deleted Table of Contents because you need Jupyter Extension to update that, so not worth it.
2018-03-19 16:08:01 -07:00
Hamel Husain 2ec3b03ed4 Update seq2seq_utils.py (#51)
Found a mistake with calculation of BLEU Score.
2018-03-18 12:25:58 -07:00
Michelle Casbon c50cda05ee Add file copy instructions after training (#47)
* Add file copy instructions after training

Fix broken link in cluster setup
Fix broken env variable in Training notebook
Change notebook name from Tutorial to Training

* Fix app selector value
2018-03-14 19:14:21 -07:00
Michelle Casbon 8ec9bac09e Add detail to cluster setup instructions (#44)
* Fix folder link

* Add detail to cluster setup instructions

Add a link to the image for this example.
In Tutorial.ipynb, move mounted directory into a variable to help avoid collisions on shared clusters.
2018-03-11 22:29:11 -07:00
Ankush Agarwal d1a2adfb01 Move from a custom tornado server to a seldon-core server for serving the model (#36)
* Create a end-to-end kubeflow example using seq2seq model (4/n)

* Move from a custom tornado server to a seldon-core model

Related to #11

* Update to use gcr.io registry for serving image
2018-03-09 14:36:12 -08:00
Ankush Agarwal ae6828cf3f
Create a end-to-end kubeflow example using seq2seq model (3/n)
* Create a simple tornado server to serve the model
* TODO: Create a docker image for the server and deploy on kubeflow

Related to https://github.com/kubeflow/examples/issues/11
2018-03-07 09:27:38 -08:00
Michelle Casbon adad73bad0 Merge remote-tracking branch 'upstream/master' into third-party 2018-03-01 15:05:54 -05:00
Michelle Casbon 76862c5141 Remove third_party folder & MIT license file 2018-02-27 13:17:42 -05:00