examples

Commit Graph

Author	SHA1	Message	Date
Daniel Sanche	8a26b23e3d	Add Dan Sanche to OWNERS (#520 )	2019-03-06 09:24:49 -08:00
Jin Chi He	bc11d20adf	resolve confict for the patch (#492 )	2019-02-26 09:22:38 -08:00
Jeremy Lewi	7f7fbfd1cd	Trigger unittests on postsubmit and periodic runs. (#511 ) * Trigger unittests on postsubmit and periodic runs. * Rename the unittests workflow because its running unittests not E2E tests. Fix #510 * Shorten the name otherwise step names become two long.	2019-02-22 11:03:06 -08:00
Oleg Shepetjuk	90ea8cb8cd	[mnist] Add support for S3 in TensorBoard component; Update docs. (#499 ) * [mnist] Add support for S3 in TensorBoard component; Update docs. * [mnist] reverted autonumbering in README * [mnist] add expected fail for predict_test, until it'ss fixed	2019-02-20 06:34:23 -08:00
Hung-Ting Wen	45d157f238	Update gitignore for all VIM temp files. (#477 ) * Update gitignore for all VIM temp files. * remove existed format.	2019-02-19 11:06:02 -08:00
Michelle Casbon	ed82693304	Merge pull request #508 from CraigSterrett/pytorch_mnist Fixed typo in README and one bad link	2019-02-17 09:06:50 -08:00
Michelle Casbon	692c78550e	Merge pull request #399 from govindKAG/patch-1 fixed "setting persistent disk" link	2019-02-17 09:04:42 -08:00
Craig Sterrett	838ad79898	Fixed typo in README and one bad link Two small fixes I ran into when trying the example. One is a type, it says it's displaying an 8 when it is a 7. The other was a bad link.	2019-02-15 11:14:23 -08:00
Daniel Sanche	b18cec9b3b	Mnist fixes (#495 ) * removed environments * fixed issues with README * addressed PR comments * updated aws yaml to match master	2019-02-13 16:45:38 -08:00
Zhenghui Wang	74378a2990	Add end2end test for Xgboost housing example (#493 ) * Add e2e test for xgboost housing example * fix typo add ks apply add [ modify example to trigger tests add prediction test add xgboost ks param rename the job name without _ use - instead of _ libson params rm redudent component rename component in prow config add ames-hoursing-env use - for all names use _ for params names use xgboost_ames_accross rename component name shorten the name change deploy-test command change to xgboost- namespace init ks app fix type add confest.py change path change deploy command change dep change the query URL for seldon add ks_app with seldon lib update ks_app use ks init only rerun change to kf-v0-4-n00 cluster add ks_app use ks-13 remove --namespace use kubeflow as namespace delete seldon deployment simplify ks_app retry on 503 fix typo query 1285 move deletion after prediction wait 10s always retry till 10 mins move check to retry fix pylint move clean-up to the delete template * set up xgboost component * check in ks component& run it directly * change comments * add comment on why use 'ks delete' * add two modules to pylint whitelist * ignore tf_operator/py * disable pylint per line * reorder import	2019-02-12 06:37:05 -08:00
Gregory Godreau	329c53cea5	Fixes #497 (#498 ) * Fixes #497 - Removes slack invite link * Fixes #497 - Updates slack url to https://join.slack.com/t/kubeflow/shared_invite/enQtNDg5MTM4NTQyNjczLWUyZGI1ZmExZWExYWY4YzlkOWI4NjljNjJhZjhjMjEwNGFjNmVkNjg2NTg4M2I0ZTM5NDExZWI5YTIyMzVmNzM - This link allows a member of the general public (who doesn't have either a google, microsoft, or caicloud address) to sign up for the slack channel with an auto-generated invite	2019-02-04 10:45:54 -08:00
Jeremy Lewi	908ed3f2ba	Merge pull request #488 from agilestacks/master Add support for AWS access/secret keys in the train component (#466)	2019-01-30 16:38:32 -08:00
Oleg Shepetyuk	9505afa524	Removed note about S3 from README	2019-01-26 09:55:54 +02:00
Oleg Shepetyuk	ea86a41172	Updated mnist example README with AWS credentials setting	2019-01-25 17:26:56 +02:00
Oleg Shepetyuk	f85a8e970f	Made SecretRefs more generic and fixed failed test	2019-01-24 18:40:56 +02:00
Christopher Beitel	89e960202a	rm stale agents example (#487 )	2019-01-23 16:27:50 -08:00
Oleg Shepetyuk	f89af01e2c	Add support for AWS access/secret keys in train component (#466 )	2019-01-23 09:58:00 +02:00
Jeremy Lewi	2b0eec34c3	Enable periodic tests for mnist & GH issue examples. (#486 ) * Add a link to the E2E testing guide to the contributing page. Related to #485 - enable periodic mnist E2E testing.	2019-01-22 16:10:17 -08:00
govind cs	225a7e9f90	Merge branch 'master' into patch-1	2019-01-21 09:49:12 +05:30
govind cs	bf5e18a34e	Update 01_setup_a_kubeflow_cluster.md	2019-01-21 09:46:04 +05:30
Zhenghui Wang	22715c4900	build image for ames-housing-serving (#484 )	2019-01-18 18:19:44 -08:00
Jeremy Lewi	5b797c871e	Create an E2E test for TFServing using the rest API (#479 ) * Create an E2E test for TFServing using the rest API * We use the pytest framework because 1. it has really good support for using command line arguments 2. can emit junit xml file to report results to prow. Related to #270: Create a generic test runner * Address comments. * Fix lint. * Add retries to the prediction. * Add some comments. * Fix model path. * * Fix the workflow labels * Set the K8s service name correctly on the test. * Fix the workflow. * Fix lint.	2019-01-18 16:29:42 -08:00
govind cs	b71a14396a	optimized apt-get to reduce image size (#482 ) * optimized apt-get to reduce image size * More verbose logging * minor fix removed no install recommends	2019-01-18 06:00:18 -08:00
cliveseldon	8d728f0b06	GitHub Summarization Seldon Update (#472 ) * Update model inference wrapping to use S2I and update docs * Add s2i reference in docs * Fix typo highlighted in review * Add pyLint annotation to allow protected-access on keras make predict function method	2019-01-17 16:07:34 -08:00
David Sabater Dinter	152c38b386	[mnist_pytorch] Optimise build and switch backend from MPI to GLOO (#480 ) * Refactor Python module: - Replace MPI by GLOO as backend to avoid having to recompily Pytorch - Replace DistributedDataParallel() class with official version when using GPUs - Remove unnecessary method to disable logs in workers - Refactor run() * Simplify Dockerfile by using Pytorch 0.4 official image with Cuda and remove mpirun call	2019-01-16 11:38:52 -08:00
Zhenghui Wang	1ed08b9af2	Fix model serving part of xgboost_ams_housing example. (#478 ) * Fix model serving for ames house example * change the step instructions * add public image	2019-01-15 12:30:51 -08:00
Jeremy Lewi	46a795693a	Minor fixes to the notebook. (#427 ) * Need to fix the import and compile commands. * Check if an experiment with the name already exists.	2019-01-15 08:33:19 -08:00
Zhenghui Wang	8f32202a36	Add richard and zhenghui to approvers for kubeflow/examples (#470 ) * add richard and zhenghui as approvers for examples * add owner file to xgboost example * reduce approvers * update	2019-01-14 17:50:30 -08:00
Richard Liu	64c3889071	Merge pull request #476 from richardsliu/hp_tuning Fix xgboost example for hyperparameter tuning	2019-01-14 17:41:07 -08:00
Hung-Ting Wen	c83ed09a77	revert back removed v1alpha2 yaml manifests (#475 ) * revert back removed v1alpha2 yaml manifests * Add documentation * Fix format	2019-01-14 17:08:29 -08:00
Richard Liu	3859564422	Fix pylint and log fmt	2019-01-14 17:01:27 -08:00
Richard Liu	1b29c2176e	Merge remote-tracking branch 'upstream/master' into hp_tuning	2019-01-14 16:00:30 -08:00
Richard Liu	8437ec9e5c	Fix logging	2019-01-14 15:54:25 -08:00
Jeremy Lewi	6770b4adcc	Add the web-ui for the mnist example (#473 ) * Add the web-ui for the mnist example Copy the mnist web app from https://github.com/googlecodelabs/kubeflow-introduction * Update the web app * Change "server-name" argument to "model-name" because this is what is. * Update the prediction client code; The prediction code was copied from https://github.com/googlecodelabs/kubeflow-introduction and that model used slightly different values for the input names and outputs. * Add a test for the mnist_client code; currently it needs to be run manually. * Fix the label selector for the mnist service so that it matches the TFServing deployment. * Delete the old copy of mnist_client.py; we will go with the copy in ewb-ui from https://github.com/googlecodelabs/kubeflow-introduction * Delete model-deploy.yaml, model-train.yaml, and tf-user.yaml. The K8s resources for training and deploying the model are now in ks_app. * Fix tensorboard; tensorboard only partially works behind Ambassador. It seems like some requests don't work behind a reverse proxy. * Fix lint.	2019-01-14 13:56:39 -08:00
Richard Liu	9e1ee20512	Fix xgboost for hp tuning	2019-01-14 11:50:13 -08:00
Zhenghui Wang	b3f06c204d	Fix the model training of ames-housing example (#468 ) * correct the image path * fix training part * rm downloading from github	2019-01-11 17:08:22 -08:00
Jeremy Lewi	2494fdf8c5	Update serving in mnist example; use 0.4 and add testing. (#469 ) * Add the TFServing component * Create TFServing components. * The model.py code doesn't appear to be exporting a model in saved model format; it was a missing a call to export. * I'm not sure how this ever worked. * It also looks like there is a bug in the code in that its using the cnn input fn even if the model is the linear one. I'm going to leave that as is for now. * Create a namespace for each test run; delete the namespace on teardown * We need to copy the GCP service account key to the new namespace. * Add a shell script to do that.	2019-01-11 14:36:43 -08:00
Jeremy Lewi	ef108dbbcc	Update training to use Kubeflow 0.4 and add testing. (#465 ) * Update training to use Kubeflow 0.4 and add testing. * To support testing we need to create a ksonnet template to train the model so we can easily subsitute in different parameters during training. * We create a ksonnet component for just training; we don't use Argo. This makes the example much simpler. * To support S3 we add a generic ksonnet parameter to take environment variables as a comma separated list of variables. This should make it easy for users to set the environment variables needed to talk to S3. This is compatible with the existing Argo workflow which supports S3. * By default the training job runs non-distributed; this is because to run distributed the user needs a shared filesystem (e.g. S3/GCS/NFS). * Update the mnist workflow to correctly build the images. * We didn't update the workflow in the previous example to actually build the correct images. * Update the workflow to run the tfjob_test * Related to #460 E2E test for mnist. * Add a parameter to specify a secret that can be used to mount a secret such as the GCP service account key. * Update the README with instructions for GCS and S3. * Remove the instructions about Argo; the Argo workflow is outdated. Using Argo adds complexity to the example and the thinking is to remove that to provide a simpler example and to mirror the pytorch example. * Add a TOC to the README * Update prerequisite instructions. * Delete instructions for installing Kubeflow; just link to the getting started guide. * Argo CLI should no longer be needed. * GitHub token shouldn't be needed; I think that was only needed for ksonnet to pull the registry. * * Fix instructions; access keys shouldn't be stored as ksonnet parameters as these will get checked into source control.	2019-01-10 12:42:45 -08:00
Hung-Ting Wen	4dda73afbf	Update pytorch_mnist example to use v1beta1 (#445 ) * Add job_mnist_DDP_CPU for v1beta1 * Add job_mnist_DDP_GPU for v1beta1 * Update 02_distributed_training.md to use v1beta1 * Remove pytorch v1alpha2 config * Add missing CPU training config	2019-01-09 05:27:35 -08:00
David Sabater Dinter	38daafa0c3	[mnist_pytorch] Update documentation (#463 ) * Fix link to next section, training the model * Added links to next and previous sections in training the model README * Fix link to previous section, training the model * Remove TODO list	2019-01-08 15:32:51 -08:00
Jeremy Lewi	d28ba7c4db	Continuously build the docker images used by mnist. (#462 ) * This is the first step in adding E2E tests for the mnist example. * Add a Makefile and .jsonnet file to build the Docker images using GCB * Define an Argo workflow to trigger the image builds on pre & post submit. Related to: #460	2019-01-08 15:21:49 -08:00
Jeremy Lewi	1cc4550b7d	GIS E2E test verify the TFJob runs successfully (#456 ) * Create a test for submitting the TFJob for the GitHub issue summarization example. * This test needs to be run manually right now. In a follow on PR we will integrate it into CI. * We use the image built from Dockerfile.estimator because that is the image we are running train_test.py in. * Note: The current version of the code now requires Python3 (I think this is due to an earlier PR which refactored the code into a shared implementation for using TF estimator and not TF estimator). * Create a TFJob component for TFJob v1beta1; this is the version in KF 0.4. TFJob component * Upgrade to v1beta to work with 0.4 * Update command line arguments to match the versions in the current code * input & output are now single parameters rather then separate parameters for bucket and name * change default input to a CSV file because the current version of the code doesn't handle unzipping it. * Use ks_util from kubeflow/testing * Address comments.	2019-01-08 15:06:49 -08:00
Jeremy Lewi	959d072e68	Setup continuous building of Docker images for GH Issue Summarization Example (#449 ) * Setup continuous building of Docker images and testing for GH Issue Summarization Example. * This is the first step in setting up a continuously running CI test. * Add support for building the Docker images using GCB; we will use GCB to trigger the builds from our CI system. * Make the Makefile top level (at root of GIS example) so that we can easily access all the different resources. * Add a .gitignore file to avoid checking in the build directory used by the Makefile. * Define an Argo workflow to use as the E2E test. Related to #92: E2E test & CI for github issue summarization * Trigger the test on pre & post submit * Dockerfile.estimator don't install the data_download.sh script * It doesn't look like we are currently using data_download.sh in the DockerImage * It looks like it only gets used vias the ksonnet job which mounts the script via a config map * Copying data_download.sh to the Docker image is currently weird given the organization of the Dockerfile and context. * Copy the test_data to the Docker images so that we can run the test inside the images. * Invoke the python unittest for training from our CI system. * In a follow on PR we will update the test to emit a JUnit XML file to report results to prow. * Fix image build.	2019-01-04 17:02:24 -08:00
Michelle Casbon	70a22d6d7b	[GH Issue Summarization] Upgrade to kf v0.4.0-rc.2 (#450 ) * Update tfjob components to v1beta1 Remove old version of tensor2tensor component * Combine UI into a single jsonnet file * Upgrade GH issue summarization to kf v0.4.0-rc.2 Use latest ksonnet v0.13.1 Use latest seldon v1alpha2 Remove ksonnet app with full kubeflow platform & replace with components specific to this example. Remove outdated scripts Add cluster creation links to Click-to-deploy & kfctl Add warning not to use the Training with an Estimator guide Replace commandline with bash for better syntax highlighting Replace messy port-forwarding commands with svc/ambassador Add modelUrl param to ui component Modify teardown instructions to remove the deployment Fix grammatical mistakes * Rearrange tfjob instructions	2018-12-30 20:05:29 -08:00
Jeremy Lewi	7990408207	Delete obsolete HP tuning code. (#451 ) * Katib no longer uses custom go programs. Instead it uses the new StudyJobController custom resource. * This code is no longer needed so delete it.	2018-12-29 19:00:14 -08:00
Hung-Ting Wen	37dd52f49d	Fix example documentation (#447 )	2018-12-28 18:11:33 -08:00
Jeremy Lewi	e15bfffca4	An Argo workflow to use as the E2E test for code_search example. (#446 ) * An Argo workflow to use as the E2E test for code_search example. * The workflow builds the Docker images and then runs the python test to train and export a model * Move common utilities into util.libsonnet. * Add the workflow to the set of triggered workflows. * Update the test environment used by the test ksonnet app; we've since changed the location of the app. Related to #295 * Refactor the jsonnet file defining the GCB build workflow * Use an external variable to conditionally pull and use a previous Docker image as a cache * Reduce code duplication by building a shared template for all the different workflows. * BUILD_ID needs to be defined in the default parameters otherwise we get an error when adding a new environment. * Define suitable defaults.	2018-12-28 16:12:32 -08:00
David Sabater Dinter	a1f0d6dfec	Fixed some outdated comments to trigger pushing web-ui and model serve images to gcr.io/kubeflow-examples (#444 )	2018-12-26 15:05:42 -08:00
Hougang Liu	1ed74b274c	create pv for pets-pv (#439 ) * create pv for pets-pv For a lot of user k8s clusters, dynamic volume provisioning isn't enabled. So the newcomer may be blocked since pets-pv will keep Pending. We can guide them to create a nfs PV as an option. * tell user how to check if a default storage class is defined * add link about how to create PV	2018-12-21 06:05:11 -08:00
Jeremy Lewi	2e6e891a5b	Update the ArgoCD app to use the kubeflow/examples repo (#440 ) * We were using jlewi's fork because PRs hadn't been committed but all the relevant PRs have been merged and master is the source of truth.	2018-12-19 21:26:49 -08:00

... 4 5 6 7 8 ...

544 Commits All Branches Search

544 Commits

All Branches