examples

Commit Graph

Author	SHA1	Message	Date
Helber Belmiro	4132f35d3b	Fixed broken link (#1063 )	2023-10-19 18:25:15 +00:00
Xin Ren	9665ff8a5c	Update link to Kubeflow on AWS (#948 )	2022-07-06 04:11:44 +00:00
陳傑夫	ac7c1762fd	some link finxing (#888 ) * some link finxing * some link finxing * some link finxing	2021-10-26 18:32:27 -07:00
Tommy Li	e95c20112b	Update IBM Cloud instructions to use persistent storage. (#766 )	2020-03-10 19:43:35 -07:00
Bernd Verst	2b827ea139	Adds MNIST E2E Example for Azure. (#759 ) * Adds MNIST E2E Example for Azure. * Remove auto-generated ToC * Remove incompatible script to retrieve Ingress URL * Remove orphaned ToC entry	2020-02-28 18:15:53 -08:00
Tommy Li	222715031a	add ibm mnist example (#746 ) remove cell outputs update cos section update missing typos	2020-02-26 15:27:19 -08:00
Jeremy Lewi	5b4b0c6c94	Remove kustomize from mnist example. (#745 ) * Remove kustomize from mnist example. * The mnist E2E guide has been updated to use notebooks and get rid of kustomize * We have notebooks for AWS, GCP, and Vanilla K8s. * As such we no longer need the old, outdated kustomization files or Docker containers anymore * The notebooks handle parameterizing the K8s resources using Python f style string. * Update the README to remove the old instructions. * Cleanup more references.	2020-02-21 14:14:47 -08:00
Jiaxin Shan	4c4f1c0f88	Create a notebook for mnist E2E on AWS (#740 ) * Add method to get ALB hostname for aws users * Revoke setup based on the platform * Add AWS notebook for mnist e2e example * Remove legacy kustomize manifests for mnist example * Address feedbacks from reviewers	2020-02-20 18:32:32 -08:00
Adhita Selvaraj	40f6ec8fe7	Mnist vanilla k8s (#737 ) * adds mnist example for vanilla k8s * typo fix * address review comments; get minio endpoint from k8s client;	2020-02-20 06:51:04 -08:00
Jeremy Lewi	cc93a80420	Create a notebook for mnist E2E on GCP (#723 ) * A notebook to run the mnist E2E example on GCP. This fixes a number of issues with the example * Use ISTIO instead of Ambassador to add reverse proxy routes * The training job needs to be updated to run in a profile created namespace in order to have the required service accounts * See kubeflow/examples#713 * Running inside a notebook running on Kubeflow should ensure user is running inside an appropriately setup namespace * With ISTIO the default RBAC rules prevent the web UI from sending requests to the model server * A short term fix was to not include the ISTIO side car * In the future we can add an appropriate ISTIO rbac policy * Using a notebook allows us to eliminate the use of kustomize * This resolves kubeflow/examples#713 which required people to use and old version of kustomize * Rather than using kustomize we can use python f style strings to write the YAML specs and then easily substitute in user specific values * This should be more informative; it avoids introducing kustomize and users can see the resource specs. * I've opted to make the notebook GCP specific. I think its less confusing to users to have separate notebooks focused on specific platforms rather than having one notebook with a lot of caveats about what to do under different conditions * I've deleted the kustomize overlays for GCS since we don't want users to use them anymore * I used fairing and kaniko to eliminate the use of docker to build the images so that everything can run from a notebook running inside the cluster. * k8s_utils.py has some reusable functions to add some details from users (e.g. low level calls to K8s APIs.) * * Change the mnist test to just run the notebook * Copy the notebook test infra for xgboost_synthetic to py/kubeflow/examples/notebook_test to make it more reusable * Fix lint. * Update for lint. * A notebook to run the mnist E2E example. Related to: kubeflow/website#1553 * 1. Use fairing to build the model. 2. Construct the YAML spec directly in the notebook. 3. Use the TFJob python SDK. * Fix the ISTIO rule. * Fix UI and serving; need to update TF serving to match version trained on. * Get the IAP endpoint. * Start writing some helper python functions for K8s. * Commit before switching from replace to delete. * Create a library to bulk create objects. * Cleanup. * Add back k8s_util.py * Delete train.yaml; this shouldn't have been aded. * update the notebook image. * Refactor code into k8s_util; print out links. * Clean up the notebok. Should be working E2E. * Added section to get logs from stackdriver. * Add comment about profile. * Latest. * Override mnist_gcp.ipynb with mnist.ipynb I accidentally put my latest changes in mnist.ipynb even though that file was deleted. * More fixes. * Resolve some conflicts from the rebase; override with changes on remote branch.	2020-02-16 19:15:28 -08:00
Amy	0c8d2fdfc1	mnist example namespace fix (#720 ) * update mnist tutorial to use the user profile namespace for the tfjob * add namespace arg to some kubectl commands	2020-02-10 10:11:54 -08:00
Jin Chi He	1e385247b0	update ci tests for mnist example (#684 )	2019-12-06 16:55:54 -08:00
Mike Mainguy	0b33b536b7	Applied changes to README and Kustomize files to handle training, monitoring, and serving the mnist model in S3 using Kustomize (#543 )	2019-08-19 17:41:33 -07:00
Simon Rey	ef9484595f	Add tensorboard support for local mninst example (#616 ) * Add files via upload * Update kustomization.yaml * Update README.md * Update README.md * Update README.md	2019-08-12 16:03:38 -07:00
Xiao Kou	607533311e	Fix mnist readme service name and deployments name typo (#611 )	2019-07-29 20:12:51 -07:00
Jin Chi He	da903549cc	remove useless requirements.txt (#593 )	2019-07-14 14:49:03 -07:00
Jin Chi He	871895c544	recomment using kustomize v2.0.3 for mnist (#584 )	2019-07-04 19:20:35 -07:00
Zane Durante	45ece829aa	Fixed typo (#572 ) Changed "intalled" to "installed"	2019-06-10 19:32:18 -07:00
Karthik Vadla	8efa3e72fb	MNIST: Update web-ui reference for GCS (#570 ) * Update webui url for GCS * clean up	2019-06-07 17:57:06 -07:00
Jin Chi He	4f8bbe1e9e	update TFJob version to v1 (#568 )	2019-06-06 21:11:59 -07:00
ssivart	21f76812ec	[docs]: enhance documentation for chosen namespace (#555 )	2019-05-15 02:44:19 -07:00
Iman Tabrizian	fce8fa0a3f	[docs]: fix minor bug in the documentation (#554 )	2019-05-12 18:08:08 -07:00
Jin Chi He	5fac627725	drop_ksonnet_from_mnist (#546 )	2019-05-07 19:54:32 -07:00
Jin Chi He	335c2a1d6e	Enhance mnist training and add serving steps for local mode (#528 )	2019-04-09 14:54:45 -07:00
Oleg Shepetjuk	90ea8cb8cd	[mnist] Add support for S3 in TensorBoard component; Update docs. (#499 ) * [mnist] Add support for S3 in TensorBoard component; Update docs. * [mnist] reverted autonumbering in README * [mnist] add expected fail for predict_test, until it'ss fixed	2019-02-20 06:34:23 -08:00
Daniel Sanche	b18cec9b3b	Mnist fixes (#495 ) * removed environments * fixed issues with README * addressed PR comments * updated aws yaml to match master	2019-02-13 16:45:38 -08:00
Oleg Shepetyuk	9505afa524	Removed note about S3 from README	2019-01-26 09:55:54 +02:00
Oleg Shepetyuk	ea86a41172	Updated mnist example README with AWS credentials setting	2019-01-25 17:26:56 +02:00
Jeremy Lewi	6770b4adcc	Add the web-ui for the mnist example (#473 ) * Add the web-ui for the mnist example Copy the mnist web app from https://github.com/googlecodelabs/kubeflow-introduction * Update the web app * Change "server-name" argument to "model-name" because this is what is. * Update the prediction client code; The prediction code was copied from https://github.com/googlecodelabs/kubeflow-introduction and that model used slightly different values for the input names and outputs. * Add a test for the mnist_client code; currently it needs to be run manually. * Fix the label selector for the mnist service so that it matches the TFServing deployment. * Delete the old copy of mnist_client.py; we will go with the copy in ewb-ui from https://github.com/googlecodelabs/kubeflow-introduction * Delete model-deploy.yaml, model-train.yaml, and tf-user.yaml. The K8s resources for training and deploying the model are now in ks_app. * Fix tensorboard; tensorboard only partially works behind Ambassador. It seems like some requests don't work behind a reverse proxy. * Fix lint.	2019-01-14 13:56:39 -08:00
Jeremy Lewi	2494fdf8c5	Update serving in mnist example; use 0.4 and add testing. (#469 ) * Add the TFServing component * Create TFServing components. * The model.py code doesn't appear to be exporting a model in saved model format; it was a missing a call to export. * I'm not sure how this ever worked. * It also looks like there is a bug in the code in that its using the cnn input fn even if the model is the linear one. I'm going to leave that as is for now. * Create a namespace for each test run; delete the namespace on teardown * We need to copy the GCP service account key to the new namespace. * Add a shell script to do that.	2019-01-11 14:36:43 -08:00
Jeremy Lewi	ef108dbbcc	Update training to use Kubeflow 0.4 and add testing. (#465 ) * Update training to use Kubeflow 0.4 and add testing. * To support testing we need to create a ksonnet template to train the model so we can easily subsitute in different parameters during training. * We create a ksonnet component for just training; we don't use Argo. This makes the example much simpler. * To support S3 we add a generic ksonnet parameter to take environment variables as a comma separated list of variables. This should make it easy for users to set the environment variables needed to talk to S3. This is compatible with the existing Argo workflow which supports S3. * By default the training job runs non-distributed; this is because to run distributed the user needs a shared filesystem (e.g. S3/GCS/NFS). * Update the mnist workflow to correctly build the images. * We didn't update the workflow in the previous example to actually build the correct images. * Update the workflow to run the tfjob_test * Related to #460 E2E test for mnist. * Add a parameter to specify a secret that can be used to mount a secret such as the GCP service account key. * Update the README with instructions for GCS and S3. * Remove the instructions about Argo; the Argo workflow is outdated. Using Argo adds complexity to the example and the thinking is to remove that to provide a simpler example and to mirror the pytorch example. * Add a TOC to the README * Update prerequisite instructions. * Delete instructions for installing Kubeflow; just link to the getting started guide. * Argo CLI should no longer be needed. * GitHub token shouldn't be needed; I think that was only needed for ksonnet to pull the registry. * * Fix instructions; access keys shouldn't be stored as ksonnet parameters as these will get checked into source control.	2019-01-10 12:42:45 -08:00
Inki Hwang	8e30631c54	example mnist upgrade to v1alpha2 (#246 ) * example mnist upgrade to v1alpha2 * Remove cleanPodPolicy * Fix kubeflow branch to v0.2.4	2018-09-09 13:01:21 -07:00
Julien Stroheker	2d335b1302	minst example - Add Azure instructions / update Argo package (#114 ) * Adding Azure instructions * add auth	2018-05-14 09:33:25 -07:00
Elson Rodriguez	7434bb55ba	Updating mnist example to fix minio compatibility (#108 ) * Updating mnist example to fix minio compatibility * Changing default sa user for ksonnet entrypoint * Updating mnist example based on pr feedback.	2018-04-30 16:14:18 -07:00
Elson Rodriguez	1be7ccb142	Fixes #2 : End to end model training/serving example using S3, Argo, and Kubeflow (#42 ) * Add awscli tools container. * Add initial readme. * Add argo skeleton. * Run a an argo job. * Artifact support and argo test * Use built container (#3) * Fix artifacts and secrets * Add work in progress tfflow (#14) * Add kvc deployment to workflow. * Switch aws repo. * wip. * Add working tfflow job. * Add sidecar that waits for MASTER completion * Pass in job-name * Add volumemanager info step * Add input parameters to step * Adds nodeaffinity and hostpath * Add fixes for workflow (#17) - Use correct images for worker and ps - Use correct aws keys - Change volumemanager to mnist - Comment unused steps - Fix volume mount to correct containers * Fix hostpath for tfjob * Download all mnist files * added GCS stored artifacts comptability to Argo * Add initial inference workflow. (#30) * Initial serving step (#31) * Adds fixes to initial serving step * Ready for rough demo: Workflow in working state * Move conflicting readme. * Initial commit, everything boots without crashing. * Working, with some python errors. * Adding explicit flags * Working with ins-outs * Letting training job exit on success * Adding documentation skeletion * trying to properly save model * Almost working * Working * Adding export script, refactored to allow model more reusability * Starting documentation * little further on docs * More doc updates, fixing sleep logic * adding urls for mnist data * Removing download logic, it's to tied in with build-in tf examples. * Added argo workflow instructions, minor cleanups. * Adding mnist client. * Fixing typos * Adding instructions for installing components. * Added ksonnet container * Adding new entrypoint. * Added helm install instructions for kvc * doing things with variables * Typos. * Added better namespace support * S3 refactor. * Added missing region variables. * Adding tensorboard support. * Addding Container for Tensorboard. * Added temporary flag, added install instructions for CLI. * Removing invalid ksonnet environment. * Updating readme * Cleanup currently unused pieces * Add missint cluster-role * Minor cleanup. * Adding more parameters. * added changes to allow model to train on multiple workers and fixed some doc typos * Adding flag to enable/disable model serving. Adding s3 urls as outputs for future querying, renaming info step. * Adding seperate deployer workflow. * Split serving working. * Adding split workflow. * More parameters. * updates as to elson comments * Revert "added changes to allow model to train on multiple workers and fixed s…" * Initial working pure-s3 workflow. * Removed wait sidecars. * Remove unused flag. * Added part two, minor doc fixes * Inverted links... * Adding diff. * Fix url syntax * Documentation updates. * Added AWS Cli * Parameterized export. * Fixing image in s3 version. * Fixed documentation issues. * KVC snippet changes, need to find last working helm chart. * Temporarily pinning kvc version. * working master model and some doc typos fixes (#13) * added changes to allow model to train on multiple workers and fixed some doc typos * Adding flag to enable/disable model serving. Adding s3 urls as outputs for future querying, renaming info step. * Adding seperate deployer workflow. * Split serving working. * Adding split workflow. * More parameters. * updates as to elson comments * working master model and some doc typos * fixes as to Elson * Removign whitespace differences * updating diff * Changing parameters. * Undoing whitespace. * Changing termination policy on s3 version due to unknown issue. * Updating mnist diff. * Changing train steps. * Syncing Demo changes. * Update README.md * Going S3-native for initial example. Getting rid of Master. * Minor documentation tweaks, adding params, swapping aws cli for minio. * Updating KVC version. * Switching ksonnet repo, removing model name from client. * Updating git url. * Adding certificate hack to avoid RBAC errors. * Pinning KVC to commit while working on PR. * Updating version. * Updates README with additional details (#14) * Updates README with additional details * Adding clarity to kubectl config commands * Fixed comma placement * Refactoring notes for github and kubernetes credentials. * Forgot to add an overview of the argo template. * Updating example based on feedback. - Removed superflous images - Clarified use of KVC - Added unaltered model - Variable cleanup * Refactored grpc image into generic base image. * minor cleanup of resubmitting section. * Switching Argo deployment to ksonnet, conslidating install instructions. * Removing old cruft, clarifying cluster requirements. * [WIP] Switching out model (#15) * Switching to new mnist example. * Parameterized model, testing export. * Got CNN model exporting. * Attempting to do distributed training with Estimator, removed seperate export. * Adding master back, otherwise Estimator complains about not having a chief. * Switching to tf.estimator.train_and_evaluate. * Minor path/var name refactor. * Adding test data and new client. * Fixed documentation to reflect new client. * Getting rid of tf job shim. * Removing KVC from example, renaming directory * Modifying parent README * Removed reference to export. * Adding reference to export. * Removing unused Dockerfile. * Removing uneeded files, simplifying how to get status, refactor model serving workflow step. * Renaming directory * Minor doc improvements, removed extra clis. * Making SSL configurable for clusters without secured s3 endpoints. * Added a tf-user account for workflow. Fixed serving bug. * Updating gke version. * Re-ran through instructions, fixed errata. * Fixing lint issues * Pylint errors * Pylint errors * Adding parenthesis back. * pylint Hacks * Disabling argument filter, model bombs without empty arg. * Removing unneeded lambdas	2018-04-06 14:34:09 -07:00

35 Commits