Commit Graph

56 Commits

Author SHA1 Message Date
Helber Belmiro 4132f35d3b
Fixed broken link (#1063) 2023-10-19 18:25:15 +00:00
Xin Ren 9665ff8a5c
Update link to Kubeflow on AWS (#948) 2022-07-06 04:11:44 +00:00
陳傑夫 ac7c1762fd
some link finxing (#888)
* some link finxing

* some link finxing

* some link finxing
2021-10-26 18:32:27 -07:00
Tommy Li 58afbf10b6
[MNIST] Update fairing to 1.0.2 to resolve import bug (#840) 2021-01-12 18:30:35 -08:00
Jeremy Lewi 7bde5b484d
Update fairing in mnist to 1.0.1 (#807)
This addresses kubeflow/kfserving#806. Fairing 1.0.0 isn't compatible with kfserving 0.3.2
2020-07-07 17:37:07 -07:00
Jeremy Lewi c880fdaa80
Delete the notebook tests because they are outdated. (#808)
* Delete the notebook tests because they are outdated.

* We have rewritten the test infra for notebooks to use Tekton.
  see:
    https://github.com/kubeflow/testing/blob/master/tekton/templates/pipelines/notebook-test-pipeline.yaml
    https://github.com/kubeflow/examples/tree/master/py/kubeflow/examples/notebook_tests

* We are also no longer regularly deploying the v1 clusters; we are no using
  blueprints so that's why the tests can no longer get credentials

* * Add the mnist notebook test as a postsubmit and periodic test.

* Fix.
2020-07-07 01:23:58 -07:00
Yash Jakhotiya 10b34b8dc8
Add link to the 'Deploying Kubeflow on GCP' doc (#779)
This is important as this is an E2E tutorial. Moreover, the catch that GCP Free Tier and the 12-month trial period with $300 credit does not offer enough resources to run default GCP installation of Kubeflow is mentioned in those docs.
2020-06-29 14:18:41 -07:00
Ian Coffey b8b7179fc2
Bump fairing commit to latest on master to fix examples (#795)
* fixes kubeflow/examples#792 bump fairing commit to latest on master to fix periodic tests

* Skip xgboost synthetic test until solution is found
2020-05-20 13:56:19 -07:00
Tommy Li e95c20112b
Update IBM Cloud instructions to use persistent storage. (#766) 2020-03-10 19:43:35 -07:00
Derrick Miller 95db89ad74
Update Dockerfile ENTRYPOINT to use /usr/local/bin/python (#744)
Base image `FROM tensorflow/tensorflow:1.15.2-py3` uses python3 and therefore the python binary location is `/usr/bin/python3`. However, [tensorflow base image creates a symlink](e5bf8de410/tensorflow/tools/dockerfiles/dockerfiles/cpu.Dockerfile (L45)) to the current python binary as `/usr/local/bin/python` regardless if that is python version 2 or version 3, so that binary location should be used in the *ENTRYPOINT* of the `Dockerfile.model` instead of `/usr/bin/python` which is customary for Python v2.x installations.
2020-03-03 08:41:38 -08:00
Sarah Maddox 7589c004d7
Tech writing updates for GCP MNIST notebook (#756)
* Tech writing updates for GCP MNIST notebook.

* Changed logic for defining webapp endpoints.
2020-03-02 13:23:38 -08:00
Bernd Verst 2b827ea139
Adds MNIST E2E Example for Azure. (#759)
* Adds MNIST E2E Example for Azure.

* Remove auto-generated ToC

* Remove incompatible script to retrieve Ingress URL

* Remove orphaned ToC entry
2020-02-28 18:15:53 -08:00
Tommy Li 222715031a
add ibm mnist example (#746)
remove cell outputs

update cos section

update missing typos
2020-02-26 15:27:19 -08:00
Jeremy Lewi 5b4b0c6c94
Remove kustomize from mnist example. (#745)
* Remove kustomize from mnist example.

* The mnist E2E guide has been updated to use notebooks and get rid
  of kustomize

* We have notebooks for AWS, GCP, and Vanilla K8s.

* As such we no longer need the old, outdated kustomization files or
  Docker containers anymore

  * The notebooks handle parameterizing the K8s resources using Python
    f style string.

* Update the README to remove the old instructions.

* Cleanup more references.
2020-02-21 14:14:47 -08:00
Jiaxin Shan 4c4f1c0f88
Create a notebook for mnist E2E on AWS (#740)
* Add method to get ALB hostname for aws users

* Revoke setup based on the platform

* Add AWS notebook for mnist e2e example

* Remove legacy kustomize manifests for mnist example

* Address feedbacks from reviewers
2020-02-20 18:32:32 -08:00
Adhita Selvaraj 40f6ec8fe7
Mnist vanilla k8s (#737)
* adds mnist example for vanilla k8s

* typo fix

* address review comments; get minio endpoint from k8s client;
2020-02-20 06:51:04 -08:00
Jeremy Lewi cc93a80420
Create a notebook for mnist E2E on GCP (#723)
* A notebook to run the mnist E2E example on GCP.

This fixes a number of issues with the example
* Use ISTIO instead of Ambassador to add reverse proxy routes
* The training job needs to be updated to run in a profile created namespace in order to have the required service accounts
     * See kubeflow/examples#713
     * Running inside a notebook running on Kubeflow should ensure user
       is running inside an appropriately setup namespace
* With ISTIO the default RBAC rules prevent the web UI from sending requests to the model server
     * A short term fix was to not include the ISTIO side car
     * In the future we can add an appropriate ISTIO rbac policy

* Using a notebook allows us to eliminate the use of kustomize
  * This resolves kubeflow/examples#713 which required people to use
    and old version of kustomize

  * Rather than using kustomize we can use python f style strings to
    write the YAML specs and then easily substitute in user specific values

  * This should be more informative; it avoids introducing kustomize and
    users can see the resource specs.

* I've opted to make the notebook GCP specific. I think its less confusing
  to users to have separate notebooks focused on specific platforms rather
  than having one notebook with a lot of caveats about what to do under
  different conditions

* I've deleted the kustomize overlays for GCS since we don't want users to
  use them anymore

* I used fairing and kaniko to eliminate the use of docker to build the images
  so that everything can run from a notebook running inside the cluster.

* k8s_utils.py has some reusable functions to add some details from users
  (e.g. low level calls to K8s APIs.)

* * Change the mnist test to just run the notebook
  * Copy the notebook test infra for xgboost_synthetic to py/kubeflow/examples/notebook_test to make it more reusable

* Fix lint.

* Update for lint.

* A notebook to run the mnist E2E example.

Related to: kubeflow/website#1553

* 1. Use fairing to build the model. 2. Construct the YAML spec directly in the notebook. 3. Use the TFJob python SDK.

* Fix the ISTIO rule.

* Fix UI and serving; need to update TF serving to match version trained on.

* Get the IAP endpoint.

* Start writing some helper python functions for K8s.

* Commit before switching from replace to delete.

* Create a library to bulk create objects.

* Cleanup.

* Add back k8s_util.py

* Delete train.yaml; this shouldn't have been aded.

* update the notebook image.

* Refactor code into k8s_util; print out links.

* Clean up the notebok. Should be working E2E.

* Added section to get logs from stackdriver.

* Add comment about profile.

* Latest.

* Override mnist_gcp.ipynb with mnist.ipynb

I accidentally put my latest changes in mnist.ipynb even though that file
was deleted.

* More fixes.

* Resolve some conflicts from the rebase; override with changes on remote branch.
2020-02-16 19:15:28 -08:00
Amy 0c8d2fdfc1
mnist example namespace fix (#720)
* update mnist tutorial to use the user profile namespace for the tfjob

* add namespace arg to some kubectl commands
2020-02-10 10:11:54 -08:00
Hung-Ting Wen 06f9b3f880
kubeflow-kf-ci-v1-user (#719) 2020-02-06 20:09:42 -08:00
Amy 68f172c2ee
pin the web-ui version of TF to 1.7-- same as training (#658) 2020-01-24 08:33:10 -08:00
Jin Chi He 1e385247b0 update ci tests for mnist example (#684) 2019-12-06 16:55:54 -08:00
Daniel Sanche ec9020e851 Allow extra arguments (#625) 2019-08-21 19:34:31 -07:00
Mike Mainguy 0b33b536b7 Applied changes to README and Kustomize files to handle training, monitoring, and serving the mnist model in S3 using Kustomize (#543) 2019-08-19 17:41:33 -07:00
Simon Rey ef9484595f Add tensorboard support for local mninst example (#616)
* Add files via upload

* Update kustomization.yaml

* Update README.md

* Update README.md

* Update README.md
2019-08-12 16:03:38 -07:00
Xiao Kou 607533311e Fix mnist readme service name and deployments name typo (#611) 2019-07-29 20:12:51 -07:00
Jin Chi He da903549cc remove useless requirements.txt (#593) 2019-07-14 14:49:03 -07:00
Jin Chi He 871895c544 recomment using kustomize v2.0.3 for mnist (#584) 2019-07-04 19:20:35 -07:00
Zane Durante 45ece829aa Fixed typo (#572)
Changed "intalled" to "installed"
2019-06-10 19:32:18 -07:00
Jin Chi He 45f03e6fd9 fix issue for s3 case in mnist example (#566) 2019-06-09 19:17:05 -07:00
Karthik Vadla 8efa3e72fb MNIST: Update web-ui reference for GCS (#570)
* Update webui url for GCS

* clean up
2019-06-07 17:57:06 -07:00
Jin Chi He 4f8bbe1e9e update TFJob version to v1 (#568) 2019-06-06 21:11:59 -07:00
ssivart 21f76812ec [docs]: enhance documentation for chosen namespace (#555) 2019-05-15 02:44:19 -07:00
Iman Tabrizian fce8fa0a3f [docs]: fix minor bug in the documentation (#554) 2019-05-12 18:08:08 -07:00
Jin Chi He 5fac627725 drop_ksonnet_from_mnist (#546) 2019-05-07 19:54:32 -07:00
Jin Chi He 335c2a1d6e Enhance mnist training and add serving steps for local mode (#528) 2019-04-09 14:54:45 -07:00
Jin Chi He fb0c5eb115 fix import issue in the mnist e2e testing (#531) 2019-04-05 18:36:27 -07:00
Jin Chi He bc11d20adf resolve confict for the patch (#492) 2019-02-26 09:22:38 -08:00
Oleg Shepetjuk 90ea8cb8cd [mnist] Add support for S3 in TensorBoard component; Update docs. (#499)
* [mnist] Add support for S3 in TensorBoard component; Update docs.

* [mnist] reverted autonumbering in README

* [mnist] add expected fail for predict_test, until it'ss fixed
2019-02-20 06:34:23 -08:00
Daniel Sanche b18cec9b3b Mnist fixes (#495)
* removed environments

* fixed issues with README

* addressed PR comments

* updated aws yaml to match master
2019-02-13 16:45:38 -08:00
Zhenghui Wang 74378a2990 Add end2end test for Xgboost housing example (#493)
* Add e2e test for xgboost housing example

* fix typo

add ks apply

add [

modify example to trigger tests

add prediction test

add xgboost ks param

rename the job name without _

use - instead of _

libson params

rm redudent component

rename component in prow config

add ames-hoursing-env

use - for all names

use _ for params names

use xgboost_ames_accross

rename component name

shorten the name

change deploy-test command

change to xgboost-
namespace

init ks app

fix type

add confest.py

change path

change deploy command

change dep

change the query URL for seldon

add ks_app with seldon lib

update ks_app

use ks init only

rerun

change to kf-v0-4-n00 cluster

add ks_app

use ks-13

remove --namespace

use kubeflow as namespace

delete seldon deployment

simplify ks_app

retry on 503

fix typo

query 1285

move deletion after prediction

wait 10s

always retry till 10 mins

move check to retry

 fix pylint

move  clean-up to the delete template

* set up xgboost component

* check in ks component& run it directly

* change comments

* add comment on why use 'ks delete'

* add two modules to pylint whitelist

* ignore tf_operator/py

* disable pylint per line

* reorder import
2019-02-12 06:37:05 -08:00
Oleg Shepetyuk 9505afa524 Removed note about S3 from README 2019-01-26 09:55:54 +02:00
Oleg Shepetyuk ea86a41172 Updated mnist example README with AWS credentials setting 2019-01-25 17:26:56 +02:00
Oleg Shepetyuk f85a8e970f Made SecretRefs more generic and fixed failed test 2019-01-24 18:40:56 +02:00
Oleg Shepetyuk f89af01e2c Add support for AWS access/secret keys in train component (#466) 2019-01-23 09:58:00 +02:00
Jeremy Lewi 5b797c871e Create an E2E test for TFServing using the rest API (#479)
* Create an E2E test for TFServing using the rest API

* We use the pytest framework because
  1. it has really good support for using command line arguments
  2. can emit junit xml file to report results to prow.

Related to #270: Create a generic test runner

* Address comments.

* Fix lint.

* Add retries to the prediction.

* Add some comments.

* Fix model path.

* * Fix the workflow labels
* Set the K8s service name correctly on the test.

* Fix the workflow.

* Fix lint.
2019-01-18 16:29:42 -08:00
govind cs b71a14396a optimized apt-get to reduce image size (#482)
* optimized apt-get to reduce image size

* More verbose logging

* minor fix

removed no install recommends
2019-01-18 06:00:18 -08:00
Jeremy Lewi 6770b4adcc Add the web-ui for the mnist example (#473)
* Add the web-ui for the mnist example

Copy the mnist web app from
https://github.com/googlecodelabs/kubeflow-introduction

* Update the web app

   * Change "server-name" argument to "model-name" because this is what
     is.

   * Update the prediction client code; The prediction code was copied
     from https://github.com/googlecodelabs/kubeflow-introduction and
     that model used slightly different values for the input names
     and outputs.

  * Add a test for the mnist_client code; currently it needs to be run
    manually.

* Fix the label selector for the mnist service so that it matches the
  TFServing deployment.

* Delete the old copy of mnist_client.py; we will go with the copy in ewb-ui from https://github.com/googlecodelabs/kubeflow-introduction

* Delete model-deploy.yaml, model-train.yaml, and tf-user.yaml.
  The K8s resources for training and deploying the model are now in ks_app.

* Fix tensorboard; tensorboard only partially works behind Ambassador. It seems like some requests don't work behind a reverse proxy.

* Fix lint.
2019-01-14 13:56:39 -08:00
Jeremy Lewi 2494fdf8c5 Update serving in mnist example; use 0.4 and add testing. (#469)
* Add the TFServing component
* Create TFServing components.

* The model.py code doesn't appear to be exporting a model in saved model
  format; it was a missing a call to export.

  * I'm not sure how this ever worked.

* It also looks like there is a bug in the code in that its using the cnn input fn even if the model is the linear one. I'm going to leave that as is for now.

* Create a namespace for each test run; delete the namespace on teardown
* We need to copy the GCP service account key to the new namespace.
* Add a shell script to do that.
2019-01-11 14:36:43 -08:00
Jeremy Lewi ef108dbbcc Update training to use Kubeflow 0.4 and add testing. (#465)
* Update training to use Kubeflow 0.4 and add testing.

* To support testing we need to create a ksonnet template to train
  the model so we can easily subsitute in different parameters during
  training.

* We create a ksonnet component for just training; we don't use Argo.
  This makes the example much simpler.

* To support S3 we add a generic ksonnet parameter to take environment
  variables as a comma separated list of variables. This should make it
  easy for users to set the environment variables needed to talk to S3.
  This is compatible with the existing Argo workflow which supports S3.

* By default the training job runs non-distributed; this is because to
  run distributed the user needs a shared filesystem (e.g. S3/GCS/NFS).

* Update the mnist workflow to correctly build the images.

  * We didn't update the workflow in the previous example to actually
    build the correct images.

* Update the workflow to run the tfjob_test

* Related to #460 E2E test for mnist.

* Add a parameter to specify a secret that can be used to mount
  a secret such as the GCP service account key.

* Update the README with instructions for GCS and S3.

* Remove the instructions about Argo; the Argo workflow is outdated.

  Using Argo adds complexity to the example and the thinking is to remove
  that to provide a simpler example and to mirror the pytorch example.

* Add a TOC to the README

* Update prerequisite instructions.

  * Delete instructions for installing Kubeflow; just link to the
    getting started guide.

  * Argo CLI should no longer be needed.

  * GitHub token shouldn't be needed; I think that was only needed
    for ksonnet to pull the registry.

* * Fix instructions; access keys shouldn't be stored as ksonnet parameters
  as these will get checked into source control.
2019-01-10 12:42:45 -08:00
Jeremy Lewi d28ba7c4db Continuously build the docker images used by mnist. (#462)
* This is the first step in adding E2E tests for the mnist example.

* Add a Makefile and .jsonnet file to build the Docker images using GCB

* Define an Argo workflow to trigger the image builds on pre & post submit.

Related to: #460
2019-01-08 15:21:49 -08:00