Commit Graph

12 Commits

Author SHA1 Message Date
Jeremy Lewi c880fdaa80
Delete the notebook tests because they are outdated. (#808)
* Delete the notebook tests because they are outdated.

* We have rewritten the test infra for notebooks to use Tekton.
  see:
    https://github.com/kubeflow/testing/blob/master/tekton/templates/pipelines/notebook-test-pipeline.yaml
    https://github.com/kubeflow/examples/tree/master/py/kubeflow/examples/notebook_tests

* We are also no longer regularly deploying the v1 clusters; we are no using
  blueprints so that's why the tests can no longer get credentials

* * Add the mnist notebook test as a postsubmit and periodic test.

* Fix.
2020-07-07 01:23:58 -07:00
Jeremy Lewi 197abc9daa
Some improvements to utilities for testing notebooks (#803)
* Changes pulled in from kuueflow/examples#764

* Notebook tests should print a link to the stackdriver logs for
  the actual notebook job.

* Related to kubeflow/testing#613

Co-authored-by: Gabriel Wen <gabrielwen@google.com>
2020-06-14 20:21:56 -07:00
Hung-Ting Wen c337d90e87
freeze papermill version #783 (#784) 2020-03-31 19:23:27 -07:00
Kubernetes Prow Robot c5222ddde8
add a general notebook test script (#763)
* add a general notebook test script

* fix join

* fix typo

* infer notebook name with path

* replace

* fix name

* add a log

* update comment
2020-03-04 14:54:39 -08:00
Jeremy Lewi b218d2b23c
Fix the mnist_gcp_test.py (#741)
* Fix the mnist_gcp_test.py

* The job spec was invalid; we were missing container name

* There were a bunch of other issues as well.

* Pull in the changes from xgboost_synthetic to upload an HTML version
  of the notebook output to GCS.

* Add exceptoin

* Revert "Add exceptoin"

This reverts commit 44f34d9d74.
2020-02-21 15:58:48 -08:00
Jeremy Lewi cc93a80420
Create a notebook for mnist E2E on GCP (#723)
* A notebook to run the mnist E2E example on GCP.

This fixes a number of issues with the example
* Use ISTIO instead of Ambassador to add reverse proxy routes
* The training job needs to be updated to run in a profile created namespace in order to have the required service accounts
     * See kubeflow/examples#713
     * Running inside a notebook running on Kubeflow should ensure user
       is running inside an appropriately setup namespace
* With ISTIO the default RBAC rules prevent the web UI from sending requests to the model server
     * A short term fix was to not include the ISTIO side car
     * In the future we can add an appropriate ISTIO rbac policy

* Using a notebook allows us to eliminate the use of kustomize
  * This resolves kubeflow/examples#713 which required people to use
    and old version of kustomize

  * Rather than using kustomize we can use python f style strings to
    write the YAML specs and then easily substitute in user specific values

  * This should be more informative; it avoids introducing kustomize and
    users can see the resource specs.

* I've opted to make the notebook GCP specific. I think its less confusing
  to users to have separate notebooks focused on specific platforms rather
  than having one notebook with a lot of caveats about what to do under
  different conditions

* I've deleted the kustomize overlays for GCS since we don't want users to
  use them anymore

* I used fairing and kaniko to eliminate the use of docker to build the images
  so that everything can run from a notebook running inside the cluster.

* k8s_utils.py has some reusable functions to add some details from users
  (e.g. low level calls to K8s APIs.)

* * Change the mnist test to just run the notebook
  * Copy the notebook test infra for xgboost_synthetic to py/kubeflow/examples/notebook_test to make it more reusable

* Fix lint.

* Update for lint.

* A notebook to run the mnist E2E example.

Related to: kubeflow/website#1553

* 1. Use fairing to build the model. 2. Construct the YAML spec directly in the notebook. 3. Use the TFJob python SDK.

* Fix the ISTIO rule.

* Fix UI and serving; need to update TF serving to match version trained on.

* Get the IAP endpoint.

* Start writing some helper python functions for K8s.

* Commit before switching from replace to delete.

* Create a library to bulk create objects.

* Cleanup.

* Add back k8s_util.py

* Delete train.yaml; this shouldn't have been aded.

* update the notebook image.

* Refactor code into k8s_util; print out links.

* Clean up the notebok. Should be working E2E.

* Added section to get logs from stackdriver.

* Add comment about profile.

* Latest.

* Override mnist_gcp.ipynb with mnist.ipynb

I accidentally put my latest changes in mnist.ipynb even though that file
was deleted.

* More fixes.

* Resolve some conflicts from the rebase; override with changes on remote branch.
2020-02-16 19:15:28 -08:00
Hung-Ting Wen b9a7719f29
Write xgboost_synthetic test output to html (#735)
* use nbconvert to write output as html

* write local file

* change dir

* write to gcs

* add kubeflow/testing

* update to env and checkout_repos

* format gcs path

* fix syntax

* fix

* add option notebook_artifacts_dir

* download to artifacts

* fix

* shorten name

* fix

* fix

* mkdirs

* fix

* fix

* log error

* use notebook_artifacts_path
2020-02-14 16:19:27 -08:00
Hung-Ting Wen 188ba8f091
xgboost test for v1 (#718)
* add param for cluster pattern

* add new entry to prow-config

* add info to error

* fix prow-config

* match prefix instead of exact test target name matching

* update prow-config

* remove master suffix for 63 char limit

* fix lint
2020-02-04 16:49:55 -08:00
Jin Chi He 1e385247b0 update ci tests for mnist example (#684) 2019-12-06 16:55:54 -08:00
Jeremy Lewi 7a2977ef11 Fix miscellaneous bugs with the xgboost_synthetic test (#676)
* namespace where test runs should correspond to the namespace of a Kubeflow
  profile

* There was a bug in the logging format string

* There was a bug in the print statement for the job
2019-11-07 19:46:19 -08:00
Jeremy Lewi e2198ce1e8 Fix the xgboost_synthetic test so it actually runs and produces signal (#674)
* Fix the xgboost_synthetic test so it actually runs and produces signal

* The test wasn't actually running because we were passing arguments that
  were unknown to pytest

* Remove the old role.yaml; we don't use it anymore

* Wait for the Job to finish and properly report status; kubeflow/testing#514
  contains the new routine

* The test still isn't passing because of kubeflow/examples#673

* In addition we need to fix the auto deployments kubeflow/testing#444

Related to kubeflow/examples#665

* Fix lint.
2019-11-04 21:56:38 -08:00
Jeremy Lewi 7e28cd6b23 Update xgboost_synthetic test infra; preliminary updates to work with 0.7.0 (#666)
* Update xgboost_synthetic test infra to use pytest and pyfunc.

* Related to #655 update xgboost_synthetic to use workload identity

* Related to to #665 no signal about xgboost_synthetic

* We need to update the xgboost_synthetic example to work with 0.7.0;
  e.g. workload identity

* This PR focuses on updating the test infra and some preliminary
  updates the notebook

* More fixes to the test and the notebook are probably needed in order
  to get it to actually pass

* Update job spec for 0.7; remove the secret and set the default service
  account.

  * This is to make it work with workload identity

* Instead of using kustomize to define the job to run the notebook we can just modify the YAML spec using python.
* Use the python API for K8s to create the job rather than shelling out.

* Notebook should do a 0.7 compatible check for credentials

  * We don't want to assume GOOGLE_APPLICATION_CREDENTIALS is set
    because we will be using workload identity.

* Take in repos as an argument akin to what checkout_repos.sh requires

* Convert xgboost_test.py to a pytest.

  * This allows us to mark it as expected to fail so we can start to get
    signal without blocking

  * We also need to emit junit files to show up in test grid.

* Convert the jsonnet workflow for the E2E test to a python function to
  define the workflow.

  * Remove the old jsonnet workflow.

* Address comments.

* Fix issues with the notebook
* Install pip packages in user space
  * 0.7.0 images are based on TF images and they have different permissions
* Install a newer version of fairing sdk that works with workload identity

* Split pip installing dependencies out of util.py and into notebook_setup.py

  * That's because util.py could depend on the packages being installed by
    notebook_setup.py

* After pip installing the modules into user space; we need to add the local
  path for pip packages to the python otherwise we get import not found
  errors.
2019-10-24 19:53:38 -07:00