Create a notebook for mnist E2E on GCP (#723)

* A notebook to run the mnist E2E example on GCP. This fixes a number of issues with the example * Use ISTIO instead of Ambassador to add reverse proxy routes * The training job needs to be updated to run in a profile created namespace in order to have the required service accounts * See kubeflow/examples#713 * Running inside a notebook running on Kubeflow should ensure user is running inside an appropriately setup namespace * With ISTIO the default RBAC rules prevent the web UI from sending requests to the model server * A short term fix was to not include the ISTIO side car * In the future we can add an appropriate ISTIO rbac policy * Using a notebook allows us to eliminate the use of kustomize * This resolves kubeflow/examples#713 which required people to use and old version of kustomize * Rather than using kustomize we can use python f style strings to write the YAML specs and then easily substitute in user specific values * This should be more informative; it avoids introducing kustomize and users can see the resource specs. * I've opted to make the notebook GCP specific. I think its less confusing to users to have separate notebooks focused on specific platforms rather than having one notebook with a lot of caveats about what to do under different conditions * I've deleted the kustomize overlays for GCS since we don't want users to use them anymore * I used fairing and kaniko to eliminate the use of docker to build the images so that everything can run from a notebook running inside the cluster. * k8s_utils.py has some reusable functions to add some details from users (e.g. low level calls to K8s APIs.) * * Change the mnist test to just run the notebook * Copy the notebook test infra for xgboost_synthetic to py/kubeflow/examples/notebook_test to make it more reusable * Fix lint. * Update for lint. * A notebook to run the mnist E2E example. Related to: kubeflow/website#1553 * 1. Use fairing to build the model. 2. Construct the YAML spec directly in the notebook. 3. Use the TFJob python SDK. * Fix the ISTIO rule. * Fix UI and serving; need to update TF serving to match version trained on. * Get the IAP endpoint. * Start writing some helper python functions for K8s. * Commit before switching from replace to delete. * Create a library to bulk create objects. * Cleanup. * Add back k8s_util.py * Delete train.yaml; this shouldn't have been aded. * update the notebook image. * Refactor code into k8s_util; print out links. * Clean up the notebok. Should be working E2E. * Added section to get logs from stackdriver. * Add comment about profile. * Latest. * Override mnist_gcp.ipynb with mnist.ipynb I accidentally put my latest changes in mnist.ipynb even though that file was deleted. * More fixes. * Resolve some conflicts from the rebase; override with changes on remote branch.
2020-02-16 19:15:28 -08:00 · 2020-02-16 19:15:28 -08:00 · cc93a80420
parent b9a7719f29
commit cc93a80420
28 changed files with 2576 additions and 1564 deletions
--- a/.pylintrc
+++ b/.pylintrc
@ -56,7 +56,10 @@ confidence=
 # --enable=similarities". If you want to run only the classes checker, but have
 # no Warning level messages displayed, use"--disable=all --enable=classes
 # --disable=W"
-disable=import-star-module-level,old-octal-literal,oct-method,print-statement,unpacking-in-except,parameter-unpacking,backtick,old-raise-syntax,old-ne-operator,long-suffix,dict-view-method,dict-iter-method,metaclass-assignment,next-method-called,raising-string,indexing-exception,raw_input-builtin,long-builtin,file-builtin,execfile-builtin,coerce-builtin,cmp-builtin,buffer-builtin,basestring-builtin,apply-builtin,filter-builtin-not-iterating,using-cmp-argument,useless-suppression,range-builtin-not-iterating,suppressed-message,missing-docstring,no-absolute-import,old-division,cmp-method,reload-builtin,zip-builtin-not-iterating,intern-builtin,unichr-builtin,reduce-builtin,standarderror-builtin,unicode-builtin,xrange-builtin,coerce-method,delslice-method,getslice-method,setslice-method,input-builtin,round-builtin,hex-method,nonzero-method,map-builtin-not-iterating,relative-import,invalid-name,bad-continuation,no-member,locally-disabled,fixme,import-error,too-many-locals,no-name-in-module,too-many-instance-attributes,no-self-use
+#
 # Kubeflow disables string-interpolation because we are starting to use f
 # style strings
 disable=import-star-module-level,old-octal-literal,oct-method,print-statement,unpacking-in-except,parameter-unpacking,backtick,old-raise-syntax,old-ne-operator,long-suffix,dict-view-method,dict-iter-method,metaclass-assignment,next-method-called,raising-string,indexing-exception,raw_input-builtin,long-builtin,file-builtin,execfile-builtin,coerce-builtin,cmp-builtin,buffer-builtin,basestring-builtin,apply-builtin,filter-builtin-not-iterating,using-cmp-argument,useless-suppression,range-builtin-not-iterating,suppressed-message,missing-docstring,no-absolute-import,old-division,cmp-method,reload-builtin,zip-builtin-not-iterating,intern-builtin,unichr-builtin,reduce-builtin,standarderror-builtin,unicode-builtin,xrange-builtin,coerce-method,delslice-method,getslice-method,setslice-method,input-builtin,round-builtin,hex-method,nonzero-method,map-builtin-not-iterating,relative-import,invalid-name,bad-continuation,no-member,locally-disabled,fixme,import-error,too-many-locals,no-name-in-module,too-many-instance-attributes,no-self-use,logging-fstring-interpolation
 [REPORTS]
--- a/mnist/Dockerfile.model
+++ b/mnist/Dockerfile.model
@ -1,5 +1,6 @@
 #This container contains your model and any helper scripts specific to your model.
-FROM tensorflow/tensorflow:1.7.0
+# When building the image inside mnist.ipynb the base docker image will be overwritten
 FROM tensorflow/tensorflow:1.15.2-py3
 ADD model.py /opt/model.py
 RUN chmod +x /opt/model.py
--- a/mnist/Makefile
+++ b/mnist/Makefile
@ -19,6 +19,8 @@
 # To override variables do
 # make ${TARGET} ${VAR}=${VALUE}
 #
 #
 # TODO(jlewi): We should probably switch to Skaffold and Tekton
 # IMG is the base path for images..
 # Individual images will be
--- a/mnist/README.md
+++ b/mnist/README.md
@ -3,6 +3,8 @@
 **Table of Contents**  *generated with [DocToc](https://github.com/thlorenz/doctoc)*
 - [MNIST on Kubeflow](#mnist-on-kubeflow)
 - [MNIST on Kubeflow on GCP](#mnist-on-kubeflow-on-gcp)
 - [MNIST on other platforms](#mnist-on-other-platforms)
  - [Prerequisites](#prerequisites)
    - [Deploy Kubeflow](#deploy-kubeflow)
    - [Local Setup](#local-setup)
@ -13,21 +15,17 @@
  - [Preparing your Kubernetes Cluster](#preparing-your-kubernetes-cluster)
    - [Training your model](#training-your-model)
      - [Local storage](#local-storage)
      - [Using GCS](#using-gcs)
      - [Using S3](#using-s3)
  - [Monitoring](#monitoring)
    - [Tensorboard](#tensorboard)
      - [Local storage](#local-storage-1)
      - [Using GCS](#using-gcs-1)
      - [Using S3](#using-s3-1)
      - [Deploying TensorBoard](#deploying-tensorboard)
  - [Serving the model](#serving-the-model)
    - [GCS](#gcs)
    - [S3](#s3)
    - [Local storage](#local-storage-2)
  - [Web Front End](#web-front-end)
    - [Connecting via port forwarding](#connecting-via-port-forwarding)
    - [Using IAP on GCP](#using-iap-on-gcp)
  - [Conclusion and Next Steps](#conclusion-and-next-steps)
 <!-- END doctoc generated TOC please keep comment here to allow auto update -->
@ -37,6 +35,45 @@
 This example guides you through the process of taking an example model, modifying it to run better within Kubeflow, and serving the resulting trained model.
 Follow the version of the guide that is specific to how you have deployed Kubeflow
 1. [MNIST on Kubeflow on GCP](#gcp)
 1. [MNIST on other platforms](#other)
 <a id=gcp></a>
 # MNIST on Kubeflow on GCP
 Follow these instructions to run the MNIST tutorial on GCP
 1. Follow the [GCP instructions](https://www.kubeflow.org/docs/gke/deploy/) to deploy Kubeflow with IAP
 1. Launch a Jupyter notebook
   * The tutorial has been tested using the Jupyter Tensorflow 1.15 image
 1. Launch a terminal in Jupyter and clone the kubeflow examples repo
   ```
   git clone https://github.com/kubeflow/examples.git git_kubeflow-examples
   ```
   * **Tip** When you start a terminal in Jupyter, run the command `bash` to start
      a bash terminal which is much more friendly then the default shell
   * **Tip** You can change the URL from '/tree' to '/lab' to switch to using Jupyterlab
 1. Open the notebook `mnist/mnist_gcp.ipynb`
 1. Follow the notebook to train and deploy MNIST on Kubeflow
 <a id=other></a>
 # MNIST on other platforms
 The tutorial is currently not up to date for Kubeflow 1.0. Please check the issues
 * [kubeflow/examples#724](https://github.com/kubeflow/examples/issues/724) for AWS
 * [kubeflow/examples#725](https://github.com/kubeflow/examples/issues/725) for other platforms
 ## Prerequisites
 Before we get started there are a few requirements.
@ -166,100 +203,6 @@ And to check the logs
 kubectl logs mnist-train-local-chief-0
 ```
 #### Using GCS
 In this section we describe how to save the model to Google Cloud Storage (GCS).
 Storing the model in GCS has the advantages:
 * The model is readily available after the job finishes
 * We can run distributed training
  * Distributed training requires a storage system accessible to all the machines
 Enter the `training/GCS` from the `mnist` application directory.
 ```
 cd training/GCS
 ```
 Set an environment variable that points to your GCP project Id
 ```
 PROJECT=<your project id>
 ```
 Create a bucket on GCS to store our model. The name must be unique across all GCS buckets
 ```
 BUCKET=distributed-$(date +%s)
 gsutil mb gs://$BUCKET/
 ```
 Give the job a different name (to distinguish it from your job which didn't use GCS)
 ```
 kustomize edit add configmap mnist-map-training --from-literal=name=mnist-train-dist
 ```
 Optionally, if you want to use your custom training image, configurate that as below.
 ```
 kustomize edit set image training-image=$DOCKER_URL
 ```
 Next we configure it to run distributed by setting the number of parameter servers and workers to use. The `numPs` means the number of Ps and the `numWorkers` means the number of Worker.
 ```
 ../base/definition.sh --numPs 1 --numWorkers 2
 ```
 Set the training parameters, such as training steps, batch size and learning rate.
 ```
 kustomize edit add configmap mnist-map-training --from-literal=trainSteps=200
 kustomize edit add configmap mnist-map-training --from-literal=batchSize=100
 kustomize edit add configmap mnist-map-training --from-literal=learningRate=0.01
 ```
 Now we need to configure parameters and telling the code to save the model to GCS.
 ```
 MODEL_PATH=my-model
 kustomize edit add configmap mnist-map-training --from-literal=modelDir=gs://${BUCKET}/${MODEL_PATH}
 kustomize edit add configmap mnist-map-training --from-literal=exportDir=gs://${BUCKET}/${MODEL_PATH}/export
 ```
 Build a yaml file for the `TFJob` specification based on your kustomize config:
 ```
 kustomize build . > mnist-training.yaml
 ```
 Then, in `mnist-training.yaml`, search for this line: `namespace: kubeflow`.
 Edit it to **replace `kubeflow` with the name of your user profile namespace**,
 which will probably have the form `kubeflow-<username>`.  (If you're not sure what this
 namespace is called, you can find it in the top menubar of the Kubeflow Central
 Dashboard.)
 After you've updated the namespace, apply the `TFJob` specification to the
 Kubeflow cluster:
 ```
 kubectl apply -f mnist-training.yaml
 ```
 You can then check the job status:
 ```
 kubectl get tfjobs -n <your-user-namespace> -o yaml mnist-train-dist
 ```
 And to check the logs:
 ```
 kubectl logs -n <your-user-namespace> -f mnist-train-dist-chief-0
 ```
 #### Using S3
 To use S3 we need to configure TensorFlow to use S3 credentials and variables. These credentials will be provided as kubernetes secrets and the variables will be passed in as environment variables. Modify the below values to suit your environment.
@ -426,27 +369,6 @@ kustomize edit add configmap mnist-map-monitoring --from-literal=pvcMountPath=/m
 kustomize edit add configmap mnist-map-monitoring --from-literal=logDir=/mnt
 ```
 #### Using GCS
 Enter the `monitoring/GCS` from the `mnist` application directory.
 ```
 cd monitoring/GCS
 ```
 Configure TensorBoard to point to your model location
 ```
 kustomize edit add configmap mnist-map-monitoring --from-literal=logDir=${LOGDIR}
 ```
 Assuming you followed the directions above if you used GCS you can use the following value
 ```
 LOGDIR=gs://${BUCKET}/${MODEL_PATH}
 ```
 #### Using S3
 Enter the `monitoring/S3` from the `mnist` application directory.
@ -551,64 +473,6 @@ The model code will export the model in saved model format which is suitable for
 To serve the model follow the instructions below. The instructins vary slightly based on where you are storing your model (e.g. GCS, S3, PVC). Depending on the storage system we provide different kustomization as a convenience for setting relevant environment variables.
 ### GCS
 Here we show to serve the model when it is stored on GCS. This assumes that when you trained the model you set `exportDir` to a GCS URI; if not you can always copy it to GCS using `gsutil`.
 Check that a model was exported
 ```
 EXPORT_DIR=gs://${BUCKET}/${MODEL_PATH}/export
 gsutil ls -r ${EXPORT_DIR}
 ```
 The output should look something like
 ```
 ${EXPORT_DIR}/1547100373/saved_model.pb
 ${EXPORT_DIR}/1547100373/variables/:
 ${EXPORT_DIR}/1547100373/variables/
 ${EXPORT_DIR}/1547100373/variables/variables.data-00000-of-00001
 ${EXPORT_DIR}/1547100373/variables/variables.index
 ```
 The number `1547100373` is a version number auto-generated by TensorFlow; it will vary on each run but should be monotonically increasing if you save a model to the same location as a previous location.
 Enter the `serving/GCS` from the `mnist` application directory.
 ```
 cd serving/GCS
 ```
 Set a different name for the tf-serving.
 ```
 kustomize edit add configmap mnist-map-serving --from-literal=name=mnist-gcs-dist
 ```
 Set your model path
 ```
 kustomize edit add configmap mnist-map-serving --from-literal=modelBasePath=${EXPORT_DIR} 
 ```
 Deploy it, and run a service to make the deployment accessible to other pods in the cluster
 ```
 kustomize build . |kubectl apply -f -
 ```
 You can check the deployment by running
 ```
 kubectl describe deployments mnist-gcs-dist
 ```
 The service should make the `mnist-gcs-dist` deployment accessible over port 9000
 ```
 kubectl describe service mnist-gcs-dist
 ```
 ### S3
 We can also serve the model when it is stored on S3. This assumes that when you trained the model you set `exportDir` to a S3
@ -799,16 +663,7 @@ POD_NAME=$(kubectl get pods --selector=app=web-ui --template '{{range .items}}{{
 kubectl port-forward ${POD_NAME} 8080:5000  
 ```
-You should now be able to open up the web app at your localhost. [Local Storage](http://localhost:8080) or [GCS](http://localhost:8080/?addr=mnist-gcs-dist) or [S3](http://localhost:8080/?addr=mnist-s3-serving).
+You should now be able to open up the web app at your localhost. [Local Storage](http://localhost:8080) or [S3](http://localhost:8080/?addr=mnist-s3-serving).
 ### Using IAP on GCP
 If you are using GCP and have set up IAP then you can access the web UI at
 ```
 https://${DEPLOYMENT}.endpoints.${PROJECT}.cloud.goog/${NAMESPACE}/mnist/
 ```
 ## Conclusion and Next Steps
--- a/mnist/k8s_util.py
+++ b/mnist/k8s_util.py
@ -0,0 +1,147 @@
 """Some utilities for working with Kubernetes.
 TODO: These should probably be replaced by functions in fairing.
 """
 import logging
 import re
 import yaml
 from kubernetes import client as k8s_client
 from kubernetes.client import rest as k8s_rest
 def camel_to_snake(name):
  name = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
  return re.sub('([a-z0-9])([A-Z])', r'\1_\2', name).lower()
 K8S_CREATE = "K8S_CREATE"
 K8S_REPLACE = "K8S_REPLACE"
 K8S_CREATE_OR_REPLACE = "K8S_CREATE_OR_REPLACE"
 def _get_result_name(result):
  # For custom objects the result is a dict but for other objects
  # its a python class
  if isinstance(result, dict):
    result_name = result["metadata"]["name"]
    result_namespace = result["metadata"]["name"]
  else:
    result_name = result.metadata.name
    result_namespace = result.metadata.namespace
  return result_namespace, result_name
 def apply_k8s_specs(specs, mode=K8S_CREATE): # pylint: disable=too-many-branches,too-many-statements
  """Run apply on the provided Kubernetes specs.
  Args:
    specs: A list of strings or dicts providing the YAML specs to
      apply.
    mode: (Optional): Mode indicates how the resources should be created.
      K8S_CREATE - Use the create verb. Works with generateName
      K8S_REPLACE - Issue a delete of existing resources before doing a create
      K8s_CREATE_OR_REPLACE - Try to create an object; if it already exists
        replace it
  """
  # TODO(jlewi): How should we handle patching existing updates?
  results = []
  if mode not in [K8S_CREATE, K8S_CREATE_OR_REPLACE, K8S_REPLACE]:
    raise ValueError(f"Unknown mode {mode}")
  for s in specs:
    spec = s
    if not isinstance(spec, dict):
      spec = yaml.load(spec)
    name = spec["metadata"]["name"]
    namespace = spec["metadata"]["namespace"]
    kind = spec["kind"]
    kind_snake = camel_to_snake(kind)
    plural = spec["kind"].lower() + "s"
    result = None
    if not "/" in spec["apiVersion"]:
      group = None
    else:
      group, version = spec["apiVersion"].split("/", 1)
    if group is None or group.lower() == "apps":
      if group is None:
        api = k8s_client.CoreV1Api()
      else:
        api = k8s_client.AppsV1Api()
      create_method_name = f"create_namespaced_{kind_snake}"
      create_method_args = [namespace, spec]
      replace_method_name = f"delete_namespaced_{kind_snake}"
      replace_method_args = [name, namespace]
    else:
      api = k8s_client.CustomObjectsApi()
      create_method_name = f"create_namespaced_custom_object"
      create_method = getattr(api, create_method_name)
      create_method_args = [group, version, namespace, plural, spec]
      delete_options = k8s_client.V1DeleteOptions()
      replace_method_name = f"delete_namespaced_custom_object"
      replace_method_args = [group, version, namespace, plural, name, delete_options]
    create_method = getattr(api, create_method_name)
    replace_method = getattr(api, replace_method_name)
    if mode in [K8S_CREATE, K8S_CREATE_OR_REPLACE]:
      try:
        result = create_method(*create_method_args)
        result_namespace, result_name = _get_result_name(result)
        logging.info(f"Created {kind} {result_namespace}.{result_name}")
        results.append(result)
        continue
      except k8s_rest.ApiException as e:
        # 409 is conflict indicates resource already exists
        if e.status == 409 and mode == K8S_CREATE_OR_REPLACE:
          pass
        else:
          raise
    # Using replace didn't work for virtualservices so we explicitly delete
    # and then issue a create
    result = replace_method(*replace_method_args)
    logging.info(f"Deleted {kind} {namespace}.{name}")
    result = create_method(*create_method_args)
    result_namespace, result_name = _get_result_name(result)
    logging.info(f"Created {kind} {result_namespace}.{result_name}")
    # Now recreate it
    results.append(result)
  return results
 def get_iap_endpoint():
  """Return the URL of the IAP endpoint"""
  extensions = k8s_client.ExtensionsV1beta1Api()
  kf_ingress = None
  try:
    kf_ingress = extensions.read_namespaced_ingress("envoy-ingress", "istio-system")
  except k8s_rest.ApiException as e:
    if e.status == 403:
      logging.warning(f"The service account doesn't have sufficient privileges "
                      f"to get the istio-system ingress. "
                      f"You will have to manually enter the Kubeflow endpoint. "
                      f"To make this function work ask someone with cluster "
                      f"priveleges to create an appropriate "
                      f"clusterrolebinding by running a command.\n"
                      f"kubectl create --namespace=istio-system rolebinding "
                       "--clusterrole=kubeflow-view "
                       "--serviceaccount=$NAMESPACE}:default-editor "
                       "${NAMESPACE}-istio-view")
      return ""
    raise
  return f"https://{kf_ingress.spec.rules[0].host}"
--- a/mnist/mnist_gcp.ipynb
+++ b/mnist/mnist_gcp.ipynb
--- a/mnist/monitoring/GCS/kustomization.yaml
+++ b/mnist/monitoring/GCS/kustomization.yaml
@ -1,8 +0,0 @@
 apiVersion: kustomize.config.k8s.io/v1beta1
 kind: Kustomization
 bases:
 - ../base
 configurations:
 - params.yaml
--- a/mnist/notebook_setup.py
+++ b/mnist/notebook_setup.py
@ -0,0 +1,57 @@
 """Some routines to setup the notebook.
 This is separated out from util.py because this module installs some of the pip packages
 that util depends on.
 """
 import sys
 import logging
 import os
 import subprocess
 from importlib import reload
 from pathlib import Path
 TF_OPERATOR_COMMIT = "9238906"
 def notebook_setup():
  # Install the SDK
  logging.basicConfig(format='%(message)s')
  logging.getLogger().setLevel(logging.INFO)
  home = str(Path.home())
  logging.info("pip installing requirements.txt")
  subprocess.check_call(["pip3", "install", "--user", "-r", "requirements.txt"])
  clone_dir = os.path.join(home, "git_tf-operator")
  if not os.path.exists(clone_dir):
    logging.info("Cloning the tf-operator repo")
    subprocess.check_call(["git", "clone", "https://github.com/kubeflow/tf-operator.git",
                           clone_dir])
  logging.info(f"Checkout kubeflow/tf-operator @{TF_OPERATOR_COMMIT}")
  subprocess.check_call(["git", "checkout", TF_OPERATOR_COMMIT], cwd=clone_dir)
  logging.info("Configure docker credentials")
  subprocess.check_call(["gcloud", "auth", "configure-docker", "--quiet"])
  if os.getenv("GOOGLE_APPLICATION_CREDENTIALS"):
    logging.info("Activating service account")
    subprocess.check_call(["gcloud", "auth", "activate-service-account",
                           "--key-file=" +
                           os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
                           "--quiet"])
  # Installing the python packages locally doesn't appear to have them automatically
  # added the path so we need to manually add the directory
  local_py_path = os.path.join(home, ".local/lib/python3.6/site-packages")
  tf_operator_py_path = os.path.join(clone_dir, "sdk/python")
  for p in [local_py_path, tf_operator_py_path]:
    if p not in sys.path:
      logging.info("Adding %s to python path", p)
      # Insert at front because we want to override any installed packages
      sys.path.insert(0, p)
  # Force a reload of kubeflow; since kubeflow is a multi namespace module
  # if we've loaded up some new kubeflow subpackages we need to force a reload to see them.
  import kubeflow
  reload(kubeflow)
--- a/mnist/requirements.txt
+++ b/mnist/requirements.txt
@ -0,0 +1,2 @@
 git+git://github.com/kubeflow/fairing.git@9b0d4ed4796ba349ac6067bbd802ff1d6454d015
 retrying==1.3.3
--- a/mnist/serving/GCS/kustomization.yaml
+++ b/mnist/serving/GCS/kustomization.yaml
@ -1,5 +0,0 @@
 apiVersion: kustomize.config.k8s.io/v1beta1
 kind: Kustomization
 bases:
 - ../base
--- a/mnist/serving/base/mnist-deploy-config.yaml
+++ b/mnist/serving/base/mnist-deploy-config.yaml
@ -1,11 +1,11 @@
 kind: ConfigMap
 apiVersion: v1
 metadata:
  name: mnist-deploy-config
  namespace: kubeflow
 data:
  monitoring_config.txt: |-
    prometheus_config: {
      enable: true,
      path: "/monitoring/prometheus/metrics"
-    }
+    }
 kind: ConfigMap
 metadata:
  name: mnist-deploy-config
  namespace: kubeflow
--- a/mnist/testing/conftest.py
+++ b/mnist/testing/conftest.py
@ -1,116 +0,0 @@
 import os
 import pytest
 def pytest_addoption(parser):
  parser.addoption(
    "--tfjob_name", help="Name for the TFjob.",
    type=str, default="mnist-test-" + os.getenv('BUILD_ID'))
  parser.addoption(
    "--namespace", help=("The namespace to run in. This should correspond to"
                         "a namespace associated with a Kubeflow namespace."),
    type=str, default="kubeflow-kf-ci-v1-user")
  parser.addoption(
    "--repos", help="The repos to checkout; leave blank to use defaults",
    type=str, default="")
  parser.addoption(
    "--trainer_image", help="TFJob training image",
    type=str, default="gcr.io/kubeflow-examples/mnist/model:build-" + os.getenv('BUILD_ID'))
  parser.addoption(
    "--train_steps", help="train steps for mnist testing",
    type=str, default="200")
  parser.addoption(
    "--batch_size", help="batch size for mnist trainning",
    type=str, default="100")
  parser.addoption(
    "--learning_rate", help="mnist learnning rate",
    type=str, default="0.01")
  parser.addoption(
    "--num_ps", help="The number of PS",
    type=str, default="1")
  parser.addoption(
    "--num_workers", help="The number of Worker",
    type=str, default="2")
  parser.addoption(
    "--model_dir", help="Path for model saving",
    type=str, default="gs://kubeflow-ci-deployment_ci-temp/mnist/models/" + os.getenv('BUILD_ID'))
  parser.addoption(
    "--export_dir", help="Path for model exporting",
    type=str, default="gs://kubeflow-ci-deployment_ci-temp/mnist/models/" + os.getenv('BUILD_ID'))
  parser.addoption(
    "--deploy_name", help="Name for the service deployment",
    type=str, default="mnist-test-" + os.getenv('BUILD_ID'))
  parser.addoption(
      "--master", action="store", default="", help="IP address of GKE master")
  parser.addoption(
      "--service", action="store", default="mnist-test-" + os.getenv('BUILD_ID'),
      help="The name of the mnist K8s service")
@pytest.fixture
 def master(request):
  return request.config.getoption("--master")
@pytest.fixture
 def namespace(request):
  return request.config.getoption("--namespace")
@pytest.fixture
 def service(request):
  return request.config.getoption("--service")
@pytest.fixture
 def tfjob_name(request):
  return request.config.getoption("--tfjob_name")
@pytest.fixture
 def repos(request):
  return request.config.getoption("--repos")
@pytest.fixture
 def trainer_image(request):
  return request.config.getoption("--trainer_image")
@pytest.fixture
 def train_steps(request):
  return request.config.getoption("--train_steps")
@pytest.fixture
 def batch_size(request):
  return request.config.getoption("--batch_size")
@pytest.fixture
 def learning_rate(request):
  return request.config.getoption("--learning_rate")
@pytest.fixture
 def num_ps(request):
  return request.config.getoption("--num_ps")
@pytest.fixture
 def num_workers(request):
  return request.config.getoption("--num_workers")
@pytest.fixture
 def model_dir(request):
  return request.config.getoption("--model_dir")
@pytest.fixture
 def export_dir(request):
  return request.config.getoption("--export_dir")
@pytest.fixture
 def deploy_name(request):
  return request.config.getoption("--deploy_name")
--- a/mnist/testing/deploy_test.py
+++ b/mnist/testing/deploy_test.py
@ -1,84 +0,0 @@
 """Test deploying the mnist model.
 This file tests that we can deploy the model.
 It is an integration test as it depends on having access to
 a Kubeflow deployment to deploy on. It also depends on having a model.
 Python Path Requirements:
  kubeflow/testing/py - https://github.com/kubeflow/testing/tree/master/py
     * Provides utilities for testing
 Manually running the test
  pytest deploy_test.py \
    name=mnist-deploy-test-${BUILD_ID} \
    namespace=${namespace} \
    modelBasePath=${modelDir} \
    exportDir=${modelDir} \
 """
 import logging
 import os
 import pytest
 from kubernetes.config import kube_config
 from kubernetes import client as k8s_client
 from kubeflow.testing import util
 def test_deploy(record_xml_attribute, deploy_name, namespace, model_dir, export_dir):
  util.set_pytest_junit(record_xml_attribute, "test_deploy")
  util.maybe_activate_service_account()
  app_dir = os.path.join(os.path.dirname(__file__), "../serving/GCS")
  app_dir = os.path.abspath(app_dir)
  logging.info("--app_dir not set defaulting to: %s", app_dir)
  # TODO (@jinchihe) Using kustomize 2.0.3 to work around below issue:
  # https://github.com/kubernetes-sigs/kustomize/issues/1295
  kusUrl = 'https://github.com/kubernetes-sigs/kustomize/' \
           'releases/download/v2.0.3/kustomize_2.0.3_linux_amd64'
  util.run(['wget', '-q', '-O', '/usr/local/bin/kustomize', kusUrl], cwd=app_dir)
  util.run(['chmod', 'a+x', '/usr/local/bin/kustomize'], cwd=app_dir)
  # TODO (@jinchihe): The kubectl need to be upgraded to 1.14.0 due to below issue.
  # Invalid object doesn't have additional properties ...
  kusUrl = 'https://storage.googleapis.com/kubernetes-release/' \
           'release/v1.14.0/bin/linux/amd64/kubectl'
  util.run(['wget', '-q', '-O', '/usr/local/bin/kubectl', kusUrl], cwd=app_dir)
  util.run(['chmod', 'a+x', '/usr/local/bin/kubectl'], cwd=app_dir)
  # Configure custom parameters using kustomize
  configmap = 'mnist-map-serving'
  util.run(['kustomize', 'edit', 'set', 'namespace', namespace], cwd=app_dir)
  util.run(['kustomize', 'edit', 'add', 'configmap', configmap,
           '--from-literal=name' + '=' + deploy_name], cwd=app_dir)
  util.run(['kustomize', 'edit', 'add', 'configmap', configmap,
            '--from-literal=modelBasePath=' + model_dir], cwd=app_dir)
  util.run(['kustomize', 'edit', 'add', 'configmap', configmap,
            '--from-literal=exportDir=' + export_dir], cwd=app_dir)
  # Apply the components
  util.run(['kustomize', 'build', app_dir, '-o', 'generated.yaml'], cwd=app_dir)
  util.run(['kubectl', 'apply', '-f', 'generated.yaml'], cwd=app_dir)
  kube_config.load_kube_config()
  api_client = k8s_client.ApiClient()
  util.wait_for_deployment(api_client, namespace, deploy_name, timeout_minutes=4)
  # We don't delete the resources. We depend on the namespace being
  # garbage collected.
 if __name__ == "__main__":
  logging.basicConfig(level=logging.INFO,
                      format=('%(levelname)s|%(asctime)s'
                              '|%(pathname)s|%(lineno)d| %(message)s'),
                      datefmt='%Y-%m-%dT%H:%M:%S',
                      )
  logging.getLogger().setLevel(logging.INFO)
  pytest.main()
--- a/mnist/testing/predict_test.py
+++ b/mnist/testing/predict_test.py
@ -1,123 +0,0 @@
 """Test mnist_client.
 This file tests that we can send predictions to the model
 using REST.
 It is an integration test as it depends on having access to
 a deployed model.
 We use the pytest framework because
  1. It can output results in junit format for prow/gubernator
  2. It has good support for configuring tests using command line arguments
     (https://docs.pytest.org/en/latest/example/simple.html)
 Python Path Requirements:
  kubeflow/testing/py - https://github.com/kubeflow/testing/tree/master/py
     * Provides utilities for testing
 Manually running the test
 1. Configure your KUBECONFIG file to point to the desired cluster
 """
 import json
 import logging
 import os
 import subprocess
 import requests
 from retrying import retry
 import six
 from kubernetes.config import kube_config
 from kubernetes import client as k8s_client
 import pytest
 from kubeflow.testing import util
 def is_retryable_result(r):
  if r.status_code == requests.codes.NOT_FOUND:
    message = "Request to {0} returned 404".format(r.url)
    logging.error(message)
    return True
  return False
@retry(wait_exponential_multiplier=1000, wait_exponential_max=10000,
       stop_max_delay=5*60*1000,
       retry_on_result=is_retryable_result)
 def send_request(*args, **kwargs):
  # We don't use util.run because that ends up including the access token
  # in the logs
  token = subprocess.check_output(["gcloud", "auth", "print-access-token"])
  if six.PY3 and hasattr(token, "decode"):
    token = token.decode()
  token = token.strip()
  headers = {
    "Authorization": "Bearer " + token,
  }
  if "headers" not in kwargs:
    kwargs["headers"] = {}
  kwargs["headers"].update(headers)
  r = requests.post(*args, **kwargs)
  return r
@pytest.mark.xfail
 def test_predict(master, namespace, service):
  app_credentials = os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
  if app_credentials:
    print("Activate service account")
    util.run(["gcloud", "auth", "activate-service-account",
              "--key-file=" + app_credentials])
  if not master:
    print("--master set; using kubeconfig")
    # util.load_kube_config appears to hang on python3
    kube_config.load_kube_config()
    api_client = k8s_client.ApiClient()
    host = api_client.configuration.host
    print("host={0}".format(host))
    master = host.rsplit("/", 1)[-1]
  this_dir = os.path.dirname(__file__)
  test_data = os.path.join(this_dir, "test_data", "instances.json")
  with open(test_data) as hf:
    instances = json.load(hf)
  # We proxy the request through the APIServer so that we can connect
  # from outside the cluster.
  url = ("https://{master}/api/v1/namespaces/{namespace}/services/{service}:8500"
         "/proxy/v1/models/mnist:predict").format(
           master=master, namespace=namespace, service=service)
  logging.info("Request: %s", url)
  r = send_request(url, json=instances, verify=False)
  if r.status_code != requests.codes.OK:
    msg = "Request to {0} exited with status code: {1} and content: {2}".format(
      url, r.status_code, r.content)
    logging.error(msg)
    raise RuntimeError(msg)
  content = r.content
  if six.PY3 and hasattr(content, "decode"):
    content = content.decode()
  result = json.loads(content)
  assert len(result["predictions"]) == 1
  predictions = result["predictions"][0]
  assert "classes" in predictions
  assert "predictions" in predictions
  assert len(predictions["predictions"]) == 10
  logging.info("URL %s returned; %s", url, content)
 if __name__ == "__main__":
  logging.basicConfig(level=logging.INFO,
                      format=('%(levelname)s|%(asctime)s'
                              '|%(pathname)s|%(lineno)d| %(message)s'),
                      datefmt='%Y-%m-%dT%H:%M:%S',
                      )
  logging.getLogger().setLevel(logging.INFO)
  pytest.main()
--- a/mnist/testing/test_data/instances.json
+++ b/mnist/testing/test_data/instances.json
@ -1,792 +0,0 @@
 {
  "instances": [
    {
      "x": [
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.011764707043766975,
        0.07058823853731155,
        0.07058823853731155,
        0.07058823853731155,
        0.4941176772117615,
        0.5333333611488342,
        0.686274528503418,
        0.10196079313755035,
        0.6509804129600525,
        1.0,
        0.9686275124549866,
        0.49803924560546875,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.11764706671237946,
        0.1411764770746231,
        0.3686274588108063,
        0.6039215922355652,
        0.6666666865348816,
        0.9921569228172302,
        0.9921569228172302,
        0.9921569228172302,
        0.9921569228172302,
        0.9921569228172302,
        0.8823530077934265,
        0.6745098233222961,
        0.9921569228172302,
        0.9490196704864502,
        0.7647059559822083,
        0.250980406999588,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.19215688109397888,
        0.9333333969116211,
        0.9921569228172302,
        0.9921569228172302,
        0.9921569228172302,
        0.9921569228172302,
        0.9921569228172302,
        0.9921569228172302,
        0.9921569228172302,
        0.9921569228172302,
        0.9843137860298157,
        0.364705890417099,
        0.32156863808631897,
        0.32156863808631897,
        0.2196078598499298,
        0.15294118225574493,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.07058823853731155,
        0.8588235974311829,
        0.9921569228172302,
        0.9921569228172302,
        0.9921569228172302,
        0.9921569228172302,
        0.9921569228172302,
        0.7764706611633301,
        0.7137255072593689,
        0.9686275124549866,
        0.9450981020927429,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.3137255012989044,
        0.6117647290229797,
        0.41960787773132324,
        0.9921569228172302,
        0.9921569228172302,
        0.803921639919281,
        0.04313725605607033,
        0.0,
        0.16862745583057404,
        0.6039215922355652,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.05490196496248245,
        0.003921568859368563,
        0.6039215922355652,
        0.9921569228172302,
        0.3529411852359772,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.545098066329956,
        0.9921569228172302,
        0.7450980544090271,
        0.007843137718737125,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.04313725605607033,
        0.7450980544090271,
        0.9921569228172302,
        0.27450981736183167,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.13725490868091583,
        0.9450981020927429,
        0.8823530077934265,
        0.6274510025978088,
        0.4235294461250305,
        0.003921568859368563,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.3176470696926117,
        0.9411765336990356,
        0.9921569228172302,
        0.9921569228172302,
        0.46666669845581055,
        0.09803922474384308,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.1764705926179886,
        0.729411780834198,
        0.9921569228172302,
        0.9921569228172302,
        0.5882353186607361,
        0.10588236153125763,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.062745101749897,
        0.364705890417099,
        0.988235354423523,
        0.9921569228172302,
        0.7333333492279053,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.9764706492424011,
        0.9921569228172302,
        0.9764706492424011,
        0.250980406999588,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.18039216101169586,
        0.5098039507865906,
        0.7176470756530762,
        0.9921569228172302,
        0.9921569228172302,
        0.8117647767066956,
        0.007843137718737125,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.15294118225574493,
        0.5803921818733215,
        0.8980392813682556,
        0.9921569228172302,
        0.9921569228172302,
        0.9921569228172302,
        0.9803922176361084,
        0.7137255072593689,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0941176563501358,
        0.44705885648727417,
        0.8666667342185974,
        0.9921569228172302,
        0.9921569228172302,
        0.9921569228172302,
        0.9921569228172302,
        0.7882353663444519,
        0.30588236451148987,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.09019608050584793,
        0.25882354378700256,
        0.8352941870689392,
        0.9921569228172302,
        0.9921569228172302,
        0.9921569228172302,
        0.9921569228172302,
        0.7764706611633301,
        0.3176470696926117,
        0.007843137718737125,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.07058823853731155,
        0.6705882549285889,
        0.8588235974311829,
        0.9921569228172302,
        0.9921569228172302,
        0.9921569228172302,
        0.9921569228172302,
        0.7647059559822083,
        0.3137255012989044,
        0.03529411926865578,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.21568629145622253,
        0.6745098233222961,
        0.8862745761871338,
        0.9921569228172302,
        0.9921569228172302,
        0.9921569228172302,
        0.9921569228172302,
        0.9568628072738647,
        0.5215686559677124,
        0.04313725605607033,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.5333333611488342,
        0.9921569228172302,
        0.9921569228172302,
        0.9921569228172302,
        0.8313726186752319,
        0.529411792755127,
        0.5176470875740051,
        0.062745101749897,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0
      ]
    }
  ]
 }
--- a/mnist/testing/tfjob_test.py
+++ b/mnist/testing/tfjob_test.py
@ -1,142 +0,0 @@
 """Test training using TFJob.
 This file tests that we can submit the job
 and that the job runs to completion.
 It is an integration test as it depends on having access to
 a Kubeflow deployment to submit the TFJob to.
 Python Path Requirements:
  kubeflow/tf-operator/py - https://github.com/kubeflow/tf-operator
     * Provides utilities for testing TFJobs
  kubeflow/testing/py - https://github.com/kubeflow/testing/tree/master/py
     * Provides utilities for testing
 Manually running the test
  pytest tfjobs_test.py \
    tfjob_name=tfjobs-test-${BUILD_ID} \
    namespace=${test_namespace} \
    trainer_image=${trainning_image} \
    train_steps=10 \
    batch_size=10 \
    learning_rate=0.01 \
    num_ps=1 \
    num_workers=2 \
    model_dir=${model_dir} \
    export_dir=${model_dir} \
 """
 import json
 import logging
 import os
 import pytest
 from kubernetes.config import kube_config
 from kubernetes import client as k8s_client
 from kubeflow.tf_operator import tf_job_client #pylint: disable=no-name-in-module
 from kubeflow.testing import util
 def test_training(record_xml_attribute, tfjob_name, namespace, trainer_image, num_ps, #pylint: disable=too-many-arguments
                  num_workers, train_steps, batch_size, learning_rate, model_dir, export_dir):
  util.set_pytest_junit(record_xml_attribute, "test_mnist")
  util.maybe_activate_service_account()
  app_dir = os.path.join(os.path.dirname(__file__), "../training/GCS")
  app_dir = os.path.abspath(app_dir)
  logging.info("--app_dir not set defaulting to: %s", app_dir)
  # TODO (@jinchihe) Using kustomize 2.0.3 to work around below issue:
  # https://github.com/kubernetes-sigs/kustomize/issues/1295
  kusUrl = 'https://github.com/kubernetes-sigs/kustomize/' \
           'releases/download/v2.0.3/kustomize_2.0.3_linux_amd64'
  util.run(['wget', '-q', '-O', '/usr/local/bin/kustomize', kusUrl], cwd=app_dir)
  util.run(['chmod', 'a+x', '/usr/local/bin/kustomize'], cwd=app_dir)
  # TODO (@jinchihe): The kubectl need to be upgraded to 1.14.0 due to below issue.
  # Invalid object doesn't have additional properties ...
  kusUrl = 'https://storage.googleapis.com/kubernetes-release/' \
           'release/v1.14.0/bin/linux/amd64/kubectl'
  util.run(['wget', '-q', '-O', '/usr/local/bin/kubectl', kusUrl], cwd=app_dir)
  util.run(['chmod', 'a+x', '/usr/local/bin/kubectl'], cwd=app_dir)
  # Configurate custom parameters using kustomize
  util.run(['kustomize', 'edit', 'set', 'namespace', namespace], cwd=app_dir)
  util.run(['kustomize', 'edit', 'set', 'image', 'training-image=' + trainer_image], cwd=app_dir)
  util.run(['../base/definition.sh', '--numPs', num_ps], cwd=app_dir)
  util.run(['../base/definition.sh', '--numWorkers', num_workers], cwd=app_dir)
  trainning_config = {
    "name": tfjob_name,
    "trainSteps": train_steps,
    "batchSize": batch_size,
    "learningRate": learning_rate,
    "modelDir": model_dir,
    "exportDir": export_dir,
  }
  configmap = 'mnist-map-training'
  for key, value in trainning_config.items():
    util.run(['kustomize', 'edit', 'add', 'configmap', configmap,
            '--from-literal=' + key + '=' + value], cwd=app_dir)
  # Created the TFJobs.
  util.run(['kustomize', 'build', app_dir, '-o', 'generated.yaml'], cwd=app_dir)
  util.run(['kubectl', 'apply', '-f', 'generated.yaml'], cwd=app_dir)
  logging.info("Created job %s in namespaces %s", tfjob_name, namespace)
  kube_config.load_kube_config()
  api_client = k8s_client.ApiClient()
  # Wait for the job to complete.
  logging.info("Waiting for job to finish.")
  results = tf_job_client.wait_for_job(
        api_client,
        namespace,
        tfjob_name,
        status_callback=tf_job_client.log_status)
  logging.info("Final TFJob:\n %s", json.dumps(results, indent=2))
  # Check for errors creating pods and services. Can potentially
  # help debug failed test runs.
  creation_failures = tf_job_client.get_creation_failures_from_tfjob(
      api_client, namespace, results)
  if creation_failures:
    logging.warning(creation_failures)
  if not tf_job_client.job_succeeded(results):
    failure = "Job {0} in namespace {1} in status {2}".format(  # pylint: disable=attribute-defined-outside-init
        tfjob_name, namespace, results.get("status", {}))
    logging.error(failure)
    # if the TFJob failed, print out the pod logs for debugging.
    pod_names = tf_job_client.get_pod_names(
        api_client, namespace, tfjob_name)
    logging.info("The Pods name:\n %s", pod_names)
    core_api = k8s_client.CoreV1Api(api_client)
    for pod in pod_names:
      logging.info("Getting logs of Pod %s.", pod)
      try:
        pod_logs = core_api.read_namespaced_pod_log(pod, namespace)
        logging.info("The logs of Pod %s log:\n %s", pod, pod_logs)
      except k8s_client.rest.ApiException as e:
        logging.info("Exception when calling CoreV1Api->read_namespaced_pod_log: %s\n", e)
    return
  # We don't delete the jobs. We rely on TTLSecondsAfterFinished
  # to delete old jobs. Leaving jobs around should make it
  # easier to debug.
 if __name__ == "__main__":
  logging.basicConfig(level=logging.INFO,
                      format=('%(levelname)s|%(asctime)s'
                              '|%(pathname)s|%(lineno)d| %(message)s'),
                      datefmt='%Y-%m-%dT%H:%M:%S',
                      )
  logging.getLogger().setLevel(logging.INFO)
  pytest.main()
--- a/mnist/training/GCS/kustomization.yaml
+++ b/mnist/training/GCS/kustomization.yaml
@ -1,11 +0,0 @@
 apiVersion: kustomize.config.k8s.io/v1beta1
 kind: Kustomization
 bases:
 - ../base
 images:
 - name: training-image
  newName: gcr.io/kubeflow-examples/mnist/model
  newTag: build-1202842504546750464
--- a/mnist/web-ui/mnist_client.py
+++ b/mnist/web-ui/mnist_client.py
@ -27,7 +27,7 @@ from tensorflow.examples.tutorials.mnist import input_data
 from tensorflow_serving.apis import predict_pb2
 from tensorflow_serving.apis import prediction_service_pb2
-from PIL import Image
+from PIL import Image # pylint: disable=wrong-import-order
 def get_prediction(image, server_host='127.0.0.1', server_port=9000,
--- a/prow_config.yaml
+++ b/prow_config.yaml
@ -55,6 +55,7 @@ workflows:
      - postsubmit
    include_dirs:
      - xgboost_synthetic/*
      - mnist/*
      - py/kubeflow/examples/create_e2e_workflow.py
  # E2E test for various notebooks
@ -67,17 +68,7 @@ workflows:
      - postsubmit
    include_dirs:
      - xgboost_synthetic/*
      - mnist/*
      - py/kubeflow/examples/create_e2e_workflow.py
    kwargs:
      cluster_pattern: kf-v1-(?!n\d\d)
  # E2E test for mnist example
  - py_func: kubeflow.examples.create_e2e_workflow.create_workflow
    name: mnist
    job_types:
      - periodic
      - presubmit
      - postsubmit
    include_dirs:
      - mnist/*
      - py/kubeflow/examples/create_e2e_workflow.py
--- a/py/kubeflow/examples/create_e2e_workflow.py
+++ b/py/kubeflow/examples/create_e2e_workflow.py
@ -261,82 +261,23 @@ class Builder:
                                                           "xgboost_synthetic",
                                                           "testing")
  def _build_tests_dag_mnist(self):
    """Build the dag for the set of tests to run mnist TFJob tests."""
    task_template = self._build_task_template()
    # ***************************************************************************
-    # Build mnist image
+    # Test mnist
-    step_name = "build-image"
+    step_name = "mnist"
-    train_image_base = "gcr.io/kubeflow-examples/mnist"
+    command = ["pytest", "mnist_gcp_test.py",
-    train_image_tag = "build-" + PROW_DICT['BUILD_ID']
+               # Increase the log level so that info level log statements show up.
-    command = ["/bin/bash",
+               "--log-cli-level=info",
-               "-c",
+               "--log-cli-format='%(levelname)s|%(asctime)s|%(pathname)s|%(lineno)d| %(message)s'",
-               "gcloud auth activate-service-account --key-file=$(GOOGLE_APPLICATION_CREDENTIALS) \
+               # Test timeout in seconds.
-                && make build-gcb IMG=" + train_image_base + " TAG=" + train_image_tag,
+               "--timeout=1800",
-              ]
+               "--junitxml=" + self.artifacts_dir + "/junit_mnist-gcp-test.xml",
               ]
    dependencies = []
-    build_step = self._build_step(step_name, self.workflow, TESTS_DAG_NAME, task_template,
+    mnist_step = self._build_step(step_name, self.workflow, TESTS_DAG_NAME, task_template,
-                                   command, dependencies)
+                                  command, dependencies)
-    build_step["container"]["workingDir"] = os.path.join(self.src_dir, "mnist")
+    mnist_step["container"]["workingDir"] = os.path.join(
-
+      self.src_dir, "py/kubeflow/examples/notebook_tests")
    # ***************************************************************************
    # Test mnist TFJob
    step_name = "tfjob-test"
    # Using python2 to run the test to avoid dependency error.
    command = ["python2", "-m", "pytest", "tfjob_test.py",
               # Increase the log level so that info level log statements show up.
               "--log-cli-level=info",
               "--log-cli-format='%(levelname)s|%(asctime)s|%(pathname)s|%(lineno)d| %(message)s'",
               # Test timeout in seconds.
               "--timeout=1800",
               "--junitxml=" + self.artifacts_dir + "/junit_tfjob-test.xml",
               ]
    dependencies = [build_step['name']]
    tfjob_step = self._build_step(step_name, self.workflow, TESTS_DAG_NAME, task_template,
                                   command, dependencies)
    tfjob_step["container"]["workingDir"] = os.path.join(self.src_dir,
                                                           "mnist",
                                                           "testing")
    # ***************************************************************************
    # Test mnist deploy
    step_name = "deploy-test"
    command = ["python2", "-m", "pytest", "deploy_test.py",
               # Increase the log level so that info level log statements show up.
               "--log-cli-level=info",
               "--log-cli-format='%(levelname)s|%(asctime)s|%(pathname)s|%(lineno)d| %(message)s'",
               # Test timeout in seconds.
               "--timeout=1800",
               "--junitxml=" + self.artifacts_dir + "/junit_deploy-test.xml",
               ]
    dependencies = [tfjob_step["name"]]
    deploy_step = self._build_step(step_name, self.workflow, TESTS_DAG_NAME, task_template,
                                   command, dependencies)
    deploy_step["container"]["workingDir"] = os.path.join(self.src_dir,
                                                           "mnist",
                                                           "testing")
    # ***************************************************************************
    # Test mnist predict
    step_name = "predict-test"
    command = ["pytest", "predict_test.py",
               # Increase the log level so that info level log statements show up.
               "--log-cli-level=info",
               "--log-cli-format='%(levelname)s|%(asctime)s|%(pathname)s|%(lineno)d| %(message)s'",
               # Test timeout in seconds.
               "--timeout=1800",
               "--junitxml=" + self.artifacts_dir + "/junit_predict-test.xml",
               ]
    dependencies = [deploy_step["name"]]
    predict_step = self._build_step(step_name, self.workflow, TESTS_DAG_NAME, task_template,
                                   command, dependencies)
    predict_step["container"]["workingDir"] = os.path.join(self.src_dir,
                                                           "mnist",
                                                           "testing")
  def _build_exit_dag(self):
    """Build the exit handler dag"""
@ -432,8 +373,6 @@ class Builder:
    # Run a dag of tests
    if self.test_target_name.startswith("notebooks"):
      self._build_tests_dag_notebooks()
    elif self.test_target_name == "mnist":
      self._build_tests_dag_mnist()
    else:
      raise RuntimeError('Invalid test_target_name ' + self.test_target_name)
--- a/py/kubeflow/examples/notebook_tests/conftest.py
+++ b/py/kubeflow/examples/notebook_tests/conftest.py
@ -0,0 +1,34 @@
 import pytest
 def pytest_addoption(parser):
  parser.addoption(
    "--name", help="Name for the job. If not specified one was created "
    "automatically", type=str, default="")
  parser.addoption(
    "--namespace", help=("The namespace to run in. This should correspond to"
                         "a namespace associated with a Kubeflow namespace."),
                   type=str,
    default="kubeflow-kf-ci-v1-user")
  parser.addoption(
    "--image", help="Notebook image to use", type=str,
    default="gcr.io/kubeflow-images-public/"
            "tensorflow-1.15.2-notebook-cpu:1.0.0")
  parser.addoption(
    "--repos", help="The repos to checkout; leave blank to use defaults",
    type=str, default="")
@pytest.fixture
 def name(request):
  return request.config.getoption("--name")
@pytest.fixture
 def namespace(request):
  return request.config.getoption("--namespace")
@pytest.fixture
 def image(request):
  return request.config.getoption("--image")
@pytest.fixture
 def repos(request):
  return request.config.getoption("--repos")
--- a/py/kubeflow/examples/notebook_tests/execute_notebook.py
+++ b/py/kubeflow/examples/notebook_tests/execute_notebook.py
@ -0,0 +1,58 @@
 import fire
 import tempfile
 import logging
 import os
 import subprocess
 logger = logging.getLogger(__name__)
 def prepare_env():
  subprocess.check_call(["pip3", "install", "-U", "papermill"])
  subprocess.check_call(["pip3", "install", "-r", "../requirements.txt"])
 def execute_notebook(notebook_path, parameters=None):
  import papermill #pylint: disable=import-error
  temp_dir = tempfile.mkdtemp()
  notebook_output_path = os.path.join(temp_dir, "out.ipynb")
  papermill.execute_notebook(notebook_path, notebook_output_path,
                             cwd=os.path.dirname(notebook_path),
                             parameters=parameters,
                             log_output=True)
  return notebook_output_path
 def run_notebook_test(notebook_path, expected_messages, parameters=None):
  output_path = execute_notebook(notebook_path, parameters=parameters)
  actual_output = open(output_path, 'r').read()
  for expected_message in expected_messages:
    if not expected_message in actual_output:
      logger.error(actual_output)
      assert False, "Unable to find from output: " + expected_message
 class NotebookExecutor:
  @staticmethod
  def test(notebook_path):
    """Test a notebook.
    Args:
      notebook_path: Absolute path of the notebook.
    """
    prepare_env()
    FILE_DIR = os.path.dirname(__file__)
    EXPECTED_MGS = [
        "Finished upload of",
        "Model export success: mockup-model.dat",
        "Pod started running True",
        "Cluster endpoint: http:",
    ]
    run_notebook_test(notebook_path, EXPECTED_MGS)
 if __name__ == "__main__":
  logging.basicConfig(level=logging.INFO,
                      format=('%(levelname)s|%(asctime)s'
                              '|%(message)s|%(pathname)s|%(lineno)d|'),
                      datefmt='%Y-%m-%dT%H:%M:%S',
                      )
  fire.Fire(NotebookExecutor)
--- a/py/kubeflow/examples/notebook_tests/job.yaml
+++ b/py/kubeflow/examples/notebook_tests/job.yaml
@ -0,0 +1,51 @@
 # A batch job to run a notebook using papermill.
 # TODO(jlewi): We should switch to using Tekton
 apiVersion: batch/v1
 kind: Job
 metadata:
  name: nb-test
  labels:
    app: nb-test
 spec:
  backoffLimit: 1
  template:
    metadata:
      annotations:
        # TODO(jlewi): Do we really want to disable sidecar injection
        # in the test? Would it be better to use istio to mimic what happens
        # in notebooks?
        sidecar.istio.io/inject: "false"
      labels:
        app: nb-test
    spec:
      restartPolicy: Never
      securityContext:
        runAsUser: 0
      initContainers:
      # This init container checks out the source code.
      - command:
        - /usr/local/bin/checkout_repos.sh
        - --repos=kubeflow/examples@$(CHECK_TAG)
        - --src_dir=/src
        name: checkout
        image: gcr.io/kubeflow-ci/test-worker:v20190802-c6f9140-e3b0c4
        volumeMounts:
        - mountPath: /src
          name: src
      containers:
      - env:
        - name: PYTHONPATH
          value: /src/kubeflow/examples/py/
      - name: executing-notebooks
        image: execute-image
        command: ["python3", "-m",
                  "kubeflow.examples.notebook_tests.execute_notebook",
                  "test", "/src/kubeflow/examples/mnist/mnist_gcp.ipynb"]
        workingDir: /src/kubeflow/examples/py/kubeflow/examples/notebook_tests
        volumeMounts:
        - mountPath: /src
          name: src
      serviceAccount: default-editor
      volumes:
      - name: src
        emptyDir: {}
--- a/py/kubeflow/examples/notebook_tests/mnist_gcp_test.py
+++ b/py/kubeflow/examples/notebook_tests/mnist_gcp_test.py
@ -0,0 +1,29 @@
 import logging
 import os
 import pytest
 from kubeflow.examples.notebook_tests import nb_test_util
 from kubeflow.testing import util
 # TODO(jlewi): This test is new; there's some work to be done to make it
 # reliable. So for now we mark it as expected to fail in presubmits
 # We only mark it as expected to fail
 # on presubmits because if expected failures don't show up in test grid
 # and we want signal in postsubmits and periodics
@pytest.mark.xfail(os.getenv("JOB_TYPE") == "presubmit", reason="Flaky")
 def test_mnist_gcp(record_xml_attribute, name, namespace, # pylint: disable=too-many-branches,too-many-statements
                   repos, image):
  '''Generate Job and summit.'''
  util.set_pytest_junit(record_xml_attribute, "test_mnist_gcp")
  nb_test_util.run_papermill_job(name, namespace, repos, image)
 if __name__ == "__main__":
  logging.basicConfig(level=logging.INFO,
                      format=('%(levelname)s|%(asctime)s'
                              '|%(pathname)s|%(lineno)d| %(message)s'),
                      datefmt='%Y-%m-%dT%H:%M:%S',
                      )
  logging.getLogger().setLevel(logging.INFO)
  pytest.main()
--- a/py/kubeflow/examples/notebook_tests/nb_test_util.py
+++ b/py/kubeflow/examples/notebook_tests/nb_test_util.py
@ -0,0 +1,80 @@
 """Some utitilies for running notebook tests."""
 import datetime
 import logging
 import uuid
 import yaml
 from kubernetes import client as k8s_client
 from kubeflow.testing import argo_build_util
 from kubeflow.testing import util
 def run_papermill_job(name, namespace, # pylint: disable=too-many-branches,too-many-statements
                      repos, image):
  """Generate a K8s job to run a notebook using papermill
  Args:
    name: Name for the K8s job
    namespace: The namespace where the job should run.
    repos: (Optional) Which repos to checkout; if not specified tries
      to infer based on PROW environment variables
    image:
  """
  util.maybe_activate_service_account()
  with open("job.yaml") as hf:
    job = yaml.load(hf)
  # We need to checkout the correct version of the code
  # in presubmits and postsubmits. We should check the environment variables
  # for the prow environment variables to get the appropriate values.
  # We should probably also only do that if the
  # See
  # https://github.com/kubernetes/test-infra/blob/45246b09ed105698aa8fb928b7736d14480def29/prow/jobs.md#job-environment-variables
  if not repos:
    repos = argo_build_util.get_repo_from_prow_env()
  logging.info("Repos set to %s", repos)
  job["spec"]["template"]["spec"]["initContainers"][0]["command"] = [
    "/usr/local/bin/checkout_repos.sh",
    "--repos=" + repos,
    "--src_dir=/src",
    "--depth=all",
  ]
  job["spec"]["template"]["spec"]["containers"][0]["image"] = image
  util.load_kube_config(persist_config=False)
  if name:
    job["metadata"]["name"] = name
  else:
    job["metadata"]["name"] = ("xgboost-test-" +
                               datetime.datetime.now().strftime("%H%M%S")
                               + "-" + uuid.uuid4().hex[0:3])
    name = job["metadata"]["name"]
  job["metadata"]["namespace"] = namespace
  # Create an API client object to talk to the K8s master.
  api_client = k8s_client.ApiClient()
  batch_api = k8s_client.BatchV1Api(api_client)
  logging.info("Creating job:\n%s", yaml.dump(job))
  actual_job = batch_api.create_namespaced_job(job["metadata"]["namespace"],
                                               job)
  logging.info("Created job %s.%s:\n%s", namespace, name,
               yaml.safe_dump(actual_job.to_dict()))
  final_job = util.wait_for_job(api_client, namespace, name,
                                timeout=datetime.timedelta(minutes=30))
  logging.info("Final job:\n%s", yaml.safe_dump(final_job.to_dict()))
  if not final_job.status.conditions:
    raise RuntimeError("Job {0}.{1}; did not complete".format(namespace, name))
  last_condition = final_job.status.conditions[-1]
  if last_condition.type not in ["Complete"]:
    logging.error("Job didn't complete successfully")
    raise RuntimeError("Job {0}.{1} failed".format(namespace, name))
--- a/test/workflows/components/workflows.libsonnet
+++ b/test/workflows/components/workflows.libsonnet
@ -0,0 +1,252 @@
 {
  // TODO(https://github.com/ksonnet/ksonnet/issues/222): Taking namespace as an argument is a work around for the fact that ksonnet
  // doesn't support automatically piping in the namespace from the environment to prototypes.
  // convert a list of two items into a map representing an environment variable
  // TODO(jlewi): Should we move this into kubeflow/core/util.libsonnet
  listToMap:: function(v)
    {
      name: v[0],
      value: v[1],
    },
  // Function to turn comma separated list of prow environment variables into a dictionary.
  parseEnv:: function(v)
    local pieces = std.split(v, ",");
    if v != "" && std.length(pieces) > 0 then
      std.map(
        function(i) $.listToMap(std.split(i, "=")),
        std.split(v, ",")
      )
    else [],
  parts(namespace, name):: {
    // Workflow to run the e2e test.
    e2e(prow_env, bucket):
      // mountPath is the directory where the volume to store the test data
      // should be mounted.
      local mountPath = "/mnt/" + "test-data-volume";
      // testDir is the root directory for all data for a particular test run.
      local testDir = mountPath + "/" + name;
      // outputDir is the directory to sync to GCS to contain the output for this job.
      local outputDir = testDir + "/output";
      local artifactsDir = outputDir + "/artifacts";
      local goDir = testDir + "/go";
      // Source directory where all repos should be checked out
      local srcRootDir = testDir + "/src";
      // The directory containing the kubeflow/examples repo
      local srcDir = srcRootDir + "/kubeflow/examples";
      local image = "gcr.io/kubeflow-ci/test-worker";
      // The name of the NFS volume claim to use for test files.
      // local nfsVolumeClaim = "kubeflow-testing";
      local nfsVolumeClaim = "nfs-external";
      // The name to use for the volume to use to contain test data.
      local dataVolume = "kubeflow-test-volume";
      local versionTag = name;
      // The directory within the kubeflow_testing submodule containing
      // py scripts to use.
      local kubeflowExamplesPy = srcDir;
      local kubeflowTestingPy = srcRootDir + "/kubeflow/testing/py";
      local project = "kubeflow-ci";
      // GKE cluster to use
      // We need to truncate the cluster to no more than 40 characters because
      // cluster names can be a max of 40 characters.
      // We expect the suffix of the cluster name to be unique salt.
      // We prepend a z because cluster name must start with an alphanumeric character
      // and if we cut the prefix we might end up starting with "-" or other invalid
      // character for first character.
      local cluster =
        if std.length(name) > 40 then
          "z" + std.substr(name, std.length(name) - 39, 39)
        else
        name;
      local zone = "us-east1-d";
      local chart = srcDir + "/bin/examples-chart-0.2.1-" + versionTag + ".tgz";
      {
        // Build an Argo template to execute a particular command.
        // step_name: Name for the template
        // command: List to pass as the container command.
        buildTemplate(step_name, command):: {
          name: step_name,
          container: {
            command: command,
            image: image,
            workingDir: srcDir,
            env: [
              {
                // Add the source directories to the python path.
                name: "PYTHONPATH",
                value: kubeflowExamplesPy + ":" + kubeflowTestingPy,
              },
              {
                // Set the GOPATH
                name: "GOPATH",
                value: goDir,
              },
              {
                name: "GOOGLE_APPLICATION_CREDENTIALS",
                value: "/secret/gcp-credentials/key.json",
              },
              {
                name: "GIT_TOKEN",
                valueFrom: {
                  secretKeyRef: {
                    name: "github-token",
                    key: "github_token",
                  },
                },
              },
              {
                name: "EXTRA_REPOS",
                value: "kubeflow/testing@HEAD",
              },
            ] + prow_env,
            volumeMounts: [
              {
                name: dataVolume,
                mountPath: mountPath,
              },
              {
                name: "github-token",
                mountPath: "/secret/github-token",
              },
              {
                name: "gcp-credentials",
                mountPath: "/secret/gcp-credentials",
              },
            ],
          },
        },  // buildTemplate
        apiVersion: "argoproj.io/v1alpha1",
        kind: "Workflow",
        metadata: {
          name: name,
          namespace: namespace,
        },
        // TODO(jlewi): Use OnExit to run cleanup steps.
        spec: {
          entrypoint: "e2e",
          volumes: [
            {
              name: "github-token",
              secret: {
                secretName: "github-token",
              },
            },
            {
              name: "gcp-credentials",
              secret: {
                secretName: "kubeflow-testing-credentials",
              },
            },
            {
              name: dataVolume,
              persistentVolumeClaim: {
                claimName: nfsVolumeClaim,
              },
            },
          ],  // volumes
          // onExit specifies the template that should always run when the workflow completes.
          onExit: "exit-handler",
          templates: [
            {
              name: "e2e",
              steps: [
                [{
                  name: "checkout",
                  template: "checkout",
                }],
                [
                  {
                    name: "create-pr-symlink",
                    template: "create-pr-symlink",
                  },
                  // test_py_checks runs all py files matching "_test.py"
                  // This is currently commented out because the only matching tests
                  // are manual tests for some of the examples and/or they require
                  // dependencies (e.g. tensorflow) not in the generic test worker image.
                  //                                
                  // 
                  // test_py_checks doesn't have options to exclude specific directories.
                  // Since there are no other tests we just comment it out.
                  //
                  // TODO(https://github.com/kubeflow/testing/issues/240): Modify py_test
                  // so we can exclude specific files.
                  //
                  // {
                  //    name: "py-test",
                  //    template: "py-test",
                  // },
                  // {
                  //  name: "py-lint",
                  //  template: "py-lint",
                  //},
                ],
              ],
            },
            {
              name: "exit-handler",
              steps: [
                [{
                  name: "copy-artifacts",
                  template: "copy-artifacts",
                }],
              ],
            },
            {
              name: "checkout",
              container: {
                command: [
                  "/usr/local/bin/checkout.sh",
                  srcRootDir,
                ],
                env: prow_env + [{
                  name: "EXTRA_REPOS",
                  value: "kubeflow/testing@HEAD",
                }],
                image: image,
                volumeMounts: [
                  {
                    name: dataVolume,
                    mountPath: mountPath,
                  },
                ],
              },
            },  // checkout
            $.parts(namespace, name).e2e(prow_env, bucket).buildTemplate("py-test", [
              "python",
              "-m",
              "kubeflow.testing.test_py_checks",
              "--artifacts_dir=" + artifactsDir,
              "--src_dir=" + srcDir,
            ]),  // py test
            $.parts(namespace, name).e2e(prow_env, bucket).buildTemplate("py-lint", [
              "python",
              "-m",
              "kubeflow.testing.test_py_lint",
              "--artifacts_dir=" + artifactsDir,
              "--src_dir=" + srcDir,
            ]),  // py lint
            $.parts(namespace, name).e2e(prow_env, bucket).buildTemplate("create-pr-symlink", [
              "python",
              "-m",
              "kubeflow.testing.prow_artifacts",
              "--artifacts_dir=" + outputDir,
              "create_pr_symlink",
              "--bucket=" + bucket,
            ]),  // create-pr-symlink
            $.parts(namespace, name).e2e(prow_env, bucket).buildTemplate("copy-artifacts", [
              "python",
              "-m",
              "kubeflow.testing.prow_artifacts",
              "--artifacts_dir=" + outputDir,
              "copy_artifacts",
              "--bucket=" + bucket,
            ]),  // copy-artifacts
          ],  // templates
        },
      },  // e2e
  },  // parts
 }
--- a/xgboost_synthetic/build-train-deploy.ipynb
+++ b/xgboost_synthetic/build-train-deploy.ipynb
@ -1257,7 +1257,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.7.5rc1"
+   "version": "3.6.9"
  }
 },
 "nbformat": 4,
--- a/xgboost_synthetic/testing/README.md
+++ b/xgboost_synthetic/testing/README.md
@ -0,0 +1 @@
 TODO: We should reuse/share logic in py/kubeflow/examples/notebook_tests
		`@ -0,0 +1,2 @@`
							`git+git://github.com/kubeflow/fairing.git@9b0d4ed4796ba349ac6067bbd802ff1d6454d015`
							`retrying==1.3.3`
		`@ -0,0 +1 @@`
							`TODO: We should reuse/share logic in py/kubeflow/examples/notebook_tests`