{ "cells": [ { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# MNIST E2E on Kubeflow on Vanilla k8s\n", "\n", "This example guides you through:\n", " \n", " 1. Taking an example TensorFlow model and modifying it to support distributed training\n", " 1. Serving the resulting model using TFServing\n", " 1. Deploying and using a web-app that uses the model\n", " \n", "## Requirements\n", "\n", " * You must be running Kubeflow 1.0 using the k8s istio config or the istio dex config.\n", " " ] }, { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Prepare model\n", "\n", "There is a delta between existing distributed mnist examples and what's needed to run well as a TFJob.\n", "\n", "Basically, we must:\n", "\n", "1. Add options in order to make the model configurable.\n", "1. Use `tf.estimator.train_and_evaluate` to enable model exporting and serving.\n", "1. Define serving signatures for model serving.\n", "\n", "The resulting model is [model.py](model.py)." ] }, { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Configure external service credentials\n", "\n", "\n", "## Step 1 - Pushing to DockerHub\n", "\n", "Source documentation: [Kaniko docs](https://github.com/GoogleContainerTools/kaniko#pushing-to-docker-hub)\n", "\n", "### Why do we need this?\n", "\n", "Kaniko is used by fairing to build the model every time the notebook is run and deploy a fresh model.\n", "The newly built image is pushed into the DOCKER_REGISTRY and pulled from there by subsequent resources.\n", "\n", "### Configure docker credentials\n", "\n", "Get your docker registry user and password encoded in base64
\n", "\n", "`echo -n USER:PASSWORD | base64`
\n", "\n", "Create a config.json file with your Docker registry url and the previous generated base64 string
\n", "```json\n", "{\n", "\t\"auths\": {\n", "\t\t\"https://index.docker.io/v1/\": {\n", "\t\t\t\"auth\": \"xxxxxxxxxxxxxxx\"\n", "\t\t}\n", "\t}\n", "}\n", "```\n", "\n", "
\n", "\n", "### Create a config-map in the namespace you're using with the docker config\n", "\n", "`kubectl create --namespace ${NAMESPACE} configmap docker-config --from-file=`\n", "\n", "## Step 2 - Set DOCKER_REGISTRY\n", "\n", "The **DOCKER_REGISTRY** variable is used to push the newly built image.
\n", "Please change the variable to the registry for which you've configured credentials." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from kubernetes import client as k8s_client\n", "from kubernetes.client import rest as k8s_rest\n", "from kubeflow import fairing \n", "from kubeflow.fairing import utils as fairing_utils\n", "from kubeflow.fairing.builders import append\n", "from kubeflow.fairing.deployers import job\n", "from kubeflow.fairing.preprocessors import base as base_preprocessor\n", "\n", "DOCKER_REGISTRY = \"ciscoai\"\n", "namespace = fairing_utils.get_current_k8s_namespace()\n", "\n", "from kubernetes import client as k8s_client\n", "from kubernetes.client.rest import ApiException\n", "\n", "api_client = k8s_client.CoreV1Api()\n", "minio_service_endpoint = None\n", "try:\n", " minio_service_endpoint = api_client.read_namespaced_service(name='minio-service', namespace='kubeflow').spec.cluster_ip\n", "except ApiException as e:\n", " if e.status == 403:\n", " logging.warning(f\"The service account doesn't have sufficient privileges \"\n", " f\"to get the kubeflow minio-service. \"\n", " f\"You will have to manually enter the minio cluster-ip. \"\n", " f\"To make this function work ask someone with cluster \"\n", " f\"priveleges to create an appropriate \"\n", " f\"clusterrolebinding by running a command.\\n\"\n", " f\"kubectl create --namespace=kubeflow rolebinding \"\n", " \"--clusterrole=kubeflow-view \"\n", " \"--serviceaccount=${NAMESPACE}:default-editor \"\n", " \"${NAMESPACE}-minio-view\")\n", " logging.error(\"API access denied with reason: {e.reason}\")\n", "\n", "s3_endpoint = minio_service_endpoint\n", "minio_endpoint = \"http://\"+s3_endpoint\n", "minio_username = \"minio\"\n", "minio_key = \"minio123\"\n", "minio_region = \"us-east-1\"\n", "\n", "logging.info(f\"Running in namespace {namespace}\")\n", "logging.info(f\"Using docker registry {DOCKER_REGISTRY}\")\n", "logging.info(f\"Using minio instance with endpoint '{s3_endpoint}'\")" ] }, { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Install Required Libraries\n", "\n", "Import the libraries required to train this model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import logging\n", "import os\n", "import uuid\n", "from importlib import reload\n", "import notebook_setup\n", "reload(notebook_setup)\n", "notebook_setup.notebook_setup()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import k8s_util\n", "# Force a reload of kubeflow; since kubeflow is a multi namespace module\n", "# it looks like doing this in notebook_setup may not be sufficient\n", "import kubeflow\n", "reload(kubeflow)\n", "from kubernetes import client as k8s_client\n", "from kubernetes import config as k8s_config\n", "from kubeflow.tfjob.api import tf_job_client as tf_job_client_module\n", "from IPython.core.display import display, HTML\n", "import yaml" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO(https://github.com/kubeflow/fairing/issues/426): We should get rid of this once the default \n", "# Kaniko image is updated to a newer image than 0.7.0.\n", "from kubeflow.fairing import constants\n", "constants.constants.KANIKO_IMAGE = \"gcr.io/kaniko-project/executor:v0.14.0\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from kubeflow.fairing.builders import cluster\n", "\n", "# output_map is a map of extra files to add to the notebook.\n", "# It is a map from source location to the location inside the context.\n", "output_map = {\n", " \"Dockerfile.model\": \"Dockerfile\",\n", " \"model.py\": \"model.py\"\n", "}\n", "\n", "preprocessor = base_preprocessor.BasePreProcessor(\n", " command=[\"python\"], # The base class will set this.\n", " input_files=[],\n", " path_prefix=\"/app\", # irrelevant since we aren't preprocessing any files\n", " output_map=output_map)\n", "\n", "preprocessor.preprocess()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Use a Tensorflow image as the base image\n", "# We use a custom Dockerfile \n", "from kubeflow.fairing.cloud.k8s import MinioUploader\n", "from kubeflow.fairing.builders.cluster.minio_context import MinioContextSource\n", "\n", "minio_uploader = MinioUploader(endpoint_url=minio_endpoint, minio_secret=minio_username, minio_secret_key=minio_key, region_name=minio_region)\n", "minio_context_source = MinioContextSource(endpoint_url=minio_endpoint, minio_secret=minio_username, minio_secret_key=minio_key, region_name=minio_region)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cluster_builder = cluster.cluster.ClusterBuilder(registry=DOCKER_REGISTRY,\n", " base_image=\"\", # base_image is set in the Dockerfile\n", " preprocessor=preprocessor,\n", " image_name=\"mnist\",\n", " dockerfile_path=\"Dockerfile\",\n", " context_source=minio_context_source)\n", "cluster_builder.build()\n", "logging.info(f\"Built image {cluster_builder.image_tag}\")" ] }, { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Create a Minio Bucket\n", "\n", "* Create a minio bucket to store our models and other results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mnist_bucket = f\"{DOCKER_REGISTRY}-mnist\"\n", "minio_uploader.create_bucket(mnist_bucket)\n", "logging.info(f\"Bucket {mnist_bucket} created or already exists\")" ] }, { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Distributed training\n", "\n", "* We will train the model by using TFJob to run a distributed training job" ] }, { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "### Training job parameters" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_name = f\"mnist-train-{uuid.uuid4().hex[:4]}\"\n", "num_ps = 1\n", "num_workers = 2\n", "model_dir = f\"s3://{mnist_bucket}/mnist\"\n", "export_path = f\"s3://{mnist_bucket}/mnist/export\" \n", "train_steps = 200\n", "batch_size = 100\n", "learning_rate = .01\n", "image = cluster_builder.image_tag" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_spec = f\"\"\"apiVersion: kubeflow.org/v1\n", "kind: TFJob\n", "metadata:\n", " name: {train_name} \n", "spec:\n", " tfReplicaSpecs:\n", " Ps:\n", " replicas: {num_ps}\n", " template:\n", " metadata:\n", " annotations:\n", " sidecar.istio.io/inject: \"false\"\n", " spec:\n", " serviceAccount: default-editor\n", " containers:\n", " - name: tensorflow\n", " command:\n", " - python\n", " - /opt/model.py\n", " - --tf-model-dir={model_dir}\n", " - --tf-export-dir={export_path}\n", " - --tf-train-steps={train_steps}\n", " - --tf-batch-size={batch_size}\n", " - --tf-learning-rate={learning_rate}\n", " env:\n", " - name: S3_ENDPOINT\n", " value: {s3_endpoint}\n", " - name: AWS_ENDPOINT_URL\n", " value: {minio_endpoint}\n", " - name: AWS_REGION\n", " value: {minio_region}\n", " - name: BUCKET_NAME\n", " value: {mnist_bucket}\n", " - name: S3_USE_HTTPS\n", " value: \"0\"\n", " - name: S3_VERIFY_SSL\n", " value: \"0\"\n", " - name: AWS_ACCESS_KEY_ID\n", " value: {minio_username}\n", " - name: AWS_SECRET_ACCESS_KEY\n", " value: {minio_key}\n", " image: {image}\n", " workingDir: /opt\n", " restartPolicy: OnFailure\n", " Chief:\n", " replicas: 1\n", " template:\n", " metadata:\n", " annotations:\n", " sidecar.istio.io/inject: \"false\"\n", " spec:\n", " serviceAccount: default-editor\n", " containers:\n", " - name: tensorflow\n", " command:\n", " - python\n", " - /opt/model.py\n", " - --tf-model-dir={model_dir}\n", " - --tf-export-dir={export_path}\n", " - --tf-train-steps={train_steps}\n", " - --tf-batch-size={batch_size}\n", " - --tf-learning-rate={learning_rate}\n", " env:\n", " - name: S3_ENDPOINT\n", " value: {s3_endpoint}\n", " - name: AWS_ENDPOINT_URL\n", " value: {minio_endpoint}\n", " - name: AWS_REGION\n", " value: {minio_region}\n", " - name: BUCKET_NAME\n", " value: {mnist_bucket}\n", " - name: S3_USE_HTTPS\n", " value: \"0\"\n", " - name: S3_VERIFY_SSL\n", " value: \"0\"\n", " - name: AWS_ACCESS_KEY_ID\n", " value: {minio_username}\n", " - name: AWS_SECRET_ACCESS_KEY\n", " value: {minio_key}\n", " image: {image}\n", " workingDir: /opt\n", " restartPolicy: OnFailure\n", " Worker:\n", " replicas: 1\n", " template:\n", " metadata:\n", " annotations:\n", " sidecar.istio.io/inject: \"false\"\n", " spec:\n", " serviceAccount: default-editor\n", " containers:\n", " - name: tensorflow\n", " command:\n", " - python\n", " - /opt/model.py\n", " - --tf-model-dir={model_dir}\n", " - --tf-export-dir={export_path}\n", " - --tf-train-steps={train_steps}\n", " - --tf-batch-size={batch_size}\n", " - --tf-learning-rate={learning_rate}\n", " env:\n", " - name: S3_ENDPOINT\n", " value: {s3_endpoint}\n", " - name: AWS_ENDPOINT_URL\n", " value: {minio_endpoint}\n", " - name: AWS_REGION\n", " value: {minio_region}\n", " - name: BUCKET_NAME\n", " value: {mnist_bucket}\n", " - name: S3_USE_HTTPS\n", " value: \"0\"\n", " - name: S3_VERIFY_SSL\n", " value: \"0\"\n", " - name: AWS_ACCESS_KEY_ID\n", " value: {minio_username}\n", " - name: AWS_SECRET_ACCESS_KEY\n", " value: {minio_key}\n", " image: {image}\n", " workingDir: /opt\n", " restartPolicy: OnFailure\n", "\"\"\" " ] }, { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "### Create the training job\n", "\n", "* You could write the spec to a YAML file and then do `kubectl apply -f {FILE}`\n", "* Since you are running in jupyter you will use the TFJob client\n", "* You will run the TFJob in a namespace created by a Kubeflow profile\n", " * The namespace will be the same namespace you are running the notebook in\n", " * Creating a profile ensures the namespace is provisioned with service accounts and other resources needed for Kubeflow" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf_job_client = tf_job_client_module.TFJobClient()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf_job_body = yaml.safe_load(train_spec)\n", "tf_job = tf_job_client.create(tf_job_body, namespace=namespace) \n", "\n", "logging.info(f\"Created job {namespace}.{train_name}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from kubeflow.tfjob import TFJobClient\n", "tfjob_client = TFJobClient()\n", "tfjob_client.wait_for_job(train_name, namespace=namespace, watch=True)" ] }, { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Get TF Job logs" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tfjob_client.get_logs(train_name, namespace=namespace)" ] }, { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Check the model in Minio" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#TODO(swiftdiaries): Check object key for model specifically\n", "from botocore.exceptions import ClientError\n", "\n", "try:\n", " model_response = minio_uploader.client.list_objects(Bucket=mnist_bucket)\n", " # Minimal check to see if at least the bucket is created\n", " if model_response[\"ResponseMetadata\"][\"HTTPStatusCode\"] == 200:\n", " logging.info(f\"{model_dir} found in {mnist_bucket} bucket\")\n", "except ClientError as err:\n", " logging.error(err)" ] }, { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Deploy Tensorboard" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tb_name = \"mnist-tensorboard\"\n", "tb_deploy = f\"\"\"apiVersion: apps/v1\n", "kind: Deployment\n", "metadata:\n", " labels:\n", " app: mnist-tensorboard\n", " name: {tb_name}\n", " namespace: {namespace}\n", "spec:\n", " selector:\n", " matchLabels:\n", " app: mnist-tensorboard\n", " template:\n", " metadata:\n", " labels:\n", " app: mnist-tensorboard\n", " version: v1\n", " spec:\n", " serviceAccount: default-editor\n", " containers:\n", " - command:\n", " - /usr/local/bin/tensorboard\n", " - --logdir={model_dir}\n", " - --port=80\n", " image: tensorflow/tensorflow:1.15.2-py3\n", " env:\n", " - name: S3_ENDPOINT\n", " value: {s3_endpoint}\n", " - name: AWS_ENDPOINT_URL\n", " value: {minio_endpoint}\n", " - name: AWS_REGION\n", " value: {minio_region}\n", " - name: BUCKET_NAME\n", " value: {mnist_bucket}\n", " - name: S3_USE_HTTPS\n", " value: \"0\"\n", " - name: S3_VERIFY_SSL\n", " value: \"0\"\n", " - name: AWS_ACCESS_KEY_ID\n", " value: {minio_username}\n", " - name: AWS_SECRET_ACCESS_KEY\n", " value: {minio_key} \n", " name: tensorboard\n", " ports:\n", " - containerPort: 80\n", "\"\"\"\n", "tb_service = f\"\"\"apiVersion: v1\n", "kind: Service\n", "metadata:\n", " labels:\n", " app: mnist-tensorboard\n", " name: {tb_name}\n", " namespace: {namespace}\n", "spec:\n", " ports:\n", " - name: http-tb\n", " port: 80\n", " targetPort: 80\n", " selector:\n", " app: mnist-tensorboard\n", " type: ClusterIP\n", "\"\"\"\n", "\n", "tb_virtual_service = f\"\"\"apiVersion: networking.istio.io/v1alpha3\n", "kind: VirtualService\n", "metadata:\n", " name: {tb_name}\n", " namespace: {namespace}\n", "spec:\n", " gateways:\n", " - kubeflow/kubeflow-gateway\n", " hosts:\n", " - '*'\n", " http:\n", " - match:\n", " - uri:\n", " prefix: /mnist/{namespace}/tensorboard/\n", " rewrite:\n", " uri: /\n", " route:\n", " - destination:\n", " host: {tb_name}.{namespace}.svc.cluster.local\n", " port:\n", " number: 80\n", " timeout: 300s\n", "\"\"\"\n", "\n", "tb_specs = [tb_deploy, tb_service, tb_virtual_service]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "k8s_util.apply_k8s_specs(tb_specs, k8s_util.K8S_CREATE_OR_REPLACE)" ] }, { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Get Tensorboard URL" ] }, { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Run this with the appropriate RBAC permissions
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "istio_ingress_endpoint = None\n", "try:\n", " istio_ingress_endpoint = api_client.read_namespaced_service(name='istio-ingressgateway', namespace='istio-system')\n", " istio_ports = istio_ingress_endpoint.spec.ports\n", " for istio_port in istio_ports:\n", " if istio_port.name == \"http2\":\n", " logging.warning(\"get worker-node-ip by running 'kubectl get nodes -o wide'\")\n", " logging.info(f\"Tensorboard URL: :{istio_port.node_port}/mnist/anonymous/tensorboard/\")\n", "except ApiException as e:\n", " if e.status == 403:\n", " logging.warning(f\"The service account doesn't have sufficient privileges \"\n", " f\"to get the kubeflow minio-service. \"\n", " f\"You will have to manually enter the minio cluster-ip. \"\n", " f\"To make this function work ask someone with cluster \"\n", " f\"priveleges to create an appropriate \"\n", " f\"clusterrolebinding by running a command.\\n\"\n", " f\"kubectl create --namespace=istio-system rolebinding \"\n", " \"--clusterrole=kubeflow-view \"\n", " \"--serviceaccount=${NAMESPACE}:default-editor \"\n", " \"${NAMESPACE}-ingressgateway-view\")\n", " logging.warn(\"API Access restricted. Please get URL by running the kubectl commands at the end of the notebook\")" ] }, { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Serve the model\n", "\n", "* Deploy the model using tensorflow serving\n", "* We need to create\n", " 1. A Kubernetes Deployment\n", " 1. A Kubernetes service\n", " 1. (Optional) Create a configmap containing the prometheus monitoring config" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "deploy_name = \"mnist-model\"\n", "model_base_path = export_path\n", "\n", "# The web ui defaults to mnist-service so if you change it you will\n", "# need to change it in the UI as well to send predictions to the mode\n", "model_service = \"mnist-service\"\n", "\n", "deploy_spec = f\"\"\"apiVersion: apps/v1\n", "kind: Deployment\n", "metadata:\n", " labels:\n", " app: mnist\n", " name: {deploy_name}\n", " namespace: {namespace}\n", "spec:\n", " selector:\n", " matchLabels:\n", " app: mnist-model\n", " template:\n", " metadata:\n", " # TODO(jlewi): Right now we disable the istio side car because otherwise ISTIO rbac will prevent the\n", " # UI from sending RPCs to the server. We should create an appropriate ISTIO rbac authorization\n", " # policy to allow traffic from the UI to the model servier.\n", " # https://istio.io/docs/concepts/security/#target-selectors\n", " annotations: \n", " sidecar.istio.io/inject: \"false\"\n", " labels:\n", " app: mnist-model\n", " version: v1\n", " spec:\n", " serviceAccount: default-editor\n", " containers:\n", " - args:\n", " - --port=9000\n", " - --rest_api_port=8500\n", " - --model_name=mnist\n", " - --model_base_path={model_base_path}\n", " command:\n", " - /usr/bin/tensorflow_model_server\n", " env:\n", " - name: modelBasePath\n", " value: {model_base_path}\n", " - name: S3_ENDPOINT\n", " value: {s3_endpoint}\n", " - name: AWS_ENDPOINT_URL\n", " value: {minio_endpoint}\n", " - name: AWS_REGION\n", " value: {minio_region}\n", " - name: BUCKET_NAME\n", " value: {mnist_bucket}\n", " - name: S3_USE_HTTPS\n", " value: \"0\"\n", " - name: S3_VERIFY_SSL\n", " value: \"0\"\n", " - name: AWS_ACCESS_KEY_ID\n", " value: {minio_username}\n", " - name: AWS_SECRET_ACCESS_KEY\n", " value: {minio_key} \n", " image: tensorflow/serving:1.15.0\n", " imagePullPolicy: IfNotPresent\n", " livenessProbe:\n", " initialDelaySeconds: 30\n", " periodSeconds: 30\n", " tcpSocket:\n", " port: 9000\n", " name: mnist\n", " ports:\n", " - containerPort: 9000\n", " - containerPort: 8500\n", " resources:\n", " limits:\n", " cpu: \"4\"\n", " memory: 4Gi\n", " requests:\n", " cpu: \"1\"\n", " memory: 1Gi\n", " volumeMounts:\n", " - mountPath: /var/config/\n", " name: model-config\n", " volumes:\n", " - configMap:\n", " name: {deploy_name}\n", " name: model-config\n", "\"\"\"\n", "\n", "service_spec = f\"\"\"apiVersion: v1\n", "kind: Service\n", "metadata:\n", " annotations: \n", " prometheus.io/path: /monitoring/prometheus/metrics\n", " prometheus.io/port: \"8500\"\n", " prometheus.io/scrape: \"true\"\n", " labels:\n", " app: mnist-model\n", " name: {model_service}\n", " namespace: {namespace}\n", "spec:\n", " ports:\n", " - name: grpc-tf-serving\n", " port: 9000\n", " targetPort: 9000\n", " - name: http-tf-serving\n", " port: 8500\n", " targetPort: 8500\n", " selector:\n", " app: mnist-model\n", " type: ClusterIP\n", "\"\"\"\n", "\n", "monitoring_config = f\"\"\"kind: ConfigMap\n", "apiVersion: v1\n", "metadata:\n", " name: {deploy_name}\n", " namespace: {namespace}\n", "data:\n", " monitoring_config.txt: |-\n", " prometheus_config: {{\n", " enable: true,\n", " path: \"/monitoring/prometheus/metrics\"\n", " }}\n", "\"\"\"\n", "\n", "model_specs = [deploy_spec, service_spec, monitoring_config]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "k8s_util.apply_k8s_specs(model_specs, k8s_util.K8S_CREATE_OR_REPLACE) " ] }, { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Deploy the mnist UI\n", "\n", "* We will now deploy the UI to visual the mnist results\n", "* Note: This is using a prebuilt and public docker image for the UI" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ui_name = \"mnist-ui\"\n", "ui_deploy = f\"\"\"apiVersion: apps/v1\n", "kind: Deployment\n", "metadata:\n", " name: {ui_name}\n", " namespace: {namespace}\n", "spec:\n", " replicas: 1\n", " selector:\n", " matchLabels:\n", " app: mnist-web-ui\n", " template:\n", " metadata:\n", " labels:\n", " app: mnist-web-ui\n", " spec:\n", " containers:\n", " - image: gcr.io/kubeflow-examples/mnist/web-ui:v20190112-v0.2-142-g3b38225\n", " name: web-ui\n", " ports:\n", " - containerPort: 5000 \n", " serviceAccount: default-editor\n", "\"\"\"\n", "\n", "ui_service = f\"\"\"apiVersion: v1\n", "kind: Service\n", "metadata:\n", " annotations:\n", " name: {ui_name}\n", " namespace: {namespace}\n", "spec:\n", " ports:\n", " - name: http-mnist-ui\n", " port: 80\n", " targetPort: 5000\n", " selector:\n", " app: mnist-web-ui\n", " type: ClusterIP\n", "\"\"\"\n", "\n", "ui_virtual_service = f\"\"\"apiVersion: networking.istio.io/v1alpha3\n", "kind: VirtualService\n", "metadata:\n", " name: {ui_name}\n", " namespace: {namespace}\n", "spec:\n", " gateways:\n", " - kubeflow/kubeflow-gateway\n", " hosts:\n", " - '*'\n", " http:\n", " - match:\n", " - uri:\n", " prefix: /mnist/{namespace}/ui/\n", " rewrite:\n", " uri: /\n", " route:\n", " - destination:\n", " host: {ui_name}.{namespace}.svc.cluster.local\n", " port:\n", " number: 80\n", " timeout: 300s\n", "\"\"\"\n", "\n", "ui_specs = [ui_deploy, ui_service, ui_virtual_service]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "k8s_util.apply_k8s_specs(ui_specs, k8s_util.K8S_CREATE_OR_REPLACE) " ] }, { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Access the web UI\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "istio_ingress_endpoint = None\n", "try:\n", " istio_ingress_endpoint = api_client.read_namespaced_service(name='istio-ingressgateway', namespace='istio-system')\n", " istio_ports = istio_ingress_endpoint.spec.ports\n", " for istio_port in istio_ports:\n", " if istio_port.name == \"http2\":\n", " logging.warning(\"get worker-node-ip by running 'kubectl get nodes -o wide'\")\n", " logging.info(f\"Tensorboard URL: :{istio_port.node_port}/mnist/anonymous/ui/\")\n", "except ApiException as e:\n", " if e.status == 403:\n", " logging.warning(f\"The service account doesn't have sufficient privileges \"\n", " f\"to get the kubeflow minio-service. \"\n", " f\"You will have to manually enter the minio cluster-ip. \"\n", " f\"To make this function work ask someone with cluster \"\n", " f\"priveleges to create an appropriate \"\n", " f\"clusterrolebinding by running a command.\\n\"\n", " f\"kubectl create --namespace=kubeflow rolebinding \"\n", " \"--clusterrole=kubeflow-view \"\n", " \"--serviceaccount=${NAMESPACE}:default-editor \"\n", " \"${NAMESPACE}-minio-view\")\n", " logging.warn(\"API Access restricted. Please get URL by running the kubectl commands at the end of the notebook\")" ] }, { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Get Tensorboard URL\n", "\n", "Run this with the appropriate RBAC permissions
\n", "**Note:** You can get the node worker ip from `kubectl get no -o wide`
\n", "```bash\n", "export INGRESS_HOST=\n", "export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name==\"http2\")].nodePort}')\n", "printf \"Tensorboard URL: \\n${INGRESS_HOST}:${INGRESS_PORT}/mnist/anonymous/tensorboard/\\n\"\n", "```" ] }, { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Access the Web UI\n", "\n", "Run this with the appropriate RBAC permissions
\n", "**Note:** You can get the node worker ip from `kubectl get no -o wide`
\n", "```bash\n", "!export INGRESS_HOST=\n", "!export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name==\"http2\")].nodePort}')\n", "!printf \"mnist-web-app URL: \\n${INGRESS_HOST}:${INGRESS_PORT}/mnist/anonymous/ui/\\n\"\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 4 }