examples/mnist/mnist_vanilla_k8s.ipynb

1040 lines
33 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# MNIST E2E on Kubeflow on Vanilla k8s\n",
"\n",
"This example guides you through:\n",
" \n",
" 1. Taking an example TensorFlow model and modifying it to support distributed training\n",
" 1. Serving the resulting model using TFServing\n",
" 1. Deploying and using a web-app that uses the model\n",
" \n",
"## Requirements\n",
"\n",
" * You must be running Kubeflow 1.0 using the k8s istio config or the istio dex config.\n",
" "
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"## Prepare model\n",
"\n",
"There is a delta between existing distributed mnist examples and what's needed to run well as a TFJob.\n",
"\n",
"Basically, we must:\n",
"\n",
"1. Add options in order to make the model configurable.\n",
"1. Use `tf.estimator.train_and_evaluate` to enable model exporting and serving.\n",
"1. Define serving signatures for model serving.\n",
"\n",
"The resulting model is [model.py](model.py)."
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"## Configure external service credentials\n",
"\n",
"\n",
"## Step 1 - Pushing to DockerHub\n",
"\n",
"Source documentation: [Kaniko docs](https://github.com/GoogleContainerTools/kaniko#pushing-to-docker-hub)\n",
"\n",
"### Why do we need this?\n",
"\n",
"Kaniko is used by fairing to build the model every time the notebook is run and deploy a fresh model.\n",
"The newly built image is pushed into the DOCKER_REGISTRY and pulled from there by subsequent resources.\n",
"\n",
"### Configure docker credentials\n",
"\n",
"Get your docker registry user and password encoded in base64 <br>\n",
"\n",
"`echo -n USER:PASSWORD | base64` <br>\n",
"\n",
"Create a config.json file with your Docker registry url and the previous generated base64 string <br>\n",
"```json\n",
"{\n",
"\t\"auths\": {\n",
"\t\t\"https://index.docker.io/v1/\": {\n",
"\t\t\t\"auth\": \"xxxxxxxxxxxxxxx\"\n",
"\t\t}\n",
"\t}\n",
"}\n",
"```\n",
"\n",
"<br>\n",
"\n",
"### Create a config-map in the namespace you're using with the docker config\n",
"\n",
"`kubectl create --namespace ${NAMESPACE} configmap docker-config --from-file=<path to config.json>`\n",
"\n",
"## Step 2 - Set DOCKER_REGISTRY\n",
"\n",
"The **DOCKER_REGISTRY** variable is used to push the newly built image. <br>\n",
"Please change the variable to the registry for which you've configured credentials."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from kubernetes import client as k8s_client\n",
"from kubernetes.client import rest as k8s_rest\n",
"from kubeflow import fairing \n",
"from kubeflow.fairing import utils as fairing_utils\n",
"from kubeflow.fairing.builders import append\n",
"from kubeflow.fairing.deployers import job\n",
"from kubeflow.fairing.preprocessors import base as base_preprocessor\n",
"\n",
"DOCKER_REGISTRY = \"ciscoai\"\n",
"namespace = fairing_utils.get_current_k8s_namespace()\n",
"\n",
"from kubernetes import client as k8s_client\n",
"from kubernetes.client.rest import ApiException\n",
"\n",
"api_client = k8s_client.CoreV1Api()\n",
"minio_service_endpoint = None\n",
"try:\n",
" minio_service_endpoint = api_client.read_namespaced_service(name='minio-service', namespace='kubeflow').spec.cluster_ip\n",
"except ApiException as e:\n",
" if e.status == 403:\n",
" logging.warning(f\"The service account doesn't have sufficient privileges \"\n",
" f\"to get the kubeflow minio-service. \"\n",
" f\"You will have to manually enter the minio cluster-ip. \"\n",
" f\"To make this function work ask someone with cluster \"\n",
" f\"priveleges to create an appropriate \"\n",
" f\"clusterrolebinding by running a command.\\n\"\n",
" f\"kubectl create --namespace=kubeflow rolebinding \"\n",
" \"--clusterrole=kubeflow-view \"\n",
" \"--serviceaccount=${NAMESPACE}:default-editor \"\n",
" \"${NAMESPACE}-minio-view\")\n",
" logging.error(\"API access denied with reason: {e.reason}\")\n",
"\n",
"s3_endpoint = minio_service_endpoint\n",
"minio_endpoint = \"http://\"+s3_endpoint\n",
"minio_username = \"minio\"\n",
"minio_key = \"minio123\"\n",
"minio_region = \"us-east-1\"\n",
"\n",
"logging.info(f\"Running in namespace {namespace}\")\n",
"logging.info(f\"Using docker registry {DOCKER_REGISTRY}\")\n",
"logging.info(f\"Using minio instance with endpoint '{s3_endpoint}'\")"
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"## Install Required Libraries\n",
"\n",
"Import the libraries required to train this model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"import os\n",
"import uuid\n",
"from importlib import reload\n",
"import notebook_setup\n",
"reload(notebook_setup)\n",
"notebook_setup.notebook_setup()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import k8s_util\n",
"# Force a reload of kubeflow; since kubeflow is a multi namespace module\n",
"# it looks like doing this in notebook_setup may not be sufficient\n",
"import kubeflow\n",
"reload(kubeflow)\n",
"from kubernetes import client as k8s_client\n",
"from kubernetes import config as k8s_config\n",
"from kubeflow.tfjob.api import tf_job_client as tf_job_client_module\n",
"from IPython.core.display import display, HTML\n",
"import yaml"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# TODO(https://github.com/kubeflow/fairing/issues/426): We should get rid of this once the default \n",
"# Kaniko image is updated to a newer image than 0.7.0.\n",
"from kubeflow.fairing import constants\n",
"constants.constants.KANIKO_IMAGE = \"gcr.io/kaniko-project/executor:v0.14.0\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from kubeflow.fairing.builders import cluster\n",
"\n",
"# output_map is a map of extra files to add to the notebook.\n",
"# It is a map from source location to the location inside the context.\n",
"output_map = {\n",
" \"Dockerfile.model\": \"Dockerfile\",\n",
" \"model.py\": \"model.py\"\n",
"}\n",
"\n",
"preprocessor = base_preprocessor.BasePreProcessor(\n",
" command=[\"python\"], # The base class will set this.\n",
" input_files=[],\n",
" path_prefix=\"/app\", # irrelevant since we aren't preprocessing any files\n",
" output_map=output_map)\n",
"\n",
"preprocessor.preprocess()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Use a Tensorflow image as the base image\n",
"# We use a custom Dockerfile \n",
"from kubeflow.fairing.cloud.k8s import MinioUploader\n",
"from kubeflow.fairing.builders.cluster.minio_context import MinioContextSource\n",
"\n",
"minio_uploader = MinioUploader(endpoint_url=minio_endpoint, minio_secret=minio_username, minio_secret_key=minio_key, region_name=minio_region)\n",
"minio_context_source = MinioContextSource(endpoint_url=minio_endpoint, minio_secret=minio_username, minio_secret_key=minio_key, region_name=minio_region)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cluster_builder = cluster.cluster.ClusterBuilder(registry=DOCKER_REGISTRY,\n",
" base_image=\"\", # base_image is set in the Dockerfile\n",
" preprocessor=preprocessor,\n",
" image_name=\"mnist\",\n",
" dockerfile_path=\"Dockerfile\",\n",
" context_source=minio_context_source)\n",
"cluster_builder.build()\n",
"logging.info(f\"Built image {cluster_builder.image_tag}\")"
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"## Create a Minio Bucket\n",
"\n",
"* Create a minio bucket to store our models and other results."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mnist_bucket = f\"{DOCKER_REGISTRY}-mnist\"\n",
"minio_uploader.create_bucket(mnist_bucket)\n",
"logging.info(f\"Bucket {mnist_bucket} created or already exists\")"
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"## Distributed training\n",
"\n",
"* We will train the model by using TFJob to run a distributed training job"
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"### Training job parameters"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train_name = f\"mnist-train-{uuid.uuid4().hex[:4]}\"\n",
"num_ps = 1\n",
"num_workers = 2\n",
"model_dir = f\"s3://{mnist_bucket}/mnist\"\n",
"export_path = f\"s3://{mnist_bucket}/mnist/export\" \n",
"train_steps = 200\n",
"batch_size = 100\n",
"learning_rate = .01\n",
"image = cluster_builder.image_tag"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train_spec = f\"\"\"apiVersion: kubeflow.org/v1\n",
"kind: TFJob\n",
"metadata:\n",
" name: {train_name} \n",
"spec:\n",
" tfReplicaSpecs:\n",
" Ps:\n",
" replicas: {num_ps}\n",
" template:\n",
" metadata:\n",
" annotations:\n",
" sidecar.istio.io/inject: \"false\"\n",
" spec:\n",
" serviceAccount: default-editor\n",
" containers:\n",
" - name: tensorflow\n",
" command:\n",
" - python\n",
" - /opt/model.py\n",
" - --tf-model-dir={model_dir}\n",
" - --tf-export-dir={export_path}\n",
" - --tf-train-steps={train_steps}\n",
" - --tf-batch-size={batch_size}\n",
" - --tf-learning-rate={learning_rate}\n",
" env:\n",
" - name: S3_ENDPOINT\n",
" value: {s3_endpoint}\n",
" - name: AWS_ENDPOINT_URL\n",
" value: {minio_endpoint}\n",
" - name: AWS_REGION\n",
" value: {minio_region}\n",
" - name: BUCKET_NAME\n",
" value: {mnist_bucket}\n",
" - name: S3_USE_HTTPS\n",
" value: \"0\"\n",
" - name: S3_VERIFY_SSL\n",
" value: \"0\"\n",
" - name: AWS_ACCESS_KEY_ID\n",
" value: {minio_username}\n",
" - name: AWS_SECRET_ACCESS_KEY\n",
" value: {minio_key}\n",
" image: {image}\n",
" workingDir: /opt\n",
" restartPolicy: OnFailure\n",
" Chief:\n",
" replicas: 1\n",
" template:\n",
" metadata:\n",
" annotations:\n",
" sidecar.istio.io/inject: \"false\"\n",
" spec:\n",
" serviceAccount: default-editor\n",
" containers:\n",
" - name: tensorflow\n",
" command:\n",
" - python\n",
" - /opt/model.py\n",
" - --tf-model-dir={model_dir}\n",
" - --tf-export-dir={export_path}\n",
" - --tf-train-steps={train_steps}\n",
" - --tf-batch-size={batch_size}\n",
" - --tf-learning-rate={learning_rate}\n",
" env:\n",
" - name: S3_ENDPOINT\n",
" value: {s3_endpoint}\n",
" - name: AWS_ENDPOINT_URL\n",
" value: {minio_endpoint}\n",
" - name: AWS_REGION\n",
" value: {minio_region}\n",
" - name: BUCKET_NAME\n",
" value: {mnist_bucket}\n",
" - name: S3_USE_HTTPS\n",
" value: \"0\"\n",
" - name: S3_VERIFY_SSL\n",
" value: \"0\"\n",
" - name: AWS_ACCESS_KEY_ID\n",
" value: {minio_username}\n",
" - name: AWS_SECRET_ACCESS_KEY\n",
" value: {minio_key}\n",
" image: {image}\n",
" workingDir: /opt\n",
" restartPolicy: OnFailure\n",
" Worker:\n",
" replicas: 1\n",
" template:\n",
" metadata:\n",
" annotations:\n",
" sidecar.istio.io/inject: \"false\"\n",
" spec:\n",
" serviceAccount: default-editor\n",
" containers:\n",
" - name: tensorflow\n",
" command:\n",
" - python\n",
" - /opt/model.py\n",
" - --tf-model-dir={model_dir}\n",
" - --tf-export-dir={export_path}\n",
" - --tf-train-steps={train_steps}\n",
" - --tf-batch-size={batch_size}\n",
" - --tf-learning-rate={learning_rate}\n",
" env:\n",
" - name: S3_ENDPOINT\n",
" value: {s3_endpoint}\n",
" - name: AWS_ENDPOINT_URL\n",
" value: {minio_endpoint}\n",
" - name: AWS_REGION\n",
" value: {minio_region}\n",
" - name: BUCKET_NAME\n",
" value: {mnist_bucket}\n",
" - name: S3_USE_HTTPS\n",
" value: \"0\"\n",
" - name: S3_VERIFY_SSL\n",
" value: \"0\"\n",
" - name: AWS_ACCESS_KEY_ID\n",
" value: {minio_username}\n",
" - name: AWS_SECRET_ACCESS_KEY\n",
" value: {minio_key}\n",
" image: {image}\n",
" workingDir: /opt\n",
" restartPolicy: OnFailure\n",
"\"\"\" "
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"### Create the training job\n",
"\n",
"* You could write the spec to a YAML file and then do `kubectl apply -f {FILE}`\n",
"* Since you are running in jupyter you will use the TFJob client\n",
"* You will run the TFJob in a namespace created by a Kubeflow profile\n",
" * The namespace will be the same namespace you are running the notebook in\n",
" * Creating a profile ensures the namespace is provisioned with service accounts and other resources needed for Kubeflow"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tf_job_client = tf_job_client_module.TFJobClient()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tf_job_body = yaml.safe_load(train_spec)\n",
"tf_job = tf_job_client.create(tf_job_body, namespace=namespace) \n",
"\n",
"logging.info(f\"Created job {namespace}.{train_name}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from kubeflow.tfjob import TFJobClient\n",
"tfjob_client = TFJobClient()\n",
"tfjob_client.wait_for_job(train_name, namespace=namespace, watch=True)"
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"## Get TF Job logs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tfjob_client.get_logs(train_name, namespace=namespace)"
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"## Check the model in Minio"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#TODO(swiftdiaries): Check object key for model specifically\n",
"from botocore.exceptions import ClientError\n",
"\n",
"try:\n",
" model_response = minio_uploader.client.list_objects(Bucket=mnist_bucket)\n",
" # Minimal check to see if at least the bucket is created\n",
" if model_response[\"ResponseMetadata\"][\"HTTPStatusCode\"] == 200:\n",
" logging.info(f\"{model_dir} found in {mnist_bucket} bucket\")\n",
"except ClientError as err:\n",
" logging.error(err)"
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"## Deploy Tensorboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tb_name = \"mnist-tensorboard\"\n",
"tb_deploy = f\"\"\"apiVersion: apps/v1\n",
"kind: Deployment\n",
"metadata:\n",
" labels:\n",
" app: mnist-tensorboard\n",
" name: {tb_name}\n",
" namespace: {namespace}\n",
"spec:\n",
" selector:\n",
" matchLabels:\n",
" app: mnist-tensorboard\n",
" template:\n",
" metadata:\n",
" labels:\n",
" app: mnist-tensorboard\n",
" version: v1\n",
" spec:\n",
" serviceAccount: default-editor\n",
" containers:\n",
" - command:\n",
" - /usr/local/bin/tensorboard\n",
" - --logdir={model_dir}\n",
" - --port=80\n",
" image: tensorflow/tensorflow:1.15.2-py3\n",
" env:\n",
" - name: S3_ENDPOINT\n",
" value: {s3_endpoint}\n",
" - name: AWS_ENDPOINT_URL\n",
" value: {minio_endpoint}\n",
" - name: AWS_REGION\n",
" value: {minio_region}\n",
" - name: BUCKET_NAME\n",
" value: {mnist_bucket}\n",
" - name: S3_USE_HTTPS\n",
" value: \"0\"\n",
" - name: S3_VERIFY_SSL\n",
" value: \"0\"\n",
" - name: AWS_ACCESS_KEY_ID\n",
" value: {minio_username}\n",
" - name: AWS_SECRET_ACCESS_KEY\n",
" value: {minio_key} \n",
" name: tensorboard\n",
" ports:\n",
" - containerPort: 80\n",
"\"\"\"\n",
"tb_service = f\"\"\"apiVersion: v1\n",
"kind: Service\n",
"metadata:\n",
" labels:\n",
" app: mnist-tensorboard\n",
" name: {tb_name}\n",
" namespace: {namespace}\n",
"spec:\n",
" ports:\n",
" - name: http-tb\n",
" port: 80\n",
" targetPort: 80\n",
" selector:\n",
" app: mnist-tensorboard\n",
" type: ClusterIP\n",
"\"\"\"\n",
"\n",
"tb_virtual_service = f\"\"\"apiVersion: networking.istio.io/v1alpha3\n",
"kind: VirtualService\n",
"metadata:\n",
" name: {tb_name}\n",
" namespace: {namespace}\n",
"spec:\n",
" gateways:\n",
" - kubeflow/kubeflow-gateway\n",
" hosts:\n",
" - '*'\n",
" http:\n",
" - match:\n",
" - uri:\n",
" prefix: /mnist/{namespace}/tensorboard/\n",
" rewrite:\n",
" uri: /\n",
" route:\n",
" - destination:\n",
" host: {tb_name}.{namespace}.svc.cluster.local\n",
" port:\n",
" number: 80\n",
" timeout: 300s\n",
"\"\"\"\n",
"\n",
"tb_specs = [tb_deploy, tb_service, tb_virtual_service]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"k8s_util.apply_k8s_specs(tb_specs, k8s_util.K8S_CREATE_OR_REPLACE)"
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"## Get Tensorboard URL"
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Run this with the appropriate RBAC permissions <br>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"istio_ingress_endpoint = None\n",
"try:\n",
" istio_ingress_endpoint = api_client.read_namespaced_service(name='istio-ingressgateway', namespace='istio-system')\n",
" istio_ports = istio_ingress_endpoint.spec.ports\n",
" for istio_port in istio_ports:\n",
" if istio_port.name == \"http2\":\n",
" logging.warning(\"get worker-node-ip by running 'kubectl get nodes -o wide'\")\n",
" logging.info(f\"Tensorboard URL: <worker-node-ip>:{istio_port.node_port}/mnist/anonymous/tensorboard/\")\n",
"except ApiException as e:\n",
" if e.status == 403:\n",
" logging.warning(f\"The service account doesn't have sufficient privileges \"\n",
" f\"to get the kubeflow minio-service. \"\n",
" f\"You will have to manually enter the minio cluster-ip. \"\n",
" f\"To make this function work ask someone with cluster \"\n",
" f\"priveleges to create an appropriate \"\n",
" f\"clusterrolebinding by running a command.\\n\"\n",
" f\"kubectl create --namespace=istio-system rolebinding \"\n",
" \"--clusterrole=kubeflow-view \"\n",
" \"--serviceaccount=${NAMESPACE}:default-editor \"\n",
" \"${NAMESPACE}-ingressgateway-view\")\n",
" logging.warn(\"API Access restricted. Please get URL by running the kubectl commands at the end of the notebook\")"
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"## Serve the model\n",
"\n",
"* Deploy the model using tensorflow serving\n",
"* We need to create\n",
" 1. A Kubernetes Deployment\n",
" 1. A Kubernetes service\n",
" 1. (Optional) Create a configmap containing the prometheus monitoring config"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"deploy_name = \"mnist-model\"\n",
"model_base_path = export_path\n",
"\n",
"# The web ui defaults to mnist-service so if you change it you will\n",
"# need to change it in the UI as well to send predictions to the mode\n",
"model_service = \"mnist-service\"\n",
"\n",
"deploy_spec = f\"\"\"apiVersion: apps/v1\n",
"kind: Deployment\n",
"metadata:\n",
" labels:\n",
" app: mnist\n",
" name: {deploy_name}\n",
" namespace: {namespace}\n",
"spec:\n",
" selector:\n",
" matchLabels:\n",
" app: mnist-model\n",
" template:\n",
" metadata:\n",
" # TODO(jlewi): Right now we disable the istio side car because otherwise ISTIO rbac will prevent the\n",
" # UI from sending RPCs to the server. We should create an appropriate ISTIO rbac authorization\n",
" # policy to allow traffic from the UI to the model servier.\n",
" # https://istio.io/docs/concepts/security/#target-selectors\n",
" annotations: \n",
" sidecar.istio.io/inject: \"false\"\n",
" labels:\n",
" app: mnist-model\n",
" version: v1\n",
" spec:\n",
" serviceAccount: default-editor\n",
" containers:\n",
" - args:\n",
" - --port=9000\n",
" - --rest_api_port=8500\n",
" - --model_name=mnist\n",
" - --model_base_path={model_base_path}\n",
" command:\n",
" - /usr/bin/tensorflow_model_server\n",
" env:\n",
" - name: modelBasePath\n",
" value: {model_base_path}\n",
" - name: S3_ENDPOINT\n",
" value: {s3_endpoint}\n",
" - name: AWS_ENDPOINT_URL\n",
" value: {minio_endpoint}\n",
" - name: AWS_REGION\n",
" value: {minio_region}\n",
" - name: BUCKET_NAME\n",
" value: {mnist_bucket}\n",
" - name: S3_USE_HTTPS\n",
" value: \"0\"\n",
" - name: S3_VERIFY_SSL\n",
" value: \"0\"\n",
" - name: AWS_ACCESS_KEY_ID\n",
" value: {minio_username}\n",
" - name: AWS_SECRET_ACCESS_KEY\n",
" value: {minio_key} \n",
" image: tensorflow/serving:1.15.0\n",
" imagePullPolicy: IfNotPresent\n",
" livenessProbe:\n",
" initialDelaySeconds: 30\n",
" periodSeconds: 30\n",
" tcpSocket:\n",
" port: 9000\n",
" name: mnist\n",
" ports:\n",
" - containerPort: 9000\n",
" - containerPort: 8500\n",
" resources:\n",
" limits:\n",
" cpu: \"4\"\n",
" memory: 4Gi\n",
" requests:\n",
" cpu: \"1\"\n",
" memory: 1Gi\n",
" volumeMounts:\n",
" - mountPath: /var/config/\n",
" name: model-config\n",
" volumes:\n",
" - configMap:\n",
" name: {deploy_name}\n",
" name: model-config\n",
"\"\"\"\n",
"\n",
"service_spec = f\"\"\"apiVersion: v1\n",
"kind: Service\n",
"metadata:\n",
" annotations: \n",
" prometheus.io/path: /monitoring/prometheus/metrics\n",
" prometheus.io/port: \"8500\"\n",
" prometheus.io/scrape: \"true\"\n",
" labels:\n",
" app: mnist-model\n",
" name: {model_service}\n",
" namespace: {namespace}\n",
"spec:\n",
" ports:\n",
" - name: grpc-tf-serving\n",
" port: 9000\n",
" targetPort: 9000\n",
" - name: http-tf-serving\n",
" port: 8500\n",
" targetPort: 8500\n",
" selector:\n",
" app: mnist-model\n",
" type: ClusterIP\n",
"\"\"\"\n",
"\n",
"monitoring_config = f\"\"\"kind: ConfigMap\n",
"apiVersion: v1\n",
"metadata:\n",
" name: {deploy_name}\n",
" namespace: {namespace}\n",
"data:\n",
" monitoring_config.txt: |-\n",
" prometheus_config: {{\n",
" enable: true,\n",
" path: \"/monitoring/prometheus/metrics\"\n",
" }}\n",
"\"\"\"\n",
"\n",
"model_specs = [deploy_spec, service_spec, monitoring_config]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"k8s_util.apply_k8s_specs(model_specs, k8s_util.K8S_CREATE_OR_REPLACE) "
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"## Deploy the mnist UI\n",
"\n",
"* We will now deploy the UI to visual the mnist results\n",
"* Note: This is using a prebuilt and public docker image for the UI"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ui_name = \"mnist-ui\"\n",
"ui_deploy = f\"\"\"apiVersion: apps/v1\n",
"kind: Deployment\n",
"metadata:\n",
" name: {ui_name}\n",
" namespace: {namespace}\n",
"spec:\n",
" replicas: 1\n",
" selector:\n",
" matchLabels:\n",
" app: mnist-web-ui\n",
" template:\n",
" metadata:\n",
" labels:\n",
" app: mnist-web-ui\n",
" spec:\n",
" containers:\n",
" - image: gcr.io/kubeflow-examples/mnist/web-ui:v20190112-v0.2-142-g3b38225\n",
" name: web-ui\n",
" ports:\n",
" - containerPort: 5000 \n",
" serviceAccount: default-editor\n",
"\"\"\"\n",
"\n",
"ui_service = f\"\"\"apiVersion: v1\n",
"kind: Service\n",
"metadata:\n",
" annotations:\n",
" name: {ui_name}\n",
" namespace: {namespace}\n",
"spec:\n",
" ports:\n",
" - name: http-mnist-ui\n",
" port: 80\n",
" targetPort: 5000\n",
" selector:\n",
" app: mnist-web-ui\n",
" type: ClusterIP\n",
"\"\"\"\n",
"\n",
"ui_virtual_service = f\"\"\"apiVersion: networking.istio.io/v1alpha3\n",
"kind: VirtualService\n",
"metadata:\n",
" name: {ui_name}\n",
" namespace: {namespace}\n",
"spec:\n",
" gateways:\n",
" - kubeflow/kubeflow-gateway\n",
" hosts:\n",
" - '*'\n",
" http:\n",
" - match:\n",
" - uri:\n",
" prefix: /mnist/{namespace}/ui/\n",
" rewrite:\n",
" uri: /\n",
" route:\n",
" - destination:\n",
" host: {ui_name}.{namespace}.svc.cluster.local\n",
" port:\n",
" number: 80\n",
" timeout: 300s\n",
"\"\"\"\n",
"\n",
"ui_specs = [ui_deploy, ui_service, ui_virtual_service]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"k8s_util.apply_k8s_specs(ui_specs, k8s_util.K8S_CREATE_OR_REPLACE) "
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"## Access the web UI\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"istio_ingress_endpoint = None\n",
"try:\n",
" istio_ingress_endpoint = api_client.read_namespaced_service(name='istio-ingressgateway', namespace='istio-system')\n",
" istio_ports = istio_ingress_endpoint.spec.ports\n",
" for istio_port in istio_ports:\n",
" if istio_port.name == \"http2\":\n",
" logging.warning(\"get worker-node-ip by running 'kubectl get nodes -o wide'\")\n",
" logging.info(f\"Tensorboard URL: <worker-node-ip>:{istio_port.node_port}/mnist/anonymous/ui/\")\n",
"except ApiException as e:\n",
" if e.status == 403:\n",
" logging.warning(f\"The service account doesn't have sufficient privileges \"\n",
" f\"to get the kubeflow minio-service. \"\n",
" f\"You will have to manually enter the minio cluster-ip. \"\n",
" f\"To make this function work ask someone with cluster \"\n",
" f\"priveleges to create an appropriate \"\n",
" f\"clusterrolebinding by running a command.\\n\"\n",
" f\"kubectl create --namespace=kubeflow rolebinding \"\n",
" \"--clusterrole=kubeflow-view \"\n",
" \"--serviceaccount=${NAMESPACE}:default-editor \"\n",
" \"${NAMESPACE}-minio-view\")\n",
" logging.warn(\"API Access restricted. Please get URL by running the kubectl commands at the end of the notebook\")"
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"## Get Tensorboard URL\n",
"\n",
"Run this with the appropriate RBAC permissions <br>\n",
"**Note:** You can get the node worker ip from `kubectl get no -o wide` <br>\n",
"```bash\n",
"export INGRESS_HOST=<worker-node-ip>\n",
"export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name==\"http2\")].nodePort}')\n",
"printf \"Tensorboard URL: \\n${INGRESS_HOST}:${INGRESS_PORT}/mnist/anonymous/tensorboard/\\n\"\n",
"```"
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"## Access the Web UI\n",
"\n",
"Run this with the appropriate RBAC permissions <br>\n",
"**Note:** You can get the node worker ip from `kubectl get no -o wide` <br>\n",
"```bash\n",
"!export INGRESS_HOST=<worker-node-ip>\n",
"!export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name==\"http2\")].nodePort}')\n",
"!printf \"mnist-web-app URL: \\n${INGRESS_HOST}:${INGRESS_PORT}/mnist/anonymous/ui/\\n\"\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
}
},
"nbformat": 4,
"nbformat_minor": 4
}