Kaggle notebook to kfp pipeline (#940)

* Create README.md * kaggle to kfp * Create README.md * Update README.md * Add files via upload * Update README.md * Update README.md * Update README.md * kaggle to kfp
2022-04-26 06:46:11 +01:00 · 2022-04-26 06:46:11 +01:00 · 639f84a9d6
parent 68477e3bbf
commit 639f84a9d6
4 changed files with 765 additions and 0 deletions
--- a/digit_recognition/README.md
+++ b/digit_recognition/README.md
@ -0,0 +1,30 @@
+# Objective
+Here we convert the https://www.kaggle.com/competitions/digit-recognizer code to kfp-pipeline 
+The objective of this task is tois to correctly identify digits from a dataset of tens of thousands of handwritten images.
+
+# Testing environment
+| Name        | version           | 
+| ------------- |:-------------:|
+| Kubeflow      | v1     |
+| kfp           | 1.8.11 |
+
+
+Kfp version used for testing can be installed as `pip install kfp==1.8.11`  
+
+# Components used
+
+## kubeflow lightweight component method
+Here, a python function is created to carry out a certain task and the python function is passed inside kfp component method`create_component_from_func`. 
+
+
+## Kubeflow pipelines
+Kubeflow pipelines connect each components according to how they were passed and creates a pipeline. The kfp `dsl.pipeline` method was used to create a pipeline function. The kkfp component method `InputPath` and `OutputPath` was used to pass data amongst component. 
+
+Finally, the  `create_run_from_pipeline_func` was used to submit pipeline directly from pipeline function
+
+## To create pipeline
+1. Navigate to `data` directory, download compressed kaggle data and put your `training.zip` and `test.zip` data in the data folder.
+2. Open your setup kubeflow cluster and create a notebook server and connect to it.
+3. Clone this repo and navigate to this directory
+4. run the kfp-digit-recognizer notebook from start to finish
+5. View run details immediately after submitting pipeline.
--- a/digit_recognition/data/README.md
+++ b/digit_recognition/data/README.md
@ -0,0 +1,16 @@
+# Objective
+Download compressed data from https://www.kaggle.com/competitions/digit-recognizer/data and store here
+
+replace download link with the repo link where the data is stored https://github-repo/data-dir/{file}.csv.zip?raw=true
+<p>
+<img src="https://github.com/josepholaide/examples/blob/master/digit_recognition/data/img1.PNG?raw=true" alt="kubeflow pipeline" width="850" height="250"/>
+ </p>
+
+# Testing environment
+| Name        | version           | 
+| ------------- |:-------------:|
+| Kubeflow      | v1     |
+| kfp           | 1.8.11 |
+
+
+Kfp version used for testing can be installed as `pip install kfp==1.8.11`  
--- a/digit_recognition/data/img1.PNG
+++ b/digit_recognition/data/img1.PNG
--- a/digit_recognition/digit-recognizer-kfp-pipeline.ipynb
+++ b/digit_recognition/digit-recognizer-kfp-pipeline.ipynb
@ -0,0 +1,719 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "# Digit Recognizer Kubeflow Pipeline\n",
+    "\n",
+    "In this [Kaggle competition](https://www.kaggle.com/competitions/digit-recognizer/overview) \n",
+    "\n",
+    ">MNIST (\"Modified National Institute of Standards and Technology\") is the de facto “hello world” dataset of computer vision. Since its release in 1999, this classic dataset of handwritten images has served as the basis for benchmarking classification algorithms. As new machine learning techniques emerge, MNIST remains a reliable resource for researchers and learners alike.\n",
+    "\n",
+    ">In this competition, your goal is to correctly identify digits from a dataset of tens of thousands of handwritten images."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "# Install relevant libraries\n",
+    "\n",
+    "\n",
+    ">Update pip `pip install --user --upgrade pip`\n",
+    "\n",
+    ">Install and upgrade kubeflow sdk `pip install kfp --upgrade --user --quiet`\n",
+    "\n",
+    "You may need to restart your notebook kernel after installing the kfp sdk"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {
+    "id": "38y13drotnXK",
+    "outputId": "61184254-57cb-4f29-c0e5-dba30df0c914",
+    "tags": [
+     "imports"
+    ]
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Requirement already satisfied: pip in /usr/local/lib/python3.6/dist-packages (21.3.1)\n"
+     ]
+    }
+   ],
+   "source": [
+    "!pip install --user --upgrade pip"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {
+    "id": "6a8V8LN9ttJT"
+   },
+   "outputs": [],
+   "source": [
+    "!pip install kfp --upgrade --user --quiet"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {
+    "id": "taqx2u69tnXS",
+    "outputId": "32ec75b1-7d1e-434e-8834-aa841fecd4d5"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Name: kfp\n",
+      "Version: 1.8.11\n",
+      "Summary: KubeFlow Pipelines SDK\n",
+      "Home-page: https://github.com/kubeflow/pipelines\n",
+      "Author: The Kubeflow Authors\n",
+      "Author-email: \n",
+      "License: UNKNOWN\n",
+      "Location: /home/jovyan/.local/lib/python3.6/site-packages\n",
+      "Requires: absl-py, click, cloudpickle, dataclasses, Deprecated, docstring-parser, fire, google-api-python-client, google-auth, google-cloud-storage, jsonschema, kfp-pipeline-spec, kfp-server-api, kubernetes, protobuf, pydantic, PyYAML, requests-toolbelt, strip-hints, tabulate, typer, typing-extensions, uritemplate\n",
+      "Required-by: kubeflow-kale\n"
+     ]
+    }
+   ],
+   "source": [
+    "# confirm the kfp sdk\n",
+    "! pip show kfp"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "4B2CjdRcuRgT"
+   },
+   "source": [
+    "## Import kubeflow pipeline libraries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {
+    "id": "_8P4-rCDtnXT"
+   },
+   "outputs": [],
+   "source": [
+    "import kfp\n",
+    "import kfp.components as comp\n",
+    "import kfp.dsl as dsl\n",
+    "from kfp.components import InputPath, OutputPath\n",
+    "from typing import NamedTuple\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "NWPLyw_DuzSl"
+   },
+   "source": [
+    "## Kubeflow pipeline component creation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "98g9LoIcuaPB"
+   },
+   "source": [
+    "Component 1: Download the digits Dataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {
+    "id": "lGK3hlXdtnXV"
+   },
+   "outputs": [],
+   "source": [
+    "# download data step\n",
+    "def download_data(download_link: str, data_path: OutputPath(str)):\n",
+    "    import zipfile\n",
+    "    import sys, subprocess;\n",
+    "    subprocess.run([\"python\", \"-m\", \"pip\", \"install\", \"--upgrade\", \"pip\"])\n",
+    "    subprocess.run([sys.executable, \"-m\", \"pip\", \"install\", \"wget\"])\n",
+    "    import wget\n",
+    "    import os\n",
+    "\n",
+    "    if not os.path.exists(data_path):\n",
+    "        os.makedirs(data_path)\n",
+    "\n",
+    "    # download files\n",
+    "    wget.download(download_link.format(file='train'), f'{data_path}/train_csv.zip')\n",
+    "    wget.download(download_link.format(file='test'), f'{data_path}/test_csv.zip')\n",
+    "    \n",
+    "    with zipfile.ZipFile(f\"{data_path}/train_csv.zip\",\"r\") as zip_ref:\n",
+    "        zip_ref.extractall(data_path)\n",
+    "        \n",
+    "    with zipfile.ZipFile(f\"{data_path}/test_csv.zip\",\"r\") as zip_ref:\n",
+    "        zip_ref.extractall(data_path)\n",
+    "    \n",
+    "    return(print('Done!'))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "J9Yhdk3xudcn"
+   },
+   "source": [
+    "Component 2: load the digits Dataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {
+    "id": "oEdsQpH2tnXX"
+   },
+   "outputs": [],
+   "source": [
+    "# load data\n",
+    "\n",
+    "def load_data(data_path: InputPath(str), \n",
+    "              load_data_path: OutputPath(str)):\n",
+    "    \n",
+    "    # import Library\n",
+    "    import sys, subprocess;\n",
+    "    subprocess.run([\"python\", \"-m\", \"pip\", \"install\", \"--upgrade\", \"pip\"])\n",
+    "    subprocess.run([sys.executable, '-m', 'pip', 'install','pandas'])\n",
+    "    # import Library\n",
+    "    import os, pickle;\n",
+    "    import pandas as pd\n",
+    "    import numpy as np\n",
+    "\n",
+    "    #importing the data\n",
+    "    # Data Path\n",
+    "    train_data_path = data_path + '/train.csv'\n",
+    "    test_data_path = data_path + '/test.csv'\n",
+    "\n",
+    "    # Loading dataset into pandas \n",
+    "    train_df = pd.read_csv(train_data_path)\n",
+    "    test_df = pd.read_csv(test_data_path)\n",
+    "    \n",
+    "    # join train and test together\n",
+    "    ntrain = train_df.shape[0]\n",
+    "    ntest = test_df.shape[0]\n",
+    "    all_data = pd.concat((train_df, test_df)).reset_index(drop=True)\n",
+    "    print(\"all_data size is : {}\".format(all_data.shape))\n",
+    "    \n",
+    "    #creating the preprocess directory\n",
+    "    os.makedirs(load_data_path, exist_ok = True)\n",
+    "    \n",
+    "    #Save the combined_data as a pickle file to be used by the preprocess component.\n",
+    "    with open(f'{load_data_path}/all_data', 'wb') as f:\n",
+    "        pickle.dump((ntrain, all_data), f)\n",
+    "    \n",
+    "    return(print('Done!'))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "Component 3: Preprocess the digits Dataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# preprocess data\n",
+    "\n",
+    "def preprocess_data(load_data_path: InputPath(str), \n",
+    "                    preprocess_data_path: OutputPath(str)):\n",
+    "    \n",
+    "    # import Library\n",
+    "    import sys, subprocess;\n",
+    "    subprocess.run([\"python\", \"-m\", \"pip\", \"install\", \"--upgrade\", \"pip\"])\n",
+    "    subprocess.run([sys.executable, '-m', 'pip', 'install','pandas'])\n",
+    "    subprocess.run([sys.executable, '-m', 'pip', 'install','scikit-learn'])\n",
+    "    import os, pickle;\n",
+    "    import pandas as pd\n",
+    "    import numpy as np\n",
+    "    from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "    #loading the train data\n",
+    "    with open(f'{load_data_path}/all_data', 'rb') as f:\n",
+    "        ntrain, all_data = pickle.load(f)\n",
+    "    \n",
+    "    # split features and label\n",
+    "    all_data_X = all_data.drop('label', axis=1)\n",
+    "    all_data_y = all_data.label\n",
+    "    \n",
+    "    # Reshape image in 3 dimensions (height = 28px, width = 28px , channel = 1)\n",
+    "    all_data_X = all_data_X.values.reshape(-1,28,28,1)\n",
+    "\n",
+    "    # Normalize the data\n",
+    "    all_data_X = all_data_X / 255.0\n",
+    "    \n",
+    "    #Get the new dataset\n",
+    "    X = all_data_X[:ntrain].copy()\n",
+    "    y = all_data_y[:ntrain].copy()\n",
+    "    \n",
+    "    # split into train and test\n",
+    "    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)\n",
+    "    \n",
+    "    #creating the preprocess directory\n",
+    "    os.makedirs(preprocess_data_path, exist_ok = True)\n",
+    "    \n",
+    "    #Save the train_data as a pickle file to be used by the modelling component.\n",
+    "    with open(f'{preprocess_data_path}/train', 'wb') as f:\n",
+    "        pickle.dump((X_train,  y_train), f)\n",
+    "        \n",
+    "    #Save the test_data as a pickle file to be used by the predict component.\n",
+    "    with open(f'{preprocess_data_path}/test', 'wb') as f:\n",
+    "        pickle.dump((X_test,  y_test), f)\n",
+    "    \n",
+    "    return(print('Done!'))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "Component 4: ML modeling"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "metadata": {
+    "id": "8KGMGMEmtnXZ"
+   },
+   "outputs": [],
+   "source": [
+    "def modeling(preprocess_data_path: InputPath(str), \n",
+    "            model_path: OutputPath(str)):\n",
+    "    \n",
+    "    # import Library\n",
+    "    import sys, subprocess;\n",
+    "    subprocess.run([\"python\", \"-m\", \"pip\", \"install\", \"--upgrade\", \"pip\"])\n",
+    "    subprocess.run([sys.executable, '-m', 'pip', 'install','pandas'])\n",
+    "    subprocess.run([sys.executable, '-m', 'pip', 'install','tensorflow'])\n",
+    "    import os, pickle;\n",
+    "    import numpy as np\n",
+    "    import tensorflow as tf\n",
+    "    from tensorflow import keras, optimizers\n",
+    "    from tensorflow.keras.metrics import SparseCategoricalAccuracy\n",
+    "    from tensorflow.keras.losses import SparseCategoricalCrossentropy\n",
+    "    from tensorflow.keras import layers\n",
+    "\n",
+    "    #loading the train data\n",
+    "    with open(f'{preprocess_data_path}/train', 'rb') as f:\n",
+    "        train_data = pickle.load(f)\n",
+    "        \n",
+    "    # Separate the X_train from y_train.\n",
+    "    X_train, y_train = train_data\n",
+    "    \n",
+    "    #initializing the classifier model with its input, hidden and output layers\n",
+    "    hidden_dim1=56\n",
+    "    hidden_dim2=100\n",
+    "    DROPOUT=0.5\n",
+    "    model = tf.keras.Sequential([\n",
+    "            tf.keras.layers.Conv2D(filters = hidden_dim1, kernel_size = (5,5),padding = 'Same', \n",
+    "                         activation ='relu'),\n",
+    "            tf.keras.layers.Dropout(DROPOUT),\n",
+    "            tf.keras.layers.Conv2D(filters = hidden_dim2, kernel_size = (3,3),padding = 'Same', \n",
+    "                         activation ='relu'),\n",
+    "            tf.keras.layers.Dropout(DROPOUT),\n",
+    "            tf.keras.layers.Conv2D(filters = hidden_dim2, kernel_size = (3,3),padding = 'Same', \n",
+    "                         activation ='relu'),\n",
+    "            tf.keras.layers.Dropout(DROPOUT),\n",
+    "            tf.keras.layers.Flatten(),\n",
+    "            tf.keras.layers.Dense(10, activation = \"softmax\")\n",
+    "            ])\n",
+    "\n",
+    "    model.build(input_shape=(None,28,28,1))\n",
+    "    \n",
+    "    #Compiling the classifier model with Adam optimizer\n",
+    "    model.compile(optimizers.Adam(learning_rate=0.001), \n",
+    "              loss=SparseCategoricalCrossentropy(), \n",
+    "              metrics=SparseCategoricalAccuracy(name='accuracy'))\n",
+    "\n",
+    "    # model fitting\n",
+    "    history = model.fit(np.array(X_train), np.array(y_train),\n",
+    "              validation_split=.1, epochs=1, batch_size=64)\n",
+    "    \n",
+    "    #loading the X_test and y_test\n",
+    "    with open(f'{preprocess_data_path}/test', 'rb') as f:\n",
+    "        test_data = pickle.load(f)\n",
+    "    # Separate the X_test from y_test.\n",
+    "    X_test, y_test = test_data\n",
+    "    \n",
+    "    # Evaluate the model and print the results\n",
+    "    test_loss, test_acc = model.evaluate(np.array(X_test),  np.array(y_test), verbose=0)\n",
+    "    print(\"Test_loss: {}, Test_accuracy: {} \".format(test_loss,test_acc))\n",
+    "    \n",
+    "    #creating the preprocess directory\n",
+    "    os.makedirs(model_path, exist_ok = True)\n",
+    "      \n",
+    "    #saving the model\n",
+    "    model.save(f'{model_path}/model.h5')    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "XwXJuoHQui3d"
+   },
+   "source": [
+    "Component 5: Prediction and evaluation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def prediction(model_path: InputPath(str), \n",
+    "                preprocess_data_path: InputPath(str), \n",
+    "                mlpipeline_ui_metadata_path: OutputPath(str)) -> NamedTuple('conf_m_result', [('mlpipeline_ui_metadata', 'UI_metadata')]):\n",
+    "    \n",
+    "    # import Library\n",
+    "    import sys, subprocess;\n",
+    "    subprocess.run([\"python\", \"-m\", \"pip\", \"install\", \"--upgrade\", \"pip\"])\n",
+    "    subprocess.run([sys.executable, '-m', 'pip', 'install','scikit-learn'])\n",
+    "    subprocess.run([sys.executable, '-m', 'pip', 'install','pandas'])\n",
+    "    subprocess.run([sys.executable, '-m', 'pip', 'install','tensorflow'])\n",
+    "    import pickle, json;\n",
+    "    import pandas as  pd\n",
+    "    import numpy as np\n",
+    "    from collections import namedtuple\n",
+    "    from sklearn.metrics import confusion_matrix\n",
+    "    from tensorflow.keras.models import load_model\n",
+    "\n",
+    "    #loading the X_test and y_test\n",
+    "    with open(f'{preprocess_data_path}/test', 'rb') as f:\n",
+    "        test_data = pickle.load(f)\n",
+    "    # Separate the X_test from y_test.\n",
+    "    X_test, y_test = test_data\n",
+    "    \n",
+    "    #loading the model\n",
+    "    model = load_model(f'{model_path}/model.h5')\n",
+    "    \n",
+    "    # prediction\n",
+    "    y_pred = np.argmax(model.predict(X_test), axis=-1)\n",
+    "    \n",
+    "    # confusion matrix\n",
+    "    cm = confusion_matrix(y_test, y_pred)\n",
+    "    vocab = list(np.unique(y_test))\n",
+    "    \n",
+    "    # confusion_matrix pair dataset \n",
+    "    data = []\n",
+    "    for target_index, target_row in enumerate(cm):\n",
+    "        for predicted_index, count in enumerate(target_row):\n",
+    "            data.append((vocab[target_index], vocab[predicted_index], count))\n",
+    "    \n",
+    "    # convert confusion_matrix pair dataset to dataframe\n",
+    "    df = pd.DataFrame(data,columns=['target','predicted','count'])\n",
+    "    \n",
+    "    # change 'target', 'predicted' to integer strings\n",
+    "    df[['target', 'predicted']] = (df[['target', 'predicted']].astype(int)).astype(str)\n",
+    "    \n",
+    "    # create kubeflow metric metadata for UI\n",
+    "    metadata = {\n",
+    "        \"outputs\": [\n",
+    "            {\n",
+    "                \"type\": \"confusion_matrix\",\n",
+    "                \"format\": \"csv\",\n",
+    "                \"schema\": [\n",
+    "                    {\n",
+    "                        \"name\": \"target\",\n",
+    "                        \"type\": \"CATEGORY\"\n",
+    "                    },\n",
+    "                    {\n",
+    "                        \"name\": \"predicted\",\n",
+    "                        \"type\": \"CATEGORY\"\n",
+    "                    },\n",
+    "                    {\n",
+    "                        \"name\": \"count\",\n",
+    "                        \"type\": \"NUMBER\"\n",
+    "                    }\n",
+    "                ],\n",
+    "                \"source\": df.to_csv(header=False, index=False),\n",
+    "                \"storage\": \"inline\",\n",
+    "                \"labels\": [\n",
+    "                    \"0\",\n",
+    "                    \"1\",\n",
+    "                    \"2\",\n",
+    "                    \"3\",\n",
+    "                    \"4\",\n",
+    "                    \"5\",\n",
+    "                    \"6\",\n",
+    "                    \"7\",\n",
+    "                    \"8\",\n",
+    "                    \"9\",\n",
+    "                ]\n",
+    "            }\n",
+    "        ]\n",
+    "    }\n",
+    "    \n",
+    "    with open(mlpipeline_ui_metadata_path, 'w') as metadata_file:\n",
+    "        json.dump(metadata, metadata_file)\n",
+    "\n",
+    "    conf_m_result = namedtuple('conf_m_result', ['mlpipeline_ui_metadata'])\n",
+    "    \n",
+    "    return conf_m_result(json.dumps(metadata))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create light weight components\n",
+    "download_op = comp.create_component_from_func(download_data,base_image=\"python:3.7.1\")\n",
+    "load_op = comp.create_component_from_func(load_data,base_image=\"python:3.7.1\")\n",
+    "preprocess_op = comp.create_component_from_func(preprocess_data,base_image=\"python:3.7.1\")\n",
+    "modeling_op = comp.create_component_from_func(modeling, base_image=\"tensorflow/tensorflow:latest\")\n",
+    "predict_op = comp.create_component_from_func(prediction, base_image=\"tensorflow/tensorflow:latest\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "Create kubeflow pipeline components from images"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "v3bNFpBOuuG-"
+   },
+   "source": [
+    "## Kubeflow pipeline creation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {
+    "id": "LVbUms_ptnXc",
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# create client that would enable communication with the Pipelines API server \n",
+    "client = kfp.Client()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "metadata": {
+    "id": "ECZRaIgCtnXd",
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# define pipeline\n",
+    "@dsl.pipeline(name=\"digit-recognizer-pipeline\", \n",
+    "              description=\"Performs Preprocessing, training and prediction of digits\")\n",
+    "\n",
+    "# Define parameters to be fed into pipeline\n",
+    "def digit_recognize_pipeline(download_link: str,\n",
+    "                             data_path: str,\n",
+    "                             load_data_path: str, \n",
+    "                             preprocess_data_path: str,\n",
+    "                             model_path:str\n",
+    "                            ):\n",
+    "\n",
+    "\n",
+    "    # Create download container.\n",
+    "    download_container = download_op(download_link)\n",
+    "    # Create load container.\n",
+    "    load_container = load_op(download_container.output)\n",
+    "    # Create preprocess container.\n",
+    "    preprocess_container = preprocess_op(load_container.output)\n",
+    "    # Create modeling container.\n",
+    "    modeling_container = modeling_op(preprocess_container.output)\n",
+    "    # Create prediction container.\n",
+    "    predict_container = predict_op(modeling_container.output, preprocess_container.output)\n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# replace download_link with the repo link where the data is stored https:github-repo/data-dir/{file}.csv.zip?raw=true\n",
+    "download_link = 'https://github.com/josepholaide/KfaaS/blob/main/kale/data/{file}.csv.zip?raw=true'\n",
+    "data_path = \"/mnt\"\n",
+    "load_data_path = \"load\"\n",
+    "preprocess_data_path = \"preprocess\"\n",
+    "model_path = \"model\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "metadata": {
+    "id": "Jq2M3chhtnXd",
+    "outputId": "cd75c395-f0d1-415f-8205-9ea23d52fdb5"
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<a href=\"/pipeline/#/experiments/details/cd1408ad-670b-4fc2-90a0-f9fa177b5570\" target=\"_blank\" >Experiment details</a>."
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<a href=\"/pipeline/#/runs/details/178ac72f-97e3-455a-af7b-37af66f7b8e8\" target=\"_blank\" >Run details</a>."
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "pipeline_func = digit_recognize_pipeline\n",
+    "\n",
+    "experiment_name = 'digit_recognizer_lightweight'\n",
+    "run_name = pipeline_func.__name__ + ' run'\n",
+    "\n",
+    "arguments = {\"download_link\": download_link,\n",
+    "             \"data_path\": data_path,\n",
+    "             \"load_data_path\": load_data_path,\n",
+    "             \"preprocess_data_path\": preprocess_data_path,\n",
+    "             \"model_path\":model_path}\n",
+    "\n",
+    "# Compile pipeline to generate compressed YAML definition of the pipeline.\n",
+    "kfp.compiler.Compiler().compile(pipeline_func,  \n",
+    "  '{}.zip'.format(experiment_name))\n",
+    "\n",
+    "# Submit pipeline directly from pipeline function\n",
+    "run_result = client.create_run_from_pipeline_func(pipeline_func, \n",
+    "                                                  experiment_name=experiment_name, \n",
+    "                                                  run_name=run_name, \n",
+    "                                                  arguments=arguments\n",
+    "                                                 )\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "colab": {
+   "name": "kubepipe.ipynb",
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "kubeflow_notebook": {
+   "autosnapshot": true,
+   "docker_image": "gcr.io/arrikto/jupyter-kale-py36@sha256:dd3f92ca66b46d247e4b9b6a9d84ffbb368646263c2e3909473c3b851f3fe198",
+   "experiment": {
+    "id": "6f6c9b81-54e3-414b-974a-6fe8b445a59e",
+    "name": "digit_recognize_lightweight"
+   },
+   "experiment_name": "digit_recognize_lightweight",
+   "katib_metadata": {
+    "algorithm": {
+     "algorithmName": "grid"
+    },
+    "maxFailedTrialCount": 3,
+    "maxTrialCount": 12,
+    "objective": {
+     "objectiveMetricName": "",
+     "type": "minimize"
+    },
+    "parallelTrialCount": 3,
+    "parameters": []
+   },
+   "katib_run": false,
+   "pipeline_description": "Performs Preprocessing, training and prediction of digits",
+   "pipeline_name": "digit-recognizer-kfp",
+   "snapshot_volumes": true,
+   "steps_defaults": [
+    "label:access-ml-pipeline:true",
+    "label:access-rok:true"
+   ],
+   "volume_access_mode": "rwm",
+   "volumes": [
+    {
+     "annotations": [],
+     "mount_point": "/home/jovyan",
+     "name": "arikkto-workspace-7xzjm",
+     "size": 5,
+     "size_type": "Gi",
+     "snapshot": false,
+     "type": "clone"
+    }
+   ]
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}