History

Kimonas Sotirchos 067ba59439 Update the examples with correct image paths and packages (#1016 ) * fix ipynb images to be file paths, and not relevant urls Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com> * Don't explicitly set the kale image Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com> * Update packages Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com> Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>		2022-11-22 21:49:42 +00:00
..
data	Kaggle notebook on digit recognition to kubeflow pipeline (#944 )	2022-05-20 00:36:24 +00:00
images	Kaggle notebook on digit recognition to kubeflow pipeline (#944 )	2022-05-20 00:36:24 +00:00
README.md	Kaggle notebook on digit recognition to kubeflow pipeline (#944 )	2022-05-20 00:36:24 +00:00
digit-recognizer-kale.ipynb	Update the examples with correct image paths and packages (#1016 )	2022-11-22 21:49:42 +00:00
digit-recognizer-kfp.ipynb	Update the examples with correct image paths and packages (#1016 )	2022-11-22 21:49:42 +00:00
digit-recognizer-orig.ipynb	Update the examples with correct image paths and packages (#1016 )	2022-11-22 21:49:42 +00:00
digit_recognizer_orig.ipynb	Update the examples with correct image paths and packages (#1016 )	2022-11-22 21:49:42 +00:00
requirements.txt	Bound example dependencies to avoid future warnings and issues (#997 )	2022-10-03 16:39:21 +00:00

README.md

Objective

Here we convert the https://www.kaggle.com/competitions/digit-recognizer code to a Kubeflow pipeline The objective of this task is to correctly identify digits from a dataset of tens of thousands of handwritten images.

Testing Environment

Environment:

Name	version
Kubeflow	v1.4
kfp	1.8.11
kubeflow-kale	0.6.0
pip	21.3.1

The KFP version used for testing can be installed as pip install kfp==1.8.11

Section 1: KFP Pipeline

Kubeflow lightweight component method

Here, a python function is created to carry out a certain task and the python function is passed inside a kfp component methodcreate_component_from_func.

Kubeflow pipelines

A Kubeflow pipelines connects all components together, to create a directed acyclic graph (DAG). The kfp dsl.pipeline method was used to create a pipeline function. The kfp component method InputPath and OutputPath was used to pass data amongst component.

Finally, the create_run_from_pipeline_func was used to submit pipeline directly from pipeline function

To create pipeline on KFP

Open your Kubeflow Cluster, create a Notebook Server and connect to it.
Clone this repo and navigate to this directory
Navigate to data directory, download the compressed kaggle data using this link, store the training.zip, test.zip and sample_sumbission.csv files in the data folder
Run the digit-recognizer-kfp notebook from start to finish
View run details immediately after submitting pipeline.

View Pipeline

kubeflow pipeline

Section 2: Kale Pipeline

To create pipeline using the Kale JupyterLab extension

Clone GitHub repo and navigate to this directory
Install the requirements.txt file
Launch the digit-recognizer-kale.ipynb Notebook
Enable the Kale extension in JupyterLab
The notebook's cells are automatically annotated with Kale tags

With the use of Kale tags we define the following:
- Pipeline parameters are assigned using the "pipeline parameters" tag
- The necessary libraries that need to be used throughout the Pipeline are passed through the "imports" tag
- Notebook cells are assigned to specific Pipeline components (download data, load data, etc.) using the "pipeline step" tag
- Cell dependencies are defined between the different pipeline steps with the "depends on" flag
Compile and run Notebook using Kale

View Pipeline

kubeflow pipeline