* fix ipynb images to be file paths, and not relevant urls Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com> * Don't explicitly set the kale image Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com> * Update packages Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com> Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com> |
||
|---|---|---|
| .. | ||
| data | ||
| images | ||
| README.md | ||
| digit-recognizer-kale.ipynb | ||
| digit-recognizer-kfp.ipynb | ||
| digit-recognizer-orig.ipynb | ||
| digit_recognizer_orig.ipynb | ||
| requirements.txt | ||
README.md
Objective
Here we convert the https://www.kaggle.com/competitions/digit-recognizer code to a Kubeflow pipeline The objective of this task is to correctly identify digits from a dataset of tens of thousands of handwritten images.
Testing Environment
Environment:
| Name | version |
|---|---|
| Kubeflow | v1.4 |
| kfp | 1.8.11 |
| kubeflow-kale | 0.6.0 |
| pip | 21.3.1 |
The KFP version used for testing can be installed as pip install kfp==1.8.11
Section 1: KFP Pipeline
Kubeflow lightweight component method
Here, a python function is created to carry out a certain task and the python function is passed inside a kfp component methodcreate_component_from_func.
Kubeflow pipelines
A Kubeflow pipelines connects all components together, to create a directed acyclic graph (DAG). The kfp dsl.pipeline method was used to create a pipeline function. The kfp component method InputPath and OutputPath was used to pass data amongst component.
Finally, the create_run_from_pipeline_func was used to submit pipeline directly from pipeline function
To create pipeline on KFP
-
Open your Kubeflow Cluster, create a Notebook Server and connect to it.
-
Clone this repo and navigate to this directory
-
Navigate to
datadirectory, download the compressed kaggle data using this link, store thetraining.zip,test.zipandsample_sumbission.csvfiles in the data folder -
Run the digit-recognizer-kfp notebook from start to finish
-
View run details immediately after submitting pipeline.
View Pipeline
Section 2: Kale Pipeline
To create pipeline using the Kale JupyterLab extension
-
Clone GitHub repo and navigate to this directory
-
Install the requirements.txt file
-
Launch the digit-recognizer-kale.ipynb Notebook
-
Enable the Kale extension in JupyterLab
-
The notebook's cells are automatically annotated with Kale tags
With the use of Kale tags we define the following:
- Pipeline parameters are assigned using the "pipeline parameters" tag
- The necessary libraries that need to be used throughout the Pipeline are passed through the "imports" tag
- Notebook cells are assigned to specific Pipeline components (download data, load data, etc.) using the "pipeline step" tag
- Cell dependencies are defined between the different pipeline steps with the "depends on" flag
-
Compile and run Notebook using Kale
View Pipeline