mirror of https://github.com/kubeflow/examples.git
Kaggle to kfp (#938)
* Add files via upload * Kaggle to kfp Converted Kaggle notebook of Facial-Keypoint-Detection to Kubeflow pipeline * Kaggle to kfp
This commit is contained in:
parent
97cb872bcf
commit
7a02695ac4
|
|
@ -0,0 +1,43 @@
|
|||
# Objective
|
||||
Here we convert the https://www.kaggle.com/competitions/facial-keypoints-detection code to kfp-pipeline
|
||||
The objective of this task is to predict keypoint positions on face images
|
||||
|
||||
# Testing enviornment
|
||||
The pipeline is tested on `Kubeflow 1.4` and `kfp 1.1.2` , it should be compatible with previous releases of Kubeflow . kfp version used for testing is 1.1.2 which can be installed as `pip install kfp==1.1.2`
|
||||
|
||||
# Components used
|
||||
|
||||
## Docker
|
||||
Docker is used to create an enviornment to run each component.
|
||||
|
||||
## Kubeflow pipelines
|
||||
Kubeflow pipelines connect each docker component and create a pipeline. Each Kubeflow pipeline is reproducable workflow wherein we pass input arguments and run entire workflow.
|
||||
|
||||
# Docker
|
||||
We start with creating a docker account on dockerhub (https://hub.docker.com/). We signup with our individual email. After signup is compelete login to docker using your username and password using the command `docker login` on your terminal
|
||||
|
||||
## Build train image
|
||||
Navigate to `train` directory, create a folder named `my_data` and put your `training.zip` and `test.zip` data from Kaggle repo in this folder and build docker image using :
|
||||
```
|
||||
docker build -t <docker_username>/<docker_imagename>:<tag> .
|
||||
```
|
||||
In my case this is:
|
||||
```
|
||||
docker build -t hubdocker76/demotrain:v1 .
|
||||
```
|
||||
|
||||
## Build evaluate image
|
||||
Navigate to eval directory and build docker image using :
|
||||
```
|
||||
docker build -t <docker_username>/<docker_imagename>:<tag> .
|
||||
```
|
||||
In my case this is:
|
||||
```
|
||||
docker build -t hubdocker76/demoeval:v2 .
|
||||
```
|
||||
# Kubeflow pipelines
|
||||
|
||||
Go to generate-pipeline and run `python3 my_pipeline.py` this will generate a yaml file. which we can upload to Kubeflow pipelines UI and create a Run from it.
|
||||
|
||||
# Sample pipeline to run on Kubeflow
|
||||
Navigate to directory `geneate-pipeline` and run `python3 my_pipeline.py` this will generate yaml file. I have named this yaml as `face_pipeline_01.yaml`. Please upload this pipeline on Kubeflow and start a Run.
|
||||
|
|
@ -0,0 +1,14 @@
|
|||
FROM "ubuntu:bionic"
|
||||
RUN apt-get update && yes | apt-get upgrade
|
||||
RUN mkdir -p /tensorflow/models
|
||||
RUN apt-get install -y git python3-pip
|
||||
RUN pip3 install --upgrade pip
|
||||
RUN pip3 install tensorflow
|
||||
RUN pip3 install jupyter
|
||||
RUN pip3 install matplotlib
|
||||
RUN pip3 install kfp==1.1.2
|
||||
RUN pip install opencv-python-headless
|
||||
RUN pip3 install pandas keras
|
||||
RUN pip3 install sklearn
|
||||
RUN pip3 install autokeras
|
||||
COPY . /
|
||||
|
|
@ -0,0 +1,28 @@
|
|||
from tensorflow.keras.models import load_model
|
||||
import autokeras as ak
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
|
||||
### Load model
|
||||
loaded_model = load_model("/data/model_autokeras", custom_objects=ak.CUSTOM_OBJECTS)
|
||||
|
||||
### Pint model summary
|
||||
print(loaded_model.summary())
|
||||
|
||||
test_dir='/data/test.csv'
|
||||
test=pd.read_csv(test_dir)
|
||||
|
||||
X_test=[]
|
||||
for img in test['Image']:
|
||||
X_test.append(np.asarray(img.split(),dtype=float).reshape(96,96,1))
|
||||
X_test=np.reshape(X_test,(-1,96,96,1))
|
||||
X_test = np.asarray(X_test).astype('float32')
|
||||
|
||||
### predict
|
||||
y_pred = loaded_model.predict(X_test)
|
||||
|
||||
### Create submission file
|
||||
y_pred= y_pred.reshape(-1,)
|
||||
submission = pd.DataFrame({'Location': y_pred})
|
||||
submission.to_csv('/data/submission.csv', index=True , index_label='RowId')
|
||||
|
||||
|
|
@ -0,0 +1,93 @@
|
|||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Workflow
|
||||
metadata:
|
||||
generateName: face-pipeline-
|
||||
annotations: {pipelines.kubeflow.org/kfp_sdk_version: 1.1.2, pipelines.kubeflow.org/pipeline_compilation_time: '2022-03-27T11:03:51.876586',
|
||||
pipelines.kubeflow.org/pipeline_spec: '{"description": "pipeline to detect facial
|
||||
landmarks", "inputs": [{"name": "trial"}, {"name": "epoch"}, {"name": "patience"}],
|
||||
"name": "face pipeline"}'}
|
||||
labels: {pipelines.kubeflow.org/kfp_sdk_version: 1.1.2}
|
||||
spec:
|
||||
entrypoint: face-pipeline
|
||||
templates:
|
||||
- name: evaluate
|
||||
container:
|
||||
command: [python3, eval.py]
|
||||
image: hubdocker76/demoeval:v2
|
||||
volumeMounts:
|
||||
- {mountPath: /data, name: pvc}
|
||||
inputs:
|
||||
parameters:
|
||||
- {name: pvc-name}
|
||||
volumes:
|
||||
- name: pvc
|
||||
persistentVolumeClaim: {claimName: '{{inputs.parameters.pvc-name}}'}
|
||||
- name: face-pipeline
|
||||
inputs:
|
||||
parameters:
|
||||
- {name: epoch}
|
||||
- {name: patience}
|
||||
- {name: trial}
|
||||
dag:
|
||||
tasks:
|
||||
- name: evaluate
|
||||
template: evaluate
|
||||
dependencies: [pvc, train]
|
||||
arguments:
|
||||
parameters:
|
||||
- {name: pvc-name, value: '{{tasks.pvc.outputs.parameters.pvc-name}}'}
|
||||
- {name: pvc, template: pvc}
|
||||
- name: train
|
||||
template: train
|
||||
dependencies: [pvc]
|
||||
arguments:
|
||||
parameters:
|
||||
- {name: epoch, value: '{{inputs.parameters.epoch}}'}
|
||||
- {name: patience, value: '{{inputs.parameters.patience}}'}
|
||||
- {name: pvc-name, value: '{{tasks.pvc.outputs.parameters.pvc-name}}'}
|
||||
- {name: trial, value: '{{inputs.parameters.trial}}'}
|
||||
- name: pvc
|
||||
resource:
|
||||
action: create
|
||||
manifest: |
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: '{{workflow.name}}-pvc'
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 1Gi
|
||||
outputs:
|
||||
parameters:
|
||||
- name: pvc-manifest
|
||||
valueFrom: {jsonPath: '{}'}
|
||||
- name: pvc-name
|
||||
valueFrom: {jsonPath: '{.metadata.name}'}
|
||||
- name: pvc-size
|
||||
valueFrom: {jsonPath: '{.status.capacity.storage}'}
|
||||
- name: train
|
||||
container:
|
||||
args: [--trial, '{{inputs.parameters.trial}}', --epoch, '{{inputs.parameters.epoch}}',
|
||||
--patience, '{{inputs.parameters.patience}}']
|
||||
command: [python3, train.py]
|
||||
image: hubdocker76/demotrain:v1
|
||||
volumeMounts:
|
||||
- {mountPath: /data, name: pvc}
|
||||
inputs:
|
||||
parameters:
|
||||
- {name: epoch}
|
||||
- {name: patience}
|
||||
- {name: pvc-name}
|
||||
- {name: trial}
|
||||
volumes:
|
||||
- name: pvc
|
||||
persistentVolumeClaim: {claimName: '{{inputs.parameters.pvc-name}}'}
|
||||
arguments:
|
||||
parameters:
|
||||
- {name: trial}
|
||||
- {name: epoch}
|
||||
- {name: patience}
|
||||
serviceAccountName: pipeline-runner
|
||||
|
|
@ -0,0 +1,42 @@
|
|||
import kfp
|
||||
from kfp import dsl
|
||||
|
||||
def SendMsg(trial, epoch, patience):
|
||||
vop = dsl.VolumeOp(name="pvc",
|
||||
resource_name="pvc", size='1Gi',
|
||||
modes=dsl.VOLUME_MODE_RWO)
|
||||
|
||||
return dsl.ContainerOp(
|
||||
name = 'Train',
|
||||
image = 'hubdocker76/demotrain:v1',
|
||||
command = ['python3', 'train.py'],
|
||||
arguments=[
|
||||
'--trial', trial,
|
||||
'--epoch', epoch,
|
||||
'--patience', patience
|
||||
],
|
||||
pvolumes={
|
||||
'/data': vop.volume
|
||||
}
|
||||
)
|
||||
|
||||
def GetMsg(comp1):
|
||||
return dsl.ContainerOp(
|
||||
name = 'Evaluate',
|
||||
image = 'hubdocker76/demoeval:v2',
|
||||
pvolumes={
|
||||
'/data': comp1.pvolumes['/data']
|
||||
},
|
||||
command = ['python3', 'eval.py']
|
||||
)
|
||||
|
||||
@dsl.pipeline(
|
||||
name = 'face pipeline',
|
||||
description = 'pipeline to detect facial landmarks')
|
||||
def passing_parameter(trial, epoch, patience):
|
||||
comp1 = SendMsg(trial, epoch, patience)
|
||||
comp2 = GetMsg(comp1)
|
||||
|
||||
if __name__ == '__main__':
|
||||
import kfp.compiler as compiler
|
||||
compiler.Compiler().compile(passing_parameter, __file__ + '.yaml')
|
||||
|
|
@ -0,0 +1,14 @@
|
|||
FROM "ubuntu:bionic"
|
||||
RUN apt-get update && yes | apt-get upgrade
|
||||
RUN mkdir -p /tensorflow/models
|
||||
RUN apt-get install -y git python3-pip
|
||||
RUN pip3 install --upgrade pip
|
||||
RUN pip3 install tensorflow
|
||||
RUN pip3 install jupyter
|
||||
RUN pip3 install matplotlib
|
||||
RUN pip3 install kfp==1.1.2
|
||||
RUN pip install opencv-python-headless
|
||||
RUN pip3 install pandas keras
|
||||
RUN pip3 install sklearn
|
||||
RUN pip3 install autokeras
|
||||
COPY . /
|
||||
|
|
@ -0,0 +1,79 @@
|
|||
import numpy as np
|
||||
import os
|
||||
from sklearn.utils import shuffle
|
||||
import matplotlib.pyplot as plt
|
||||
import tensorflow as tf
|
||||
import pandas as pd
|
||||
from tensorflow.keras.models import load_model
|
||||
import os
|
||||
import shutil
|
||||
import argparse
|
||||
import autokeras as ak
|
||||
|
||||
### Declaring input arguments
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument('--trial', type=int)
|
||||
parser.add_argument('--epoch', type=int)
|
||||
parser.add_argument('--patience', type=int)
|
||||
|
||||
args = vars(parser.parse_args())
|
||||
|
||||
trials = args['trial']
|
||||
epochs = args['epoch']
|
||||
patience = args['patience']
|
||||
|
||||
project="Facial-keypoints"
|
||||
run_id= "1.8"
|
||||
resume_run = True
|
||||
|
||||
MAX_TRIALS=trials
|
||||
EPOCHS=epochs
|
||||
PATIENCE=patience
|
||||
|
||||
### Data Extraction : extract data and save to attached extenal pvc at location /data ###
|
||||
|
||||
base_dir='my_data/'
|
||||
train_dir_zip=base_dir+'training.zip'
|
||||
test_dir_zip=base_dir+'test.zip'
|
||||
|
||||
from zipfile import ZipFile
|
||||
with ZipFile(train_dir_zip,'r') as zipObj:
|
||||
zipObj.extractall('/data')
|
||||
print("Train Archive unzipped")
|
||||
with ZipFile(test_dir_zip,'r') as zipObj:
|
||||
zipObj.extractall('/data')
|
||||
print("Test Archive unzipped")
|
||||
|
||||
|
||||
## Data preprocess
|
||||
|
||||
train_dir='/data/training.csv'
|
||||
test_dir='/data/test.csv'
|
||||
train=pd.read_csv(train_dir)
|
||||
test=pd.read_csv(test_dir)
|
||||
|
||||
train=train.dropna()
|
||||
train=train.reset_index(drop=True)
|
||||
|
||||
X_train=[]
|
||||
Y_train=[]
|
||||
|
||||
for img in train['Image']:
|
||||
X_train.append(np.asarray(img.split(),dtype=float).reshape(96,96,1))
|
||||
X_train=np.reshape(X_train,(-1,96,96,1))
|
||||
X_train = np.asarray(X_train).astype('float32')
|
||||
|
||||
for i in range(len((train))):
|
||||
Y_train.append(np.asarray(train.iloc[i][0:30].to_numpy()))
|
||||
Y_train = np.asarray(Y_train).astype('float32')
|
||||
|
||||
|
||||
## Data training
|
||||
|
||||
reg = ak.ImageRegressor(max_trials=MAX_TRIALS)
|
||||
reg.fit(X_train, Y_train, validation_split=0.15, epochs=EPOCHS)
|
||||
|
||||
# Export trained model to externally attached pvc
|
||||
my_model = reg.export_model()
|
||||
my_model.save('/data/model_autokeras', save_format="tf")
|
||||
Loading…
Reference in New Issue