# Objective This example is based on the Titanic OpenVaccine competition (https://www.kaggle.com/c/stanford-covid-vaccine). The objective of this exercise is to develop models and design rules for RNA degradation. ## Environment This pipeline was tested using Kubeflow 1.4 and kfp 1.1.2 and x86-64 and ARM based system which includes all Intel and AMD based CPU's and M1/M2 series Macbooks. ## Step 1: Setup Kubeflow as a Service - If you haven’t already, sign up (https://www.arrikto.com/kubeflow-as-a-service/) - Deploy Kubeflow ## Step 2: Launch a Notebook Server - Bump memory to 2GB and vCPUs to 2 ## Step 3: Clone the Project Repo to Your Notebook - (Kubeflow as a Service) Open up a terminal in the Notebook Server and git clone the `kubeflow/examples` repository ``` git clone https://github.com/kubeflow/examples ``` ## Step 4: Setup DockerHub and Docker - If you haven’t already, sign up (https://hub.docker.com/) for DockerHub - If you haven’t already, install Docker Desktop locally (https://www.docker.com/products/docker-desktop/) OR install the Docker command line utility (https://docs.docker.com/get-docker/) and enter `sudo docker login` command in your terminal and log into Docker with your your DockerHub username and password ## Step 5: Setup Kaggle - If you haven’t already done so, sign up (https://www.kaggle.com/) for Kaggle - (On Kaggle) Generate an API token (https://www.kaggle.com/docs/api) - (Kubeflow as a Service) Create a Kubernetes secret ``` kubectl create secret generic kaggle-secret --from-literal=KAGGLE_USERNAME= --from-literal=KAGGLE_KEY= ``` ## Step 6: Install Git - (Locally) If you don’t have it already, install Git (https://github.com/git-guides/install-git) ## Step 7: Clone the Project Repo Locally - (Locally) Git clone the `kubeflow/examples` repository ``` git clone https://github.com/kubeflow/examples ``` ## Step 8: Create a PodDefault Resource - (Kubeflow as a Service) Navigate to the `openvaccine-kaggle-competition` directory - Create a `resource.yaml` file resource.yaml: ``` apiVersion: "kubeflow.org/v1alpha1" kind: PodDefault metadata: name: kaggle-access spec: selector: matchLabels: kaggle-secret: "true" desc: "kaggle-access" volumeMounts: - name: secret-volume mountPath: /secret/kaggle volumes: - name: secret-volume secret: secretName: kaggle-secret ``` ![image2](https://user-images.githubusercontent.com/17012391/177001253-3e525eb6-3415-428c-a52b-11803326af6b.png) - Apply created resource using: `kubectl apply -f resource.yaml` ## Step 9: Explore the `load-data` directory - (Locally) Navigate to the `openvaccine-kaggle-competition/pipeline-components/load-data` directory - Open up the `load.py` file - Note the code in this file that will perform the actions required in the “load-data” pipeline step image7 ## Step 10: Build the `load-data` Docker Image - (Locally) Navigate to the `openvaccine-kaggle-competition/pipeline-components/load-data` directory - Build the Docker image if locally you are using arm64 (Apple M1) ``` docker build --platform=linux/amd64 -t /:-amd64 . ``` - OR build the Docker image if locally you are using amd64 ``` docker build -t /: . ``` ## Step 11: Push the `load-data` Docker Image to DockerHub - (Locally) Navigate to the `openvaccine-kaggle-competition/pipeline-components/load-data` directory - Push the Docker image if locally you are using arm64 (Apple M1) ``` docker push /:-amd64 ``` - OR build the Docker image if locally you are using amd64 ``` docker push /: ``` ## Step 12: Explore the `preprocess-data` directory - (Locally) Navigate to the `openvaccine-kaggle-competition/pipeline-components/preprocess-data` directory - Open up the `preprocess.py` file - Note the code in this file that will perform the actions required in the “preprocess” pipeline step image5 ## Step 13: Explore the `preprocess-data` directory - (Locally) Navigate to the `openvaccine-kaggle-competition/pipeline-components/preprocess-data` directory - Build the Docker image if locally you are using arm64 (Apple M1) ``` docker build --platform=linux/amd64 -t /:-amd64 . ``` - OR build the Docker image if locally you are using amd64 ``` docker build -t /: . ``` ## Step 14: Push the `preprocess-data` Docker Image to DockerHub - (Locally) Navigate to the `openvaccine-kaggle-competition/pipeline-components/preprocess-data` directory - Push the Docker image if locally you are using arm64 (Apple M1) ``` docker push /:-amd64 ``` - OR build the Docker image if locally you are using amd64 ``` docker push /: ``` ## Step 15: Explore the `model-training` directory - (Locally) Navigate to the `openvaccine-kaggle-competition/pipeline-components/model-training` directory - Open up the `model.py` file - Note the code in this file that will perform the actions required in the “train” pipeline step ![image4](https://user-images.githubusercontent.com/17012391/177001740-a63f190c-284e-4328-ba01-17dbdbe61cee.png) ## Step 16: Build the `model-training` Docker Image - (Locally) Navigate to the `openvaccine-kaggle-competition/pipeline-components/model-training` directory - Build the Docker image if locally you are using arm64 (Apple M1) ``` docker build --platform=linux/amd64 -t /:-amd64 . ``` - OR build the Docker image if locally you are using amd64 ``` docker build -t /: . ``` ## Step 17: Push the `model-training` Docker Image to DockerHub - (Locally) Navigate to the `openvaccine-kaggle-competition/pipeline-components/model-training` directory - Push the Docker image if locally you are using arm64 (Apple M1) ``` docker push /:-amd64 ``` - OR build the Docker image if locally you are using amd64 ``` docker push /: ``` ## Step 18: Explore the `model-evaluation` directory - (Locally) Navigate to the `openvaccine-kaggle-competition/pipeline-components/model-evaluation` directory - Open up the `eval.py` file - Note the code in this file that will perform the actions required in the “test” pipeline step ![image1](https://user-images.githubusercontent.com/17012391/177001951-1f7b13b9-adea-48c1-89a7-d8dc67214133.png) ## Step 19: Build the `model-evaluation` Docker Image - (Locally) Navigate to the `openvaccine-kaggle-competition/pipeline-components/model-evaluation` directory - Build the Docker image if locally you are using arm64 (Apple M1) ``` docker build --platform=linux/amd64 -t /:-amd64 . ``` - OR build the Docker image if locally you are using amd64 ``` docker build -t /: . ``` ## Step 20: Push the `model-evaluation` Docker Image to DockerHub - (Locally) Navigate to the `openvaccine-kaggle-competition/pipeline-components/model-evaluation` directory - Push the Docker image if locally you are using arm64 (Apple M1) ``` docker push /:-amd64 ``` - OR build the Docker image if locally you are using amd64 ``` docker push /: ``` ## Step 21: Modify the openvaccine-kaggle-competiton-kfp.py file - (Kubeflow as a Service) Navigate to the `openvaccine-kaggle-competition` directory - Update the `openvaccine-kaggle-competiton-kfp.py` with accurate Docker Image inputs ``` return dsl.ContainerOp( name = 'load-data', image = '/:', —----- def GetMsg(comp1): return dsl.ContainerOp( name = 'preprocess', image = '/:', —----- def Train(comp2, trial, epoch, batchsize, embeddim, hiddendim, dropout, spdropout, trainsequencelength): return dsl.ContainerOp( name = 'train', image = '/:', —----- def Eval(comp1, trial, epoch, batchsize, embeddim, hiddendim, dropout, spdropout, trainsequencelength): return dsl.ContainerOp( name = 'Evaluate', image = '/:', ``` ## Step 22: Generate a KFP Pipeline yaml File - (Locally) Navigate to the `openvaccine-kaggle-competition` directory and delete the existing `openvaccine-kaggle-competition-kfp.yaml` file - (Kubeflow as a Service) Navigate to the openvaccine-kaggle-competition directory Build a python virtual environment : Step a) Update pip ``` python3 -m pip install --upgrade pip ``` Step b) Install virtualenv ``` sudo pip3 install virtualenv ``` Step c) Check the installed version of venv ``` virtualenv --version ``` Step d) Name your virtual enviornment as kfp ``` virtualenv kfp ``` Step e) Activate your venv. ``` source kfp/bin/activate ``` After this virtual environment will get activated. Now in our activated venv we need to install following packages: ``` sudo apt-get update sudo apt-get upgrade sudo apt-get install -y git python3-pip python3 -m pip install kfp==1.1.2 ``` After installing packages create the yaml file Inside venv point your terminal to a path which contains our kfp file to build pipeline (openvaccine-kaggle-competition-kfp.py) and run these commands to generate a `yaml` file for the Pipeline: ``` python3 openvaccine-kaggle-competition-kfp.py ``` image3 Download the `openvaccine-kaggle-competition-kfp.yaml` file that was created to your local `openvaccine-kaggle-competition` directory ## Step 23: Create an Experiment - (Kubeflow as a Service) Within the Kubeflow Central Dashboard, navigate to the Experiments (KFP) > Create Experiment view - Name the experiment and click Next - Click on Experiments (KFP) to view the experiment you just created ## Step 24: Create a Pipeline - (Kubeflow as a Service) Within the Kubeflow Central Dashboard, navigate to the Pipelines > +Upload Pipeline view - Name the pipeline - Click on Upload a file - Upload the local `openvaccine-kaggle-competition-kfp.yaml` file - Click Create ## Step 25: Create a Run - (Kubeflow as a Service) Click on Create Run in the view from the previous step - Choose the experiment we created in Step 23 - Input your desired run parameters. For example: ``` TRIAL = 1 EPOCHS = 2 BATCH_SIZE = 64 EMBED_DIM = 100 HIDDEN_DIM = 128 DROPOUT = .2 SP_DROPOUT = .3 TRAIN_SEQUENCE_LENGTH = 107 ``` - Click Start - Click on the run name to view the runtime execution graph ![image6](https://user-images.githubusercontent.com/17012391/177002214-8258e3fa-e669-43cc-979d-70c1059f6aae.png) ## Troubleshooting Tips: While running the pipeline as mentioned above you may come across this error: ![kaggle-secret-error-01](https://user-images.githubusercontent.com/17012391/175290593-aac58d80-0d9f-47bd-bd20-46e6f5207210.PNG) errorlog: ``` kaggle.rest.ApiException: (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'Date': 'Thu, 23 Jun 2022 11:31:18 GMT', 'Access-Control-Allow-Credentials': 'true', 'Set-Cookie': 'ka_sessionid=6817a347c75399a531148e19cad0aaeb; max-age=2626560; path=/, GCLB=CIGths3--ebbUg; path=/; HttpOnly', 'Transfer-Encoding': 'chunked', 'Vary': HTTP response body: b'{"code":403,"message":"You must accept this competition\\u0027s rules before you\\u0027ll be able to download files."}' ``` This error occours for two reasons: - Your Kaggle account is not verified with your phone number. - Rules for this specific competitions are not accepted. Lets accept Rules of competition ![rules](https://user-images.githubusercontent.com/17012391/175306310-10808262-07ce-4952-8fb0-3b7754e9fb46.png) Click on "I Understand and Accept". After this you will be prompted to verify your account using your phone number: ![kaggle-secret-error-03](https://user-images.githubusercontent.com/17012391/175291608-daad1a47-119a-4e47-b48b-4f878d65ddd7.PNG) Add your phone number and Kaggle will send the code to your number, enter this code and verify your account. ( Note: pipeline wont run if your Kaggle account is not verified ) ## Success After the kaggle account is verified pipeline run is successful we will get the following: Screenshot 2022-06-06 at 3 00 51 PM