Remove outdated video, Add KFP spark example (#902)

* Add js-ts as an approver * Remove outdated video * Add kfp-spark example
2022-01-12 14:38:10 +05:30 · 2022-01-12 14:38:10 +05:30 · 11ebbba517
parent 79418168c3
commit 11ebbba517
16 changed files with 871 additions and 63 deletions
--- a/kfp-spark/LICENSE
+++ b/kfp-spark/LICENSE
@ -0,0 +1,201 @@
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/kfp-spark/README.md
+++ b/kfp-spark/README.md
@ -0,0 +1,50 @@
+KFP version: 1.7.0+
+Kubernetes version: 1.17+
+
+# Orchestrate Spark Jobs using Kubeflow pipelines
+
+## Install kubeflow pipelines standalone or full kubeflow 
+
+### for standalone kubeflow pipelines installation
+https://www.kubeflow.org/docs/components/pipelines/installation/
+
+### for full kubeflow installation
+https://www.kubeflow.org/docs/started/installing-kubeflow/
+
+## Install Spark Operator
+
+https://github.com/GoogleCloudPlatform/spark-on-k8s-operator#installation
+
+## Create Spark Service Account and add permissions
+
+```
+kubectl apply -f ./scripts/spark-rbac.yaml
+```
+
+## Run the notebok kubeflow-pipeline.ipynb 
+ 
+## Access Kubflow/KFP UI
+
+![image](/images/central-ui.png)
+
+## OR
+
+![image](/images/pipelines-ui.png)
+
+## Upload pipeline
+
+Upload the spark_job_pipeline.yaml file
+
+![image](/images/upload-pipeline.png)
+
+# Create Run
+
+![image](/images/create-run.png)
+
+# Start Pipeline add service account `spark-sa`
+
+![image](/images/start_run.png)
+
+# Wait till the execution is finished. check the `print-message` logs to view the result
+
+![image](/images/final-output.png)
--- a/kfp-spark/images/central-ui.png
+++ b/kfp-spark/images/central-ui.png
--- a/kfp-spark/images/create-run.png
+++ b/kfp-spark/images/create-run.png
--- a/kfp-spark/images/final-output.png
+++ b/kfp-spark/images/final-output.png
--- a/kfp-spark/images/pipelines-ui.png
+++ b/kfp-spark/images/pipelines-ui.png
--- a/kfp-spark/images/start_run.png
+++ b/kfp-spark/images/start_run.png
--- a/kfp-spark/images/upload-pipeline.png
+++ b/kfp-spark/images/upload-pipeline.png
--- a/kfp-spark/k8s-apply-component.yaml
+++ b/kfp-spark/k8s-apply-component.yaml
@ -0,0 +1,32 @@
+name: Apply Kubernetes object
+inputs:
+  - {name: Object, type: JsonObject}
+outputs:
+  - {name: Name, type: String}
+  - {name: Kind, type: String}
+  - {name: Object, type: JsonObject}
+metadata:
+  annotations:
+    author: Alexey Volkov <alexey.volkov@ark-kun.com>
+implementation:
+  container:
+    image: bitnami/kubectl:1.17.17
+    command:
+      - bash
+      - -exc
+      - |
+        object_path=$0
+        output_name_path=$1
+        output_kind_path=$2
+        output_object_path=$3
+        mkdir -p "$(dirname "$output_name_path")"
+        mkdir -p "$(dirname "$output_kind_path")"
+        mkdir -p "$(dirname "$output_object_path")"
+        kubectl apply -f "$object_path" --output=json > "$output_object_path"
+
+        < "$output_object_path" jq '.metadata.name' --raw-output > "$output_name_path"
+        < "$output_object_path" jq '.kind' --raw-output > "$output_kind_path"
+      - {inputPath: Object}
+      - {outputPath: Name}
+      - {outputPath: Kind}
+      - {outputPath: Object}
--- a/kfp-spark/k8s-get-component.yaml
+++ b/kfp-spark/k8s-get-component.yaml
@ -0,0 +1,37 @@
+name: Get Kubernetes object
+inputs:
+  - {name: Name, type: String}
+  - {name: Kind, type: String}
+outputs:
+  - {name: Name, type: String}
+  - {name: ApplicationState, type: String}
+  - {name: Object, type: JsonObject}
+metadata:
+  annotations:
+    author: Alexey Volkov <alexey.volkov@ark-kun.com>
+implementation:
+  container:
+    image: bitnami/kubectl:1.17.17
+    command:
+      - bash
+      - -exc
+      - |
+        object_name=$0
+        object_type=$1
+        output_name_path=$2
+        output_state_path=$3
+        output_object_path=$4
+        mkdir -p "$(dirname "$output_name_path")"
+        mkdir -p "$(dirname "$output_state_path")"
+        mkdir -p "$(dirname "$output_object_path")"
+
+        kubectl get "$object_type" "$object_name" --output=json > "$output_object_path"
+
+        < "$output_object_path" jq '.metadata.name' --raw-output > "$output_name_path"
+        < "$output_object_path" jq '.status.applicationState.state' --raw-output > "$output_state_path"
+
+      - {inputValue: Name}
+      - {inputValue: Kind}
+      - {outputPath: Name}
+      - {outputPath: ApplicationState}
+      - {outputPath: Object}
--- a/kfp-spark/kubeflow-pipeline.ipynb
+++ b/kfp-spark/kubeflow-pipeline.ipynb
@ -0,0 +1,264 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Run the following command to install the Kubeflow Pipelines SDK. If you run this command in a Jupyter\n",
+    "    notebook, restart the kernel after installing the SDK. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install kfp --upgrade\n",
+    "# to install tekton compiler uncomment the line below\n",
+    "# %pip install kfp_tekton"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Import Packages"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "import time\n",
+    "import yaml\n",
+    "\n",
+    "import kfp\n",
+    "import kfp.components as comp\n",
+    "import kfp.dsl as dsl"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "SPARK_COMPLETED_STATE = \"COMPLETED\"\n",
+    "SPARK_APPLICATION_KIND = \"sparkapplications\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "def get_spark_job_definition():\n",
+    "    \"\"\"\n",
+    "    Read Spark Operator job manifest file and return the corresponding dictionary and\n",
+    "    add some randomness in the job name\n",
+    "    :return: dictionary defining the spark job\n",
+    "    \"\"\"\n",
+    "    # Read manifest file\n",
+    "    with open(\"spark-job.yaml\", \"r\") as stream:\n",
+    "        spark_job_manifest = yaml.safe_load(stream)\n",
+    "\n",
+    "    # Add epoch time in the job name\n",
+    "    epoch = int(time.time())\n",
+    "    spark_job_manifest[\"metadata\"][\"name\"] = spark_job_manifest[\"metadata\"][\"name\"].format(epoch=epoch)\n",
+    "\n",
+    "    return spark_job_manifest"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def print_op(msg):\n",
+    "    \"\"\"\n",
+    "    Op to print a message.\n",
+    "    \"\"\"\n",
+    "    return dsl.ContainerOp(\n",
+    "        name=\"Print message.\",\n",
+    "        image=\"alpine:3.6\",\n",
+    "        command=[\"echo\", msg],\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@dsl.graph_component  # Graph component decorator is used to annotate recursive functions\n",
+    "def graph_component_spark_app_status(input_application_name):\n",
+    "    k8s_get_op = comp.load_component_from_file(\"k8s-get-component.yaml\")\n",
+    "    check_spark_application_status_op = k8s_get_op(\n",
+    "        name=input_application_name,\n",
+    "        kind=SPARK_APPLICATION_KIND\n",
+    "    )\n",
+    "    # Remove cache\n",
+    "    check_spark_application_status_op.execution_options.caching_strategy.max_cache_staleness = \"P0D\"\n",
+    "\n",
+    "    time.sleep(5)\n",
+    "    with dsl.Condition(check_spark_application_status_op.outputs[\"applicationstate\"] != SPARK_COMPLETED_STATE):\n",
+    "        graph_component_spark_app_status(check_spark_application_status_op.outputs[\"name\"])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@dsl.pipeline(\n",
+    "    name=\"Spark Operator job pipeline\",\n",
+    "    description=\"Spark Operator job pipeline\"\n",
+    ")\n",
+    "def spark_job_pipeline():\n",
+    "\n",
+    "    # Load spark job manifest\n",
+    "    spark_job_definition = get_spark_job_definition()\n",
+    "\n",
+    "    # Load the kubernetes apply component\n",
+    "    k8s_apply_op = comp.load_component_from_file(\"k8s-apply-component.yaml\")\n",
+    "\n",
+    "    # Execute the apply command\n",
+    "    spark_job_op = k8s_apply_op(object=json.dumps(spark_job_definition))\n",
+    "\n",
+    "    # Fetch spark job name\n",
+    "    spark_job_name = spark_job_op.outputs[\"name\"]\n",
+    "\n",
+    "    # Remove cache for the apply operator\n",
+    "    spark_job_op.execution_options.caching_strategy.max_cache_staleness = \"P0D\"\n",
+    "\n",
+    "    spark_application_status_op = graph_component_spark_app_status(spark_job_op.outputs[\"name\"])\n",
+    "    spark_application_status_op.after(spark_job_op)\n",
+    "\n",
+    "    print_message = print_op(f\"Job {spark_job_name} is completed.\")\n",
+    "    print_message.after(spark_application_status_op)\n",
+    "    print_message.execution_options.caching_strategy.max_cache_staleness = \"P0D\"\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Compile and run your pipeline\n",
+    "\n",
+    "After defining the pipeline in Python as described in the preceding section, use one of the following options to compile the pipeline and submit it to the Kubeflow Pipelines service.\n",
+    "\n",
+    "#### Option 1: Compile and then upload in UI\n",
+    "\n",
+    "1.  Run the following to compile your pipeline and save it as `spark_job_pipeline.yaml`. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For Argo (Default)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create piepline file for argo backend the default one if you use tekton use the block below\n",
+    "if __name__ == \"__main__\":\n",
+    "    # Compile the pipeline\n",
+    "    import kfp.compiler as compiler\n",
+    "    import logging\n",
+    "    logging.basicConfig(level=logging.INFO)\n",
+    "    pipeline_func = spark_job_pipeline\n",
+    "    pipeline_filename = pipeline_func.__name__ + \".yaml\"\n",
+    "    compiler.Compiler().compile(pipeline_func, pipeline_filename)\n",
+    "    logging.info(f\"Generated pipeline file: {pipeline_filename}.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For Tekton"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# uncomment the block below to create pipeline file for tekton\n",
+    "\n",
+    "# if __name__ == '__main__':\n",
+    "#     from kfp_tekton.compiler import TektonCompiler\n",
+    "#     import logging\n",
+    "#     logging.basicConfig(level=logging.INFO)\n",
+    "#     pipeline_func = spark_job_pipeline\n",
+    "#     pipeline_filename = pipeline_func.__name__ + \".yaml\"\n",
+    "#     TektonCompiler().compile(pipeline_func, pipeline_filename)\n",
+    "#     logging.info(f\"Generated pipeline file: {pipeline_filename}.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "2.  Upload and run your `spark_job_pipeline.yaml` using the Kubeflow Pipelines user interface.\n",
+    "See the guide to [getting started with the UI][quickstart].\n",
+    "\n",
+    "[quickstart]: https://www.kubeflow.org/docs/components/pipelines/overview/quickstart"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Option 2: run the pipeline using Kubeflow Pipelines SDK client\n",
+    "\n",
+    "1.  Create an instance of the [`kfp.Client` class][kfp-client] following steps in [connecting to Kubeflow Pipelines using the SDK client][connect-api].\n",
+    "\n",
+    "[kfp-client]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/kfp.client.html#kfp.Client\n",
+    "[connect-api]: https://www.kubeflow.org/docs/components/pipelines/sdk/connect-api"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "client = kfp.Client() # change arguments accordingly"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "client.create_run_from_pipeline_func(\n",
+    "   spark_job_pipeline)"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/kfp-spark/spark-job.yaml
+++ b/kfp-spark/spark-job.yaml
@ -0,0 +1,54 @@
+#
+# Copyright 2017 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     https://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+apiVersion: "sparkoperator.k8s.io/v1beta2"
+kind: SparkApplication
+metadata:
+  name: spark-pi-{epoch}
+  namespace: kubeflow
+spec:
+  type: Scala
+  mode: cluster
+  image: "gcr.io/spark-operator/spark:v3.1.1"
+  imagePullPolicy: Always
+  mainClass: org.apache.spark.examples.SparkPi
+  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar"
+  sparkVersion: "3.1.1"
+  restartPolicy:
+    type: Never
+  volumes:
+    - name: "test-volume"
+      hostPath:
+        path: "/tmp"
+        type: Directory
+  driver:
+    cores: 1
+    coreLimit: "1200m"
+    memory: "512m"
+    labels:
+      version: 3.1.1
+    serviceAccount: spark-sa
+    volumeMounts:
+      - name: "test-volume"
+        mountPath: "/tmp"
+  executor:
+    cores: 1
+    instances: 2
+    memory: "1024m"
+    labels:
+      version: 3.1.1
+    volumeMounts:
+      - name: "test-volume"
+        mountPath: "/tmp"
--- a/kfp-spark/spark-rbac.yaml
+++ b/kfp-spark/spark-rbac.yaml
@ -0,0 +1,32 @@
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: spark-sa
+  namespace: kubeflow
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: Role
+metadata:
+  namespace: kubeflow
+  name: spark-role
+rules:
+- apiGroups: [""]
+  resources: ["pods", "services", "configmaps", "pods/log"]
+  verbs: ["create", "get", "watch", "list", "post", "delete", "patch"]
+- apiGroups: ["sparkoperator.k8s.io"]
+  resources: ["sparkapplications"]
+  verbs: ["create", "get", "watch", "list", "post", "delete", "patch"]
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: RoleBinding
+metadata:
+  name: spark-role-binding
+  namespace: kubeflow
+subjects:
+- kind: ServiceAccount
+  name: spark-sa
+  namespace: kubeflow
+roleRef:
+  kind: Role
+  name: spark-role
+  apiGroup: rbac.authorization.k8s.io
--- a/kfp-spark/spark_job_pipeline.yaml
+++ b/kfp-spark/spark_job_pipeline.yaml
@ -0,0 +1,201 @@
+apiVersion: argoproj.io/v1alpha1
+kind: Workflow
+metadata:
+  generateName: spark-operator-job-pipeline-
+  annotations: {pipelines.kubeflow.org/kfp_sdk_version: 1.8.10, pipelines.kubeflow.org/pipeline_compilation_time: '2021-12-14T17:26:58.647651',
+    pipelines.kubeflow.org/pipeline_spec: '{"description": "Spark Operator job pipeline",
+      "name": "Spark Operator job pipeline"}'}
+  labels: {pipelines.kubeflow.org/kfp_sdk_version: 1.8.10}
+spec:
+  entrypoint: spark-operator-job-pipeline
+  templates:
+  - name: apply-kubernetes-object
+    container:
+      args: []
+      command:
+      - bash
+      - -exc
+      - |
+        object_path=$0
+        output_name_path=$1
+        output_kind_path=$2
+        output_object_path=$3
+        mkdir -p "$(dirname "$output_name_path")"
+        mkdir -p "$(dirname "$output_kind_path")"
+        mkdir -p "$(dirname "$output_object_path")"
+        kubectl apply -f "$object_path" --output=json > "$output_object_path"
+
+        < "$output_object_path" jq '.metadata.name' --raw-output > "$output_name_path"
+        < "$output_object_path" jq '.kind' --raw-output > "$output_kind_path"
+      - /tmp/inputs/Object/data
+      - /tmp/outputs/Name/data
+      - /tmp/outputs/Kind/data
+      - /tmp/outputs/Object/data
+      image: bitnami/kubectl:1.17.17
+    inputs:
+      artifacts:
+      - name: Object
+        path: /tmp/inputs/Object/data
+        raw: {data: '{"apiVersion": "sparkoperator.k8s.io/v1beta2", "kind": "SparkApplication",
+            "metadata": {"name": "spark-pi-1639502813", "namespace": "kubeflow"},
+            "spec": {"type": "Scala", "mode": "cluster", "image": "gcr.io/spark-operator/spark:v3.1.1",
+            "imagePullPolicy": "Always", "mainClass": "org.apache.spark.examples.SparkPi",
+            "mainApplicationFile": "local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar",
+            "sparkVersion": "3.1.1", "restartPolicy": {"type": "Never"}, "volumes":
+            [{"name": "test-volume", "hostPath": {"path": "/tmp", "type": "Directory"}}],
+            "driver": {"cores": 1, "coreLimit": "1200m", "memory": "512m", "labels":
+            {"version": "3.1.1"}, "serviceAccount": "spark-sa", "volumeMounts": [{"name":
+            "test-volume", "mountPath": "/tmp"}]}, "executor": {"cores": 1, "instances":
+            2, "memory": "1024m", "labels": {"version": "3.1.1"}, "volumeMounts":
+            [{"name": "test-volume", "mountPath": "/tmp"}]}}}'}
+    outputs:
+      parameters:
+      - name: apply-kubernetes-object-Name
+        valueFrom: {path: /tmp/outputs/Name/data}
+      artifacts:
+      - {name: apply-kubernetes-object-Kind, path: /tmp/outputs/Kind/data}
+      - {name: apply-kubernetes-object-Name, path: /tmp/outputs/Name/data}
+      - {name: apply-kubernetes-object-Object, path: /tmp/outputs/Object/data}
+    metadata:
+      annotations: {author: Alexey Volkov <alexey.volkov@ark-kun.com>, pipelines.kubeflow.org/component_spec: '{"implementation":
+          {"container": {"command": ["bash", "-exc", "object_path=$0\noutput_name_path=$1\noutput_kind_path=$2\noutput_object_path=$3\nmkdir
+          -p \"$(dirname \"$output_name_path\")\"\nmkdir -p \"$(dirname \"$output_kind_path\")\"\nmkdir
+          -p \"$(dirname \"$output_object_path\")\"\nkubectl apply -f \"$object_path\"
+          --output=json > \"$output_object_path\"\n\n< \"$output_object_path\" jq
+          ''.metadata.name'' --raw-output > \"$output_name_path\"\n< \"$output_object_path\"
+          jq ''.kind'' --raw-output > \"$output_kind_path\"\n", {"inputPath": "Object"},
+          {"outputPath": "Name"}, {"outputPath": "Kind"}, {"outputPath": "Object"}],
+          "image": "bitnami/kubectl:1.17.17"}}, "inputs": [{"name": "Object", "type":
+          "JsonObject"}], "metadata": {"annotations": {"author": "Alexey Volkov <alexey.volkov@ark-kun.com>"}},
+          "name": "Apply Kubernetes object", "outputs": [{"name": "Name", "type":
+          "String"}, {"name": "Kind", "type": "String"}, {"name": "Object", "type":
+          "JsonObject"}]}', pipelines.kubeflow.org/component_ref: '{"digest": "31e4123b45bebd4323a4ffd51fea3744046f9be8e77a2ccf06ba09f80359fcf5",
+          "url": "k8s-apply-component.yaml"}', pipelines.kubeflow.org/max_cache_staleness: P0D}
+      labels:
+        pipelines.kubeflow.org/kfp_sdk_version: 1.8.10
+        pipelines.kubeflow.org/pipeline-sdk-type: kfp
+        pipelines.kubeflow.org/enable_caching: "true"
+  - name: condition-2
+    inputs:
+      parameters:
+      - {name: get-kubernetes-object-Name}
+    dag:
+      tasks:
+      - name: graph-graph-component-spark-app-status-1
+        template: graph-graph-component-spark-app-status-1
+        arguments:
+          parameters:
+          - {name: apply-kubernetes-object-Name, value: '{{inputs.parameters.get-kubernetes-object-Name}}'}
+  - name: get-kubernetes-object
+    container:
+      args: []
+      command:
+      - bash
+      - -exc
+      - |
+        object_name=$0
+        object_type=$1
+        output_name_path=$2
+        output_state_path=$3
+        output_object_path=$4
+        mkdir -p "$(dirname "$output_name_path")"
+        mkdir -p "$(dirname "$output_state_path")"
+        mkdir -p "$(dirname "$output_object_path")"
+
+        kubectl get "$object_type" "$object_name" --output=json > "$output_object_path"
+
+        < "$output_object_path" jq '.metadata.name' --raw-output > "$output_name_path"
+        < "$output_object_path" jq '.status.applicationState.state' --raw-output > "$output_state_path"
+      - '{{inputs.parameters.apply-kubernetes-object-Name}}'
+      - sparkapplications
+      - /tmp/outputs/Name/data
+      - /tmp/outputs/ApplicationState/data
+      - /tmp/outputs/Object/data
+      image: bitnami/kubectl:1.17.17
+    inputs:
+      parameters:
+      - {name: apply-kubernetes-object-Name}
+    outputs:
+      parameters:
+      - name: get-kubernetes-object-ApplicationState
+        valueFrom: {path: /tmp/outputs/ApplicationState/data}
+      - name: get-kubernetes-object-Name
+        valueFrom: {path: /tmp/outputs/Name/data}
+      artifacts:
+      - {name: get-kubernetes-object-ApplicationState, path: /tmp/outputs/ApplicationState/data}
+      - {name: get-kubernetes-object-Name, path: /tmp/outputs/Name/data}
+      - {name: get-kubernetes-object-Object, path: /tmp/outputs/Object/data}
+    metadata:
+      annotations: {author: Alexey Volkov <alexey.volkov@ark-kun.com>, pipelines.kubeflow.org/component_spec: '{"implementation":
+          {"container": {"command": ["bash", "-exc", "object_name=$0\nobject_type=$1\noutput_name_path=$2\noutput_state_path=$3\noutput_object_path=$4\nmkdir
+          -p \"$(dirname \"$output_name_path\")\"\nmkdir -p \"$(dirname \"$output_state_path\")\"\nmkdir
+          -p \"$(dirname \"$output_object_path\")\"\n\nkubectl get \"$object_type\"
+          \"$object_name\" --output=json > \"$output_object_path\"\n\n< \"$output_object_path\"
+          jq ''.metadata.name'' --raw-output > \"$output_name_path\"\n< \"$output_object_path\"
+          jq ''.status.applicationState.state'' --raw-output > \"$output_state_path\"\n",
+          {"inputValue": "Name"}, {"inputValue": "Kind"}, {"outputPath": "Name"},
+          {"outputPath": "ApplicationState"}, {"outputPath": "Object"}], "image":
+          "bitnami/kubectl:1.17.17"}}, "inputs": [{"name": "Name", "type": "String"},
+          {"name": "Kind", "type": "String"}], "metadata": {"annotations": {"author":
+          "Alexey Volkov <alexey.volkov@ark-kun.com>"}}, "name": "Get Kubernetes object",
+          "outputs": [{"name": "Name", "type": "String"}, {"name": "ApplicationState",
+          "type": "String"}, {"name": "Object", "type": "JsonObject"}]}', pipelines.kubeflow.org/component_ref: '{"digest":
+          "fde6162e7783ca7b16b16ad04b667ab01a29c1fb133191941312cc4605114a2c", "url":
+          "k8s-get-component.yaml"}', pipelines.kubeflow.org/arguments.parameters: '{"Kind":
+          "sparkapplications", "Name": "{{inputs.parameters.apply-kubernetes-object-Name}}"}',
+        pipelines.kubeflow.org/max_cache_staleness: P0D}
+      labels:
+        pipelines.kubeflow.org/kfp_sdk_version: 1.8.10
+        pipelines.kubeflow.org/pipeline-sdk-type: kfp
+        pipelines.kubeflow.org/enable_caching: "true"
+  - name: graph-graph-component-spark-app-status-1
+    inputs:
+      parameters:
+      - {name: apply-kubernetes-object-Name}
+    dag:
+      tasks:
+      - name: condition-2
+        template: condition-2
+        when: '"{{tasks.get-kubernetes-object.outputs.parameters.get-kubernetes-object-ApplicationState}}"
+          != "COMPLETED"'
+        dependencies: [get-kubernetes-object]
+        arguments:
+          parameters:
+          - {name: get-kubernetes-object-Name, value: '{{tasks.get-kubernetes-object.outputs.parameters.get-kubernetes-object-Name}}'}
+      - name: get-kubernetes-object
+        template: get-kubernetes-object
+        arguments:
+          parameters:
+          - {name: apply-kubernetes-object-Name, value: '{{inputs.parameters.apply-kubernetes-object-Name}}'}
+  - name: print-message
+    container:
+      command: [echo, 'Job {{inputs.parameters.apply-kubernetes-object-Name}} is completed.']
+      image: alpine:3.6
+    inputs:
+      parameters:
+      - {name: apply-kubernetes-object-Name}
+    metadata:
+      labels:
+        pipelines.kubeflow.org/kfp_sdk_version: 1.8.10
+        pipelines.kubeflow.org/pipeline-sdk-type: kfp
+        pipelines.kubeflow.org/enable_caching: "true"
+      annotations: {pipelines.kubeflow.org/max_cache_staleness: P0D}
+  - name: spark-operator-job-pipeline
+    dag:
+      tasks:
+      - {name: apply-kubernetes-object, template: apply-kubernetes-object}
+      - name: graph-graph-component-spark-app-status-1
+        template: graph-graph-component-spark-app-status-1
+        dependencies: [apply-kubernetes-object]
+        arguments:
+          parameters:
+          - {name: apply-kubernetes-object-Name, value: '{{tasks.apply-kubernetes-object.outputs.parameters.apply-kubernetes-object-Name}}'}
+      - name: print-message
+        template: print-message
+        dependencies: [apply-kubernetes-object, graph-graph-component-spark-app-status-1]
+        arguments:
+          parameters:
+          - {name: apply-kubernetes-object-Name, value: '{{tasks.apply-kubernetes-object.outputs.parameters.apply-kubernetes-object-Name}}'}
+  arguments:
+    parameters: []
+  serviceAccountName: pipeline-runner
--- a/videos/README.md
+++ b/videos/README.md
@ -1,12 +0,0 @@
-# Kubeflow Videos
-
-This repository contains the show notes for videos that highlight Kubeflow
-capabilities. Here you can find the Terminal commands and links from your favorite
-videos, to save on manual transcription.
-
-## Installation
-
-* [From Zero to Kubeflow](from_zero_to_kubeflow/): Michelle Casbon gives a
-walkthrough of two different ways to install Kubeflow from scratch on GCP:
-via the web and command-line.
-
--- a/videos/from_zero_to_kubeflow/README.md
+++ b/videos/from_zero_to_kubeflow/README.md
@ -1,51 +0,0 @@
-# From Zero to Kubeflow
-
-Video link: [YouTube](https://www.youtube.com/watch?v=AF-WH967_s4)
-
-## Description
-
-Michelle Casbon gives a straightforward walkthrough of two different ways to
-install Kubeflow from scratch on GCP:
-
-* Web-based - [Click-to-deploy](https://deploy.kubeflow.cloud)
-* CLI - [kfctl](https://www.kubeflow.org/docs/gke/deploy/deploy-cli/)
-
-## Commands
-
-The following Terminal commands are used.
-
-### Download the `kfctl` binary
-
-```
-export KUBEFLOW_TAG=0.5.1
-wget -P /tmp https://github.com/kubeflow/kubeflow/releases/download/v${KUBEFLOW_TAG}/kfctl_v${KUBEFLOW_TAG}_darwin.tar.gz
-tar -xvf /tmp/kfctl_v${KUBEFLOW_TAG}_darwin.tar.gz -C ${HOME}/bin
-```
-
-### Generate the project directory
-
-```
-export PROJECT_ID=<project_id>
-export CLIENT_ID=<oauth_client_id>
-export CLIENT_SECRET=<oauth_client_secret>
-kfctl init kubeflow-cli --platform gcp --project ${PROJECT_ID}
-```
-
-### Generate all files
-
-```
-kfctl generate all --zone us-central1-c
-```
-
-### Create all platform and Kubernetes objects
-
-```
-kfctl apply all 
-```
-
-## Links
-
-* [codelabs.developers.google.com](https://codelabs.developers.google.com/)
-* [github.com/kubeflow/examples](https://github.com/kubeflow/examples)
-* [kubeflow.org](https://www.kubeflow.org/)
-