mirror of https://github.com/kubeflow/website.git
624 lines
28 KiB
Plaintext
624 lines
28 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Building Python Function-based Components\n",
|
||
"> Building your own lightweight pipelines components using the Pipelines SDK v2 and Python\n",
|
||
"\n",
|
||
"A Kubeflow Pipelines component is a self-contained set of code that performs one step in your\n",
|
||
"ML workflow. A pipeline component is composed of:\n",
|
||
"\n",
|
||
"* The component code, which implements the logic needed to perform a step in your ML workflow.\n",
|
||
"* A component specification, which defines the following:\n",
|
||
" \n",
|
||
" * The component's metadata, its name and description.\n",
|
||
" * The component's interface, the component's inputs and outputs.\n",
|
||
" * The component's implementation, the Docker container image\n",
|
||
" to run, how to pass inputs to your component code, and how\n",
|
||
" to get the component's outputs.\n",
|
||
"\n",
|
||
"Python function-based components make it easier to iterate quickly by letting you build your\n",
|
||
"component code as a Python function and generating the [component specification][component-spec] for you.\n",
|
||
"This document describes how to build Python function-based components and use them in your pipeline.\n",
|
||
"\n",
|
||
"**Note:** This guide demonstrates how to build components using the Pipelines SDK v2.\n",
|
||
"Currently, Kubeflow Pipelines v2 is in development. You can use this guide to start\n",
|
||
"building and running pipelines that are compatible with the Pipelines SDK v2.\n",
|
||
"\n",
|
||
"[Learn more about Pipelines SDK v2][kfpv2].\n",
|
||
"\n",
|
||
"[kfpv2]: https://www.kubeflow.org/docs/components/pipelines/sdk-v2/v2-compatibility/\n",
|
||
"\n",
|
||
"[component-spec]: https://www.kubeflow.org/docs/components/pipelines/reference/component-spec/\n",
|
||
"\n",
|
||
"## Before you begin\n",
|
||
"\n",
|
||
"1. Run the following command to install the Kubeflow Pipelines SDK v1.6.2 or higher. If you run this command in a Jupyter\n",
|
||
" notebook, restart the kernel after installing the SDK. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"!pip install --upgrade kfp"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"2. Import the `kfp`, `kfp.dsl`, and `kfp.v2.dsl` packages."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import kfp\n",
|
||
"import kfp.dsl as dsl\n",
|
||
"from kfp.v2.dsl import (\n",
|
||
" component,\n",
|
||
" Input,\n",
|
||
" Output,\n",
|
||
" Dataset,\n",
|
||
" Metrics,\n",
|
||
")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"3. Create an instance of the [`kfp.Client` class][kfp-client] following steps in [connecting to Kubeflow Pipelines using the SDK client][connect-api].\n",
|
||
"\n",
|
||
"[kfp-client]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/client.html#kfp.Client\n",
|
||
"[connect-api]: https://www.kubeflow.org/docs/components/pipelines/sdk/connect-api"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"client = kfp.Client() # change arguments accordingly"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"For more information about the Kubeflow Pipelines SDK, see the [SDK reference guide][sdk-ref].\n",
|
||
"\n",
|
||
"[sdk-ref]: https://kubeflow-pipelines.readthedocs.io/en/stable/index.html\n",
|
||
"\n",
|
||
"## Getting started with Python function-based components\n",
|
||
"\n",
|
||
"This section demonstrates how to get started building Python function-based components by walking\n",
|
||
"through the process of creating a simple component.\n",
|
||
"\n",
|
||
"1. Define your component's code as a [standalone python function](#standalone).\n",
|
||
" In this example, the function adds two floats and returns the sum of the two\n",
|
||
" arguments. Use the `kfp.v2.dsl.component` annotation to convert the function\n",
|
||
" into a factory function that you can use to create\n",
|
||
" [`kfp.dsl.ContainerOp`][container-op] class instances to use as steps in your pipeline.\n",
|
||
"\n",
|
||
"[container-op]: https://kubeflow-pipelines.readthedocs.io/en/stable/source/dsl.html#kfp.dsl.ContainerOp"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"@component\n",
|
||
"def add(a: float, b: float) -> float:\n",
|
||
" '''Calculates sum of two arguments'''\n",
|
||
" return a + b"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"2. Create and run your pipeline. [Learn more about creating and running pipelines][build-pipelines].\n",
|
||
"\n",
|
||
"[build-pipelines]: https://www.kubeflow.org/docs/components/pipelines/sdk-v2/build-pipeline/#compile-and-run-your-pipeline"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import kfp.dsl as dsl\n",
|
||
"@dsl.pipeline(\n",
|
||
" name='addition-pipeline',\n",
|
||
" description='An example pipeline that performs addition calculations.',\n",
|
||
" pipeline_root='gs://my-pipeline-root/example-pipeline'\n",
|
||
")\n",
|
||
"def add_pipeline(\n",
|
||
" a: float=1,\n",
|
||
" b: float=7,\n",
|
||
"):\n",
|
||
" # Passes a pipeline parameter and a constant value to the `add` factory\n",
|
||
" # function.\n",
|
||
" first_add_task = add(a, 4)\n",
|
||
" # Passes an output reference from `first_add_task` and a pipeline parameter\n",
|
||
" # to the `add` factory function. For operations with a single return\n",
|
||
" # value, the output reference can be accessed as `task.output` or\n",
|
||
" # `task.outputs['output_name']`.\n",
|
||
" second_add_task = add(first_add_task.output, b)\n",
|
||
"\n",
|
||
"# Specify pipeline argument values\n",
|
||
"arguments = {'a': 7, 'b': 8}"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"3. Compile and run your pipeline. [Learn more about compiling and running pipelines][build-pipelines].\n",
|
||
"\n",
|
||
"[build-pipelines]: https://www.kubeflow.org/docs/components/pipelines/sdk-v2/build-pipeline/#compile-and-run-your-pipeline"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Submit a pipeline run using the v2 compatible mode\n",
|
||
"client.create_run_from_pipeline_func(\n",
|
||
" add_pipeline,\n",
|
||
" arguments=arguments,\n",
|
||
" mode=kfp.dsl.PipelineExecutionMode.V2_COMPATIBLE)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Building Python function-based components\n",
|
||
"\n",
|
||
"Use the following instructions to build a Python function-based component:\n",
|
||
"\n",
|
||
"<a name=\"standalone\"></a>\n",
|
||
"\n",
|
||
"1. Define a standalone Python function. This function must meet the following\n",
|
||
" requirements:\n",
|
||
"\n",
|
||
" * It should not use any code declared outside of the function definition.\n",
|
||
" * Import statements must be added inside the function. [Learn more about\n",
|
||
" using and installing Python packages in your component](#packages).\n",
|
||
" * Helper functions must be defined inside this function.\n",
|
||
"\n",
|
||
"1. Kubeflow Pipelines uses your function's inputs and outputs to define your\n",
|
||
" component's interface. [Learn more about passing data between\n",
|
||
" components](#pass-data). Your function's inputs and outputs must meet the\n",
|
||
" following requirements:\n",
|
||
" \n",
|
||
" * All your function's arguments must have data type annotations.\n",
|
||
" * If the function accepts or returns large amounts of data or complex\n",
|
||
" data types, you must annotate that argument as an _artifact_.\n",
|
||
" [Learn more about using large amounts of data as inputs or outputs](#pass-by-file).\n",
|
||
" * If your component returns multiple outputs, you can annotate your\n",
|
||
" function with the [`typing.NamedTuple`][named-tuple-hint] type hint\n",
|
||
" and use the [`collections.namedtuple`][named-tuple] function return\n",
|
||
" your function's outputs as a new subclass of tuple. For an example, read\n",
|
||
" [Passing parameters by value](#pass-by-value).\n",
|
||
"\n",
|
||
"1. (Optional.) If your function has complex dependencies, choose or build a\n",
|
||
" container image for your Python function to run in. [Learn more about\n",
|
||
" selecting or building your component's container image](#containers).\n",
|
||
" \n",
|
||
"1. Add the [`kfp.v2.dsl.component`][vs-dsl-component] decorator to convert your function\n",
|
||
" into a pipeline component. You can specify the following arguments to the decorator:\n",
|
||
" \n",
|
||
" * **base_image**: (Optional.) Specify the Docker container image to run\n",
|
||
" this function in. [Learn more about selecting or building a container\n",
|
||
" image](#containers). \n",
|
||
" * **output_component_file**: (Optional.) Writes your component definition\n",
|
||
" to a file. You can use this file to share the component with colleagues\n",
|
||
" or reuse it in different pipelines.\n",
|
||
" * **packages_to_install**: (Optional.) A list of versioned Python\n",
|
||
" packages to install before running your function. \n",
|
||
"\n",
|
||
"<a name=\"packages\"></a>\n",
|
||
"### Using and installing Python packages\n",
|
||
"\n",
|
||
"When Kubeflow Pipelines runs your pipeline, each component runs within a Docker\n",
|
||
"container image on a Kubernetes Pod. To load the packages that your Python\n",
|
||
"function depends on, one of the following must be true:\n",
|
||
"\n",
|
||
"* The package must be installed on the container image.\n",
|
||
"* The package must be defined using the `packages_to_install` parameter of the\n",
|
||
" [`kfp.v2.dsl.component`][vs-dsl-component] decorator.\n",
|
||
"* Your function must install the package. For example, your function can use\n",
|
||
" the [`subprocess` module][subprocess] to run a command like `pip install`\n",
|
||
" that installs a package.\n",
|
||
"\n",
|
||
"<a name=\"containers\"></a>\n",
|
||
"### Selecting or building a container image\n",
|
||
"\n",
|
||
"Currently, if you do not specify a container image, your Python-function based\n",
|
||
"component uses the [`python:3.7` container image][python37]. If your function\n",
|
||
"has complex dependencies, you may benefit from using a container image that has\n",
|
||
"your dependencies preinstalled, or building a custom container image.\n",
|
||
"Preinstalling your dependencies reduces the amount of time that your component\n",
|
||
"runs in, since your component does not need to download and install packages\n",
|
||
"each time it runs.\n",
|
||
"\n",
|
||
"Many frameworks, such as [TensorFlow][tf-docker] and [PyTorch][pytorch-docker],\n",
|
||
"and cloud service providers offer prebuilt container images that have common\n",
|
||
"dependencies installed.\n",
|
||
"\n",
|
||
"If a prebuilt container is not available, you can build a custom container\n",
|
||
"image with your Python function's dependencies. For more information about\n",
|
||
"building a custom container, read the [Dockerfile reference guide in the Docker\n",
|
||
"documentation][dockerfile].\n",
|
||
"\n",
|
||
"If you build or select a container image, instead of using the default\n",
|
||
"container image, the container image must use Python 3.5 or later.\n",
|
||
"\n",
|
||
"<a name=\"pass-data\"></a>\n",
|
||
"### Understanding how data is passed between components\n",
|
||
"\n",
|
||
"When Kubeflow Pipelines runs your component, a container image is started in a\n",
|
||
"Kubernetes Pod and your component's inputs are passed in as command-line\n",
|
||
"arguments. When your component has finished, the component’s outputs are\n",
|
||
"returned as files.\n",
|
||
"\n",
|
||
"Python function-based components make it easier to build pipeline components by\n",
|
||
"building the component specification for you. Python function-based components\n",
|
||
"also handle the complexity of passing inputs into your component and passing\n",
|
||
"your function's outputs back to your pipeline. \n",
|
||
"\n",
|
||
"Component inputs and outputs are classified as either _parameters_ or _artifacts_,\n",
|
||
"depending on their data type.\n",
|
||
"\n",
|
||
"* Parameters typically represent settings that affect the behavior of your pipeline.\n",
|
||
" Parameters are passed into your component by value, and can be of any of\n",
|
||
" the following types: `int`, `double`, `float`, or `str`. Since parameters are\n",
|
||
" passed by value, the quantity of data passed in a parameter must be appropriate\n",
|
||
" to pass as a command-line argument.\n",
|
||
"* Artifacts represent large or complex data structures like datasets or models, and\n",
|
||
" are passed into components as a reference to a file path. \n",
|
||
" \n",
|
||
" If you have large amounts of string data to pass to your component, such as a JSON\n",
|
||
" file, annotate that input or output as a type of [`Artifact`][kfp-artifact], such\n",
|
||
" as [`Dataset`][kfp-artifact], to let Kubeflow Pipelines know to pass this to\n",
|
||
" your component as a file. \n",
|
||
"\n",
|
||
" In addition to the artifact’s data, you can also read and write the artifact's\n",
|
||
" metadata. For output artifacts, you can record metadata as key-value pairs, such\n",
|
||
" as the accuracy of a trained model. For input artifacts, you can read the\n",
|
||
" artifact's metadata — for example, you could use metadata to decide if a\n",
|
||
" model is accurate enough to deploy for predictions.\n",
|
||
"\n",
|
||
"All outputs are returned as files, using the the paths that Kubeflow Pipelines\n",
|
||
"provides.\n",
|
||
"\n",
|
||
"[kfp-artifact]: https://github.com/kubeflow/pipelines/blob/sdk/release-1.8/sdk/python/kfp/dsl/io_types.py\n",
|
||
"\n",
|
||
"The following sections describe how to pass parameters and artifacts to your function. \n",
|
||
"\n",
|
||
"<a name=\"pass-by-value\"></a>\n",
|
||
"#### Passing parameters by value\n",
|
||
"\n",
|
||
"Python function-based components make it easier to pass parameters between\n",
|
||
"components by value (such as numbers, booleans, and short strings), by letting\n",
|
||
"you define your component’s interface by annotating your Python function.\n",
|
||
"Parameters can be of any type that is appropriate to pass as a command-line argument, such as `int`, `float`, `double`, or `str`.\n",
|
||
"\n",
|
||
"If your component returns multiple outputs by value, annotate your function\n",
|
||
"with the [`typing.NamedTuple`][named-tuple-hint] type hint and use the\n",
|
||
"[`collections.namedtuple`][named-tuple] function to return your function's\n",
|
||
"outputs as a new subclass of `tuple`.\n",
|
||
"\n",
|
||
"The following example demonstrates how to return multiple outputs by value. \n",
|
||
"\n",
|
||
"[python37]: https://hub.docker.com/layers/python/library/python/3.7/images/sha256-7eef781ed825f3b95c99f03f4189a8e30e718726e8490651fa1b941c6c815ad1?context=explore\n",
|
||
"[create-component-from-func]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/components.html#kfp.components.create_component_from_func\n",
|
||
"[subprocess]: https://docs.python.org/3/library/subprocess.html\n",
|
||
"[tf-docker]: https://www.tensorflow.org/install/docker\n",
|
||
"[pytorch-docker]: https://hub.docker.com/r/pytorch/pytorch/tags\n",
|
||
"[dockerfile]: https://docs.docker.com/engine/reference/builder/\n",
|
||
"[named-tuple-hint]: https://docs.python.org/3/library/typing.html#typing.NamedTuple\n",
|
||
"[named-tuple]: https://docs.python.org/3/library/collections.html#collections.namedtuple\n",
|
||
"[kfp-visualize]: https://www.kubeflow.org/docs/components/pipelines/sdk/output-viewer/\n",
|
||
"[kfp-metrics]: https://www.kubeflow.org/docs/components/pipelines/sdk/pipelines-metrics/\n",
|
||
"[input-path]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/components.html#kfp.components.InputPath\n",
|
||
"[output-path]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/components.html#kfp.components.OutputPath\n",
|
||
"[vs-dsl-component]: https://github.com/kubeflow/pipelines/blob/sdk/release-1.8/sdk/python/kfp/v2/components/component_decorator.py"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from typing import NamedTuple\n",
|
||
"\n",
|
||
"@component\n",
|
||
"def multiple_return_values_example(a: float, b: float) -> NamedTuple(\n",
|
||
" 'ExampleOutputs',\n",
|
||
" [\n",
|
||
" ('sum', float),\n",
|
||
" ('product', float)\n",
|
||
" ]):\n",
|
||
" \"\"\"Example function that demonstrates how to return multiple values.\"\"\" \n",
|
||
" sum_value = a + b\n",
|
||
" product_value = a * b\n",
|
||
"\n",
|
||
" from collections import namedtuple\n",
|
||
" example_output = namedtuple('ExampleOutputs', ['sum', 'product'])\n",
|
||
" return example_output(sum_value, product_value)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"<a name=\"pass-by-file\"></a>\n",
|
||
"#### Passing artifacts by file\n",
|
||
"\n",
|
||
"Python function-based components make it easier to pass files to your\n",
|
||
"component, or to return files from your component, by letting you annotate\n",
|
||
"your Python function's arguments as _artifacts_.\n",
|
||
"Artifacts represent large or complex data structures like datasets or models, and are passed into components as a reference to a file path.\n",
|
||
"\n",
|
||
"In addition to the artifact’s data, you can also read and write the artifact's metadata. For output artifacts, you can record metadata as key-value pairs, such as the accuracy of a trained model. For input artifacts, you can read the artifact's metadata — for example, you could use metadata to decide if a model is accurate enough to deploy for predictions.\n",
|
||
"\n",
|
||
"If your artifact is an output file, Kubeflow Pipelines passes your function a\n",
|
||
"path or stream that you can use to store your output file. This path is a\n",
|
||
"location within your pipeline's `pipeline_root` that your component can write to.\n",
|
||
"\n",
|
||
"The following example accepts a file as an input and returns two files as outputs."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"@component\n",
|
||
"def split_text_lines(\n",
|
||
" source: Input[Dataset],\n",
|
||
" odd_lines: Output[Dataset],\n",
|
||
" even_lines_path: Output[Dataset]):\n",
|
||
" \"\"\"Splits a text file into two files, with even lines going to one file\n",
|
||
" and odd lines to the other.\"\"\"\n",
|
||
"\n",
|
||
" with open(source.path, 'r') as reader:\n",
|
||
" with open(odd_lines.path, 'w') as odd_writer:\n",
|
||
" with open(even_lines_path, 'w') as even_writer:\n",
|
||
" while True:\n",
|
||
" line = reader.readline()\n",
|
||
" if line == \"\":\n",
|
||
" break\n",
|
||
" odd_writer.write(line)\n",
|
||
" line = reader.readline()\n",
|
||
" if line == \"\":\n",
|
||
" break\n",
|
||
" even_writer.write(line)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"In this example, the inputs and outputs are defined as arguments of the\n",
|
||
"`split_text_lines` function. This lets Kubeflow Pipelines pass the path to the\n",
|
||
"source data file and the paths to the output data files into the function.\n",
|
||
"\n",
|
||
"To accept a file as an input parameter, use one of the following type annotations:\n",
|
||
"\n",
|
||
"* [`kfp.dsl.Input`][input]: Use this generic type hint to specify that your\n",
|
||
" function expects this argument to be an [`Artifact`][kfp-artifact]. Your\n",
|
||
" function can use the argument's `path` property to get the\n",
|
||
" artifact's path, and the `metadata` property to read its key/value metadata.\n",
|
||
"* [`kfp.components.InputBinaryFile`][input-binary]: Use this annotation to\n",
|
||
" specify that your function expects an argument to be an\n",
|
||
" [`io.BytesIO`][bytesio] instance that this function can read.\n",
|
||
"* [`kfp.components.InputPath`][input-path]: Use this annotation to specify that\n",
|
||
" your function expects an argument to be the path to the input file as\n",
|
||
" a `string`.\n",
|
||
"* [`kfp.components.InputTextFile`][input-text]: Use this annotation to specify\n",
|
||
" that your function expects an argument to be an\n",
|
||
" [`io.TextIOWrapper`][textiowrapper] instance that this function can read.\n",
|
||
"\n",
|
||
"To return a file as an output, use one of the following type annotations:\n",
|
||
"\n",
|
||
"* [`kfp.dsl.Output`][output]: Use this generic type hin to specify that your\n",
|
||
" function expects this argument to be an [`Artifact`][kfp-artifact]. Your\n",
|
||
" function can use the argument's `path` property to get the\n",
|
||
" artifact path to write to, and the `metadata` property to log key/value metadata.\n",
|
||
"* [`kfp.components.OutputBinaryFile`][output-binary]: Use this annotation to\n",
|
||
" specify that your function expects an argument to be an\n",
|
||
" [`io.BytesIO`][bytesio] instance that this function can write to.\n",
|
||
"* [`kfp.components.OutputPath`][output-path]: Use this annotation to specify that\n",
|
||
" your function expects an argument to be the path to store the output file at\n",
|
||
" as a `string`.\n",
|
||
"* [`kfp.components.OutputTextFile`][output-text]: Use this annotation to specify\n",
|
||
" that your function expects an argument to be an\n",
|
||
" [`io.TextIOWrapper`][textiowrapper] that this function can write to.\n",
|
||
"\n",
|
||
"[input-binary]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/components.html#kfp.components.InputBinaryFile\n",
|
||
"[input-path]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/components.html#kfp.components.InputPath\n",
|
||
"[input-text]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/components.html#kfp.components.InputTextFile\n",
|
||
"[output-binary]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/components.html#kfp.components.OutputBinaryFile\n",
|
||
"[output-path]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/components.html#kfp.components.OutputPath\n",
|
||
"[output-text]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/components.html#kfp.components.OutputTextFile\n",
|
||
"[bytesio]: https://docs.python.org/3/library/io.html#io.BytesIO\n",
|
||
"[textiowrapper]: https://docs.python.org/3/library/io.html#io.TextIOWrapper\n",
|
||
"\n",
|
||
"[input]: https://github.com/kubeflow/pipelines/blob/c5daa7532d18687b180badfca8d750c801805712/sdk/python/kfp/dsl/io_types.py\n",
|
||
"[output]: https://github.com/kubeflow/pipelines/blob/c5daa7532d18687b180badfca8d750c801805712/sdk/python/kfp/dsl/io_types.py\n",
|
||
"[kfp-artifact]: https://github.com/kubeflow/pipelines/blob/sdk/release-1.8/sdk/python/kfp/dsl/io_types.py\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Example Python function-based component\n",
|
||
"\n",
|
||
"This section demonstrates how to build a Python function-based component that uses imports,\n",
|
||
"helper functions, and produces multiple outputs.\n",
|
||
"\n",
|
||
"1. Define your function. This example function uses the `numpy` package to calculate the\n",
|
||
" quotient and remainder for a given dividend and divisor in a helper function. In\n",
|
||
" addition to the quotient and remainder, the function also returns two metrics.\n",
|
||
"\n",
|
||
" By adding the `@component` annotation, you convert your function into a factory function\n",
|
||
" that creates pipeline steps that execute this function. This example also specifies the\n",
|
||
" base container image to run you component in."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from typing import NamedTuple\n",
|
||
"\n",
|
||
"@component(base_image='tensorflow/tensorflow:1.11.0-py3')\n",
|
||
"def my_divmod(\n",
|
||
" dividend: float,\n",
|
||
" divisor: float,\n",
|
||
" metrics: Output[Metrics]) -> NamedTuple(\n",
|
||
" 'MyDivmodOutput',\n",
|
||
" [\n",
|
||
" ('quotient', float),\n",
|
||
" ('remainder', float),\n",
|
||
" ]):\n",
|
||
" '''Divides two numbers and calculate the quotient and remainder'''\n",
|
||
"\n",
|
||
" # Import the numpy package inside the component function\n",
|
||
" import numpy as np\n",
|
||
"\n",
|
||
" # Define a helper function\n",
|
||
" def divmod_helper(dividend, divisor):\n",
|
||
" return np.divmod(dividend, divisor)\n",
|
||
"\n",
|
||
" (quotient, remainder) = divmod_helper(dividend, divisor)\n",
|
||
"\n",
|
||
" # Export two metrics\n",
|
||
" metrics.log_metric('quotient', float(quotient))\n",
|
||
" metrics.log_metric('remainder', float(remainder))\n",
|
||
"\n",
|
||
" from collections import namedtuple\n",
|
||
" divmod_output = namedtuple('MyDivmodOutput',\n",
|
||
" ['quotient', 'remainder'])\n",
|
||
" return divmod_output(quotient, remainder)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"2. Define your pipeline. This example pipeline uses the `my_divmod` factory\n",
|
||
" function and the `add` factory function from an earlier example."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import kfp.dsl as dsl\n",
|
||
"@dsl.pipeline(\n",
|
||
" name='calculation-pipeline',\n",
|
||
" description='An example pipeline that performs arithmetic calculations.',\n",
|
||
" pipeline_root='gs://my-pipeline-root/example-pipeline'\n",
|
||
")\n",
|
||
"def calc_pipeline(\n",
|
||
" a: float=1,\n",
|
||
" b: float=7,\n",
|
||
" c: float=17,\n",
|
||
"):\n",
|
||
" # Passes a pipeline parameter and a constant value as operation arguments.\n",
|
||
" add_task = add(a, 4) # The add_op factory function returns\n",
|
||
" # a dsl.ContainerOp class instance. \n",
|
||
"\n",
|
||
" # Passes the output of the add_task and a pipeline parameter as operation\n",
|
||
" # arguments. For an operation with a single return value, the output\n",
|
||
" # reference is accessed using `task.output` or\n",
|
||
" # `task.outputs['output_name']`.\n",
|
||
" divmod_task = my_divmod(add_task.output, b)\n",
|
||
"\n",
|
||
" # For an operation with multiple return values, output references are\n",
|
||
" # accessed as `task.outputs['output_name']`.\n",
|
||
" result_task = add(divmod_task.outputs['quotient'], c)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"3. Compile and run your pipeline. [Learn more about compiling and running pipelines][build-pipelines].\n",
|
||
"\n",
|
||
"[build-pipelines]: https://www.kubeflow.org/docs/components/pipelines/sdk-v2/build-pipeline/#compile-and-run-your-pipeline"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Specify pipeline argument values\n",
|
||
"arguments = {'a': 7, 'b': 8}\n",
|
||
"\n",
|
||
"# Submit a pipeline run\n",
|
||
"client.create_run_from_pipeline_func(\n",
|
||
" calc_pipeline,\n",
|
||
" arguments=arguments,\n",
|
||
" mode=kfp.dsl.PipelineExecutionMode.V2_COMPATIBLE)"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"environment": {
|
||
"name": "tf2-2-3-gpu.2-3.m56",
|
||
"type": "gcloud",
|
||
"uri": "gcr.io/deeplearning-platform-release/tf2-2-3-gpu.2-3:m56"
|
||
},
|
||
"kernelspec": {
|
||
"display_name": "Python 3",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.7.10"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 4
|
||
}
|