mirror of https://github.com/kubeflow/website.git
KFP v2 pipeline authoring documentation refresh (#3472)
This commit is contained in:
parent
6954e56420
commit
0f0c0d0e3c
|
@ -1,5 +0,0 @@
|
|||
+++
|
||||
title = "Author a Pipeline"
|
||||
description = "Concepts and objects for authoring pipelines"
|
||||
weight = 4
|
||||
+++
|
|
@ -1,406 +0,0 @@
|
|||
+++
|
||||
title = "Component I/O"
|
||||
description = "Use parameter/artifact inputs and outputs"
|
||||
weight = 4
|
||||
+++
|
||||
|
||||
Components may accept inputs and create outputs. Inputs and outputs can be one of two types: parameters or artifacts. The following matrix describes possible component inputs and outputs:
|
||||
|
||||
| | Parameter | Artifact
|
||||
| ------ | ---------------- | --------------- |
|
||||
| Input | Input Parameter | Input Artifact |
|
||||
| Output | Output Parameter | Output Artifact |
|
||||
|
||||
Throughout the remainder of this section, we will use the following example dataset creation pipeline to understand the behavior and usage of input and output parameters and artifacts:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
from kfp.dsl import Input, Output, Dataset
|
||||
|
||||
|
||||
@dsl.container_component
|
||||
def create_dataset(
|
||||
initial_text: str,
|
||||
output_dataset: Output[Dataset],
|
||||
):
|
||||
"""Create a dataset containing the string `initial_text`."""
|
||||
return dsl.ContainerSpec(
|
||||
image='alpine',
|
||||
command=['sh', '-c', 'mkdir --parents $(dirname "$1") && echo "$0" > "$1"',],
|
||||
args=[initial_text, output_dataset.path])
|
||||
|
||||
|
||||
@dsl.component
|
||||
def augment_dataset(
|
||||
existing_dataset: Input[Dataset],
|
||||
resulting_dataset: Output[Dataset],
|
||||
text: str,
|
||||
num: int = 10,
|
||||
) -> int:
|
||||
"""Append `text` `num` times to an existing dataset, then write it as a new dataset."""
|
||||
additional_data = ' '.join(text for _ in range(num))
|
||||
|
||||
with open(existing_dataset.path, 'r') as f:
|
||||
existing_dataset_text = f.read()
|
||||
|
||||
resulting_dataset_text = existing_dataset_text + ' ' + additional_data
|
||||
|
||||
with open(resulting_dataset.path, 'w') as f:
|
||||
f.write(resulting_dataset_text)
|
||||
|
||||
return len(resulting_dataset_text)
|
||||
|
||||
|
||||
@dsl.pipeline()
|
||||
def my_pipeline(initial_text: str = 'initial dataset text'):
|
||||
create_task = create_dataset(initial_text=initial_text)
|
||||
augment_dataset(
|
||||
existing_dataset=create_task.outputs['output_dataset'],
|
||||
text='additional text')
|
||||
```
|
||||
|
||||
This pipeline uses a [custom container component][custom-container-component] `create_dataset` to construct an initial `Dataset` artifact containing `initial_text`. Then, the downstream [lightweight Python component][lightweight-python-component] `augment_dataset` appends `text` repeated `num` times to the dataset and saves it as a new dataset.
|
||||
|
||||
## Inputs
|
||||
Component inputs are specified by the component function's signature. This applies for all authoring approaches: [lightweight Python components][lightweight-python-component], [containerized Python components][containerized-python-component], and [custom container components][custom-container-component].
|
||||
|
||||
Ultimately, each authoring style creates a component definitied by an `image`, `command`, and `args`. When you use an input, it is represented as a placeholder in the `command` or `args` and is interpolated at component runtime.
|
||||
|
||||
There is one additional type of input, the struct `PipelineTaskFinalStatus`, which allows access to the metadata of one task from within another via a system-provided value at runtime. This input is a special case, as it is neither a typical parameter nor an artifact and it is only usable in `dsl.ExitHandler` exit tasks. Use of this input is covered in [Authoring: Pipelines][pipelines].
|
||||
|
||||
### Input parameters
|
||||
Input parameters are declared when you use a `str`, `int`, `float`, `bool`, `dict` or `list` type annotation. The data passed to parameters typed with `dict` or `list` may only container JSON-serializable Python primitives. `Union` types are not permitted.
|
||||
|
||||
In the example `create_dataset` component, `initial_text` is an input parameter. In `augment_dataset`, `text` and `num` are input parameters.
|
||||
|
||||
Input parameters may have default values. For example, `augment_dataset`'s `num` parameter has a default value of `10`.
|
||||
|
||||
Within a component function body, use input parameters just as you would in a normal Python function.
|
||||
|
||||
### Input artifacts
|
||||
Input artifacts are defined when you use an `Input[<ArtifactClass>]` annotation. For more information about artifacts, see [Component I/O][component-io].
|
||||
|
||||
At component runtime, input artifacts are copied to the local filesystem by the executing backend. This abstracts away the need for the component author to know where artifacts are stored in remote storage and allows component authors to only interact with the local filesystem when implementing a component that uses an artifact. All artifacts implement a `.path` method, which can be used to access the local path where the artifact file has been copied.
|
||||
|
||||
Let's see how this works in practice. In our example pipeline, `augment_dataset` specifies the input `existing_dataset: Input[Dataset]`. In the pipeline definition, we pass the output dataset from `create_dataset` to this parameter. When the `augument_dataset` component runs, the executing backend copies the `output_dataset` artifact file to the container filesystem and passes in an instance of `Dataset` as an argument to `existing_dataset`. The `Dataset` instance has a `.path` handle to its location in the container filesystem, allowing the component to read it:
|
||||
|
||||
```python
|
||||
with open(existing_dataset.path, 'r') as f:
|
||||
existing_dataset_text = f.read()
|
||||
```
|
||||
|
||||
## Outputs
|
||||
Like inputs, component outputs are also specified by the component function's signature. Depending on the component authoring approach and the type of output (parameter or artifact), outputs may be specified by the function return type annotation (e.g., `-> int`), the type annotation generic `Output[]`, or the type annotation class `OutputPath`. Uses for each are explained in the sections to follow.
|
||||
|
||||
For all output types and authoring styles, outputs from a component are persisted to a remote file store, such as [Minio][minio], [Google Cloud Storage][gcs], or [AWS S3][aws-s3], that way they outlast the ephemeral container that creates them and can be picked up for use by a downstream task.
|
||||
|
||||
### Output parameters
|
||||
Output parameters are declared in different ways depending on the authoring approach.
|
||||
|
||||
#### Python components
|
||||
For lightweight Python components and containerized Python components, output parameters are declared by the Python component function return type annotation (e.g., `-> int`). Like parameter inputs, return type annotations may be `str`, `int`, `float`, `bool`, `dict` or `list`.
|
||||
|
||||
In our example, `augment_dataset` has a one integer output.
|
||||
|
||||
You may also specify multiple output parameters by using these annotations within a `typing.NamedTuple` as follows:
|
||||
|
||||
```python
|
||||
from typing import NamedTuple
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.component
|
||||
def my_component() -> NamedTuple('Outputs', [('name', str), ('id', int)]):
|
||||
from typing import NamedTuple
|
||||
|
||||
output = NamedTuple('Outputs', [('name', str), ('id', int)])
|
||||
return output('my_dataset', 123)
|
||||
```
|
||||
|
||||
|
||||
|
||||
#### Custom container components
|
||||
For [custom container components][custom-container-component], output parameters are declared via an `OutputPath` annotation, which is a class that takes a type as its only argument (e.g., `OutputPath(int)`). At runtime, the backend will pass a filepath string to parameters with this annotation. This string indicating where in the container filesystem the component should write this parameter output. The backend will copy the file specified by this path to remote storage after component execution.
|
||||
|
||||
While the lightweight component executor handles writing the output parameters to the correct local filepath, custom container component authors must implement this in the container logic.
|
||||
|
||||
For example, the following very simple `create_text_output_parameter` component creates the output parameter string `"some text"` by using an `OutputPath(str)` annotation and writing the parameter to the path in the variable `output_string_path`:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
from kfp.dsl import OutputPath
|
||||
|
||||
@dsl.container_component
|
||||
def create_text_output_parameter(output_string_path: OutputPath(str)):
|
||||
return dsl.ContainerSpec(
|
||||
image='alpine',
|
||||
command=[
|
||||
'sh', '-c',
|
||||
'mkdir --parents $(dirname "$0") && echo "some text" > "$0"'
|
||||
],
|
||||
args=[output_string])
|
||||
```
|
||||
|
||||
### Output artifacts
|
||||
Output artifacts are declared when you use an `Output[<ArtifactClass>]` annotation. For more information about artifacts, see [Artifacts][artifacts].
|
||||
|
||||
Output artifacts are treated inversely to input artifacts at component runtime: instead of being _copied to the container_ from remote storage, they are _copied to remote storage_ from the `.path` location in the container's filesystem after the component executes. This abstracts away the need for the component author to know where artifacts are stored in remote storage and allows component authors to only interact with the local filesystem when implementing a component that creates an artifact. As with using an artifact input, component authors should write artifacts to `.path`:
|
||||
|
||||
```python
|
||||
with open(resulting_dataset.path, 'w') as f:
|
||||
f.write(resulting_dataset_text)
|
||||
```
|
||||
|
||||
## Pipeline I/O
|
||||
A pipeline may be used like a component by instantiating it as a task within another pipeline.
|
||||
|
||||
### Inputs
|
||||
All pipeline inputs must include type annotations. Valid input parameter annotations include `str`, `int`, `float`, `bool`, `dict`, `list`. Input parameters may also have defaults. The only valid input artifact annotation is `Input[<Artifact>]` (where `<Artifact>` is any KFP-compatible artifact class). Input artifacts may not have defaults.
|
||||
|
||||
The following simple pipeline has a `str` parameter `text` and an `int` parameter `number`. `number` has a default value of `10`.
|
||||
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.pipeline
|
||||
def my_pipeline(text: str, number: int = 10):
|
||||
...
|
||||
```
|
||||
|
||||
Ultimately, all inputs must be passed to an inner "primitive" component in order to perform computation on the input. See [Passing data between tasks: From a pipeline input](#from-a-pipeline-input) for information about how to pass data from a pipeline input to a component within the pipeline.
|
||||
|
||||
### Outputs
|
||||
Pipelines may also have output parameters. All outputs are specified by a normal Python function return type annotation indicated by the `->` token (e.g., `-> int`). Valid parameter type annotations include `str`, `int`, `float`, `bool`, `dict`, and `list`. Valid artifact type return annotations include `<Artifact>` (where `<Artifact>` is a KFP-compatible artifact class). You may specify multiple outputs using a `typing.NamedTuple` return annotation (see [Python Components](#python-components)) for more information on how to use named tuple return types.
|
||||
|
||||
Ultimately, all outputs must be created by an inner "primitive" component. Pipelines may return this output as its output.
|
||||
|
||||
For example, the following `double` pipeline returns the single `int` output of the `multiply` component:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.component
|
||||
def multiply(a: int, b: int) -> int:
|
||||
return a * b
|
||||
|
||||
@dsl.pipeline
|
||||
def double(number: int) -> int:
|
||||
return multiply(num1=a, num2=2).output
|
||||
```
|
||||
|
||||
In the following different example, the `training_workflow` pipeline returns a `Model` from the inner `train_model` component:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
from kfp.dsl import Dataset, Input, Model, Output
|
||||
|
||||
@dsl.component
|
||||
def train_model(dataset: Input[Dataset], model: Output[Model]):
|
||||
# do training
|
||||
trained_model = ...
|
||||
trained_model.save(model.path)
|
||||
|
||||
@dsl.pipeline
|
||||
def training_workflow() -> Model:
|
||||
get_dataset_op = get_dataset()
|
||||
train_model_op = train_model(dataset=get_dataset_op.outputs['dataset'])
|
||||
return train_model_op.outputs['model']
|
||||
```
|
||||
|
||||
|
||||
## Passing data between tasks
|
||||
|
||||
To instantiate a component as a task, you must pass to it any required inputs. Required inputs include all input parameters without default values and all input artifacts.
|
||||
|
||||
Output parameters (e.g., `OutputPath`) and output artifacts (e.g., `Output[<ArtifactClass>]`) should not be passed explicitly by the pipeline author; they will be passed at component runtime by the executing backend. This allows component internals to know where output parameters and artifacts should be written in the container filesystem in order to be copied to remote storage by the backend.
|
||||
|
||||
Task inputs may come from one of three different places: a static variable, a pipeline parameter, or an upstream task output. Let's walk through each, using the following `identity` component to help illustrate each approach:
|
||||
|
||||
```python
|
||||
@dsl.component
|
||||
def identity(x: int) -> int:
|
||||
return x
|
||||
```
|
||||
|
||||
### From a static variable
|
||||
|
||||
To provide static data as an input to a component, simply pass it as you would when using a normal function:
|
||||
|
||||
```python
|
||||
@dsl.pipeline()
|
||||
def my_pipeline():
|
||||
task = identity(x=10)
|
||||
```
|
||||
|
||||
Note: Input artifacts cannot be passed as static variables; they must always be passed from an upstream task or an [`importer` component][importer-component].
|
||||
|
||||
### From a pipeline input
|
||||
To pass data from a pipeline input to an inner task, simply pass the variable name as you normally would when calling one function within another:
|
||||
|
||||
```python
|
||||
@dsl.pipeline()
|
||||
def my_pipeline(pipeline_var_x: int):
|
||||
task = identity(x=pipeline_var_x)
|
||||
```
|
||||
|
||||
### From a task output
|
||||
Tasks provide references to their outputs in order to support passing data between tasks in a pipeline.
|
||||
|
||||
In nearly all cases, outputs are accessed via `.outputs['<parameter>']`, where `'<parameter>'` is the parameter name or named tuple field name from the task that produced the output which you wish to access. The `.outputs['<parameter>']` access pattern is used to access `Output[]` artifacts, `OutputPath` output parameters, and `NamedTuple` output parameters.
|
||||
|
||||
The only exception to this access pattern is when you wish to access a single return value from a lightweight Python component, which can be accessed through the task's `.output` attribute.
|
||||
|
||||
The following two subsections demonstrate this for parameters then artifacts.
|
||||
|
||||
#### Passing parameters from task to task
|
||||
|
||||
Let's introduce two more components for sake of demonstrating passing parameters between components:
|
||||
```python
|
||||
from typing import NamedTuple
|
||||
|
||||
@dsl.component
|
||||
def named_tuple(an_id: int) -> NamedTuple('Outputs', [('name', str), ('id', int)]):
|
||||
"""Lightweight Python component with a NamedTuple output."""
|
||||
from typing import NamedTuple
|
||||
outputs = NamedTuple('Outputs', [('name', str), ('id', int)])
|
||||
return outputs('my_dataset', an_id)
|
||||
|
||||
@dsl.container_component
|
||||
def identity_container(integer: int, output_int: OutputPath(int)):
|
||||
"""Custom container component that creates an integer output parameter."""
|
||||
return dsl.ContainerSpec(
|
||||
image='alpine',
|
||||
command=[
|
||||
'sh', '-c',
|
||||
'mkdir --parents $(dirname "$0") && echo "$1" > "$0"'
|
||||
],
|
||||
args=[output_int, integer])
|
||||
```
|
||||
|
||||
Using the new `named_tuple` and `identity_container` components with our original `identity` component, the following pipeline shows the full range of task-to-task data passing styles:
|
||||
|
||||
```python
|
||||
@dsl.pipeline()
|
||||
def my_pipeline(pipeline_parameter_id: int):
|
||||
named_tuple_task = named_tuple(an_id=pipeline_parameter_id)
|
||||
|
||||
# access a named tuple parameter output via .outputs['<parameter>']
|
||||
identity_container_task = identity_container(integer=named_tuple_task.outputs['id'])
|
||||
|
||||
# access an OutputPath parameter output via .outputs['<parameter>']
|
||||
identity_task_1 = identity(x=identity_container_task.outputs['output_int'])
|
||||
|
||||
# access a lightweight component return value via .output
|
||||
identity_task_2 = identity(x=identity_task_1.output)
|
||||
```
|
||||
|
||||
#### Passing artifacts from task to task
|
||||
Artifacts may only be annotated via `Input[<ArtifactClass>]`/`Output[<ArtifactClass>]` annotations and may only be accessed via the `.outputs['<parameter>']` syntax. This makes passing them between tasks somewhat simpler than for parameters.
|
||||
|
||||
The pipeline below demonstrates passing an artifact between tasks using an artifact producer and an artifact consumer:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
from kfp.dsl import Artifact, Input
|
||||
|
||||
@dsl.component
|
||||
def producer(output_artifact: Output[Artifact]):
|
||||
with open(output_artifact, 'w') as f:
|
||||
f.write('my artifact')
|
||||
|
||||
@dsl.component
|
||||
def consumer(input_artifact: Input[Artifact]):
|
||||
with open(input_artifact, 'r') as f:
|
||||
print(f.read())
|
||||
|
||||
@dsl.pipeline()
|
||||
def my_pipeline():
|
||||
producer_task = producer()
|
||||
consumer(input_artifact=producer_task.outputs['output_artifact'])
|
||||
```
|
||||
|
||||
## Special input values
|
||||
There are a few special input values that may be used to access pipeline or task metadata within a component. These values can passed to input parameters typed with `str`. For example, the following `print_op` component can obtain the pipeline job name at component runtime by using the `dsl.PIPELINE_JOB_NAME_PLACEHOLDER`:
|
||||
|
||||
```python
|
||||
from kfp dsl
|
||||
|
||||
@dsl.pipeline()
|
||||
def my_pipeline():
|
||||
print_op(text=dsl.PIPELINE_JOB_NAME_PLACEHOLDER)
|
||||
```
|
||||
|
||||
There several placeholders that may be used in this style, including:
|
||||
* `dsl.PIPELINE_JOB_NAME_PLACEHOLDER`
|
||||
* `dsl.PIPELINE_JOB_RESOURCE_NAME_PLACEHOLDER`
|
||||
* `dsl.PIPELINE_JOB_ID_PLACEHOLDER`
|
||||
* `dsl.PIPELINE_TASK_NAME_PLACEHOLDER`
|
||||
* `dsl.PIPELINE_TASK_ID_PLACEHOLDER`
|
||||
* `dsl.PIPELINE_JOB_CREATE_TIME_UTC_PLACEHOLDER`
|
||||
* `dsl.PIPELINE_JOB_SCHEDULE_TIME_UTC_PLACEHOLDER`
|
||||
|
||||
|
||||
## Placeholders
|
||||
In general, each of the three component authoring styles handle the injection of placeholders into your container `command` and `args`, allowing the component author to not have to worry about them. However, there are two types of placeholders you may wish to use directly: `ConcatPlaceholder` and `IfPresentPlaceholder`. These placeholders may only be used when authoring [custom container components][custom-container-component] via the `@dsl.container_component` decorator.
|
||||
|
||||
### ConcatPlaceholder
|
||||
|
||||
When you provide a container `command` or container `args` as a list of strings, each element in the list is concatenated using a space separator, then issued to the container. Concatenating an one input to another string without a space separator requires special handling provided by the `ConcatPlaceholder`.
|
||||
|
||||
`ConcatPlaceholder` takes one argument, `items` which may be a list of any combination of static strings, parameter inputs, or other instances of `ConcatPlaceholder` or `IfPresentPlaceholder`. At runtime, these strings will be concatenated together without a separator.
|
||||
|
||||
For example, you can use `ConcatPlaceholder` to concatenate a file path prefix, suffix, and extension:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.container_component
|
||||
def concatenator(prefix: str, suffix: str):
|
||||
return dsl.ContainerSpec(
|
||||
image='alpine',
|
||||
command=[
|
||||
'my_program.sh'
|
||||
],
|
||||
args=['--input', dsl.ConcatPlaceholder([prefix, suffix, '.txt'])]
|
||||
)
|
||||
```
|
||||
|
||||
### IfPresentPlaceholder
|
||||
`IfPresentPlaceholder` is used to conditionally provide command line arguments. The `IfPresentPlaceholder` takes three arguments: `input_name`, `then`, and optionally `else_`. This placeholder is easiest to understand through an example:
|
||||
|
||||
```python
|
||||
@dsl.container_component
|
||||
def hello_someone(optional_name: str = None):
|
||||
return dsl.ContainerSpec(
|
||||
image='python:3.7',
|
||||
command=[
|
||||
'echo', 'hello',
|
||||
dsl.IfPresentPlaceholder(
|
||||
input_name='optional_name', then=[optional_name])
|
||||
])
|
||||
```
|
||||
|
||||
If the `hello_someone` component is passed `'world'` as an argument for `optional_name`, the component will print `hello world`. If not, it will only print `hello`.
|
||||
|
||||
The third parameter `else_` can be used to provide a default value to fall back to if `input_name` is not provided.
|
||||
|
||||
Arguments to `then` and `else_` may be a list of any combination of static strings, parameter inputs, or other instances of `ConcatPlaceholder` or `IfPresentPlaceholder`.
|
||||
|
||||
## Component interfaces and type checking
|
||||
The KFP SDK compiler has the ability to use the type annotations you provide to type check your pipeline definition for mismatches between input and output types. The type checking logic is simple yet handy, particularly for complex pipelines. The type checking logic is:
|
||||
|
||||
* Parameter outputs may only be passed to parameter inputs. Artifact outputs may only be passed to artifact inputs.
|
||||
* A parameter output type (`int`, `str`, etc.) must match the annotation of the parameter input to which it is passed.
|
||||
* An artifact output type (`Dataset`, `Model`, etc.) must match the artifact input type to which it is passed _or_ either of the two artifact annotations must use the generic KFP `Artifact` class.
|
||||
|
||||
|
||||
[components]: /docs/components/pipelines/v2/author-a-pipeline/components.md
|
||||
[pipelines]: /docs/components/pipelines/v2/author-a-pipeline/pipelines
|
||||
[lightweight-python-component]: /docs/components/pipelines/v2/author-a-pipeline/components/#1-lighweight-python-function-based-components
|
||||
[containerized-python-component]: /docs/components/pipelines/v2/author-a-pipeline/components/#2-containerized-python-components
|
||||
[custom-container-component]: /docs/components/pipelines/v2/author-a-pipeline/components/#3-custom-container-components
|
||||
[minio]: https://min.io/
|
||||
[gcs]: https://cloud.google.com/storage
|
||||
[aws-s3]: https://aws.amazon.com/s3/
|
||||
[importer-component]: /docs/components/pipelines/v2/author-a-pipeline/components/#special-case-importer-components
|
||||
[component-io]: /docs/components/pipelines/v2/author-a-pipeline/component-io/
|
|
@ -1,282 +0,0 @@
|
|||
+++
|
||||
title = "Components"
|
||||
description = "Author KFP components"
|
||||
weight = 1
|
||||
+++
|
||||
|
||||
|
||||
## Summary
|
||||
A *component* is the basic unit of execution logic in KFP. A component is a named template for how to run a container using an image, a command, and arguments. Components may also have inputs and outputs, making a component a computational template, analogous to a function.
|
||||
|
||||
Component inputs are dynamic data used in either the container commands or arguments.
|
||||
|
||||
Component outputs may be machine learning artifacts or other JSON-serializable data.
|
||||
|
||||
## Author a component
|
||||
|
||||
At the lowest level of execution, all components define their execution logic via a container image, command, and arguments. An importer component is a special case and the only exception to this.
|
||||
|
||||
The KFP SDK exposes three ways of authoring components with these three properties.
|
||||
|
||||
### 1. Lighweight Python function-based components
|
||||
The most simple way to author a component is via a lightweight Python function-based component (also known as a lightweight component).
|
||||
|
||||
Lightweight components provides a fully Pythonic approach to creating a component that executes a single Python function within a container at runtime.
|
||||
|
||||
To create a lightweight component, you must:
|
||||
1. Define a standalone function.
|
||||
|
||||
A standalone Python function is a function that does not reference any symbols defined outside of its scope. This means the function must define all objects it uses within the scope of the function and must include all import statements within the function body.
|
||||
|
||||
2. Include type annotations for the function parameters and return values.
|
||||
|
||||
Type annotations indicate what the component inputs and outputs are and tells the KFP lightweight component executor how to serialize and deserialize the data as it is passed within a pipeline. This also (optionally) allows the KFP DSL compiler to type check your pipeline.
|
||||
|
||||
Valid parameter annotations include `str`, `int`, `float`, `bool`, `dict`, `list`, `OutputPath`, `InputPath`, `Input[<Artifact>]`, and `Output[<Artifact>]`.
|
||||
|
||||
Valid return annotations include `str`, `int`, `float`, `bool`, `dict`, and `list`. You may also specify multiple return values by using these annotations within a `typing.NamedTuple`.
|
||||
|
||||
For detailed discussion on type annotations and runtime behavior, see [Data Passing][data-passing].
|
||||
|
||||
3. Decorate your function with the `@kfp.dsl.component` decorator.
|
||||
This decorator transforms a Python function into a component that can be used within a pipeline.
|
||||
|
||||
For a comprehensive list of `@kfp.dsl.component` decorator arguments, see the DSL [reference documentation][dsl-reference-documentation].
|
||||
|
||||
|
||||
The following is an example of a lightweight component that trains a model on an existing input `Dataset` artifact for `num_epochs` epochs, then saves the output `Model` artifact.
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.component(
|
||||
base_image='python:3.7',
|
||||
packages_to_install=['tensorflow==2.9.1']
|
||||
)
|
||||
def train_model(
|
||||
dataset: Input[Dataset],
|
||||
model: Output[Model],
|
||||
num_epochs: int,
|
||||
):
|
||||
from tensorflow import keras
|
||||
|
||||
# load and process the Dataset artifact
|
||||
with open(dataset.path) as f:
|
||||
x, y = ...
|
||||
|
||||
my_model = keras.Sequential(
|
||||
[
|
||||
layers.Dense(4, activation='relu', name='layer1'),
|
||||
layers.Dense(3, activation='relu', name='layer2'),
|
||||
layers.Dense(2, activation='relu', name='layer2'),
|
||||
layers.Dense(1, name='layer3'),
|
||||
]
|
||||
)
|
||||
|
||||
my_model.compile(...)
|
||||
# train for num_epochs
|
||||
my_model.fit(x, y, epochs=num_epochs)
|
||||
|
||||
# save the Model artifact
|
||||
my_model.save(model.path)
|
||||
```
|
||||
|
||||
Notice the `base_image` argument to the `@kfp.dsl.component` decorator. Despite not having the word "container" in its name, lightweight components are still executed as a container at runtime. The `@kfp.dsl.component` decorator mereley provides a convient Pythonic interface to defining this container image, command, and arguments. [`python:3.7`][python-docker-image] is the default image, but can be changed to any image accessible to the executing backend, as long as the image has a Python interpreter available as `python3`. Packages in `packages_to_install` will be pip installed at container runtime.
|
||||
|
||||
**When to use?** Lightweight components should be used if your component implementation can be written as a standalone Python function and does not require an abundance of source code. This is the preferred authoring approach for quick demos and when authoring components in a noteebok.
|
||||
|
||||
For more involved components and for production usage, prefer containerized components and custom container components for their increased flexibility.
|
||||
|
||||
Note: This authoring approach replaces `kfp.components.create_component_from_func` in KFP v1.
|
||||
|
||||
### 2. Containerized Python components
|
||||
Containerized Python components extend lightweight components by allowing users to package and build their Python function-based components into containers.
|
||||
|
||||
Unlike lightweight components, containerized Python components allow authors to use additional source code outside of the component's Python function definition, including source code across multiple files. This is the preferred approach for authoring Python components that require more source code than can cleanly be included in the body of a standalone function or in cases where you wish to reuse the same source code in multiple components.
|
||||
|
||||
To create a containerized component, you must:
|
||||
|
||||
1) Define a component using the `@kfp.dsl.component` decorator.
|
||||
|
||||
A containerized Python component definition is very similar to a lightweight component definition, but with a few key differences:
|
||||
|
||||
a) The `@kfp.dsl.component` decorator is given a `target_image`. This is the name of containerized component image that will be created from the `base_image` in Step 2 below.
|
||||
|
||||
b) The `tensorflow` import is included outside of the `train_model` function. This is possible because the entire module will be executed at component runtime, not only the Python function as in a lightweight component.
|
||||
|
||||
c) The component uses functions defined in `my_helper_module` imported via a [relative import](https://docs.python.org/3/reference/import.html#package-relative-imports). This is possible because `my_helper_module.py` will be included in the container image created in Step 2 below. This is unlike a lighweight component, which only uses the source code included in the Python function definition. This helper code could have also been defined within the same module outside of the `train_model` function.
|
||||
|
||||
The following containerized component adapts the lightweight component in the previous section to a containerized component. Notice that most of the logic is extracted into helper functions in `my_helper_module`, permitting a cleaner, modular component function:
|
||||
|
||||
```python
|
||||
# my_component.py
|
||||
from kfp import dsl
|
||||
from tensorflow import keras
|
||||
from .my_helper_module import compile_and_train, get_model, split_dataset
|
||||
|
||||
@dsl.component(
|
||||
base_image='python:3.7',
|
||||
target_image='gcr.io/my-project/my-component:v1',
|
||||
packages_to_install=['tensorflow'],
|
||||
)
|
||||
def train_model(
|
||||
dataset: Input[Dataset],
|
||||
model: Output[Model],
|
||||
num_epochs: int,
|
||||
):
|
||||
# load and process the Dataset artifact
|
||||
with open(dataset.path) as f:
|
||||
x, y = split_dataset(f)
|
||||
|
||||
untrained_model = get_model()
|
||||
|
||||
# train for num_epochs
|
||||
trained_model = compile_and_train(untrained_model, epochs=num_epochs)
|
||||
|
||||
# save the Model artifact
|
||||
trained_model.save(model.path)
|
||||
```
|
||||
|
||||
The `my_component.py` module, the `my_helper_module.py` module, and any other source code files you wish to include in the container image should be grouped together in a directory. When you build the component in Step 2 below, this directory will by [COPY](https://docs.docker.com/engine/reference/builder/#copy)'d into the image:
|
||||
|
||||
```
|
||||
src/
|
||||
├── my_component.py
|
||||
└── my_helper_module.py
|
||||
└── another_module.py
|
||||
```
|
||||
|
||||
2) Build the component.
|
||||
|
||||
Once you've written a component and associated source code files and put them in a standalone directory, you can use the KFP CLI to build your component. This command to do this takes the form:
|
||||
|
||||
```shell
|
||||
kfp component build [OPTIONS] COMPONENTS_DIRECTORY [ARGS]...
|
||||
```
|
||||
When you run this command, KFP will build an image with all the source code found in `COMPONENTS_DIRECTORY`. KFP will find your component definition in `src/` and execute the component function you defined at component runtime. Include the `--push-image` flag to push your image to a remote registry from which the executing backend can pull your image at runtime. For example:
|
||||
|
||||
```shell
|
||||
kfp component build src/ --push-image
|
||||
```
|
||||
|
||||
For detailed information about all arguments/flags, see [CLI reference documentation](https://kubeflow-pipelines.readthedocs.io/en/master/source/cli.html#kfp-component-build).
|
||||
|
||||
**When to use?** Containerized Python components should be used any time your component is implemented as Python code, but cannot be written as a standalone Python function or you wish to organize source code outside of the component Python function definition.
|
||||
|
||||
### 3. Custom container components
|
||||
|
||||
Custom container components allow you to specify a container to execute as your component. The `dsl.ContainerSpec` object allows you to specify a container via an image, command, and args.
|
||||
|
||||
To define a custom container component, you must:
|
||||
1) Write your component’s code as a Python function that returns a `dsl.ContainerSpec` object to specify the container image and the commands to be run in the container and wrap the function into a `@container_component` decorator. The function should do nothing other than returning a `dsl.ContainerSpec` object, with the following parameters:
|
||||
* `image`: The image that the container will run. You can use `command` and `args` to control the entrypoint.
|
||||
* `command` (optional): The command to be executed.
|
||||
* `args` (optional): The arguments of the command. It’s recommended to place the input of the components in the args section instead of the command section.
|
||||
|
||||
The decorator will then compose a component using the `ContainerSpec`, which can be used the same as a Python component. (Learn more about [ContainerSpec][dsl-reference-documentation] in documentation.)
|
||||
|
||||
2) Specify your function's inputs and outputs in the function's signature [Learn more about passing data between components][data-passing]. Specifically for custom container components, your function's inputs and outputs must meet the following requirements:
|
||||
|
||||
* All your function's arguments must have data type annotations.
|
||||
* Different from a Python component, your return type annotation for the function
|
||||
must either be `dsl.ContainerSpec` or omitted.
|
||||
* If the function accepts or returns large amounts of data or complex
|
||||
data types, you must annotate that argument as an _artifact_. Note that in the function you defined, you can only access artifacts via its `.url`, `.path`, or `.metadata` attribute. Accessing any other attribute or the artifact variable by itself is not allowed.
|
||||
|
||||
Below is an example that authors a pipelines from two custom container components. Just as using with a Python component, you can access the outputs of a `container_component` for downstream tasks as demonstrated in the pipeline:
|
||||
```python
|
||||
from kfp.dsl import (
|
||||
container_component,
|
||||
ContainerSpec,
|
||||
Dataset,
|
||||
Input,
|
||||
pipeline,
|
||||
Output,
|
||||
)
|
||||
|
||||
@container_component
|
||||
def create_dataset(text: str, output_gcs: Output[Dataset]):
|
||||
return ContainerSpec(
|
||||
image='alpine',
|
||||
command=[
|
||||
'sh',
|
||||
'-c',
|
||||
'mkdir --parents $(dirname "$1") && echo "$0" > "$1"',
|
||||
],
|
||||
args=[text, output_gcs.path])
|
||||
|
||||
|
||||
@container_component
|
||||
def print_dataset(input_gcs: Input[Dataset]):
|
||||
return ContainerSpec(image='alpine', command=['cat'], args=[input_gcs.path])
|
||||
|
||||
@pipeline
|
||||
def two_step_pipeline_containerized(text: str):
|
||||
create_dataset_task = create_dataset(text)
|
||||
print_dataset_task = print_dataset(input_gcs=create_dataset_task.outputs['output_gcs'])
|
||||
```
|
||||
In the above example, the `create_dataset` component takes in a text and output it to a path as an artifact. Then, the `print_dataset` component retrieves the artifact output by the `create_dataset` component and prints it out.
|
||||
|
||||
## Special case: Importer components
|
||||
Unlike the previous three authoring approaches, an importer component not a general authoring style but a pre-baked component for a specific use case: loading a machine learning artifact from remote storage to machine learning metadata (MLMD).
|
||||
|
||||
**Before you continue:** Understand how KFP [Artifacts][data-passing] work.
|
||||
|
||||
Typically, the input artifact to a task is an output from an upstream task. In this case, the artifact can be easily accessed from the upstream task using `my_task.outputs['artifact_name']`. The artifact is also registered in MLMD when it is created by the upstream task.
|
||||
|
||||
If you wish to use an existing artifact that is not generated by a task in the current pipeline _or_ wish to use as an artifact an external file that was not generated by a pipeline at all, you can use an importer component to load an artifact from its URI.
|
||||
|
||||
You do not need to write an importer component; it can be imported from the `dsl` module and used directly:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.pipeline()
|
||||
def my_pipeline():
|
||||
task = get_date_string()
|
||||
importer_task = dsl.importer(
|
||||
artifact_uri='gs://ml-pipeline-playground/shakespeare1.txt',
|
||||
artifact_class=dsl.Dataset,
|
||||
reimport=True,
|
||||
metadata={'date': task.output})
|
||||
other_component(dataset=importer_task.output)
|
||||
```
|
||||
|
||||
In addition to the `artifact_uri`, you must provide an `artifact_class`, indicating the type of the artifact.
|
||||
|
||||
The `importer` component permits setting artifact metadata via the `metadata` argument. Metadata can be constructed with outputs from upstream tasks, as is done for the `'date'` value in the example pipeline.
|
||||
|
||||
You may also specify a boolean `reimport` argument. If `reimport` is `False`, KFP will use an existing MLMD artifact if it already exists from an earlier importer execution. If `reimport` is `True`, KFP will reimport the artifact as a new artifact, irrespective of whether it was previously imported.
|
||||
|
||||
|
||||
## Compile (save) a component
|
||||
Once you've written a component, you may wish to write the component definition to YAML for future use or submission for execution. This can be done via the KFP SDK DSL compiler:
|
||||
|
||||
```python
|
||||
from kfp import compiler
|
||||
|
||||
compiler.Compiler().compile(pipeline_func=addition_component, package_path='addition_component.yaml')
|
||||
```
|
||||
|
||||
## Load a component
|
||||
You can load saved components via the `kfp.components` module. This is helpful for integrating existing components stored as YAML into a larger pipeline definition:
|
||||
|
||||
```python
|
||||
from kfp import components
|
||||
|
||||
addition_component = components.load_component_from_file('addition_component.yaml')
|
||||
```
|
||||
|
||||
Once loaded, you can use the component in a pipeline just as you would a component defined in Python:
|
||||
|
||||
```python
|
||||
@dsl.pipeline()
|
||||
def my_pipeline():
|
||||
addition_task = addition_component(num1=1, num2=2)
|
||||
```
|
||||
|
||||
The `components` module also includes `.load_component_from_text` and `.load_component_from_url` for loading YAML from different sources.
|
||||
|
||||
[data-passing]: /docs/components/pipelines/v2/author-a-pipeline/component-io
|
||||
[dsl-reference-documentation]: https://kubeflow-pipelines.readthedocs.io/en/master/source/dsl.html
|
||||
[python-docker-image]: https://hub.docker.com/_/python
|
|
@ -1,194 +0,0 @@
|
|||
+++
|
||||
title = "Pipelines"
|
||||
description = "Create a pipeline"
|
||||
weight = 3
|
||||
+++
|
||||
|
||||
A *pipeline* is a description of a multi-task workflow, including how tasks relate to each other to form an computational graph. Pipelines may have inputs which can be passed to tasks within the pipeline.
|
||||
|
||||
## Author a pipeline
|
||||
Unlike components which have three authoring approaches, pipelines have one authoring approach: they are defined using Python pipeline functions decorated with `@dsl.pipeline`. For example:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.pipeline
|
||||
def my_pipeline(text: str):
|
||||
my_task = my_component(arg1=text)
|
||||
```
|
||||
|
||||
The `@dsl.pipeline` decorator takes three optional arguments.
|
||||
* `name` is the name of your pipeline. If not provided, the name defaults to a sanitized version of the pipeline function name.
|
||||
* `description` is a description of the pipeline.
|
||||
* `pipeline_root` is the remote storage root path from which your pipeline will write and read artifacts (e.g., `gs://my/path`).
|
||||
|
||||
A pipeline function is a function that may have inputs, instantiates components as tasks and uses them to form a computational graph, and only uses the KFP domain-specific language objects and syntax within the function scope. Let's walk through each of these parts one-by-one.
|
||||
|
||||
First, like a component, a pipeline function may have inputs and outputs. This allows your pipeline to serve as a computational template that can be executed with different input parameters to create a specified set of outputs. See [Component I/O: Pipeline I/O][component-io-pipeline-io] for how to use type annotations in a pipeline.
|
||||
|
||||
Second, a pipeline function instantiates components as tasks and uses them to form a computational graph. For information on how to instatiate components as tasks and pass data between them, see [Component I/O: Passing data between tasks][component-io]. For information on task dependencies, see [Tasks][tasks].
|
||||
|
||||
Third, a pipeline function only uses domain-specific language (DSL) objects and syntax within the function scope. Because the body of a Python pipeline function must ultimately be compiled to IR YAML, pipeline functions only support a very narrow set of Python language features, as specified by the KFP DSL. In addition to instantiation and data passing between tasks, the only three other features permitted are `dsl.Condition`, `dsl.ParallelFor` and `dsl.ExitHandler`. Use of these three features is covered in the next section. Use of classes, list comprehensions, lambda functions, and other arbitrary Python language features are not permitted within the scope of a Python pipeline function.
|
||||
|
||||
|
||||
## DSL control flow features
|
||||
A critical difference between components and pipelines is how control flow is authored and executed. Within a Python component, control flow is authored using arbitrary Python language features and the raw Python code is executed at component runtime. Within the scope of a pipeline, control flow acts on tasks, is authored using DSL features, and is executed by the KFP backend through the creation of Kubernetes Pods to execute those tasks. `dsl.Condition`, `dsl.ParallelFor` and `dsl.ExitHandler` can be used to orchestrate the completion of tasks within a pipeline function body. Each is implemented as a Python context manager.
|
||||
|
||||
### dsl.Condition
|
||||
|
||||
The [`dsl.Condition`][dsl-reference-docs] context manager allows conditional execution of tasks within its scope based on the output of an upstream task. The context manager takes two arguments: a required `condition` and an optional `name`. The `condition` is a comparative expression where at least one of the two operands is an output from an upstream task.
|
||||
|
||||
In the following pipeline, `conditional_task` only executes if `coin_flip_task` has the output `'heads'`.
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.pipeline
|
||||
def my_pipeline():
|
||||
coin_flip_task = flip_coin()
|
||||
with dsl.Condition(coin_flip_task.output == 'heads'):
|
||||
conditional_task = my_comp()
|
||||
```
|
||||
|
||||
### dsl.ParallelFor
|
||||
|
||||
The [`dsl.ParallelFor`][dsl-reference-docs] context manager allows parallelized execution of tasks over a static set of items. The context manager takes two arguments: a required `items`, an optional `parallelism`, and an optional `name`. `items` is the static set of items to loop over, while `parallelism` is the maximum number of concurrent iterations permitted while executing the `dsl.ParallelFor` group. `parallelism=0` indicates unconstrained parallelism.
|
||||
|
||||
In the following pipeline, `train_model` will train a model for 1, 5, 10, and 25 epochs, with no more than two training tasks running at one time:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.pipeline
|
||||
def my_pipeline():
|
||||
with dsl.ParallelFor(
|
||||
items=[1, 5, 10, 25],
|
||||
parallelism=2
|
||||
) as epochs:
|
||||
train_model(epochs=epochs)
|
||||
```
|
||||
|
||||
### dsl.Collected
|
||||
|
||||
Use [`dsl.Collected`](https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.Collected) with `dsl.ParallelFor` to gather outputs from a parallel loop of tasks:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.pipeline
|
||||
def my_pipeline():
|
||||
with dsl.ParallelFor(
|
||||
items=[1, 5, 10, 25],
|
||||
parallelism=2
|
||||
) as epochs:
|
||||
train_model_task = train_model(epochs=epochs)
|
||||
max_accuracy(models=dsl.Collected(train_model_task.outputs['model']))
|
||||
```
|
||||
|
||||
Downstream tasks might consume `dsl.Collected` outputs via an input annotated with a `List` of parameters or a `List` of artifacts. For example, `max_accuracy` in the preceding example has the input `models` with type `Output[List[Model]]`, as shown by the following component definition:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
from kfp.dsl import Model, Output
|
||||
|
||||
@dsl.component
|
||||
def select_best(models: Output[List[Model]]) -> float:
|
||||
return max(score_model(model) for model in models)
|
||||
```
|
||||
|
||||
You can use `dsl.Collected` to collect outputs from nested loops in a *nested list* of parameters. For example, output parameters from two nested `dsl.ParallelFor` groups are collected in a multilevel nested list of parameters, where each nested list contains the output parameters from one of the `dsl.ParallelFor` groups. The number of nested levels is based on the depth of the `ParallelFor` group.
|
||||
|
||||
By comparison, *artifacts* from nested loops are collected in a *flat* list.
|
||||
|
||||
You can also return a `dsl.Collected` from a pipeline. Use a `List` of parameters or a `List` of artifacts in the return annotation, as shown in the following example:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
from kfp.dsl import Model
|
||||
|
||||
@dsl.pipeline
|
||||
def my_pipeline() -> List[Model]:
|
||||
with dsl.ParallelFor(
|
||||
items=[1, 5, 10, 25],
|
||||
parallelism=2
|
||||
) as epochs:
|
||||
train_model_task = train_model(epochs=epochs)
|
||||
return dsl.Collected(train_model_task.outputs['model'])
|
||||
```
|
||||
|
||||
|
||||
### dsl.ExitHandler
|
||||
The [`dsl.ExitHandler`][dsl-reference-docs] context manager allows pipeline authors to specify an "exit handler" task which will run after the tasks within its scope finish execution or one of them fails. This is analogous to using `try:` followed by `finally:` in normal Python. The context manager takes two arguments: a required `exit_task` and an optional `name`. The `exit_task` is the "exit handler" task and must be instantiated before the `dsl.ExitHandler` context manager is entered.
|
||||
|
||||
In the following pipeline, `clean_up_task` will execute after either both `create_dataset` and `train_and_save_models` finish or one of them fails:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.pipeline
|
||||
def my_pipeline():
|
||||
clean_up_task = clean_up_resources()
|
||||
with dsl.ExitHandler(exit_task=clean_up_task):
|
||||
dataset_task = create_datasets()
|
||||
train_task = train_and_save_models(dataset=dataset_task.output)
|
||||
```
|
||||
|
||||
The task you use as an exit task may use a special backend-provided input that provides access to pipeline and task status metadata, including pipeline failure or success status. You can use this special input by annotating your exit task with the `dsl.PipelineTaskFinalStatus` annotation. You should not provide any input to this annotation when you instantiate your exit task.
|
||||
|
||||
The following pipeline uses `PipelineTaskFinalStatus` to obtain information about the pipeline and task failure, even after `fail_op` fails:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
from kfp.dsl import PipelineTaskFinalStatus
|
||||
|
||||
|
||||
@dsl.component
|
||||
def exit_op(user_input: str, status: PipelineTaskFinalStatus):
|
||||
"""Prints pipeline run status."""
|
||||
print(user_input)
|
||||
print('Pipeline status: ', status.state)
|
||||
print('Job resource name: ', status.pipeline_job_resource_name)
|
||||
print('Pipeline task name: ', status.pipeline_task_name)
|
||||
print('Error code: ', status.error_code)
|
||||
print('Error message: ', status.error_message)
|
||||
|
||||
@dsl.component
|
||||
def fail_op():
|
||||
import sys
|
||||
sys.exit(1)
|
||||
|
||||
@dsl.pipeline
|
||||
def my_pipeline():
|
||||
print_op()
|
||||
print_status_task = exit_op(user_input='Task execution status:')
|
||||
with dsl.ExitHandler(exit_task=print_status_task):
|
||||
fail_op()
|
||||
```
|
||||
#### Ignore upstream failure
|
||||
The [`.ignore_upstream_failure()`][ignore-upstream-failure] configuration from [`pipeline_task`][tasks-configurations] is useful if the caller task wants to ignore the failures of upstream tasks. If the pipeline task fails, this method converts the caller task into an exit handler and continues to collect outputs from the upstream tasks.
|
||||
|
||||
If the task has no upstream tasks, either through data exchange or an explicit dependency by using `.after()`, this method has no effect.
|
||||
|
||||
In the following pipeline definition, `clean_up_task` is executed after `fail_op`, regardless of whether the task fails or not:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.pipeline()
|
||||
def my_pipeline(text: str = 'message'):
|
||||
task = fail_op(message=text)
|
||||
clean_up_task = print_op(
|
||||
message=task.output).ignore_upstream_failure()
|
||||
```
|
||||
|
||||
Note that the component used for the caller task requires a default value for each input read from an upstream task. The default value is applied if the upstream task fails to produce the outputs that are passed as inputs to the caller task. Specifying default values ensures that the caller task always succeeds, regardless of the status of the upstream task.
|
||||
|
||||
|
||||
[component-io]: /docs/components/pipelines/v2/author-a-pipeline/component-io#passing-data-between-tasks
|
||||
[components]: /docs/components/pipelines/v2/author-a-pipeline/components
|
||||
[tasks]: /docs/components/pipelines/v2/author-a-pipeline/tasks
|
||||
[component-io-pipeline-io]: /docs/components/pipelines/v2/author-a-pipeline/component-io/#pipeline-io
|
||||
<!-- TODO: make this reference more precise throughout -->
|
||||
[dsl-reference-docs]: https://kubeflow-pipelines.readthedocs.io/en/master/source/dsl.html
|
||||
[ignore-upstream-failure]: https://kubeflow-pipelines.readthedocs.io/en/master/source/dsl.html#kfp.dsl.PipelineTask.ignore_upstream_failure
|
||||
[tasks-configurations]: https://www.kubeflow.org/docs/components/pipelines/v2/author-a-pipeline/tasks/#task-level-configurations
|
|
@ -1,105 +0,0 @@
|
|||
+++
|
||||
title = "Tasks"
|
||||
description = "Understand and use KFP tasks"
|
||||
weight = 2
|
||||
+++
|
||||
|
||||
## Summary
|
||||
A *task* is an execution of a [component][components] with a set of inputs. It can be thought of as an instantiation of a component template. A pipeline is composed of individual tasks that may or may not pass data betwen one another.
|
||||
|
||||
One component can be used to instantiate multiple tasks within a single pipeline. Tasks can also be created and executed dynamically using pipeline control flow features such as loops, conditions, and exit handlers.
|
||||
|
||||
Because tasks represent a runtime execution of a component, you may set additional runtime configuration on a task, such as environment variables, hardware resource requirements, and various other task-level configurations.
|
||||
|
||||
## Task dependencies
|
||||
### Independent tasks
|
||||
Tasks may or may not depend on one another. Two tasks are independent of one another if no outputs of one are inputs to the other and neither task calls `.after()` on the other. When two tasks are independent, they execute concurrently at pipeline runtime. In the following example, `my_task1` and `my_task2` have no dependency and will execute at the same time.
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.pipeline()
|
||||
def my_pipeline():
|
||||
my_task1 = concat_comp(prefix='hello, ', text='world')
|
||||
my_task2 = concat_comp(prefix='hi, ', text='universe')
|
||||
```
|
||||
|
||||
### Implicitly dependent tasks
|
||||
When the output of one task is the input to another, an implicit dependency is created between the two tasks. When this is the case, the upstream task will execute first so that its output can be passed to the downstream task. In the following example, the argument to the `prefix` parameter on `my_tasks2` is the output from `my_task1`. This means `my_task2` implicitly depends and will execute after `my_task1`.
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.pipeline()
|
||||
def my_pipeline():
|
||||
my_task1 = concat_comp(prefix='hello, ', text='world')
|
||||
my_task2 = concat_comp(prefix=my_task1.output, text='!')
|
||||
```
|
||||
|
||||
For more information on passing inputs and outputs between components, see [Component I/O: Passing data between tasks][component-io-passing-data-between-tasks].
|
||||
|
||||
|
||||
### Explicitly dependent tasks
|
||||
Sometimes you want to order execution of two tasks but not pass data between the tasks. When this is the case, you can call the intended second task's `.after()` on the intended first task create an explicit dependency. In the following example, `my_task2` explicitly depends on `my_task1`, so `my_task1` will execute before `my_task2`:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.pipeline()
|
||||
def my_pipeline():
|
||||
my_task1 = concat_comp(prefix='hello, ', text='world')
|
||||
my_task2 = concat_comp(prefix='hi, ', text='universe').after(my_task1)
|
||||
```
|
||||
|
||||
|
||||
## Task-level configurations
|
||||
The KFP SDK exposes several platform-agnostic task-level configurations for use during authoring. Platform-agnostic configurations are those that are expected to exhibit similar execution behavior on all KFP-conformant backends, such as the open source KFP backend or the Vertex Pipelines backend. The remainder of this section refers only to platform-agnostic task-level configurations.
|
||||
|
||||
All task-level configurations are set using a method on the task. Take the following environment variable example:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.component
|
||||
def print_env_var():
|
||||
import os
|
||||
print(os.environ.get('MY_ENV_VAR'))
|
||||
|
||||
@dsl.pipeline()
|
||||
def my_pipeline():
|
||||
task = print_env_var()
|
||||
task.set_env_variable('MY_ENV_VAR', 'hello')
|
||||
```
|
||||
|
||||
When executed, the `print_env_var` component should print `'hello'`.
|
||||
|
||||
Task-level configuration methods can also be chained:
|
||||
|
||||
```python
|
||||
print_env_var().set_env_variable('MY_ENV_VAR', 'hello').set_env_variable('OTHER_VAR', 'world')
|
||||
```
|
||||
|
||||
The KFP SDK provides the following task methods for setting task-level configurations:
|
||||
* `.add_accelerator_type`
|
||||
* `.set_accelerator_limit`
|
||||
* `.set_cpu_limit`
|
||||
* `.set_memory_limit`
|
||||
* `.set_env_variable`
|
||||
* `.set_caching_options`
|
||||
* `.set_display_name`
|
||||
* `.set_retry`
|
||||
* `.ignore_upstream_failure`
|
||||
|
||||
For detailed information on how to use the above methods, see the [`kfp.dsl.PipelineTask` reference documentation][dsl-reference-docs].
|
||||
|
||||
### Caching
|
||||
KFP provides task-level output caching to reduce redundant computation by skipping the execution of tasks that were completed in a previous pipeline run. Caching is enabled by default, but can be disabled by calling `.set_caching_options(False)` on a task.
|
||||
|
||||
The cache key is determined by the task's component specification (image, command, arguments, input/output interface) and the task's provided inputs (the name and URI of artifacts and the name and value of parameters). Cache hit status is not determined until task runtime since input values may be unknown until pipeline runtime.
|
||||
|
||||
When a task's cache hits and its execution is skipped, it will be displayed on the KFP UI:
|
||||
<!-- TODO: add photo of cache on UI -->
|
||||
|
||||
[components]: /docs/components/pipelines/v2/author-a-pipeline/components
|
||||
[dsl-reference-docs]: https://kubeflow-pipelines.readthedocs.io/en/master/source/dsl.html
|
||||
[component-io-passing-data-between-tasks]: /docs/components/pipelines/v2/author-a-pipeline/component-io-passing-data-between-tasks/#passing-data-between-tasks
|
|
@ -1,7 +1,7 @@
|
|||
+++
|
||||
title = "Command Line Interface"
|
||||
description = "Interact with KFP via the CLI"
|
||||
weight = 8
|
||||
weight = 9
|
||||
+++
|
||||
|
||||
<!-- TODO: Improve or standardize rendering of variables and placeholders -->
|
||||
|
@ -46,11 +46,16 @@ kfp run --help
|
|||
|
||||
You can use the KFP CLI to do the following:
|
||||
|
||||
* [Interact with KFP resources](#interact-with-kfp-resources)
|
||||
|
||||
* [Compile pipelines](#compile-pipelines)
|
||||
|
||||
* [Build containerized Python components](#build-containerized-python-components)
|
||||
- [Usage](#usage)
|
||||
- [Check availability of KFP CLI](#check-availability-of-kfp-cli)
|
||||
- [General syntax](#general-syntax)
|
||||
- [Get help for a command](#get-help-for-a-command)
|
||||
- [Main functons of the KFP CLI](#main-functons-of-the-kfp-cli)
|
||||
- [Interact with KFP resources](#interact-with-kfp-resources)
|
||||
- [Compile pipelines](#compile-pipelines)
|
||||
- [Build containerized Python components](#build-containerized-python-components)
|
||||
- [Before you begin](#before-you-begin)
|
||||
- [Build the component](#build-the-component)
|
||||
|
||||
### Interact with KFP resources
|
||||
|
||||
|
@ -124,7 +129,7 @@ You can use the `kfp dsl compile` command to compile pipelines or components def
|
|||
### Build containerized Python components
|
||||
<!-- TODO: Revisit the links after the refactoring is completed -->
|
||||
|
||||
You can author [containerized Python components][containerized-python-components] in the KFP SDK. This lets you use handle more source code with better code organization than the simpler [lightweight Python component][lightweight-python-component] authoring experience.
|
||||
You can author [Containerized Python Components][containerized-python-components] in the KFP SDK. This lets you use handle more source code with better code organization than the simpler [Lightweight Python Component][lightweight-python-component] authoring experience.
|
||||
|
||||
<!-- TODO(GA): remove --pre -->
|
||||
|
||||
|
@ -150,10 +155,9 @@ For example:
|
|||
kfp component build src/ --component-filepattern my_component --push-image
|
||||
```
|
||||
|
||||
For more information about the arguments and flags supported by the `kfp component build` command, see [build](https://kubeflow-pipelines.readthedocs.io/en/master/source/cli.html#kfp-component-build) in the [KFP SDK API reference][kfp-sdk-api-ref]. For more information about creating containerized Python components, see [Authoring Python Containerized Components](/docs/components/pipelines/v2/author-a-pipeline/components/#2-containerized-python-components).
|
||||
For more information about the arguments and flags supported by the `kfp component build` command, see [build](https://kubeflow-pipelines.readthedocs.io/en/master/source/cli.html#kfp-component-build) in the [KFP SDK API reference][kfp-sdk-api-ref]. For more information about creating containerized Python components, see [Authoring Python Containerized Components][containerized-python-components].
|
||||
|
||||
[cli-reference-docs]: https://kubeflow-pipelines.readthedocs.io/en/master/source/cli.html
|
||||
[kfp-sdk-api-ref]: https://kubeflow-pipelines.readthedocs.io/en/master/index.html
|
||||
[author-a-pipeline]: /docs/components/pipelines/v2/author-a-pipeline
|
||||
[lightweight-python-component]: /docs/components/pipelines/v2/author-a-pipeline/components/#1-lighweight-python-function-based-components
|
||||
[containerized-python-components]: /docs/components/pipelines/v2/author-a-pipeline/components/#2-containerized-python-components
|
||||
[lightweight-python-component]: /docs/components/pipelines/v2/components/lightweight-python-components
|
||||
[containerized-python-components]: /docs/components/pipelines/v2/components/containerized-python-components
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
+++
|
||||
title = "Community and Support"
|
||||
description = "Where to get help, contribute, and learn more"
|
||||
weight = 11
|
||||
weight = 10
|
||||
+++
|
||||
|
||||
## Help
|
||||
|
|
|
@ -1,66 +1,80 @@
|
|||
+++
|
||||
title = "Compile a pipeline"
|
||||
description = "Compile a pipeline definition to YAML"
|
||||
weight = 5
|
||||
description = "Compile pipelines and components to YAML"
|
||||
weight = 7
|
||||
+++
|
||||
|
||||
You can compile your pipeline or component to intermediate representation (IR) YAML. The IR YAML definition preserves a static representation of the pipeline or component. You can submit the YAML definition to the KFP backend for execution, or deserialize it using the KFP SDK for integration into another pipeline.
|
||||
([View an example on GitHub][compiled-output-example]).
|
||||
To submit a pipeline for execution, you must compile it to YAML with the KFP SDK compiler:
|
||||
|
||||
**Note:** Pipelines as well as components are authored in Python. A pipeline is a template representing a multistep workflow, whereas a component is a template representing a single step workflow.
|
||||
```python
|
||||
from kfp import dsl
|
||||
from kfp import compiler
|
||||
|
||||
## Compile a pipeline
|
||||
@dsl.component
|
||||
def comp(message: str) -> str:
|
||||
print(message)
|
||||
return message
|
||||
|
||||
You can compile a pipeline or a template into IR YAML using the `Compiler.compile` method. To do this, follow these steps:
|
||||
|
||||
1. Define a simple pipeline:
|
||||
@dsl.pipeline
|
||||
def my_pipeline(message: str) -> str:
|
||||
"""My ML pipeline."""
|
||||
return comp(message=message).output
|
||||
|
||||
```python
|
||||
from kfp import compiler
|
||||
from kfp import dsl
|
||||
compiler.Compiler().compile(my_pipeline, package_path='pipeline.yaml')
|
||||
```
|
||||
|
||||
@dsl.component
|
||||
def addition_component(num1: int, num2: int) -> int:
|
||||
return num1 + num2
|
||||
In this example, the compiler creates a file called `pipeline.yaml`, which contains a hermetic representation of your pipeline. The output is called intermediate representation (IR) YAML. You can view an example of IR YAML on [GitHub][compiled-output-example]. The contents of the file is the serialized [`PipelineSpec`][pipeline-spec] protocol buffer message and is not intended to be human-readable.
|
||||
|
||||
@dsl.pipeline(name='addition-pipeline')
|
||||
def my_pipeline(a: int, b: int, c: int = 10):
|
||||
add_task_1 = addition_component(num1=a, num2=b)
|
||||
add_task_2 = addition_component(num1=add_task_1.output, num2=c)
|
||||
```
|
||||
You can find human-readable information about the pipeline in the comments at the top of the compiled YAML:
|
||||
|
||||
1. Compile the pipeline to the file `my_pipeline.yaml`:
|
||||
```yaml
|
||||
# PIPELINE DEFINITION
|
||||
# Name: my-pipeline
|
||||
# Description: My ML pipeline.
|
||||
# Inputs:
|
||||
# message: str
|
||||
# Outputs:
|
||||
# Output: str
|
||||
...
|
||||
```
|
||||
|
||||
```python
|
||||
cmplr = compiler.Compiler()
|
||||
cmplr.compile(my_pipeline, package_path='my_pipeline.yaml')
|
||||
```
|
||||
You can also compile components to IR YAML:
|
||||
|
||||
1. Compile the component `addition_component` to the file `addition_component.yaml`:
|
||||
```python
|
||||
compiler.Compiler().compile(comp, package_path='component.yaml')
|
||||
```
|
||||
|
||||
```python
|
||||
cmplr.compile(addition_component, package_path='addition_component.yaml')
|
||||
```
|
||||
|
||||
<!-- TODO: Replace <br> with MD line breaks -->
|
||||
|
||||
The `Compiler.compile` method accepts the following parameters:
|
||||
## Compiler arguments
|
||||
The [`Compiler.compile`][compiler-compile] method accepts the following arguments:
|
||||
|
||||
| Name | Type | Description |
|
||||
|------|------|-------------|
|
||||
| `pipeline_func` | `function` | _Required_<br/>Pipeline function constructed with the @dsl.pipeline or component constructed with the @dsl.component decorator.
|
||||
| `pipeline_func` | `function` | _Required_<br/>Pipeline function constructed with the `@dsl.pipeline` or component constructed with the @dsl.component decorator.
|
||||
| `package_path` | `string` | _Required_<br/>Output YAML file path. For example, `~/my_pipeline.yaml` or `~/my_component.yaml`.
|
||||
| `pipeline_name` | `string` | _Optional_<br/>If specified, sets the name of the pipeline template in the `pipelineInfo.name` field in the compiled IR YAML output. Overrides the name of the pipeline or component specified by the `name` parameter in the `@dsl.pipeline` decorator.
|
||||
| `pipeline_parameters` | `Dict[str, Any]` | _Optional_<br/>Map of parameter names to argument values. This lets you provide default values for pipeline or component parameters. You can override these default values during pipeline submission.
|
||||
| `type_check` | `bool` | _Optional_<br/>Indicates whether static type checking is enabled during compilation.<br/>For more information about type checking, see [Component I/O: Component interfaces and type checking][type-checking].
|
||||
| `type_check` | `bool` | _Optional_<br/>Indicates whether static type checking is enabled during compilation.<br/>
|
||||
|
||||
|
||||
## Type checking
|
||||
By default, the DSL compiler statically type checks your pipeline to ensure type consistency between components that pass data between one another. Static type checking helps identify component I/O inconsistencies without having to run the pipeline, shortening development iterations.
|
||||
|
||||
Specifically, the type checker checks for type equality between the type of data a component input expects and the type of the data provided. See [Data Types][data-types] for more information about KFP data types.
|
||||
|
||||
For example, for parameters, a list input may only be passed to parameters with a `typing.List` annotation. Similarly, a float may only be passed to parameter with a `float` annotation.
|
||||
|
||||
Input data types and annotations must also match for artifacts, with one exception: the `Artifact` type is compatible with all other artifact types. In this sense, the `Artifact` type is both the default artifact type and an artifact "any" type.
|
||||
|
||||
As described in the following section, you can disable type checking.
|
||||
|
||||
## IR YAML
|
||||
|
||||
The IR YAML is an intermediate representation of a compiled pipeline or component. It is an instance of the [`PipelineSpec`][pipeline-spec] protocol buffer message type, which is a platform-agnostic pipeline representation protocol. It is considered an intermediate representation because the KFP backend compiles `PipelineSpec` to [Argo Workflow][argo-workflow] YAML as the final pipeline definition for execution.
|
||||
|
||||
Unlike the v1 component YAML, the IR YAML is not intended to be written directly. To learn how to author pipelines and components in KFP v2 similar to authoring component YAML in KFP v1, see [Author a Pipeline: Custom Container Components][custom-container-component-authoring].
|
||||
Unlike the v1 component YAML, the IR YAML is not intended to be written directly.
|
||||
|
||||
The compiled IR YAML file contains the following sections:
|
||||
While IR YAML is not intended to be easily human readable, you can still inspect it if you know a bit about its contents:
|
||||
|
||||
| Section | Description | Example |
|
||||
|-------|-------------|---------|
|
||||
|
@ -73,9 +87,8 @@ The compiled IR YAML file contains the following sections:
|
|||
| [`default_pipeline_root`][default-pipeline-root-schema] | This section records the remote storage root path, such as a MiniIO URI or Google Cloud Storage URI, where the pipeline output is written. | [View on Github][default-pipeline-root-example]
|
||||
|
||||
|
||||
[pipeline-spec]: https://github.com/kubeflow/pipelines/blob/41b69fd90da812005965f2209b64fd1278f1cdc9/api/v2alpha1/pipeline_spec.proto#L50
|
||||
[pipeline-spec]: https://github.com/kubeflow/pipelines/blob/master/api/v2alpha1/pipeline_spec.proto#L50
|
||||
[argo-workflow]: https://argoproj.github.io/argo-workflows/
|
||||
[custom-container-component-authoring]: /docs/components/pipelines/v2/author-a-pipeline/components/#3-custom-container-components
|
||||
[compiled-output-example]: https://github.com/kubeflow/pipelines/blob/984d8a039d2ff105ca6b21ab26be057b9552b51d/sdk/python/test_data/pipelines/two_step_pipeline.yaml
|
||||
[components-example]: https://github.com/kubeflow/pipelines/blob/984d8a039d2ff105ca6b21ab26be057b9552b51d/sdk/python/test_data/pipelines/two_step_pipeline.yaml#L1-L21
|
||||
[deployment-spec-example]: https://github.com/kubeflow/pipelines/blob/984d8a039d2ff105ca6b21ab26be057b9552b51d/sdk/python/test_data/pipelines/two_step_pipeline.yaml#L23-L49
|
||||
|
@ -94,4 +107,5 @@ The compiled IR YAML file contains the following sections:
|
|||
[component-spec]: https://github.com/kubeflow/pipelines/blob/41b69fd90da812005965f2209b64fd1278f1cdc9/api/v2alpha1/pipeline_spec.proto#L85-L96
|
||||
[executor-spec]: https://github.com/kubeflow/pipelines/blob/41b69fd90da812005965f2209b64fd1278f1cdc9/api/v2alpha1/pipeline_spec.proto#L788-L803
|
||||
[dag-spec]: https://github.com/kubeflow/pipelines/blob/41b69fd90da812005965f2209b64fd1278f1cdc9/api/v2alpha1/pipeline_spec.proto#L98-L105
|
||||
[type-checking]: /docs/components/pipelines/v2/author-a-pipeline/component-io#component-interfaces-and-type-checking
|
||||
[data-types]: /docs/components/pipelines/v2/data-types
|
||||
[compiler-compile]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/compiler.html#kfp.compiler.Compiler.compile
|
|
@ -0,0 +1,15 @@
|
|||
+++
|
||||
title = "Components"
|
||||
description = "Author KFP components"
|
||||
weight = 4
|
||||
+++
|
||||
|
||||
Components are the building blocks of KFP pipelines. A component is a remote function definition; it specifies inputs, has user-defined logic in its body, and can create outputs. When the component template is instantiated with input parameters, we call it a task.
|
||||
|
||||
KFP provides two high-level ways to author components: **Python Components** and **Container Components.**
|
||||
|
||||
Python Components are a convenient way to author components implemented in pure Python. There are two specific types of Python components: **Lightweight Python Components** and **Containerized Python Components.**
|
||||
|
||||
Container Components expose a more flexible, advanced authoring approach by allowing you to define a component using an arbitrary container definition. This is the recommended approach for components that are not implemented in pure Python.
|
||||
|
||||
**Importer Components** are a special "pre-baked" component provided by KFP which allows you to import an artifact into your pipeline when that artifact was not created by tasks within the pipeline.
|
|
@ -0,0 +1,126 @@
|
|||
+++
|
||||
title = "Container Components"
|
||||
description = "Create a component via an arbitrary container definition"
|
||||
weight = 4
|
||||
+++
|
||||
|
||||
In KFP, each task execution corresponds to a container execution. This means that all components, even Python Components, are defined by an `image`, `command`, and `args`.
|
||||
|
||||
Python Components are unique because they abstract most aspects of the container definition away from the user, making it convenient to construct components that use pure Python. Under the hood, the KFP SDK sets the `image`, `commands`, and `args` to the values needed to execute the Python component for the user.
|
||||
|
||||
**Container Components, unlike Python Components, enable component authors to set the `image`, `command`, and `args` directly.** This makes it possible to author components that execute shell scripts, use other languages and binaries, etc., all from within the KFP Python SDK.
|
||||
|
||||
### A simple Container Component
|
||||
|
||||
The following starts with a simple `say_hello` Container Component and gradually modifies it until it is equivalent to our `say_hello` component from the [Hello World Pipeline example][hello-world-pipeline]. Here is a simple Container Component:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.container_component
|
||||
def say_hello():
|
||||
return dsl.ContainerSpec(image='alpine', command=['echo'], args=['Hello'])
|
||||
```
|
||||
|
||||
To create a Container Components, use the [`dsl.container_component`][dsl-container-component] decorator and create a function that returns a [`dsl.ContainerSpec`][dsl-containerspec] object. `dsl.ContainerSpec` accepts three arguments: `image`, `command`, and `args`. The component above runs the command `echo` with the argument `Hello` in a container running the image [`alpine`][alpine].
|
||||
|
||||
Container Components can be used in pipelines just like Python Components:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
from kfp import compiler
|
||||
|
||||
@dsl.pipeline
|
||||
def hello_pipeline():
|
||||
say_hello()
|
||||
|
||||
compiler.Compiler().compile(hello_pipeline, 'pipeline.yaml')
|
||||
```
|
||||
|
||||
If you run this pipeline, you'll see the string `Hello` in `say_hello`'s logs.
|
||||
|
||||
### Use component inputs
|
||||
To be more useful, `say_hello` should be able to accept arguments. You can modify `say_hello` so that it accepts an input argument `name`:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.container_component
|
||||
def say_hello(name: str):
|
||||
return dsl.ContainerSpec(image='alpine', command=['echo'], args=[f'Hello, {name}!'])
|
||||
```
|
||||
|
||||
The parameters and annotations in the Container Component function declare the component's interface. In this case, the component has one input parameter `name` and no output parameters.
|
||||
|
||||
When you compile this component, `name` will be replaced with a placeholder. At runtime, this placeholder is replaced with the actual value for `name` provided to the `say_hello` component.
|
||||
|
||||
Another way to implement this component is to use `sh -c` to read the commands from a single string and pass the name as an argument. This approach tends to be more flexible, as it readily allows chaining multiple commands together.
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.container_component
|
||||
def say_hello(name: str):
|
||||
return dsl.ContainerSpec(image='alpine', command=['sh', '-c', 'echo Hello, $0!'], args=[name])
|
||||
```
|
||||
|
||||
When you run the component with the argument `name='World'`, you’ll see the string `'Hello, World!'` in `say_hello`’s logs.
|
||||
|
||||
### Create component outputs
|
||||
|
||||
Unlike Python functions, containers do not have a standard mechanism for returning values. To enable Container Components to have outputs, KFP requires you to write outputs to a file inside the container. KFP will read this file and persist the output to [ML Metadata][ml-metadata].
|
||||
|
||||
To return an output string from the say `say_hello` component, you can add an output parameter to the function using a `dsl.OutputPath(str)` annotation:
|
||||
|
||||
```python
|
||||
@dsl.container_component
|
||||
def say_hello(name: str, greeting: dsl.OutputPath(str)):
|
||||
...
|
||||
```
|
||||
|
||||
This component now has one input parameter named `name` typed `str` and one output parameter named `greeting` also typed `str`. At runtime, parameters annotated with [`dsl.OutputPath`][dsl-outputpath] will be provided a system-generated path as an argument. Your component logic should write the output value to this path as JSON. The argument `str` in `greeting: dsl.OutputPath(str)` describes the type of the output `greeting` (e.g., the JSON written to the path `greeting` will be a string). You can fill in the `command` and `args` to write the output:
|
||||
|
||||
```python
|
||||
@dsl.container_component
|
||||
def say_hello(name: str, greeting: dsl.OutputPath(str)):
|
||||
"""Log a greeting and return it as an output."""
|
||||
|
||||
return dsl.ContainerSpec(
|
||||
image='alpine',
|
||||
command=[
|
||||
'sh', '-c', '''RESPONSE="Hello, $0!"\
|
||||
&& echo $RESPONSE\
|
||||
&& mkdir -p $(dirname $1)\
|
||||
&& echo $RESPONSE > $1
|
||||
'''
|
||||
],
|
||||
args=[name, greeting])
|
||||
```
|
||||
### Use in a pipeline
|
||||
|
||||
Finally, you can use the updated `say_hello` component in a pipeline:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
from kfp import compiler
|
||||
|
||||
@dsl.pipeline
|
||||
def hello_pipeline(person_to_greet: str) -> str:
|
||||
# greeting argument is provided automatically at runtime!
|
||||
hello_task = say_hello(name=person_to_greet)
|
||||
return hello_task.outputs['greeting']
|
||||
|
||||
compiler.Compiler().compile(hello_pipeline, 'pipeline.yaml')
|
||||
```
|
||||
|
||||
Note that you will never provide output parameters to components when constructing your pipeline; output parameters are always provided automatically by the backend at runtime.
|
||||
|
||||
This should look very similar to the [Hello World pipeline][hello-world-pipeline] with one key difference: since `greeting` is a named output parameter, we access it and return it from the pipeline using `hello_task.outputs['greeting']`, instead of `hello_task.output`. Data passing is discussed in more detail in [Pipelines Basics][pipeline-basics].
|
||||
|
||||
[hello-world-pipeline]: /docs/components/pipelines/v2/hello-world
|
||||
[pipeline-basics]: /docs/components/pipelines/v2/pipelines/pipeline-basics
|
||||
[alpine]: https://hub.docker.com/_/alpine
|
||||
[dsl-outputpath]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.OutputPath
|
||||
[dsl-container-component]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.container_component
|
||||
[dsl-containerspec]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.ContainerSpec
|
||||
[ml-metadata]: https://github.com/google/ml-metadata
|
|
@ -0,0 +1,107 @@
|
|||
+++
|
||||
title = "Containerized Python Components"
|
||||
description = "Create Python components with more complex dependencies"
|
||||
weight = 3
|
||||
+++
|
||||
|
||||
The following assumes a basic familiarity with [Lightweight Python Components][lightweight-python-components].
|
||||
|
||||
Containerized Python Components extend [Lightweight Python Components][lightweight-python-components] by relaxing the constraint that Lightweight Python Components be hermetic (i.e., fully self-contained). This means Containerized Python Component functions can depend on symbols defined outside of the function, imports outside of the function, code in adjacent Python modules, etc. To achieve this, the KFP SDK provides a convenient way to package your Python code into a container.
|
||||
|
||||
The following shows how to use Containerized Python Components by modifying the `add` component from the [Lightweight Python Components][lightweight-python-components] example:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.component
|
||||
def add(a: int, b: int) -> int:
|
||||
return a + b
|
||||
```
|
||||
|
||||
### 1. Source code setup
|
||||
Start by creating an empty `src/` directory to contain your source code:
|
||||
|
||||
```txt
|
||||
src/
|
||||
```
|
||||
|
||||
Next, add the following simple module `src/math_utils.py` with one helper function:
|
||||
|
||||
```python
|
||||
# src/math_utils.py
|
||||
def add_numbers(num1, num2):
|
||||
return num1 + num2
|
||||
```
|
||||
|
||||
Lastly, move your component to `src/my_component.py` and modify it to use the helper function:
|
||||
|
||||
```python
|
||||
# src/my_component.py
|
||||
from kfp import dsl
|
||||
from math_utils import add_numbers
|
||||
|
||||
@dsl.component
|
||||
def add(a: int, b: int) -> int:
|
||||
return add_numbers(a, b)
|
||||
```
|
||||
|
||||
`src` now looks like this:
|
||||
|
||||
```txt
|
||||
src/
|
||||
├── my_component.py
|
||||
└── math_utils.py
|
||||
```
|
||||
|
||||
### 2. Modify the dsl.component decorator
|
||||
|
||||
In this step, you'll provide `base_image` and `target_image` arguments to the `@dsl.component` decorator of your component in `src/my_component.py`:
|
||||
|
||||
```python
|
||||
@dsl.component(base_image='python:3.7',
|
||||
target_image='gcr.io/my-project/my-component:v1')
|
||||
def add(a: int, b: int) -> int:
|
||||
return add_numbers(a, b)
|
||||
```
|
||||
|
||||
Setting `target_image` at once specifies the [tag][image-tag] for the image you'll build in Step 3 and instructs KFP to run the decorated Python function in a container that uses the image with that tag.
|
||||
|
||||
In a Containerized Python Component, `base_image` specifies the base image that KFP will use when building your new container image. Specifically, KFP uses the `base_image` argument for the [`FROM`][docker-from] instruction in the Dockerfile used to build your image.
|
||||
|
||||
The previous example includes `base_image` for clarity, but this is not necessary as `base_image` will to default to `'python:3.7'` if omitted.
|
||||
|
||||
### 3. Build the component
|
||||
Now that your code is in a standalone directory and you've specified a target image, you can conveniently build an image using the [`kfp component build`][kfp-component-build] CLI command:
|
||||
|
||||
```sh
|
||||
kfp component build src/ --component-filepattern my_component.py --no-push-image
|
||||
```
|
||||
|
||||
If you have a [configured Docker to use a private image registry](https://docs.docker.com/engine/reference/commandline/login/), you can replace the `--no-push-image` flag with `--push-image` to automatically push the image after building.
|
||||
|
||||
### 4. Use the component in a pipeline
|
||||
|
||||
Finally, you can use the component in a pipeline:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.pipeline
|
||||
def addition_pipeline(x: int, y: int) -> int:
|
||||
task1 = add(a=x, b=y)
|
||||
task2 = add(a=task1.output, b=x)
|
||||
return task2.output
|
||||
|
||||
compiler.Compiler().compile(addition_pipeline, 'pipeline.yaml')
|
||||
```
|
||||
|
||||
Since `add`'s `target_image` uses [Google Cloud Artifact Registry][artifact-registry] (indicated by the `gcr.io` URI), the pipeline shown here assumes you have pushed your image to Google Cloud Artifact Registry, you are running your pipeline on [Google Cloud Vertex AI Pipelines][vertex-pipelines], and you have configured [IAM permissions][iam] so that Vertex AI Pipelines can pull images from Artifact Registry.
|
||||
|
||||
|
||||
[kfp-component-build]: https://kubeflow-pipelines.readthedocs.io/en/master/source/cli.html#kfp-component-build
|
||||
[lightweight-python-components]: /docs/components/pipelines/v2/components/lightweight-python-components
|
||||
[image-tag]: https://docs.docker.com/engine/reference/commandline/tag/
|
||||
[docker-from]: https://docs.docker.com/engine/reference/builder/#from
|
||||
[artifact-registry]: https://cloud.google.com/artifact-registry/docs/docker/authentication
|
||||
[vertex-pipelines]: https://cloud.google.com/vertex-ai/docs/pipelines/introduction
|
||||
[iam]: https://cloud.google.com/iam
|
|
@ -0,0 +1,38 @@
|
|||
+++
|
||||
title = "Special case: Importer Components"
|
||||
description = "Import artifacts from outside your pipeline"
|
||||
weight = 5
|
||||
+++
|
||||
|
||||
Unlike the other three authoring approaches, an importer component not a general authoring style but a pre-baked component for a specific use case: loading a machine learning artifact from from a URI into the current pipeline and, as a result, into [ML Metadata][ml-metadata]. This section assumes basic familiarity with KFP [artifacts][artifacts].
|
||||
|
||||
As described in [Pipeline Basics][pipeline-basics], inputs to a task are typically outputs of an upstream task. When this is the case, artifacts are easily accessed on the upstream task using `my_task.outputs['<output-key>']`. The artifact is also registered in ML Metadata when it is created by the upstream task.
|
||||
|
||||
If you wish to use an existing artifact that is not generated by a task in the current pipeline or wish to use as an artifact an external file that was not generated by a pipeline at all, you can use a [`dsl.importer`][dsl-importer] component to load the artifact from its URI.
|
||||
|
||||
You do not need to write an importer component; it can be imported from the `dsl` module and used directly:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.pipeline
|
||||
def my_pipeline():
|
||||
task = get_date_string()
|
||||
importer_task = dsl.importer(
|
||||
artifact_uri='gs://ml-pipeline-playground/shakespeare1.txt',
|
||||
artifact_class=dsl.Dataset,
|
||||
reimport=True,
|
||||
metadata={'date': task.output})
|
||||
other_component(dataset=importer_task.output)
|
||||
```
|
||||
|
||||
In addition to an `artifact_uri` argument, you must provide an `artifact_class` argument to specify the type of the artifact.
|
||||
|
||||
The `importer` component permits setting artifact metadata via the `metadata` argument. Metadata can be constructed with outputs from upstream tasks, as is done for the `'date'` value in the example pipeline.
|
||||
|
||||
You may also specify a boolean `reimport` argument. If `reimport` is `False`, KFP will check to see if the artifact has already been imported to ML Metadata and, if so, use it. This is useful for avoiding duplicative artifact entries in ML Metadata when multiple pipeline runs import the same artifact. If `reimport` is `True`, KFP will reimport the artifact as a new artifact in ML Metadata regardless of whether it was previously imported.
|
||||
|
||||
[pipeline-basics]: /docs/components/pipelines/v2/pipelines/pipeline-basics
|
||||
[dsl-importer]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.importer
|
||||
[artifacts]: /docs/components/pipelines/v2/data-types/artifacts
|
||||
[ml-metadata]: https://github.com/google/ml-metadata
|
|
@ -0,0 +1,137 @@
|
|||
+++
|
||||
title = "Lightweight Python Components"
|
||||
description = "Create a component from a self-contained Python function"
|
||||
weight = 1
|
||||
+++
|
||||
|
||||
The easiest way to get started authoring components is by creating a Lightweight Python Component. We saw an example of a Lightweight Python Component with `say_hello` in the [Hello World pipeline example][hello-world-pipeline]. Here is another Lightweight Python Component that adds two integers together:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.component
|
||||
def add(a: int, b: int) -> int:
|
||||
return a + b
|
||||
```
|
||||
|
||||
Lightweight Python Components are constructed by decorating Python functions with the [`@dsl.component`][dsl-component] decorator. The `@dsl.component` decorator transforms your function into a KFP component that can be executed as a remote function by a KFP conformant-backend, either independently or as a single step in a larger pipeline.
|
||||
|
||||
### Python function requirements
|
||||
To decorate a function with the `@dsl.component` decorator it must meet two requirements:
|
||||
|
||||
|
||||
1. **Type annotations:** The function inputs and outputs must have valid KFP [type annotations][data-types].
|
||||
|
||||
There are two categories of inputs and outputs in KFP: [parameters][parameters] and [artifacts][artifacts]. There are specific types of parameters and artifacts within each category. Every input and output will have a specific type indicated indicated by its type annotation.
|
||||
|
||||
In the preceding `add` component, both inputs `a` and `b` are parameters typed `int`. There is one output, also typed `int`.
|
||||
|
||||
Valid parameter annotations include Python's built-in `int`, `float`, `str`, `bool`, `typing.Dict`, and `typing.List`. Artifact annotations are discussed in detail in [Data Types: Artifacts][artifacts].
|
||||
|
||||
|
||||
2. **Hermetic:** The Python function may not reference any symbols defined outside of its body.
|
||||
|
||||
For example, if you wish to use a constant, the constant must be defined inside the function:
|
||||
|
||||
```python
|
||||
@dsl.component
|
||||
def double(a: int) -> int:
|
||||
"""Succeeds at runtime."""
|
||||
VALID_CONSTANT = 2
|
||||
return VALID_CONSTANT * a
|
||||
```
|
||||
|
||||
By comparison, the following is invalid and will fail at runtime:
|
||||
```python
|
||||
# non-example!
|
||||
INVALID_CONSTANT = 2
|
||||
|
||||
@dsl.component
|
||||
def errored_double(a: int) -> int:
|
||||
"""Fails at runtime."""
|
||||
return INVALID_CONSTANT * a
|
||||
```
|
||||
|
||||
Imports must also be included in the function body:
|
||||
|
||||
```python
|
||||
@dsl.component
|
||||
def print_env():
|
||||
import os
|
||||
print(os.environ)
|
||||
```
|
||||
|
||||
|
||||
For many realistic components, hermeticism can be a fairly constraining requirement. [Containerized Python Components][containerized-python-components] is a more flexible authoring approach that drops this requirement.
|
||||
|
||||
### dsl.component decorator arguments
|
||||
In the above examples we used the [`@dsl.component`][dsl-component] decorator with only one argument: the Python function. The decorator accepts some additional arguments.
|
||||
|
||||
#### packages_to_install
|
||||
|
||||
Most realistic Lightweight Python Components will depend on other Python libraries. You can pass a list of requirements to `packages_to_install` and the component will install these packages at runtime before executing the component function.
|
||||
|
||||
|
||||
This is similar to including requirements in a [`requirements.txt`][requirements-txt] file.
|
||||
|
||||
```python
|
||||
@dsl.component(packages_to_install=['numpy==1.21.6'])
|
||||
def sin(val: float = 3.14) -> float:
|
||||
return np.sin(val).item()
|
||||
```
|
||||
#### pip_index_urls
|
||||
|
||||
`pip_index_urls` exposes the ability to pip install `packages_to_install` from package indices other than the default [PyPI.org][pypi-org].
|
||||
|
||||
When you set `pip_index_urls`, KFP passes these indices to [`pip install`][pip-install]'s [`--index-url`][pip-index-url] and [`--extra-index-url`][pip-extra-index-url] options. It also sets each index as a `--trusted-host`.
|
||||
|
||||
Take the following component:
|
||||
|
||||
```python
|
||||
@dsl.component(packages_to_install=['custom-ml-package==0.0.1', 'numpy==1.21.6'],
|
||||
pip_index_urls=['http://myprivaterepo.com/simple', 'http://pypi.org/simple'],
|
||||
)
|
||||
def comp():
|
||||
from custom_ml_package import model_trainer
|
||||
import numpy as np
|
||||
...
|
||||
```
|
||||
|
||||
These arguments approximately translate to the following `pip install` command:
|
||||
|
||||
```sh
|
||||
pip install custom-ml-package==0.0.1 numpy==1.21.6 kfp==2 --index-url http://myprivaterepo.com/simple --trusted-host http://myprivaterepo.com/simple --extra-index-url http://pypi.org/simple --trusted-host http://pypi.org/simple
|
||||
```
|
||||
|
||||
Note that when you set `pip_index_urls`, KFP does not include `'http://pypi.org/simple'` automatically. If you wish to pip install packages from a private repository _and_ the default public repository, you should include both the private and default URLs as shown in the preceding component `comp`.
|
||||
|
||||
#### base_image
|
||||
|
||||
When you create a Lightweight Python Component, your Python function code is extracted by the KFP SDK to be executed inside a container at pipeline runtime. By default, the container image used is [`python:3.7`](https://hub.docker.com/_/python). You can override this image by providing an argument to `base_image`. This can be useful if your code requires a specific Python version or other dependencies not included in the default image.
|
||||
|
||||
```python
|
||||
@dsl.component(base_image='python:3.8')
|
||||
def print_py_version():
|
||||
import sys
|
||||
print(sys.version)
|
||||
```
|
||||
|
||||
#### install_kfp_package
|
||||
`install_kfp_package` can be used together with `pip_index_urls` to provide granular control over installation of the `kfp` package at component runtime.
|
||||
|
||||
By default, Python Components install `kfp` at runtime. This is required to define symbols used by your component (such as [artifact annotations][artifacts]) and to access additional KFP library code required to execute your component remotely. If `install_kfp_package` is `False`, `kfp` will not be installed via the normal automatic mechanism. Instead, you can use `packages_to_install` and `pip_index_urls` to install a different version of `kfp`, possibly from a non-default pip index URL.
|
||||
|
||||
Note that setting `install_kfp_package` to `False` is rarely necessary and is discouraged for the majority of use cases.
|
||||
|
||||
|
||||
[hello-world-pipeline]: /docs/components/pipelines/v2/hello-world
|
||||
[containerized-python-components]: /docs/components/pipelines/v2/components/containerized-python-components
|
||||
[dsl-component]: https://kubeflow-pipelines.readthedocs.io/en/master/source/dsl.html#kfp.dsl.component
|
||||
[data-types]: /docs/components/pipelines/v2/data-types
|
||||
[parameters]: /docs/components/pipelines/v2/data-types/parameters
|
||||
[artifacts]: /docs/components/pipelines/v2/data-types/artifacts
|
||||
[requirements-txt]: https://pip.pypa.io/en/stable/reference/requirements-file-format/
|
||||
[pypi-org]: https://pypi.org/
|
||||
[pip-install]: https://pip.pypa.io/en/stable/cli/pip_install/
|
||||
[pip-index-url]: https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-0
|
||||
[pip-extra-index-url]: https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-extra-index-url
|
|
@ -0,0 +1,17 @@
|
|||
+++
|
||||
title = "Data Types"
|
||||
description = "Component and pipeline I/O types"
|
||||
weight = 6
|
||||
+++
|
||||
|
||||
KFP components and pipelines can accept inputs and create outputs. To do so, they must declare typed interfaces through their function signatures and annotations.
|
||||
|
||||
There are two groups of types in KFP: parameters and artifacts. Parameters are useful for passing small amounts of data between components. Artifacts types are the mechanism by which KFP provides first-class support for ML artifact outputs, such as datasets, models, metrics, etc.
|
||||
|
||||
So far [Hello World pipeline][hello-world] and the examples in [Components][components] have demonstrated how to use input and output parameters.
|
||||
|
||||
KFP automatically tracks the way parameters and artifacts are passed between components and stores the this data passing history in [ML Metadata][ml-metadata]. This enables out-of-the-box ML artifact lineage tracking and easily reproducible pipeline executions. Furthermore, KFP's strongly-typed components provide a data contract between tasks in a pipeline.
|
||||
|
||||
[hello-world]: /docs/components/pipelines/v2/hello-world
|
||||
[components]: /docs/components/pipelines/v2/components
|
||||
[ml-metadata]: https://github.com/google/ml-metadata
|
|
@ -0,0 +1,213 @@
|
|||
+++
|
||||
title = "Artifacts"
|
||||
description = "Create, use, pass, and track ML artifacts"
|
||||
weight = 2
|
||||
+++
|
||||
|
||||
Most machine learning pipelines aim to create one or more machine learning artifacts, such as a model, dataset, evaluation metrics, etc.
|
||||
|
||||
KFP provides first-class support for creating machine learning artifacts via the [`dsl.Artifact`][dsl-artifact] class and other artifact subclasses. KFP maps these artifacts to their underlying [ML Metadata][ml-metadata] schema title, the canonical name for the artifact type.
|
||||
|
||||
In general, artifacts and their associated annotations serve several purposes:
|
||||
* To provide logical groupings of component/pipeline input/output types
|
||||
* To provide a convenient mechanism for writing to object storage via the task's local filesystem
|
||||
* To enable [type checking][type-checking] of pipelines that create ML artifacts
|
||||
* To make the contents of some artifact types easily observable via special UI rendering
|
||||
|
||||
The following `training_component` demonstrates standard usage of both input and output artifacts:
|
||||
|
||||
```python
|
||||
from kfp.dsl import Input, Output, Dataset, Model
|
||||
|
||||
@dsl.component
|
||||
def training_component(dataset: Input[Dataset], model: Output[Model]):
|
||||
"""Trains an output Model on an input Dataset."""
|
||||
with open(dataset.path) as f:
|
||||
contents = f.read()
|
||||
|
||||
# ... train tf_model model on contents of dataset ...
|
||||
|
||||
tf_model.save(model.path)
|
||||
tf_model.metadata['framework'] = 'tensorflow'
|
||||
```
|
||||
|
||||
This `training_component` does the following:
|
||||
1. Accepts an input dataset
|
||||
2. Reads the input dataset's content from the local filesystem
|
||||
3. Trains a model (omitted)
|
||||
4. Saves the model as a component output
|
||||
5. Sets some metadata about the saved model
|
||||
|
||||
As illustrated by `training_component`, artifacts are simply a thin wrapper around some artifact properties, including the `.path` from which the artifact can be read/written and the artifact's `.metadata`. The following sections describe these properties and other aspects of artifacts in detail.
|
||||
|
||||
### Artifact types
|
||||
|
||||
The artifact annotation indicates the type of the artifact. KFP provides several artifact types within the DSL:
|
||||
|
||||
| DSL object | Artifact schema title |
|
||||
| ----------------------------- | ---------------------------------- |
|
||||
| [`Artifact`][dsl-artifact] | system.Artifact |
|
||||
| [`Dataset`][dsl-dataset] | system.Dataset |
|
||||
| [`Model`][dsl-model] | system.Model |
|
||||
| [`Metrics`][dsl-metrics] | system.Metrics |
|
||||
| [`ClassificationMetrics`][dsl-classificationmetrics] | system.ClassificationMetrics |
|
||||
| [`SlicedClassificationMetrics`][dsl-slicedclassificationmetrics] | system.SlicedClassificationMetrics |
|
||||
| [`HTML`][dsl-html] | system.HTML |
|
||||
| [`Markdown`][dsl-markdown] | system.Markdown |
|
||||
|
||||
|
||||
`Artifact`, `Dataset`, `Model`, and `Metrics` are the most generic and commonly used artifact types. `Artifact` is the default artifact base type and should be used in cases where the artifact type does not fit neatly into another artifact category. `Artifact` is also compatible with all other artifact types. In this sense, the `Artifact` type is also an artifact "any" type.
|
||||
|
||||
On the [KFP open source][oss-be] UI, `ClassificationMetrics`, `SlicedClassificationMetrics`, `HTML`, and `Markdown` provide special UI rendering to make the contents of the artifact easily observable.
|
||||
|
||||
<!-- TODO: describe strongly-typed schemas -->
|
||||
|
||||
### Declaring Input/Output artifacts
|
||||
|
||||
In _components_, an artifact annotation must always be wrapped in an `Input` or `Output` type marker to indicate the artifact's I/O type. This is required, as it would otherwise be ambiguous whether an artifact is an input or output since input and output artifacts are both declared via Python function parameters.
|
||||
|
||||
In _pipelines_, input artifact annotations should be wrapped in an `Input` type marker and, unlike in components, output artifacts should be provided as a return annotation as shown in `concat_pipeline`'s `Dataset` output:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
from kfp.dsl import Dataset, Input, Output
|
||||
|
||||
@dsl.component
|
||||
def concat_component(
|
||||
dataset1: Input[Dataset],
|
||||
dataset2: Input[Dataset],
|
||||
out_dataset: Output[Dataset],
|
||||
):
|
||||
with open(dataset1.path) as f:
|
||||
contents1 = f.read()
|
||||
with open(dataset2.path) as f:
|
||||
contents2 = f.read()
|
||||
with open(out_dataset.path, 'w') as f:
|
||||
f.write(contents1 + contents2)
|
||||
|
||||
@dsl.pipeline
|
||||
def concat_pipeline(
|
||||
d1: Input[Dataset],
|
||||
d2: Input[Dataset],
|
||||
) -> Dataset:
|
||||
return concat_component(
|
||||
dataset1=d1,
|
||||
dataset2=d2
|
||||
).output['out_dataset']
|
||||
```
|
||||
|
||||
You can specify multiple pipeline artifact outputs, just as you would for parameters. This is shown by `concat_pipeline2`'s outputs `intermediate_dataset` and `final_dataset`:
|
||||
|
||||
```python
|
||||
from typing import NamedTuple
|
||||
from kfp import dsl
|
||||
from kfp.dsl import Dataset, Input
|
||||
|
||||
@dsl.pipeline
|
||||
def concat_pipeline2(
|
||||
d1: Input[Dataset],
|
||||
d2: Input[Dataset],
|
||||
d3: Input[Dataset],
|
||||
) -> NamedTuple('Outputs',
|
||||
intermediate_dataset=Dataset,
|
||||
final_dataset=Dataset):
|
||||
Outputs = NamedTuple('Outputs',
|
||||
intermediate_dataset=Dataset,
|
||||
final_dataset=Dataset)
|
||||
concat1 = concat_component(
|
||||
dataset1=d1,
|
||||
dataset2=d2
|
||||
)
|
||||
concat2 = concat_component(
|
||||
dataset1=concat1.outputs['out_dataset'],
|
||||
dataset2=d3
|
||||
)
|
||||
return Outputs(intermediate_dataset=concat1.outputs['out_dataset'],
|
||||
final_dataset=concat2.outputs['out_dataset'])
|
||||
```
|
||||
|
||||
The [KFP SDK compiler][compiler] will type check artifact usage according to the rules described in [Type Checking][type-checking].
|
||||
|
||||
### Using output artifacts
|
||||
|
||||
When you use an input or output annotation in a component, your component effectively makes a request at runtime for a URI path to the artifact.
|
||||
|
||||
For output artifacts, the artifact being created does not yet exist (your component is going to create it!). To make it easy for components to create artifacts, the KFP backend provides a unique system-generated URI where the component should write the output artifact. For both input and output artifacts, the URI is a path within the cloud object storage bucket specified as the pipeline root. The URI uniquely identifies the output by its name, producer task, and pipeline. The system-generated URI is accessible as an attribute on the `.uri` attribute of the artifact instance automatically passed to the component at runtime:
|
||||
|
||||
<!-- TODO: need to document pipeline_root and link here -->
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
from kfp.dsl import Model
|
||||
from kfp.dsl import Output
|
||||
|
||||
@dsl.component
|
||||
def print_artifact(model: Output[Model]):
|
||||
print('URI:', dataset.uri)
|
||||
```
|
||||
|
||||
Note that you will never pass an output artifact to a component directly when composing your pipeline. For example, in `concat_pipeline2` above, we do not pass `out_dataset` to the `concat_component`. The output artifact will be passed to the component automatically with the correct system-generated URI at runtime.
|
||||
|
||||
While you can write output artifacts directly to the URI, KFP provides an even easier mechanism via the artifact's `.path` attribute:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
from kfp.dsl import Model
|
||||
from kfp.dsl import Output
|
||||
|
||||
@dsl.component
|
||||
def print_and_create_artifact(model: Output[Model]):
|
||||
print('path:', dataset.path)
|
||||
with open(dataset.path, 'w') as f:
|
||||
f.write('my dataset!')
|
||||
```
|
||||
|
||||
After the task executes, KFP handles copying the file at `.path` to the URI at `.uri` automatically, allowing you to create artifact files by only interacting with the local filesystem. This approach works when the output artifact is stored as a file or directory.
|
||||
|
||||
For cases where the output artifact is not easily represented by a file (for example, the output is a container image containing a model), you should override the system-generated `.uri` by setting it on the artifact directly, then write the output to that location. KFP will store the updated URI in ML Metadata. The artifact's `.path` attribute will not be useful.
|
||||
|
||||
### Using input artifacts
|
||||
|
||||
For input artifacts, the artifact URI already exists since the artifact has already been created. KFP handles passing the correct URI to your component based on the data exchange established in your pipeline. As for output artifacts, KFP handles copying the existing file at `.uri` to the path at `.path` so that your component can read it from the local filesystem.
|
||||
|
||||
Input artifacts should be treated as immutable. You should not try to modify the contents of the file at `.path` and any changes to `.metadata` will not affect the artifact's metadata in [ML Metadata][ml-metadata].
|
||||
|
||||
### Artifact name and metadata
|
||||
|
||||
In addition to `.uri` and `.path`, artifacts also have a `.name` and `.metadata`.
|
||||
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
from kfp.dsl import Dataset
|
||||
from kfp.dsl import Input
|
||||
|
||||
@dsl.component
|
||||
def count_rows(dataset: Input[Dataset]) -> int:
|
||||
with open(dataset.path) as f:
|
||||
lines = f.readlines()
|
||||
|
||||
print('Information about the artifact:')
|
||||
print('Name:', dataset.name)
|
||||
print('URI:', dataset.uri)
|
||||
print('Path:', dataset.path)
|
||||
print('Metadata:', dataset.metadata)
|
||||
|
||||
return len(lines)
|
||||
```
|
||||
|
||||
In KFP artifacts can have metadata, which can be accessed in a component via the artifact's `.metadata` attribute. Metadata is useful for recording information about the artifact such as which ML framework generated the artifact, what its downstream uses are, etc. For output artifacts, metadata can be set directly on the `.metadata` dictionary, as shown for `model` in the preceding `training_component`.
|
||||
|
||||
|
||||
[ml-metadata]: https://github.com/google/ml-metadata
|
||||
[compiler]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/compiler.html#kfp.compiler.Compiler
|
||||
[dsl-artifact]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.Artifact
|
||||
[dsl-dataset]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.Dataset
|
||||
[dsl-model]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.Model
|
||||
[dsl-metrics]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.Metrics
|
||||
[dsl-classificationmetrics]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.ClassificationMetrics
|
||||
[dsl-slicedclassificationmetrics]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.SlicedClassificationMetrics
|
||||
[dsl-html]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.HTML
|
||||
[dsl-markdown]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.Markdown
|
||||
[type-checking]: /docs/components/pipelines/v2/compile-a-pipeline#type-checking
|
||||
[oss-be]: /docs/components/pipelines/v2/installation/
|
|
@ -0,0 +1,175 @@
|
|||
+++
|
||||
title = "Parameters"
|
||||
description = "Pass small amounts of data between components"
|
||||
weight = 1
|
||||
+++
|
||||
|
||||
Parameters are useful for passing small amounts of data between components and when the data created by a component does not represent a machine learning artifact such as a model, dataset, or more complex data type.
|
||||
|
||||
Specify parameter inputs and outputs using built-in Python type annotations:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.component
|
||||
def join_words(word: str, count: int = 10) -> str:
|
||||
return ' '.join(word for _ in range(count))
|
||||
```
|
||||
|
||||
|
||||
KFP maps Python type annotations to the types stored in [ML Metadata][ml-metadata] according to the following table:
|
||||
|
||||
| Python object | KFP type |
|
||||
| ---------------------- | -------- |
|
||||
| `str` | string |
|
||||
| `int` | number |
|
||||
| `float` | number |
|
||||
| `bool` | boolean |
|
||||
| `typing.List` / `list` | object |
|
||||
| `typing.Dict` / `dict` | object |
|
||||
|
||||
As with normal Python function, input parameters can have default values, indicated in the standard way: `def func(my_string: str = 'default'):`
|
||||
|
||||
Under the hood KFP passes all parameters to and from components by serializing them as JSON.
|
||||
|
||||
For all Python Components ([Lightweight Python Components][lightweight-python-components] and [Containerized Python Components][containerized-python-components]), parameter serialization and deserialiation is invisible to the user; KFP handles this automatically.
|
||||
|
||||
For [Container Components][container-component], input parameter deserialization is invisible to the user; KFP passes inputs to the component automatically. For Container Component *outputs*, the user code in the Container Component must handle serializing the output parameters as described in [Container Components: Create component outputs][container-component-outputs].
|
||||
|
||||
### Input parameters
|
||||
Using input parameters is very easy. Simply annotate your component function with the types and, optionally, defaults. This is demonstrated by the following pipeline, which uses a Python Component, a Container Component, and a pipeline with all parameter types as inputs:
|
||||
|
||||
<!-- TODO: document None default -->
|
||||
|
||||
```python
|
||||
from typing import Dict, List
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.component
|
||||
def python_comp(
|
||||
string: str = 'hello',
|
||||
integer: int = 1,
|
||||
floating_pt: float = 0.1,
|
||||
boolean: bool = True,
|
||||
dictionary: Dict = {'key': 'value'},
|
||||
array: List = [1, 2, 3],
|
||||
):
|
||||
print(string)
|
||||
print(integer)
|
||||
print(floating_pt)
|
||||
print(boolean)
|
||||
print(dictionary)
|
||||
print(array)
|
||||
|
||||
|
||||
@dsl.container_component
|
||||
def container_comp(
|
||||
string: str = 'hello',
|
||||
integer: int = 1,
|
||||
floating_pt: float = 0.1,
|
||||
boolean: bool = True,
|
||||
dictionary: Dict = {'key': 'value'},
|
||||
array: List = [1, 2, 3],
|
||||
):
|
||||
return dsl.ContainerSpec(
|
||||
image='alpine',
|
||||
command=['sh', '-c', """echo $0 $1 $2 $3 $4 $5 $6"""],
|
||||
args=[
|
||||
string,
|
||||
integer,
|
||||
floating_pt,
|
||||
boolean,
|
||||
dictionary,
|
||||
array,
|
||||
])
|
||||
|
||||
@dsl.pipeline
|
||||
def my_pipeline(
|
||||
string: str = 'Hey!',
|
||||
integer: int = 100,
|
||||
floating_pt: float = 0.1,
|
||||
boolean: bool = False,
|
||||
dictionary: Dict = {'key': 'value'},
|
||||
array: List = [1, 2, 3],
|
||||
):
|
||||
python_comp(
|
||||
string='howdy',
|
||||
integer=integer,
|
||||
array=[4, 5, 6],
|
||||
)
|
||||
container_comp(
|
||||
string=string,
|
||||
integer=20,
|
||||
dictionary={'other key': 'other val'},
|
||||
boolean=boolean,
|
||||
)
|
||||
```
|
||||
|
||||
### Output parameters
|
||||
|
||||
For Python Components and pipelines, output parameters are indicated via return annotations:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.component
|
||||
def my_comp() -> int:
|
||||
return 1
|
||||
|
||||
@dsl.pipeline
|
||||
def my_pipeline() -> int:
|
||||
task = my_comp()
|
||||
return task.output
|
||||
```
|
||||
|
||||
For Container Components, output parameters are indicated using a [`dsl.OutputPath`][dsl-outputpath] annotation:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.container_component
|
||||
def my_comp(int_path: dsl.OutputPath(int)):
|
||||
return dsl.ContainerSpec(
|
||||
image='alpine',
|
||||
command=[
|
||||
'sh', '-c', f"""mkdir -p $(dirname {int_path})\
|
||||
&& echo 1 > {int_path}"""
|
||||
])
|
||||
|
||||
@dsl.pipeline
|
||||
def my_pipeline() -> int:
|
||||
task = my_comp()
|
||||
return task.outputs['int_path']
|
||||
```
|
||||
|
||||
See [Container Components: Create component outputs][container-component-outputs] for more information on how to use `dsl.OutputPath`
|
||||
|
||||
### Multiple output parameters
|
||||
You can specify multiple named output parameters using a [`typing.NamedTuple`][typing-namedtuple]. You can access a named output using `.output['<output-key>']` on [`PipelineTask`][pipelinetask]:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
from typing import NamedTuple
|
||||
|
||||
@dsl.component
|
||||
def my_comp() -> NamedTuple('outputs', a=int, b=str):
|
||||
outputs = NamedTuple('outputs', a=int, b=str)
|
||||
return outputs(1, 'hello')
|
||||
|
||||
@dsl.pipeline
|
||||
def my_pipeline() -> NamedTuple('pipeline_outputs', c=int, d=str):
|
||||
task = my_comp()
|
||||
pipeline_outputs = NamedTuple('pipeline_outputs', c=int, d=str)
|
||||
return pipeline_outputs(task.outputs['a'], task.outputs['b'])
|
||||
```
|
||||
|
||||
|
||||
[ml-metadata]: https://github.com/google/ml-metadata
|
||||
[lightweight-python-components]: /docs/components/pipelines/v2/components/lightweight-python-components
|
||||
[containerized-python-components]: /docs/components/pipelines/v2/components/containerized-python-components
|
||||
[container-component]: /docs/components/pipelines/v2/components/container-components
|
||||
[container-component-outputs]: /docs/components/pipelines/v2/components/container-components#create-component-outputs
|
||||
[pipelinetask]: https://kubeflow-pipelines.readthedocs.io/en/master/source/dsl.html#kfp.dsl.PipelineTask
|
||||
[dsl-outputpath]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.OutputPath
|
||||
[ml-metadata]: https://github.com/google/ml-metadata
|
||||
[typing-namedtuple]: https://docs.python.org/3/library/typing.html#typing.NamedTuple
|
|
@ -0,0 +1,64 @@
|
|||
+++
|
||||
title = "Hello World Pipeline"
|
||||
description = "Create your first pipeline"
|
||||
weight = 3
|
||||
+++
|
||||
|
||||
To get started with the tutorials, pip install `kfp` v2:
|
||||
|
||||
```sh
|
||||
pip install kfp --pre
|
||||
```
|
||||
|
||||
Here is a simple pipeline that prints a greeting:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.component
|
||||
def say_hello(name: str) -> str:
|
||||
hello_text = f'Hello, {name}!'
|
||||
print(hello_text)
|
||||
return hello_text
|
||||
|
||||
@dsl.pipeline
|
||||
def hello_pipeline(recipient: str) -> str:
|
||||
hello_task = say_hello(name=recipient)
|
||||
return hello_task.output
|
||||
```
|
||||
|
||||
You can [compile the pipeline][compile-a-pipeline] to YAML with the KFP SDK DSL [`Compiler`][compiler]:
|
||||
|
||||
```python
|
||||
from kfp import compiler
|
||||
|
||||
compiler.Compiler().compile(hello_pipeline, 'pipeline.yaml')
|
||||
```
|
||||
|
||||
The [`dsl.component`][dsl-component] and [`dsl.pipeline`][dsl-pipeline] decorators turn your type-annotated Python functions into components and pipelines, respectively. The KFP SDK compiler compiles the domain-specific language (DSL) objects to a hermetic pipeline [YAML file][ir-yaml].
|
||||
|
||||
You can submit the YAML file to a KFP-conformant backend for execution. If you have already deployed a [KFP open source backend instance][installation] and obtained the endpoint for your deployment, you can submit the pipeline for execution using the KFP SDK [`Client`][client]. The following submits the pipeline for execution with the argument `recipient='World'`:
|
||||
|
||||
```python
|
||||
from kfp.client import Client
|
||||
|
||||
client = Client(host='<MY-KFP-ENDPOINT>')
|
||||
run = client.create_run_from_pipeline_package(
|
||||
'pipeline.yaml',
|
||||
arguments={
|
||||
'recipient': 'World',
|
||||
},
|
||||
)
|
||||
```
|
||||
|
||||
The client will print a link to view the pipeline execution graph and logs in the UI. In this case, the pipeline has one task that prints and returns `'Hello, World!'`.
|
||||
|
||||
In the next few sections you'll learn more about the core concepts of authoring pipelines and how to create more expressive, useful pipelines.
|
||||
|
||||
[installation]: /docs/components/pipelines/v2/installation/
|
||||
[client]: https://kubeflow-pipelines.readthedocs.io/en/master/source/client.html#kfp.client.Client
|
||||
[compiler]: https://kubeflow-pipelines.readthedocs.io/en/master/source/compiler.html#kfp.compiler.Compiler
|
||||
[ir-yaml]: /docs/components/pipelines/v2/compile-a-pipeline#ir-yaml
|
||||
[compile-a-pipeline]: /docs/components/pipelines/v2/compile-a-pipeline/
|
||||
[dsl-pipeline]: https://kubeflow-pipelines.readthedocs.io/en/master/source/dsl.html#kfp.dsl.pipeline
|
||||
[dsl-component]: https://kubeflow-pipelines.readthedocs.io/en/master/source/dsl.html#kfp.dsl.component
|
|
@ -1,7 +1,7 @@
|
|||
+++
|
||||
title = "Installation"
|
||||
description = "Options to install Kubeflow Pipelines"
|
||||
weight = 2
|
||||
description = "Options for deploying Kubeflow Pipelines"
|
||||
weight = 3
|
||||
+++
|
||||
|
||||
This page will be available soon. For similar information, see [KFP v1 installation documentation][v1-installation].
|
|
@ -1,7 +1,7 @@
|
|||
+++
|
||||
title = "Quickstart"
|
||||
description = "Get started with Kubeflow Pipelines"
|
||||
weight = 3
|
||||
weight = 2
|
||||
|
||||
+++
|
||||
|
||||
|
@ -15,7 +15,7 @@ summary {
|
|||
</style>
|
||||
|
||||
<!-- TODO: add UI screenshots for final pipeline -->
|
||||
This tutorial helps you get started with KFP.
|
||||
This tutorial helps you get started with a KFP deployment and a pipeline created with the KFP SDK.
|
||||
|
||||
Before you begin, you need the following prerequisites:
|
||||
|
||||
|
@ -88,7 +88,7 @@ print(url)
|
|||
|
||||
The above code consists of the following parts:
|
||||
|
||||
* In the first part, the following lines create a [lightweight Python component][lightweight-python-component] by using the `@dsl.component` decorator:
|
||||
* In the first part, the following lines create a [Lightweight Python Component][lightweight-python-component] by using the `@dsl.component` decorator:
|
||||
```python
|
||||
@dsl.component
|
||||
def addition_component(num1: int, num2: int) -> int:
|
||||
|
@ -303,15 +303,10 @@ Congratulations! You now have a KFP deployment, an end-to-end ML pipeline, and a
|
|||
|
||||
## Next steps
|
||||
* See [Installation][installation] for additional ways to deploy KFP
|
||||
* See [Author a Pipeline][author-a-pipeline] to learn more about feautres available when authoring pipelines
|
||||
* See [Pipelines][pipelines] to learn more about feautres available when authoring pipelines
|
||||
|
||||
[kind]: [https://kind.sigs.k8s.io/]
|
||||
|
||||
[author-a-pipeline]: /docs/components/pipelines/v2/author-a-pipeline/
|
||||
[pipelines]: /docs/components/pipelines/v2/author-a-pipeline/pipelines
|
||||
[pipelines]: /docs/components/pipelines/v2/pipelines/
|
||||
[installation]: /docs/components/pipelines/v2/installation/
|
||||
[localhost]: http://localhost:8080
|
||||
[chocolatey]: https://chocolatey.org/packages/kind
|
||||
[authenticating-pipelines-gcp]: /docs/distributions/gke/authentication/#authentication-from-kubeflow-pipelines
|
||||
[ir-yaml]: /docs/components/pipelines/v2/compile-a-pipeline/#ir-yaml
|
||||
[lightweight-python-component]: /docs/components/pipelines/v2/author-a-pipeline/components/#1-lighweight-python-function-based-components
|
||||
[lightweight-python-component]: /docs/components/pipelines/v2/components/lightweight-python-components
|
|
@ -6,56 +6,41 @@ weight = 1
|
|||
+++
|
||||
|
||||
|
||||
Kubeflow Pipelines (KFP) is a platform for building and deploying portable and
|
||||
scalable machine learning (ML) workflows by using Docker containers.
|
||||
Kubeflow Pipelines (KFP) is a platform for building and deploying portable and scalable machine learning (ML) workflows using Docker containers.
|
||||
|
||||
With KFP you can author [components][components] and [pipelines][pipelines] using the [KFP Python SDK][pypi], compile pipelines to an [intermediate representation YAML][ir-yaml], and submit the pipeline to run on a KFP-conformant backend such as the [open source KFP backend][oss-be] or [Google Cloud Vertex AI Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/introduction).
|
||||
|
||||
The [open source KFP backend][oss-be] is available as a core component of Kubeflow or as a standalone installation. Follow the [installation][installation] instructions and [Hello World Pipeline][hello-world-pipeline] example to quickly get started with KFP.
|
||||
|
||||
KFP is available as a core component of Kubeflow or as a standalone installation. To quickly get started with a KFP deployment and usage example, see the [Quickstart][quickstart] guide.
|
||||
|
||||
<!-- TODO: Include these links once the topic is available -->
|
||||
<!-- [Learn more about installing Kubeflow][Installation]
|
||||
[Learn more about installing Kubeflow Pipelines standalone][Installation] -->
|
||||
|
||||
## Objectives
|
||||
|
||||
The primary objectives of Kubeflow Pipelines are to enable the following:
|
||||
* End-to-end orchestration of ML workflows
|
||||
* Pipeline composability through reusable components and pipelines
|
||||
* Easy management, tracking, and visualization of pipeline definitions, pipeline runs, experiments, and ML artifacts
|
||||
* Efficient use of compute resources by eliminating redundant executions through [caching][caching]
|
||||
* Cross-platform pipeline portability through a platform-neutral [IR YAML pipeline definition][ir-yaml]
|
||||
## Why Kubeflow Pipelines?
|
||||
KFP enables data scientists and machine learning engineers to:
|
||||
* Author end-to-end ML workflows natively in Python
|
||||
* Create fully custom ML components or leverage an ecosystem of existing components
|
||||
* Easily manage, track, and visualize pipeline definitions, runs, experiments, and ML artifacts
|
||||
* Efficiently use compute resources through parallel task execution and through caching to eliminating redundant executions
|
||||
* Maintain cross-platform pipeline portability through a platform-neutral [IR YAML pipeline definition][ir-yaml]
|
||||
|
||||
## What is a pipeline?
|
||||
|
||||
A [_pipeline_][pipelines] is the definition of a workflow with one or more steps called [_tasks_][tasks]. A task is defined by a single container execution and includes output parameters. Each task in a pipeline might also include input parameters. By specifying the output of one task as the input of another task, a pipeline author can form a computed acyclic graph (DAG) of tasks.
|
||||
A [pipeline][pipelines] is a definition of a workflow that composes one or more [components][components] together to form a computational directed acyclic graph (DAG). At runtime, each component execution corresponds to a single container execution, which may create ML artifacts. Pipelines may also feature [control flow][control-flow].
|
||||
|
||||
Pipelines are written in Python for an easy authoring experience, compiled to YAML for portability, and executed on Kubernetes for scalability.
|
||||
|
||||
|
||||
## What does using KFP look like?
|
||||
|
||||
At a high level, using KFP consists of the following steps:
|
||||
|
||||
1. [Author a pipeline][author-a-pipeline] with one or more components using the **Python KFP SDK**'s domain-specific language (DSL). You can [author your own components][components] or use prebuilt components provided by other authors.
|
||||
2. [Compile the pipeline][compile-a-pipeline] into a static representation (YAML) by using the **KFP SDK's DSL compiler**.
|
||||
3. Submit the pipeline to run on the **KFP backend**. The KFP backend orchestrates the Kubernetes Pod creation and data passes, which are required to execute your workflow.
|
||||
4. View your runs, experiments, and ML artifacts on the **KFP Dashboard**.
|
||||
|
||||
|
||||
## Next steps
|
||||
|
||||
* Follow the
|
||||
[pipelines quickstart guide][Quickstart] guide to
|
||||
deploy Kubeflow Pipelines and run your first pipeline
|
||||
<!-- TODO: Uncomment these links once the topic is created -->
|
||||
<!-- * Learn more about the [different ways to install KFP][installation] -->
|
||||
* Learn more about [authoring pipelines][author-a-pipeline]
|
||||
## Next steps
|
||||
* [Install KFP][installation]
|
||||
* Learn more about [authoring components][components]
|
||||
* Learn more about [authoring pipelines][pipelines]
|
||||
|
||||
[quickstart]: /docs/components/pipelines/v2/quickstart
|
||||
[author-a-pipeline]: /docs/components/pipelines/v2/author-a-pipeline
|
||||
[components]: /docs/components/pipelines/v2/author-a-pipeline/components
|
||||
[pipelines]: /docs/components/pipelines/v2/author-a-pipeline/pipelines
|
||||
[tasks]: /docs/components/pipelines/v2/author-a-pipeline/tasks
|
||||
[compile-a-pipeline]: /docs/components/pipelines/v2/compile-a-pipeline
|
||||
[components]: /docs/components/pipelines/v2/components
|
||||
[pipelines]: /docs/components/pipelines/v2/pipelines
|
||||
[installation]: /docs/components/pipelines/v2/installation
|
||||
[caching]: /docs/components/pipelines/v2/author-a-pipeline/tasks/#caching
|
||||
[ir-yaml]: /docs/components/pipelines/v2/compile-a-pipeline/#ir-yaml
|
||||
[ir-yaml]: /docs/components/pipelines/v2/compile-a-pipeline#ir-yaml
|
||||
[oss-be]: /docs/components/pipelines/v2/installation/
|
||||
<!-- GA TODO: drop /#history tag -->
|
||||
[pypi]: https://pypi.org/project/kfp/#history
|
||||
[hello-world-pipeline]: /docs/components/pipelines/v2/hello-world
|
||||
[control-flow]: /docs/components/pipelines/v2/pipelines/control-flow
|
|
@ -0,0 +1,44 @@
|
|||
+++
|
||||
title = "Load and share components"
|
||||
description = "Load and use an ecosystem of components"
|
||||
weight = 8
|
||||
+++
|
||||
|
||||
This section describes how to load and use existing components. In this section, "components" refers to both single-step components and pipelines, which can also be [used as components][pipeline-as-component].
|
||||
|
||||
IR YAML serves as a portable, sharable computational template. This allows you compile and share your components with others, as well as leverage an ecosystem of existing components.
|
||||
|
||||
To use an existing component, you can load it using the [`components`][components-module] module and use it with other components in a pipeline:
|
||||
|
||||
```python
|
||||
from kfp import components
|
||||
|
||||
loaded_comp = components.load_component_from_file('component.yaml')
|
||||
|
||||
@dsl.pipeline
|
||||
def my_pipeline():
|
||||
loaded_comp()
|
||||
```
|
||||
|
||||
You can also load a component directly from a URL, such as a GitHub URL:
|
||||
|
||||
```python
|
||||
loaded_comp = components.load_component_from_url('https://github.com/kubeflow/pipelines/blob/984d8a039d2ff105ca6b21ab26be057b9552b51d/sdk/python/test_data/pipelines/two_step_pipeline.yaml')
|
||||
```
|
||||
|
||||
Lastly, you can load a component from a string using [`components.load_component_from_text`][components-load-component-from-text]:
|
||||
|
||||
```python
|
||||
with open('component.yaml') as f:
|
||||
component_str = f.read()
|
||||
|
||||
loaded_comp = components.load_component_from_text(component_str)
|
||||
```
|
||||
|
||||
Some libraries, such as [Google Cloud Pipeline Components][gcpc] package and provide reusable components in a pip-installable [Python package][gcpc-pypi].
|
||||
|
||||
[pipeline-as-component]: /docs/components/pipelines/v2/pipelines/pipeline-basics#pipelines-as-components
|
||||
[gcpc]: https://cloud.google.com/vertex-ai/docs/pipelines/components-introduction
|
||||
[gcpc-pypi]: https://pypi.org/project/google-cloud-pipeline-components/
|
||||
[components-module]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/components.html
|
||||
[components-load-component-from-text]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/components.html#kfp.components.load_component_from_text
|
|
@ -0,0 +1,7 @@
|
|||
+++
|
||||
title = "Pipelines"
|
||||
description = "Author KFP pipelines"
|
||||
weight = 5
|
||||
+++
|
||||
|
||||
A *pipeline* is a definition of a workflow containing one or more tasks, including how tasks relate to each other to form a computational graph. Pipelines may have inputs which can be passed to tasks within the pipeline and may surface outputs created by tasks within the pipeline. Pipelines can themselves be used as components within other pipelines.
|
|
@ -0,0 +1,165 @@
|
|||
+++
|
||||
title = "Control Flow"
|
||||
description = "Create pipelines with control flow"
|
||||
weight = 2
|
||||
+++
|
||||
|
||||
Although a KFP pipeline decoratored with the `@dsl.pipeline` decorator looks like a normal Python function, it is actually an expression of pipeline topology and control flow semantics, constructed using the KFP domain-specific language (DSL). [Pipeline Basics][pipeline-basics] covered how data passing expresses [pipeline topology through task dependencies][data-passing]. This section describes how to use control flow in your pipelines using the KFP DSL. The DSL features three types of control flow, each implemented by a Python context manager:
|
||||
|
||||
1. Conditions
|
||||
2. Looping
|
||||
3. Exit handling
|
||||
|
||||
### Conditions (dsl.Condition)
|
||||
|
||||
|
||||
The [`dsl.Condition`][dsl-condition] context manager enables conditional execution of tasks within its scope based on the output of an upstream task or pipeline input parameter. The context manager takes two arguments: a required `condition` and an optional `name`. The `condition` is a comparative expression where at least one of the two operands is an output from an upstream task or a pipeline input parameter.
|
||||
|
||||
In the following pipeline, `conditional_task` only executes if `coin_flip_task` has the output `'heads'`.
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.pipeline
|
||||
def my_pipeline():
|
||||
coin_flip_task = flip_coin()
|
||||
with dsl.Condition(coin_flip_task.output == 'heads'):
|
||||
conditional_task = my_comp()
|
||||
```
|
||||
|
||||
### Parallel looping (dsl.ParallelFor)
|
||||
|
||||
The [`dsl.ParallelFor`][dsl-parallelfor] context manager allows parallel execution of tasks over a static set of items. The context manager takes three arguments: a required `items`, an optional `parallelism`, and an optional `name`. `items` is the static set of items to loop over and `parallelism` is the maximum number of concurrent iterations permitted while executing the `dsl.ParallelFor` group. `parallelism=0` indicates unconstrained parallelism.
|
||||
|
||||
In the following pipeline, `train_model` will train a model for 1, 5, 10, and 25 epochs, with no more than two training tasks running at one time:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.pipeline
|
||||
def my_pipeline():
|
||||
with dsl.ParallelFor(
|
||||
items=[1, 5, 10, 25],
|
||||
parallelism=2
|
||||
) as epochs:
|
||||
train_model(epochs=epochs)
|
||||
```
|
||||
|
||||
Use [`dsl.Collected`](https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.Collected) with `dsl.ParallelFor` to gather outputs from a parallel loop of tasks:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.pipeline
|
||||
def my_pipeline():
|
||||
with dsl.ParallelFor(
|
||||
items=[1, 5, 10, 25],
|
||||
) as epochs:
|
||||
train_model_task = train_model(epochs=epochs)
|
||||
max_accuracy(models=dsl.Collected(train_model_task.outputs['model']))
|
||||
```
|
||||
|
||||
Downstream tasks might consume `dsl.Collected` outputs via an input annotated with a `List` of parameters or a `List` of artifacts. For example, `max_accuracy` in the preceding example has the input `models` with type `Input[List[Model]]`, as shown by the following component definition:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
from kfp.dsl import Model, Input
|
||||
|
||||
@dsl.component
|
||||
def select_best(models: Input[List[Model]]) -> float:
|
||||
return max(score_model(model) for model in models)
|
||||
```
|
||||
|
||||
You can use `dsl.Collected` to collect outputs from nested loops in a *nested list* of parameters. For example, output parameters from two nested `dsl.ParallelFor` groups are collected in a multilevel nested list of parameters, where each nested list contains the output parameters from one of the `dsl.ParallelFor` groups. The number of nested levels is based on the number of nested `dsl.ParallelFor` contexts.
|
||||
|
||||
By comparison, *artifacts* created in nested loops are collected in a *flat* list.
|
||||
|
||||
You can also return a `dsl.Collected` from a pipeline. Use a `List` of parameters or a `List` of artifacts in the return annotation, as shown in the following example:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
from kfp.dsl import Model
|
||||
|
||||
@dsl.pipeline
|
||||
def my_pipeline() -> List[Model]:
|
||||
with dsl.ParallelFor(
|
||||
items=[1, 5, 10, 25],
|
||||
) as epochs:
|
||||
train_model_task = train_model(epochs=epochs)
|
||||
return dsl.Collected(train_model_task.outputs['model'])
|
||||
```
|
||||
|
||||
|
||||
### Exit handling (dsl.ExitHandler)
|
||||
The [`dsl.ExitHandler`][dsl-exithandler] context manager allows pipeline authors to specify an exit task which will run after the tasks within the context manager's scope finish execution, even if one of those tasks fails. This is analogous to using `try:` block followed by `finally:` block in normal Python, where the exit task is in the `finally:` block. The context manager takes two arguments: a required `exit_task` and an optional `name`. `exit_task` accepts an instantiated [`PipelineTask`][dsl-pipelinetask].
|
||||
|
||||
In the following pipeline, `clean_up_task` will execute after either both `create_dataset` and `train_and_save_models` finish or either of them fail:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.pipeline
|
||||
def my_pipeline():
|
||||
clean_up_task = clean_up_resources()
|
||||
with dsl.ExitHandler(exit_task=clean_up_task):
|
||||
dataset_task = create_datasets()
|
||||
train_task = train_and_save_models(dataset=dataset_task.output)
|
||||
```
|
||||
|
||||
The task you use as an exit task may use a special input that provides access to pipeline and task status metadata, including pipeline failure or success status. You can use this special input by annotating your exit task with the [`dsl.PipelineTaskFinalStatus`][dsl-pipelinetaskfinalstatus] annotation. The argument for this parameter will be provided by the backend automatically at runtime. You should not provide any input to this annotation when you instantiate your exit task.
|
||||
|
||||
The following pipeline uses `dsl.PipelineTaskFinalStatus` to obtain information about the pipeline and task failure, even after `fail_op` fails:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
from kfp.dsl import PipelineTaskFinalStatus
|
||||
|
||||
|
||||
@dsl.component
|
||||
def exit_op(user_input: str, status: PipelineTaskFinalStatus):
|
||||
"""Prints pipeline run status."""
|
||||
print(user_input)
|
||||
print('Pipeline status: ', status.state)
|
||||
print('Job resource name: ', status.pipeline_job_resource_name)
|
||||
print('Pipeline task name: ', status.pipeline_task_name)
|
||||
print('Error code: ', status.error_code)
|
||||
print('Error message: ', status.error_message)
|
||||
|
||||
@dsl.component
|
||||
def fail_op():
|
||||
import sys
|
||||
sys.exit(1)
|
||||
|
||||
@dsl.pipeline
|
||||
def my_pipeline():
|
||||
print_op()
|
||||
print_status_task = exit_op(user_input='Task execution status:')
|
||||
with dsl.ExitHandler(exit_task=print_status_task):
|
||||
fail_op()
|
||||
```
|
||||
#### Ignore upstream failure
|
||||
The [`.ignore_upstream_failure()`][ignore-upstream-failure] task method on [`PipelineTask`][dsl-pipelinetask] enables another approach to author pipelines with exit handling behavior. Calling this method on a task causes the task to ignore failures of any specified upstream tasks (as established by data exchange or by use of [`.after()`][dsl-pipelinetask-after]). If the task has no upstream tasks, this method has no effect.
|
||||
|
||||
In the following pipeline definition, `clean_up_task` is executed after `fail_op`, regardless of whether `fail_op` succeeds:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.pipeline()
|
||||
def my_pipeline(text: str = 'message'):
|
||||
task = fail_op(message=text)
|
||||
clean_up_task = print_op(
|
||||
message=task.output).ignore_upstream_failure()
|
||||
```
|
||||
|
||||
Note that the component used for the caller task (`print_op` in the example above) requires a default value for all inputs it consumes from an upstream task. The default value is applied if the upstream task fails to produce the outputs that are passed to the caller task. Specifying default values ensures that the caller task always succeeds, regardless of the status of the upstream task.
|
||||
|
||||
[data-passing]: /docs/components/pipelines/v2/pipelines/pipeline-basics#data-passing-and-task-dependencies
|
||||
[pipeline-basics]: /docs/components/pipelines/v2/pipelines/pipeline-basics
|
||||
[dsl-condition]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.Condition
|
||||
[dsl-exithandler]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.ExitHandler
|
||||
[dsl-parallelfor]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.ParallelFor
|
||||
[dsl-pipelinetaskfinalstatus]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.PipelineTaskFinalStatus
|
||||
[ignore-upstream-failure]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.PipelineTask.ignore_upstream_failure
|
||||
[dsl-pipelinetask]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.PipelineTask
|
||||
[dsl-pipelinetask-after]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.PipelineTask.after
|
|
@ -0,0 +1,164 @@
|
|||
+++
|
||||
title = "Pipeline Basics"
|
||||
description = "Compose components into pipelines"
|
||||
weight = 1
|
||||
+++
|
||||
|
||||
While components have three authoring approaches, pipelines have one authoring approach: they are defined with a pipeline function decorated with the [`@dsl.pipeline`][dsl-pipeline] decorator. Take the following pipeline `pythagorean`, which implements the Pythagorean theorem as a pipeline via simple arithmetic components:
|
||||
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.component
|
||||
def square(x: float) -> float:
|
||||
return x ** 2
|
||||
|
||||
@dsl.component
|
||||
def add(x: float, y: float) -> float:
|
||||
return x + y
|
||||
|
||||
@dsl.component
|
||||
def square_root(x: float) -> float:
|
||||
return x ** .5
|
||||
|
||||
@dsl.pipeline
|
||||
def pythagorean(a: float, b: float) -> float:
|
||||
a_sq_task = square(x=a)
|
||||
b_sq_task = square(x=b)
|
||||
sum_task = add(x=a_sq_task.output, y=b_sq_task.output)
|
||||
return square_root(x=sum_task.output).output
|
||||
```
|
||||
|
||||
|
||||
Although a KFP pipeline decoratored with the `@dsl.pipeline` decorator looks like a normal Python function, it is actually an expression of pipeline topology and [control flow][control-flow] semantics, constructed using the KFP domain-specific language (DSL).
|
||||
|
||||
|
||||
A pipeline definition has four parts:
|
||||
1. The pipeline decorator
|
||||
2. Inputs and outputs declared in the function signature
|
||||
3. Data passing and task dependencies
|
||||
4. Task configurations
|
||||
5. Pipeline control flow
|
||||
|
||||
This section covers the first four parts. [Control flow][control-flow] is covered in the next section.
|
||||
|
||||
### The pipeline decorator
|
||||
KFP pipelines are defined inside functions decorated with the `@dsl.pipeline` decorator. The decorator takes three optional arguments:
|
||||
|
||||
* `name` is the name of your pipeline. If not provided, the name defaults to a sanitized version of the pipeline function name.
|
||||
* `description` is a description of the pipeline.
|
||||
* `pipeline_root` is the root path of the remote storage destination within which the tasks in your pipeline will create outputs. `pipeline_root` may also be set or overridden by pipeline submission clients.
|
||||
|
||||
You can modify the definition of `pythagorean` to use these arguments:
|
||||
|
||||
```python
|
||||
@dsl.pipeline(name='pythagorean-theorem-pipeline',
|
||||
description='Solve for the length of a hypotenuse of a triangle with sides length `a` and `b`.',
|
||||
pipeline_root='gs://my-pipelines-bucket')
|
||||
def pythagorean(a: float, b: float) -> float:
|
||||
...
|
||||
```
|
||||
|
||||
### Pipeline inputs and outputs
|
||||
Like [components][components], pipeline inputs and outputs are defined by the parameters and annotations in the pipeline function signature.
|
||||
|
||||
In the preceding example, `pythagorean` accepts inputs `a` and `b` each typed `float` and creates one `float` output.
|
||||
|
||||
Pipeline inputs are declaried via function input parameters/annotations and pipeline outputs are declared via function output annotations. Pipeline outputs will _never be declared via pipeline function input parameters_, unlike for components that use [output artifacts][output-artifacts] or [Container Components that use `dsl.OutputPath`][container-component-outputs].
|
||||
|
||||
For more information on how to declare pipeline function inputs and outputs, see [Data Types][data-types].
|
||||
|
||||
### Data passing and task dependencies
|
||||
|
||||
When you call a component in a pipeline definition, it constructs a [`PipelineTask`][pipelinetask] instance. You can pass data between tasks using the `PipelineTask`'s `.output` and `.outputs` attributes.
|
||||
|
||||
For a task with a single unnamed output indicated by a single return annotation, access the output using `PipelineTask.output`. This the case for the components `square`, `add`, and `square_root`, which each have one unnamed output.
|
||||
|
||||
For tasks with multiple outputs or named outputs, access the output using `PipelineTask.output['<output-key>']`. Using named output parameters is described in more detail in [Data Types: Parameters][parameters-namedtuple].
|
||||
|
||||
In the absence of data exchange, tasks will run in parallel for efficient pipeline executions. This is the case for `a_sq_task` and `b_sq_task` which do not exchange data.
|
||||
|
||||
When tasks exchange data, an execution ordering is established between those tasks. This is to ensure that upstream tasks create their outputs before downstream tasks attempt to consume those outputs. For example, in `pythagorean`, the backend will execute `a_sq_task` and `b_sq_task` before it executes `sum_task`. Similarly, it will execute `sum_task` before it executes the final task created from the `square_root` component.
|
||||
|
||||
In some cases, you may wish to establish execution ordering in the asbence of data exchange. In these cases, you can call one task's `.after()` method on another task. For example, while `a_sq_task` and `b_sq_task` do not exchange data, we can specify `a_sq_task` to run before `b_sq_task`:
|
||||
|
||||
```python
|
||||
@dsl.pipeline
|
||||
def pythagorean(a: float, b: float) -> float:
|
||||
a_sq_task = square(x=a)
|
||||
b_sq_task = square(x=b)
|
||||
b_sq_task.after(a_sq_task)
|
||||
...
|
||||
```
|
||||
|
||||
### Task configurations
|
||||
The KFP SDK exposes several platform-agnostic task-level configurations via task methods. Platform-agnostic configurations are those that are expected to exhibit similar execution behavior on all KFP-conformant backends, such as the [open source KFP backend][oss-be] or [Google Cloud Vertex AI Pipelines][vertex-pipelines].
|
||||
|
||||
All platform-agnostic task-level configurations are set using [`PipelineTask`][pipelinetask] methods. Take the following environment variable example:
|
||||
|
||||
```python
|
||||
from kfp import dsl
|
||||
|
||||
@dsl.component
|
||||
def print_env_var():
|
||||
import os
|
||||
print(os.environ.get('MY_ENV_VAR'))
|
||||
|
||||
@dsl.pipeline()
|
||||
def my_pipeline():
|
||||
task = print_env_var()
|
||||
task.set_env_variable('MY_ENV_VAR', 'hello')
|
||||
```
|
||||
|
||||
When executed, the `print_env_var` component should print `'hello'`.
|
||||
|
||||
Task-level configuration methods can also be chained:
|
||||
|
||||
```python
|
||||
print_env_var().set_env_variable('MY_ENV_VAR', 'hello').set_env_variable('OTHER_VAR', 'world')
|
||||
```
|
||||
|
||||
The KFP SDK provides the following task methods for setting task-level configurations:
|
||||
* `.add_accelerator_type`
|
||||
* `.set_accelerator_limit`
|
||||
* `.set_cpu_limit`
|
||||
* `.set_memory_limit`
|
||||
* `.set_env_variable`
|
||||
* `.set_caching_options`
|
||||
* `.set_display_name`
|
||||
* `.set_retry`
|
||||
* `.ignore_upstream_failure`
|
||||
|
||||
See the [`PipelineTask` reference documentation][pipelinetask] for more information about these methods.
|
||||
|
||||
### Pipelines as components
|
||||
Pipelines can themselves be used as components in other pipelines, just as you would use any other single-step component in a pipeline. For example, we could easily recompose the preceding `pythagorean` pipeline to use an inner helper pipeline `square_and_sum`:
|
||||
|
||||
```python
|
||||
@dsl.pipeline
|
||||
def square_and_sum(a: float, b: float) -> float:
|
||||
a_sq_task = square(x=a)
|
||||
b_sq_task = square(x=b)
|
||||
return add(x=a_sq_task.output, y=b_sq_task.output).output
|
||||
|
||||
|
||||
@dsl.pipeline
|
||||
def pythagorean(a: float = 1.2, b: float = 1.2) -> float:
|
||||
sq_and_sum_task = square_and_sum(a=a, b=b)
|
||||
return square_root(x=sq_and_sum_task.output).output
|
||||
```
|
||||
|
||||
<!-- TODO: make this reference more precise throughout -->
|
||||
[dsl-reference-docs]: https://kubeflow-pipelines.readthedocs.io/en/master/source/dsl.html
|
||||
[dsl-pipeline]: https://kubeflow-pipelines.readthedocs.io/en/master/source/dsl.html#kfp.dsl.pipeline
|
||||
[control-flow]: /docs/components/pipelines/v2/pipelines/control-flow
|
||||
[components]: /docs/components/pipelines/v2/components
|
||||
[pipelinetask]: https://kubeflow-pipelines.readthedocs.io/en/master/source/dsl.html#kfp.dsl.PipelineTask
|
||||
[vertex-pipelines]: https://cloud.google.com/vertex-ai/docs/pipelines/introduction
|
||||
[oss-be]: /docs/components/pipelines/v2/installation/
|
||||
[dsl-reference-docs]: https://kubeflow-pipelines.readthedocs.io/en/master/source/dsl.html
|
||||
[data-types]: /docs/components/pipelines/v2/data-types
|
||||
[output-artifacts]: /docs/components/pipelines/v2/data-types/artifacts#using-output-artifacts
|
||||
[container-component-outputs]: /docs/components/pipelines/v2/components/container-components#create-component-outputs
|
||||
[parameters-namedtuple]: /docs/components/pipelines/v2/data-types/parameters#multiple-output-parameters
|
|
@ -1,10 +1,10 @@
|
|||
+++
|
||||
title = "Run a Pipeline"
|
||||
description = "Execute a pipeline on the KFP backend"
|
||||
weight = 6
|
||||
weight = 8
|
||||
+++
|
||||
|
||||
The KFP SDK offers three ways to run a pipeline.
|
||||
The KFP offers three ways to run a pipeline.
|
||||
|
||||
## 1. Run from the KFP Dashboard
|
||||
The first and easiest way to run a pipeline is by submitting it via the KFP dashboard.
|
||||
|
@ -74,7 +74,7 @@ kfp run create --experiment-name my-experiment --package-file path/to/pipeline.y
|
|||
|
||||
For more information about the `kfp run create` command, see the [KFP Command Line Interface reference documentation][kfp-run-create-reference-docs]. For more information on the KFP CLI generally see [Command Line Interface user docs][kfp-cli].
|
||||
|
||||
[compile-a-pipeline]: /docs/components/pipelines/v2/compile-a-pipeline/
|
||||
[compile-a-pipeline]: /docs/components/pipelines/v2/compile-a-pipeline
|
||||
[kfp-sdk-api-ref-client]: https://kubeflow-pipelines.readthedocs.io/en/master/source/client.html
|
||||
[kfp-cli]: /docs/components/pipelines/v2/cli/
|
||||
[kfp-run-create-reference-docs]: https://kubeflow-pipelines.readthedocs.io/en/master/source/cli.html#kfp-run-create
|
Loading…
Reference in New Issue