History

Christian Clauss 8e1e823139 Lint Python code for undefined names (#1721 ) * Lint Python code for undefined names * Lint Python code for undefined names * Exclude tfdv.py to workaround an overzealous pytest * Fixup for tfdv.py * Fixup for tfdv.py * Fixup for tfdv.py		2019-08-21 15:04:31 -07:00
..
arena	Lint Python code for undefined names (#1721 )	2019-08-21 15:04:31 -07:00
aws	Lint Python code for undefined names (#1721 )	2019-08-21 15:04:31 -07:00
bigquery	…
dataflow	Release `0517114dc2` (#1859 )	2019-08-16 13:16:11 -07:00
dataproc	…
gcp	Update Python SDK versions for release. (#1860 )	2019-08-16 13:58:23 -07:00
ibm-components	…
kubeflow	Release `0517114dc2` (#1859 )	2019-08-16 13:16:11 -07:00
local	Release `0517114dc2` (#1859 )	2019-08-16 13:16:11 -07:00
nuclio	…
sample/keras/train_classifier	…
OWNERS	…
README.md	…
build_image.sh	…
license.sh	…
release.sh	…
test_load_all_components.sh	…
third_party_licenses.csv	…

README.md

Kubeflow pipeline components

Kubeflow pipeline components are implementations of Kubeflow pipeline tasks. Each task takes one or more artifacts as input and may produce one or more artifacts as output.

Example: XGBoost DataProc components

Each task usually includes two parts:

Client code The code that talks to endpoints to submit jobs. For example, code to talk to Google Dataproc API to submit a Spark job.

Runtime code The code that does the actual job and usually runs in the cluster. For example, Spark code that transforms raw data into preprocessed data.

Container A container image that runs the client code.

Note the naming convention for client code and runtime code—for a task named "mytask":

The mytask.py program contains the client code.
The mytask directory contains all the runtime code.

See how to use the Kubeflow Pipelines SDK and build your own components.