* add test for keyword-only arguments in pipeline func
* fix: kwargs-only argument for pipeline func
* test: kwargs generate same yaml as args
* remove whole metadata
* assert -> self.assertEqual
* programmatic example --> fixed example
* same name for both
Co-authored-by: Alexey Volkov <alexey.volkov@ark-kun.com>
* add placeholder to spec
* add output_directory to pipeline
* respect uri placeholder in file outputs
* wip: add data passing rewriting logic to respect the uri semantics
* merge input_uri and paths when instantiating ContainerOp
* fix
* fix workflow rewriting
* Add topology rewriting
* add a test case, and various fixes
* make the test case more complex
* Fix the case when working with OpsGroup
* Fix test case
* fix resolving test
* fix redundant cmd lines
* fix redundant cmd lines
* resolve comments
* fix file outputs
* resolve comments
* copy file outputs instead of modifying inplace.
* feat(sdk): add ability to set retry policy
This fixes the second part of the issue described in #4333
The first part was addressed in #4392
* feat(sdk): validate retry policy name
* feat(sdk): simplify retry policy interface
Reverting most of the #2334 which inadvertently broke those artifacts by causing the names to be mangled.
KFP's DSL compiler prepends template names to output names to ensure global uniqueness of *input* names (DSL's ContainerOp does not have concept of inputs, so the inputs are generated during the compilation including input names). But prepending template names to the output names stops the backend from recognizing the mlpipeline-ui-metadata and mlpipeline-metrics artifacts.
ContainerOp has no concept of inputs, so it looses any information about them such as input names and in some cases even the passed argument values (which are just injected into the command line).
This commit fixes that issue by preserving the paramater arguments map and ultimately storing it in an Argo template annotation.
Fixes https://github.com/kubeflow/pipelines/issues/4556
* Prepare SDK docs environment so its easier to understand how to build the docs locally so theyre consistent with ReadTheDocs.
* Clean up docstrings for kfp.Client
* Add in updates to the docs for compiler and components
* Update components area to add in code references and make formatting a little more consistent.
* Clean up containers, add in custom CSS to ensure we do not overflow on inline code blocks
* Clean up containers, add in custom CSS to ensure we do not overflow on inline code blocks
* Remove unused kfp.notebook package links
* Clean up a few more errant references
* Clean up the DSL docs some more
* Update SDK docs for KFP extensions to follow Sphinx guidelines
* Clean up formatting of docstrings after Ark-Kuns comments
* SDK - Compiler - Fixed the input argument mapping when using dsl.graph_component
Fixes https://github.com/kubeflow/pipelines/issues/3915
* Stopped relying on the argument order at all
This can make the compilation less fragile.
* SDK - Compiler - Added support for volume-based data passing
Currently artifact passing is performed by Argo sidecar containers what download input data and upload output data to artifact repository (usually, S3-compatible blob storage like Minio).
The performance of this method is not optimal and it requires that pod disks have enough capacity to hold all artifact data.
This commit adds support for volume-based data passing.
This method involves using a single milti-write Kubernetes data volume to pass all intermediate data.
Parts of the volume are mounted to the input/output artifact directories, so when the user program reads and writes files, the files actually reside in the data volume.
This method improves the performance and reduces storage resource requirements.
The data volume must exist and support "READ_WRITE_MANY".
Limitations:
* All artifact file names must be the same (e.g. "data"). All auto-generated paths are already consistent. Avoid using any hard-coded paths.
* Passing constant values (text) as arguments for artifact inputs is not supported.
* The feature is experimental.
* Added data_passing_methods.KubernetesVolume
This class represents a configured volume-based artifact passing method.
* Added PipelineConf.data_passing_method
This property allows setting the method that will be used for intermediate data passing.
Added the compiler support for the new feature.
Example:
```python
from kfp.dsl import PipelineConf, data_passing_methods
from kubernetes.client.models import V1Volume, V1PersistentVolumeClaim
pipeline_conf = PipelineConf()
pipeline_conf.data_passing_method = data_passing_methods.KubernetesVolume(
volume=V1Volume(
name='data',
persistent_volume_claim=V1PersistentVolumeClaim('data-volume'),
),
path_prefix='artifact_data/',
)
```
* Added unit test
* Fixed bug in the unit test
Kubernetes does not validate the structures at all...
* Fixed bug in the result structure
* Fixed the test data
The class should be V1PersistentVolumeClaimVolumeSource, not V1PersistentVolumeClaimSpec.
* Fixed the test
* SDK - Compiler - Using properly serialized pipeline parameter defaults
Fixes https://github.com/kubeflow/pipelines/issues/3806
* Sort the keys so that the serialized defaults are stable in python 3.5
* add OOB component dict and utility function
* add test
* add a transformer, which appends the component name label
* add transformer function, compiler and test
* move telemetry test
* fix none uri
* applies comments
* revert dependency on frozendict
* fixes some tests
* resolve comments
* SDK - Made outputs with original names available in ContainerOp.outputs
Previously, ContainerOp had strict requirements for the output names, so we had to convert all the names before passing them to the ContainerOp constructor. Outputs with non-pythonic names could not be accessed using their original names.
Now ContainerOp supports any output names, so we're now using the original output names.
However to support legacy pipelines, we're also adding output references with pythonic names.
* Fixed the compiler test data
* Fixed the duplicate parameter outputs in the compiled workflow
* Fixed long line
* Stabilized the output naming conflict resolution
* Fix case of missing special outputs
* SDK - Annotate pods with component_ref
This preserves the information about the digest of the component and the location from which the component was loaded.
* Fixed compiler tests
* SDK - Compiler - Fixed ParallelFor name clashes
The ParallelFor argument reference resolving was really broken.
The logic "worked" like this - of the name of the referenced output
contained the name of the loop collection source output, then it was
considered to be the reference to the loop item.
This broke lots of scenarios especially in cases where there were
multiple components with same output name (e.g. the default "Output"
output name). The logic also did not distinguish between references to
the loop collection item vs. references to the loop collection source
itself.
I've rewritten the argument resolving logic, to fix the issues.
* Argo cannot use {{item}} when withParams items are dicts
* Stabilize the loop template names
* Renamed the test case