Commit Graph

185 Commits

Author SHA1 Message Date
Michalina Kotwica ce985bc287
fix(sdk): Allow keyword-only arguments in pipeline function signature (#4544)
* add test for keyword-only arguments in pipeline func

* fix: kwargs-only argument for pipeline func

* test: kwargs generate same yaml as args

* remove whole metadata

* assert -> self.assertEqual

* programmatic example --> fixed example

* same name for both

Co-authored-by: Alexey Volkov <alexey.volkov@ark-kun.com>
2021-01-29 18:31:02 -08:00
Jiaxiao Zheng a36a62a700
feat(sdk): Artifact metadata related placeholder for components. (#5003)
* resolve comments.

* fix tests

* wip: add structures and skeleton for component resolution logic

* add generator

* fix the problem

* cleanup

* add a test

* fix tests
2021-01-19 08:57:45 -08:00
radcheb 5633b9abda
fix(sdk): fixes unresolved PipelineParam when static list passed to dsl.ParallelFor. Fixes #4890 (#4891)
* fix parallelfor compiling items + add tests

* remove debug print

* fix tests

* fix parallelfor_pipeline_param_in_items_resolving test

* debug test

* fix tests

* Revert "debug test"

This reverts commit 57451143bd.

* fix tests
2021-01-14 00:09:03 -08:00
Jiaxiao Zheng 7540ba5c3b
feat(sdk): Implements artifact URI placeholder. (#4932)
* add placeholder to spec

* add output_directory to pipeline

* respect uri placeholder in file outputs

* wip: add data passing rewriting logic to respect the uri semantics

* merge input_uri and paths when instantiating ContainerOp

* fix

* fix workflow rewriting

* Add topology rewriting

* add a test case, and various fixes

* make the test case more complex

* Fix the case when working with OpsGroup

* Fix test case

* fix resolving test

* fix redundant cmd lines

* fix redundant cmd lines

* resolve comments

* fix file outputs

* resolve comments

* copy file outputs instead of modifying inplace.
2021-01-05 20:39:51 -08:00
Niklas Hansson 24732b9dae
feat(compiler): add dsl operation for parallelism on sub dag level (#4199)
* Added subdag parallelism

Authored-by: NikeNano <niklas.sven.hansson@gmail.com>
Co-authored-by: guanhuichen <guanhuichen@gmail.com>

* added error handling, fixed comment and refactored

* updated with sleep and TODO

* fix imports

Co-authored-by: guanhuichen <guanhuichen@gmail.com>
2020-12-26 22:10:27 -08:00
Kenta Onishi 5a4b70e37c
feat(sdk): Add settings of the dnsConfig field. Fixes #4836 (#4837)
* feat(sdk): Add settings of the dnsConfig field. Fixes #4836

* feat(sdk): Add dnsConfig example and sample.

* feat(sdk): Refactor dnsConfig param.

* feat(sdk): Refactor dnsConfig param.
2020-12-14 20:05:49 -08:00
Vitalii Vokhmin 2f3a686e54
feat(sdk): add ability to set retry policy (#4858)
* feat(sdk): add ability to set retry policy

This fixes the second part of the issue described in #4333
The first part was addressed in #4392

* feat(sdk): validate retry policy name

* feat(sdk): simplify retry policy interface
2020-12-11 14:47:29 -08:00
Jiaxiao Zheng fb15223f7e
chore: Add doc strings marking the feature stages for SDK. (#4575)
* add doc strings

* Simplify the docstring

* fix unittest

* recover cli.py

* recover cli.py

* substitute docstring in resource ops with TODOs

* revert stable labels
2020-11-24 00:19:00 -08:00
Alexey Volkov f7874d38ff
fix(sdk): Compiler - Fixed pipeline parameters with empty default values (#4552)
Fixes https://github.com/kubeflow/pipelines/issues/4549
2020-11-12 15:52:28 -08:00
Asav Patel 9efc9e59b2
fix(sdk) - fixes missing import in KFP compiler (#4741) 2020-11-09 17:30:41 -08:00
Alexey Volkov 80e1d7063d
fix(sdk): Fixed UI metadata and metrics (#4672)
Reverting most of the #2334 which inadvertently broke those artifacts by causing the names to be mangled.

KFP's DSL compiler prepends template names to output names to ensure global uniqueness of *input* names (DSL's ContainerOp does not have concept of inputs, so the inputs are generated during the compilation including input names). But prepending template names to the output names stops the backend from recognizing the mlpipeline-ui-metadata and mlpipeline-metrics artifacts.
2020-10-25 19:37:00 -07:00
Alexey Volkov e8fb58a221
feat(sdk): Preserve parameter arguments and input names (#4563)
ContainerOp has no concept of inputs, so it looses any information about them such as input names and in some cases even the passed argument values (which are just injected into the command line).
This commit fixes that issue by preserving the paramater arguments map and ultimately storing it in an Argo template annotation.

Fixes https://github.com/kubeflow/pipelines/issues/4556
2020-10-11 20:32:48 -07:00
Niklas Hansson c32ea232d5
feat(compiled): set pod disruption budget for pipelines. Fixes #3877 (#4178)
* Update _client.py

* Update _client.py

* added pod disruption budget

* clean up

* Update sdk/python/kfp/dsl/_pipeline.py

* fixed parameter

* updated after feedback

* removed selector
2020-09-14 13:45:26 -07:00
Victor 22b7b99a8b
fix(sdk): Fix opsgroups dependency resolution (#4370) 2020-08-27 09:03:53 -07:00
Alexey Volkov d87c3be611
DEPRECATE(sdk): DSL - Deprecated output_artifact_paths parameter in ContainerOp constructor (#2334)
The users should switch to file_outputs instead.
Previously `file_outputs` only supported small data outputs, but now it supports big files.
2020-08-07 21:44:19 -07:00
Alex Latchford 704c8c7660
chore: Clean up KFP SDK docstrings, make formatting a little more consistent (#4218)
* Prepare SDK docs environment so its easier to understand how to build the docs locally so theyre consistent with ReadTheDocs.

* Clean up docstrings for kfp.Client

* Add in updates to the docs for compiler and components

* Update components area to add in code references and make formatting a little more consistent.

* Clean up containers, add in custom CSS to ensure we do not overflow on inline code blocks

* Clean up containers, add in custom CSS to ensure we do not overflow on inline code blocks

* Remove unused kfp.notebook package links

* Clean up a few more errant references

* Clean up the DSL docs some more

* Update SDK docs for KFP extensions to follow Sphinx guidelines

* Clean up formatting of docstrings after Ark-Kuns comments
2020-08-04 00:33:47 +08:00
Alexey Volkov bbc9ff5ec3
SDK - Compiler - Validating Argo validator (#3874)
* SDK - Compiler - Validating Argo validator

* Added warning if argo is available, but not working
2020-07-10 19:07:21 -07:00
Alexey Volkov db0af86e53
feat(sdk): SDK - Enable placeholders in task display names. Fixes #4163 (#4164) 2020-07-09 18:42:35 -07:00
Niklas Hansson c6ac83f72c
feat: add parallelism for dsl.ParallelFor. Fixes #4089 (#4149)
* Added parallism at sub-dag level

* updated the parallism

* remove yaml file

* reformatting

* Update sdk/python/kfp/compiler/compiler.py

* Update sdk/python/kfp/compiler/compiler.py

* Update samples/core/loop_parallelism/loop_parallelism.py

Co-authored-by: Alexey Volkov <alexey.volkov@ark-kun.com>

Co-authored-by: Alexey Volkov <alexey.volkov@ark-kun.com>
2020-07-08 11:27:13 -07:00
Alexey Volkov 48889a99d1
fix(sdk): Compiler - Fixed input artifact name sanitization when using raw string arguments. Fixes #4110 (#4120) 2020-07-08 10:43:09 -07:00
Alexey Volkov 229eff2516
SDK - Compiler - Removed the deprecated dsl-compile --package command (#4055) 2020-07-01 19:12:01 -07:00
Alexey Volkov 6960366846
fix(sdk): Compiler - Fixed the input argument mapping when using dsl.graph_component. Fixes #3915 (4082)
* SDK - Compiler - Fixed the input argument mapping when using dsl.graph_component

Fixes https://github.com/kubeflow/pipelines/issues/3915

* Stopped relying on the argument order at all

This can make the compilation less fragile.
2020-06-29 02:31:37 -07:00
Jiaxiao Zheng b099c6f5d3
chore: Rollback telemetry related changes (4088)
* Revert "fix length (#3934)"

This reverts commit 7fbb7cae

* Revert "[SDK] Add first party component label (#3861)"

This reverts commit 1e2b9d4e

* Revert "[SDK] Add pod labels for telemetry purpose. (#3578)"

This reverts commit aa8da64b
2020-06-27 15:46:14 -07:00
Alexey Volkov 54a596abd8
SDK - Compiler - Added support for volume-based data passing (3371)
* SDK - Compiler - Added support for volume-based data passing

Currently artifact passing is performed by Argo sidecar containers what download input data and upload output data to artifact repository (usually, S3-compatible blob storage like Minio).
The performance of this method is not optimal and it requires that pod disks have enough capacity to hold all artifact data.

This commit adds support for volume-based data passing.
This method involves using a single milti-write Kubernetes data volume to pass all intermediate data.
Parts of the volume are mounted to the input/output artifact directories, so when the user program reads and writes files, the files actually reside in the data volume.
This method improves the performance and reduces storage resource requirements.

The data volume must exist and support "READ_WRITE_MANY".

Limitations:
* All artifact file names must be the same (e.g. "data"). All auto-generated paths are already consistent. Avoid using any hard-coded paths.
* Passing constant values (text) as arguments for artifact inputs is not supported.
* The feature is experimental.

* Added data_passing_methods.KubernetesVolume

This class represents a configured volume-based artifact passing method.

* Added PipelineConf.data_passing_method

This property allows setting the method that will be used for intermediate data passing.
Added the compiler support for the new feature.

Example:
```python
from kfp.dsl import PipelineConf, data_passing_methods
from kubernetes.client.models import V1Volume, V1PersistentVolumeClaim
pipeline_conf = PipelineConf()
pipeline_conf.data_passing_method = data_passing_methods.KubernetesVolume(
    volume=V1Volume(
        name='data',
        persistent_volume_claim=V1PersistentVolumeClaim('data-volume'),
    ),
    path_prefix='artifact_data/',
)
```

* Added unit test

* Fixed bug in the unit test

Kubernetes does not validate the structures at all...

* Fixed bug in the result structure

* Fixed the test data

The class should be V1PersistentVolumeClaimVolumeSource, not V1PersistentVolumeClaimSpec.

* Fixed the test
2020-06-25 16:11:31 -07:00
Alexey Volkov 757d43c7fd
SDK - Compiler - Fixed error message (#4053)
Fixes https://github.com/kubeflow/pipelines/issues/4021
2020-06-24 11:42:46 -07:00
Alexey Volkov 374b3b02d2
SDK - Compiler - Made compiler compatible with @wraps (#3956)
Fixes https://github.com/kubeflow/pipelines/issues/3367
2020-06-11 20:03:55 -07:00
Jiaxiao Zheng 7fbb7cae56
fix length (#3934) 2020-06-09 14:00:06 -07:00
Alexey Volkov 40372e5c86
SDK - Compiler - Using properly serialized pipeline parameter defaults (#3832)
* SDK - Compiler - Using properly serialized pipeline parameter defaults

Fixes https://github.com/kubeflow/pipelines/issues/3806

* Sort the keys so that the serialized defaults are stable in python 3.5
2020-06-09 13:10:04 -07:00
Jiaxiao Zheng 1e2b9d4e7e
[SDK] Add first party component label (#3861)
* add OOB component dict and utility function

* add test

* add a transformer, which appends the component name label

* add transformer function, compiler and test

* move telemetry test

* fix none uri

* applies comments

* revert dependency on frozendict

* fixes some tests

* resolve comments
2020-05-29 08:55:16 -07:00
Thi Nguyen ec9445aa01
Allow PipelineParams in dict keys too. (#3565)
Co-authored-by: Thi Nguyen <duongnt@users.noreply.github.com>
2020-05-19 17:54:19 -07:00
Alexey Volkov 8ba366b03f
SDK - Made outputs with original names available in ContainerOp.outputs (#3734)
* SDK - Made outputs with original names available in ContainerOp.outputs

Previously, ContainerOp had strict requirements for the output names, so we had to convert all the names before passing them to the ContainerOp constructor. Outputs with non-pythonic names could not be accessed using their original names.
Now ContainerOp supports any output names, so we're now using the original output names.
However to support legacy pipelines, we're also adding output references with pythonic names.

* Fixed the compiler test data

* Fixed the duplicate parameter outputs in the compiled workflow

* Fixed long line

* Stabilized the output naming conflict resolution

* Fix case of missing special outputs
2020-05-12 19:08:26 -07:00
Alexey Volkov 2279bde698
SDK - Annotate pods with component_ref (#3727)
* SDK - Annotate pods with component_ref

This preserves the information about the digest of the component and the location from which the component was loaded.

* Fixed compiler tests
2020-05-11 17:18:21 -07:00
Niklas Hansson 05c1537f28
Add Nodeselector to pipelineconfig fix issue #2863 (#3616)
* updated version

* added pipeline nodeselector

* removed old legacy

* renaming

* update test

* Update sdk/python/kfp/compiler/compiler.py
2020-05-05 00:11:08 -07:00
Eterna2 9167da1b4e
Support execution throttling for executing the pipelines (#3346) (#3439)
* Add parallelism limits to pipeline in kfp sdk

* fix lint error
2020-05-04 23:25:08 -07:00
Jiaxiao Zheng aa8da64b4c
[SDK] Add pod labels for telemetry purpose. (#3578)
* add telemetry pod labels

* revert the id label

* update compiler tests

* update cli arg

* bypass tfx

* update docstring
2020-04-27 18:50:04 -07:00
Alexey Volkov 6cb92d45c8
SDK - Compiler - Include the SDK version information in the compiled workflows (#3583)
* SDK - Compiler - Include the SDK version information in the compiled workflows

* Fixed the unit tests

* Removed the sdk_version annotation.
2020-04-25 01:49:28 -07:00
Niklas Hansson 2354776e1e
fix #2802: Set ImagePullPolicy per pipeline. (#3534)
* bump version

* default image pull policy

* Update sdk/python/kfp/dsl/_pipeline.py

* task setting should dominate

* Update sdk/python/kfp/dsl/_pipeline.py

* fixed merge misstake
2020-04-23 07:09:13 -07:00
Alexey Volkov b63ad7e614
SDK - Removed the ArtifactLocation feature (#3517)
* SDK - Removed the ArtifactLocation feature

The feature was deprecated in v0.1.34 https://github.com/kubeflow/pipelines/pull/2326

* Removed the artifact_location sample
2020-04-23 00:49:44 -07:00
Alexey Volkov 08c7c0ef36
SDK - Made YAML dumping more awesome (#3520)
See the root cause explanation in https://github.com/kubeflow/pipelines/issues/3519
2020-04-16 21:23:07 -07:00
Alexey Volkov 95aec25db4
SDK - Support kubernetes client v11 (#3319)
Fixes https://github.com/kubeflow/pipelines/issues/3275
2020-03-22 02:54:44 -07:00
Alexey Volkov 734b43e3db
SDK - Added support for maxCacheStaleness (#3318)
* SDK - Added support for maxCacheStaleness

* Added the vendor prefix to the annotation
2020-03-20 13:38:09 -07:00
Alexey Volkov 03e064cee2
SDK - Compiler - Fix incompatibility with python3.5 (#3122) 2020-02-19 13:55:47 -08:00
Alexey Volkov a33ae25bc4
SDK - Compiler - Add optional Argo validation (#3094)
argo CLI tool must be in path for this feature to work
2020-02-18 23:12:25 -08:00
Alexey Volkov 4a1b282461
SDK - Compiler - Fixed ParallelFor argument resolving (#3029)
* SDK - Compiler - Fixed ParallelFor name clashes

The ParallelFor argument reference resolving was really broken.
The logic "worked" like this - of the name of the referenced output
contained the name of the loop collection source output, then it was
considered to be the reference to the loop item.
This broke lots of scenarios especially in cases where there were
multiple components with same output name (e.g. the default "Output"
output name). The logic also did not distinguish between references to
the loop collection item vs. references to the loop collection source
itself.

I've rewritten the argument resolving logic, to fix the issues.

* Argo cannot use {{item}} when withParams items are dicts

* Stabilize the loop template names

* Renamed the test case
2020-02-11 12:18:09 -08:00
Alexey Volkov c83aff2738
SDK - Components - Made it easier to access component spec classes (#2860)
* SDK - Components - Made it easier to access component spec classes

* Updated the imports
2020-01-31 11:41:21 -08:00
Jiaxiao Zheng 358e26adb1 [SDK/compiler] Sanitize op name for PipelineParam (#2711)
* sanitize op name for pipeline param

* refactor sanitization to compiler level, and add unittest
2019-12-27 18:01:39 -08:00
Alexey Volkov b8a2e6f400 SDK/Compiler - Preventing pipeline entrypoint template name from clashing with other template names (#1555)
Case exhibiting the problem:
```
def add(a, b):
    ...
@dsl.pipeline(name="add')
def some_name():
    add(...)
```
2019-12-05 18:08:49 -08:00
Jiaxiao Zheng 790fe99aca [SDK] Relax k8s sanitization (#2634)
* update

* add allow_capital

* fix

* fix volume_ops sample

* fix pipeline name sanitization

* fix unittests

* fix sanitization in _client.py

* fix component output sanitization
2019-11-26 10:28:10 -08:00
Lulu Cheng 07296bc5ba [fix] default yaml.dump to block style (#2591)
* [fix] default every field to block style

* [change] per comment

* [fix] per comment
2019-11-18 18:55:41 -08:00
Jiaxiao Zheng ead912c6f8 [SDK] Fix withItem loop (#2572)
* fix withItem

* clean up and revert sample change

* clean up

* clean up

* clean up

* clean up

* fix

* fix nit
2019-11-07 18:40:19 -08:00