pipelines

Commit Graph

Author	SHA1	Message	Date
Michalina Kotwica	ce985bc287	fix(sdk): Allow keyword-only arguments in pipeline function signature (#4544 ) * add test for keyword-only arguments in pipeline func * fix: kwargs-only argument for pipeline func * test: kwargs generate same yaml as args * remove whole metadata * assert -> self.assertEqual * programmatic example --> fixed example * same name for both Co-authored-by: Alexey Volkov <alexey.volkov@ark-kun.com>	2021-01-29 18:31:02 -08:00
radcheb	5633b9abda	fix(sdk): fixes unresolved PipelineParam when static list passed to dsl.ParallelFor. Fixes #4890 (#4891 ) * fix parallelfor compiling items + add tests * remove debug print * fix tests * fix parallelfor_pipeline_param_in_items_resolving test * debug test * fix tests * Revert "debug test" This reverts commit `57451143bd`. * fix tests	2021-01-14 00:09:03 -08:00
Jiaxiao Zheng	7540ba5c3b	feat(sdk): Implements artifact URI placeholder. (#4932 ) * add placeholder to spec * add output_directory to pipeline * respect uri placeholder in file outputs * wip: add data passing rewriting logic to respect the uri semantics * merge input_uri and paths when instantiating ContainerOp * fix * fix workflow rewriting * Add topology rewriting * add a test case, and various fixes * make the test case more complex * Fix the case when working with OpsGroup * Fix test case * fix resolving test * fix redundant cmd lines * fix redundant cmd lines * resolve comments * fix file outputs * resolve comments * copy file outputs instead of modifying inplace.	2021-01-05 20:39:51 -08:00
Niklas Hansson	24732b9dae	feat(compiler): add dsl operation for parallelism on sub dag level (#4199 ) * Added subdag parallelism Authored-by: NikeNano <niklas.sven.hansson@gmail.com> Co-authored-by: guanhuichen <guanhuichen@gmail.com> * added error handling, fixed comment and refactored * updated with sleep and TODO * fix imports Co-authored-by: guanhuichen <guanhuichen@gmail.com>	2020-12-26 22:10:27 -08:00
Kenta Onishi	5a4b70e37c	feat(sdk): Add settings of the dnsConfig field. Fixes #4836 (#4837 ) * feat(sdk): Add settings of the dnsConfig field. Fixes #4836 * feat(sdk): Add dnsConfig example and sample. * feat(sdk): Refactor dnsConfig param. * feat(sdk): Refactor dnsConfig param.	2020-12-14 20:05:49 -08:00
Jiaxiao Zheng	fb15223f7e	chore: Add doc strings marking the feature stages for SDK. (#4575 ) * add doc strings * Simplify the docstring * fix unittest * recover cli.py * recover cli.py * substitute docstring in resource ops with TODOs * revert stable labels	2020-11-24 00:19:00 -08:00
Alexey Volkov	f7874d38ff	fix(sdk): Compiler - Fixed pipeline parameters with empty default values (#4552 ) Fixes https://github.com/kubeflow/pipelines/issues/4549	2020-11-12 15:52:28 -08:00
Asav Patel	9efc9e59b2	fix(sdk) - fixes missing import in KFP compiler (#4741 )	2020-11-09 17:30:41 -08:00
Niklas Hansson	c32ea232d5	feat(compiled): set pod disruption budget for pipelines. Fixes #3877 (#4178 ) * Update _client.py * Update _client.py * added pod disruption budget * clean up * Update sdk/python/kfp/dsl/_pipeline.py * fixed parameter * updated after feedback * removed selector	2020-09-14 13:45:26 -07:00
Victor	22b7b99a8b	fix(sdk): Fix opsgroups dependency resolution (#4370 )	2020-08-27 09:03:53 -07:00
Alex Latchford	704c8c7660	chore: Clean up KFP SDK docstrings, make formatting a little more consistent (#4218 ) * Prepare SDK docs environment so its easier to understand how to build the docs locally so theyre consistent with ReadTheDocs. * Clean up docstrings for kfp.Client * Add in updates to the docs for compiler and components * Update components area to add in code references and make formatting a little more consistent. * Clean up containers, add in custom CSS to ensure we do not overflow on inline code blocks * Clean up containers, add in custom CSS to ensure we do not overflow on inline code blocks * Remove unused kfp.notebook package links * Clean up a few more errant references * Clean up the DSL docs some more * Update SDK docs for KFP extensions to follow Sphinx guidelines * Clean up formatting of docstrings after Ark-Kuns comments	2020-08-04 00:33:47 +08:00
Alexey Volkov	bbc9ff5ec3	SDK - Compiler - Validating Argo validator (#3874 ) * SDK - Compiler - Validating Argo validator * Added warning if argo is available, but not working	2020-07-10 19:07:21 -07:00
Niklas Hansson	c6ac83f72c	feat: add parallelism for dsl.ParallelFor. Fixes #4089 (#4149 ) * Added parallism at sub-dag level * updated the parallism * remove yaml file * reformatting * Update sdk/python/kfp/compiler/compiler.py * Update sdk/python/kfp/compiler/compiler.py * Update samples/core/loop_parallelism/loop_parallelism.py Co-authored-by: Alexey Volkov <alexey.volkov@ark-kun.com> Co-authored-by: Alexey Volkov <alexey.volkov@ark-kun.com>	2020-07-08 11:27:13 -07:00
Alexey Volkov	48889a99d1	fix(sdk): Compiler - Fixed input artifact name sanitization when using raw string arguments. Fixes #4110 (#4120 )	2020-07-08 10:43:09 -07:00
Alexey Volkov	6960366846	fix(sdk): Compiler - Fixed the input argument mapping when using dsl.graph_component. Fixes #3915 (4082) * SDK - Compiler - Fixed the input argument mapping when using dsl.graph_component Fixes https://github.com/kubeflow/pipelines/issues/3915 * Stopped relying on the argument order at all This can make the compilation less fragile.	2020-06-29 02:31:37 -07:00
Jiaxiao Zheng	b099c6f5d3	chore: Rollback telemetry related changes (4088) * Revert "fix length (#3934)" This reverts commit `7fbb7cae` * Revert "[SDK] Add first party component label (#3861)" This reverts commit `1e2b9d4e` * Revert "[SDK] Add pod labels for telemetry purpose. (#3578)" This reverts commit `aa8da64b`	2020-06-27 15:46:14 -07:00
Alexey Volkov	54a596abd8	SDK - Compiler - Added support for volume-based data passing (3371) * SDK - Compiler - Added support for volume-based data passing Currently artifact passing is performed by Argo sidecar containers what download input data and upload output data to artifact repository (usually, S3-compatible blob storage like Minio). The performance of this method is not optimal and it requires that pod disks have enough capacity to hold all artifact data. This commit adds support for volume-based data passing. This method involves using a single milti-write Kubernetes data volume to pass all intermediate data. Parts of the volume are mounted to the input/output artifact directories, so when the user program reads and writes files, the files actually reside in the data volume. This method improves the performance and reduces storage resource requirements. The data volume must exist and support "READ_WRITE_MANY". Limitations: * All artifact file names must be the same (e.g. "data"). All auto-generated paths are already consistent. Avoid using any hard-coded paths. * Passing constant values (text) as arguments for artifact inputs is not supported. * The feature is experimental. * Added data_passing_methods.KubernetesVolume This class represents a configured volume-based artifact passing method. * Added PipelineConf.data_passing_method This property allows setting the method that will be used for intermediate data passing. Added the compiler support for the new feature. Example: ```python from kfp.dsl import PipelineConf, data_passing_methods from kubernetes.client.models import V1Volume, V1PersistentVolumeClaim pipeline_conf = PipelineConf() pipeline_conf.data_passing_method = data_passing_methods.KubernetesVolume( volume=V1Volume( name='data', persistent_volume_claim=V1PersistentVolumeClaim('data-volume'), ), path_prefix='artifact_data/', ) ``` * Added unit test * Fixed bug in the unit test Kubernetes does not validate the structures at all... * Fixed bug in the result structure * Fixed the test data The class should be V1PersistentVolumeClaimVolumeSource, not V1PersistentVolumeClaimSpec. * Fixed the test	2020-06-25 16:11:31 -07:00
Alexey Volkov	757d43c7fd	SDK - Compiler - Fixed error message (#4053 ) Fixes https://github.com/kubeflow/pipelines/issues/4021	2020-06-24 11:42:46 -07:00
Alexey Volkov	374b3b02d2	SDK - Compiler - Made compiler compatible with @wraps (#3956 ) Fixes https://github.com/kubeflow/pipelines/issues/3367	2020-06-11 20:03:55 -07:00
Alexey Volkov	40372e5c86	SDK - Compiler - Using properly serialized pipeline parameter defaults (#3832 ) * SDK - Compiler - Using properly serialized pipeline parameter defaults Fixes https://github.com/kubeflow/pipelines/issues/3806 * Sort the keys so that the serialized defaults are stable in python 3.5	2020-06-09 13:10:04 -07:00
Jiaxiao Zheng	1e2b9d4e7e	[SDK] Add first party component label (#3861 ) * add OOB component dict and utility function * add test * add a transformer, which appends the component name label * add transformer function, compiler and test * move telemetry test * fix none uri * applies comments * revert dependency on frozendict * fixes some tests * resolve comments	2020-05-29 08:55:16 -07:00
Thi Nguyen	ec9445aa01	Allow PipelineParams in dict keys too. (#3565 ) Co-authored-by: Thi Nguyen <duongnt@users.noreply.github.com>	2020-05-19 17:54:19 -07:00
Niklas Hansson	05c1537f28	Add Nodeselector to pipelineconfig fix issue #2863 (#3616 ) * updated version * added pipeline nodeselector * removed old legacy * renaming * update test * Update sdk/python/kfp/compiler/compiler.py	2020-05-05 00:11:08 -07:00
Eterna2	9167da1b4e	Support execution throttling for executing the pipelines (#3346 ) (#3439 ) * Add parallelism limits to pipeline in kfp sdk * fix lint error	2020-05-04 23:25:08 -07:00
Jiaxiao Zheng	aa8da64b4c	[SDK] Add pod labels for telemetry purpose. (#3578 ) * add telemetry pod labels * revert the id label * update compiler tests * update cli arg * bypass tfx * update docstring	2020-04-27 18:50:04 -07:00
Alexey Volkov	6cb92d45c8	SDK - Compiler - Include the SDK version information in the compiled workflows (#3583 ) * SDK - Compiler - Include the SDK version information in the compiled workflows * Fixed the unit tests * Removed the sdk_version annotation.	2020-04-25 01:49:28 -07:00
Niklas Hansson	2354776e1e	fix #2802 : Set ImagePullPolicy per pipeline. (#3534 ) * bump version * default image pull policy * Update sdk/python/kfp/dsl/_pipeline.py * task setting should dominate * Update sdk/python/kfp/dsl/_pipeline.py * fixed merge misstake	2020-04-23 07:09:13 -07:00
Alexey Volkov	b63ad7e614	SDK - Removed the ArtifactLocation feature (#3517 ) * SDK - Removed the ArtifactLocation feature The feature was deprecated in v0.1.34 https://github.com/kubeflow/pipelines/pull/2326 * Removed the artifact_location sample	2020-04-23 00:49:44 -07:00
Alexey Volkov	08c7c0ef36	SDK - Made YAML dumping more awesome (#3520 ) See the root cause explanation in https://github.com/kubeflow/pipelines/issues/3519	2020-04-16 21:23:07 -07:00
Alexey Volkov	03e064cee2	SDK - Compiler - Fix incompatibility with python3.5 (#3122 )	2020-02-19 13:55:47 -08:00
Alexey Volkov	a33ae25bc4	SDK - Compiler - Add optional Argo validation (#3094 ) argo CLI tool must be in path for this feature to work	2020-02-18 23:12:25 -08:00
Alexey Volkov	4a1b282461	SDK - Compiler - Fixed ParallelFor argument resolving (#3029 ) * SDK - Compiler - Fixed ParallelFor name clashes The ParallelFor argument reference resolving was really broken. The logic "worked" like this - of the name of the referenced output contained the name of the loop collection source output, then it was considered to be the reference to the loop item. This broke lots of scenarios especially in cases where there were multiple components with same output name (e.g. the default "Output" output name). The logic also did not distinguish between references to the loop collection item vs. references to the loop collection source itself. I've rewritten the argument resolving logic, to fix the issues. * Argo cannot use {{item}} when withParams items are dicts * Stabilize the loop template names * Renamed the test case	2020-02-11 12:18:09 -08:00
Alexey Volkov	c83aff2738	SDK - Components - Made it easier to access component spec classes (#2860 ) * SDK - Components - Made it easier to access component spec classes * Updated the imports	2020-01-31 11:41:21 -08:00
Jiaxiao Zheng	358e26adb1	[SDK/compiler] Sanitize op name for PipelineParam (#2711 ) * sanitize op name for pipeline param * refactor sanitization to compiler level, and add unittest	2019-12-27 18:01:39 -08:00
Alexey Volkov	b8a2e6f400	SDK/Compiler - Preventing pipeline entrypoint template name from clashing with other template names (#1555 ) Case exhibiting the problem: ``` def add(a, b): ... @dsl.pipeline(name="add') def some_name(): add(...) ```	2019-12-05 18:08:49 -08:00
Jiaxiao Zheng	790fe99aca	[SDK] Relax k8s sanitization (#2634 ) * update * add allow_capital * fix * fix volume_ops sample * fix pipeline name sanitization * fix unittests * fix sanitization in _client.py * fix component output sanitization	2019-11-26 10:28:10 -08:00
Lulu Cheng	07296bc5ba	[fix] default yaml.dump to block style (#2591 ) * [fix] default every field to block style * [change] per comment * [fix] per comment	2019-11-18 18:55:41 -08:00
Jiaxiao Zheng	ead912c6f8	[SDK] Fix withItem loop (#2572 ) * fix withItem * clean up and revert sample change * clean up * clean up * clean up * clean up * fix * fix nit	2019-11-07 18:40:19 -08:00
Alexey Volkov	735e627a03	SDK - Refactoring - Split the K8sHelper class (#2333 ) * SDK - Refactoring - Split the K8sHelper class One part was only used by container builder and provided higher-level API over K8s Client. Another was used by the compiler and did not use the kubernetes library. * Updated the license year.	2019-10-21 14:57:22 -07:00
Alexey Volkov	1b6047aa69	SDK - Improve errors when ContainerOp.output is unavailable (#1578 ) * SDK - Improve errors when ContainerOp.output is unavailable ContainerOp.output is only available when there is only one output. Right now, when there are multiple outputs it just holds `None` instead of the a task output reference. In this case however it's indistinguishable from just passing None argument. This PR gives a quick fix to make accessing the nonexistent `.output` a compile-time error. * Fixed the implementation and added tests * Trigger retests	2019-10-11 18:20:40 -07:00
Alexey Volkov	181de66cf9	SDK - Compiler - Move Argo volume specifications to templates (#2229 ) * SDK - Compiler - Move volumes to templates Argo v2.3.0+ supports per-template volume specs similiar to Kubernetes. Prior to version 2.3.0 Argo only supported workflow-level volume specs. We had several outstanding issues caused by the need to put all volumes in the same place. There was also the issue with input parameter reference placeholders in volume specifications which were placed outside their home templates declaring the inputs. This change fixes those issues. * Removed dead code line	2019-10-07 16:55:12 -07:00
Jiaxiao Zheng	092845d134	[SDK/Compiler] Add _create_and_write_workflow method (#2321 ) * add _create_and_write_workflow * Add pointer to TFX dag runner usage.	2019-10-07 14:13:10 -07:00
Alexey Volkov	4b33f1b550	SDK - Compiler - Fixed deprecation warning when calling compile (#2303 )	2019-10-04 13:09:12 -07:00
Jiaxiao Zheng	9a9bd904ac	[SDK-compiler] Refactor Compiler to expose an API to write out yaml spec of pipeline. (#2146 ) * Refactor. * Remove redundant code. * Fix. * Move the implementation of create_workflow into a private api. * Change write_workflow to private. * deprecation warning	2019-10-03 16:45:56 -07:00
Alexey Volkov	c128b2a7b4	SDK - Compiler - Make it possible to create more portable pipelines (#2271 ) * SDK - Compiler - Allow creating portable pipelines This change allows directly passing the PipelineConf instance to compiler or launcher which makes it easier to create portable pipelines by allowing the environment-specific configuration to be directly passed to the environment-specific launcher. Background: PipelineConf holds all pipeline-level configuration including `op_transformers`, `image_pull_secrets` etc. Some of these are specific to particular execution environment (e.g. GCP secret or Argo artifact location or Kubernetes-specific options). Previously, the only way to modify `PipelineConf` was to do it inside the piepline function. That tied the pipeline function to specific execution environment (e.g. GCP, Argo or Kubernetes) Solution: This change allows directly passing the PipelineConf instance to compiler or launcher. This allows writing portable enlauncher and environment agnostic pipeline functions. All environment-specific configurations can be moved to launching stage. Before: ```python # Defining pipeline def my_pipeline(): # portable pipeline code dsl.get_pipeline_conf().add_op_transformer(gcp.use_gcp_secret('user-gcp-sa')) # Launching pipeline kfp.Clinet().create_run_from_pipeline_func(my_pipeline, arguments={}) ``` After: ```python # Defining pipeline def my_pipeline(): # portable pipeline code # Launching pipeline pipeline_conf = dsl.PipelineConf() pipeline_conf.add_op_transformer(gcp.use_gcp_secret('user-gcp-sa')) kfp.Clinet().create_run_from_pipeline_func(my_pipeline, arguments={}, pipeline_conf=pipeline_conf) ``` After 2 (launching same portable pipeline using different launchers): ```python # Loading portable pipeline from portable_pipeline import my_pipeline # Launching pipeline on Kubeflow pipeline_conf = dsl.PipelineConf() pipeline_conf.add_op_transformer(gcp.use_gcp_secret('user-gcp-sa')) kfp.Clinet().create_run_from_pipeline_func(my_pipeline, arguments={}, pipeline_conf=pipeline_conf) # Launching pipeline on locally (not implemented yet) kfp.run_pipeline_func_locally(my_pipeline, arguments={}) ``` Added parameter docstring	2019-10-02 20:58:08 -07:00
Alexey Volkov	ef63c653af	SDK - Compiler - Fix large data passing (#2173 ) * SDK - Compiler - Fix large data passing Stop outputting parameters unless they're consumed as parameters downstream. This prevents the situaltion when component outputs a big file, but DSL compiler instructs Argo to pick it up as parameter (parameters only hold few kilobytes of data). As byproduct, this change fixes some minor compiler data passing bugs where some parameters were being passed around, but never consumed (happened with `ResourceOp`, `dsl.Condition` and recursion). * Replaced ... with `raise AssertionError` * Fixed small bug * Removed unused variables * Fixed names of the mark_upstream_ios_of_* functions * Fixed detection of parameter output references * Fixed handling of volumes	2019-09-20 15:05:27 -07:00
Alexey Volkov	0e2bf15dbc	SDK - Refactoring - Replaced the Meta classes with the Spec classes (#1944 ) * SDK - Refactoring - Replaced the ParameterMeta class with InputSpec and OutputSpec * SDK - Refactoring - Replaced the internal PipelineMeta class with ComponentSpec * SDK - Refactoring - Replaced the internal ComponentMeta class with ComponentSpec * SDK - Refactoring - Replaced the Meta classes with the Spec classes Replaced the ComponentMeta class with ComponentSpec Replaced the PipelineMeta class with ComponentSpec Replaced the ParameterMeta class with InputSpec and OutputSpec * Removed empty fields	2019-09-16 18:41:12 -07:00
Kevin Bache	2ca7d0ac31	WithParams (#2044 ) * first working commit * incrememtal commit * in the middle of converting loop args constructor to accept pipeline param * both cases working * output works, passed doesn't * about to redo compiler section * rewrite draft done * added withparam tests * removed sdk/python/comp.yaml * minor * subvars work * more tests * removed unneeded artifact outputs from test yaml * sort keys * removed dead artifact code	2019-09-16 17:58:22 -07:00
Jiaxiao Zheng	1449d08aee	Fix the logic of passing default values of pipeline parameters. (#2098 ) * Fix the logic of passing default values. * Modify unit test * Solve.	2019-09-12 17:10:33 -07:00
Jiaxiao Zheng	497d016e85	Expose an API for appending params/names/descriptions in a programmable way. (#2082 ) * Refactor. Expose a public API to append pipeline param without interacting with dsl.Pipeline obj. * Add unit test and fix. * Fix docstring. * Fix test * Fix test * Fix two nit problems * Refactor	2019-09-10 17:58:47 -07:00

1 2 3

104 Commits