pipelines

Commit Graph

Author	SHA1	Message	Date
Jiaxiao Zheng	846423a870	feat(sdk): Always add pipeline root as a pipeline parameter (#5122 ) * refactor pipeline root passing * fix test	2021-02-10 16:29:57 -08:00
Chen Sun	051a022937	feat(sdk.v2): Allow set pipeline_root via @dsl.pipeline decorator. Make pipeline_root optional. (#5107 ) * Allow set pipeline_root via @dsl.pipeline decorator. * test covering pipeline_root not set	2021-02-07 02:06:32 -08:00
Jiaxiao Zheng	85a3b51713	feat(sdk): Add v2 component to build_python_component (#5079 ) * porting the original PR * comment * refactor * remove python2 * comment on default entrypoint * update comment * min versioned KFP * fix tests	2021-02-04 01:50:36 -08:00
Vitalii Vokhmin	2f1db59798	fix(sdk): compile ParallelFor in a deterministic manner (#4926 ) * fix(sdk): compile ParallelFor in a deterministic manner During compilataion ParallelFor components end up with randomized names, which makes it very inconvenient to compare two versions of a pipeline. This commit fixes this issue. * fix(sdk): fix new parallel-for test cases	2021-01-29 18:31:09 -08:00
Michalina Kotwica	ce985bc287	fix(sdk): Allow keyword-only arguments in pipeline function signature (#4544 ) * add test for keyword-only arguments in pipeline func * fix: kwargs-only argument for pipeline func * test: kwargs generate same yaml as args * remove whole metadata * assert -> self.assertEqual * programmatic example --> fixed example * same name for both Co-authored-by: Alexey Volkov <alexey.volkov@ark-kun.com>	2021-01-29 18:31:02 -08:00
Jiaxiao Zheng	a36a62a700	feat(sdk): Artifact metadata related placeholder for components. (#5003 ) * resolve comments. * fix tests * wip: add structures and skeleton for component resolution logic * add generator * fix the problem * cleanup * add a test * fix tests	2021-01-19 08:57:45 -08:00
Alexey Volkov	691eefc599	fix(sdk): Components - Fixed python components that use \n. Fixes #4939 (#4993 ) * SDK - Components - Fixed python components that use \n The escape sequence was being replaced by the `echo` command. Apparently, unlike in the `bash` shell, the `echo` command of the `sh` shell expands the escape sequences by default and does not support an option to turn it off. (For some reason the -n option works properly even though it should not). Fixes https://github.com/kubeflow/pipelines/issues/4939 * Fixed the test data * Fixed the deprecated container component builder * Fixed the new compiler test case * Added test	2021-01-14 18:21:51 -08:00
radcheb	5633b9abda	fix(sdk): fixes unresolved PipelineParam when static list passed to dsl.ParallelFor. Fixes #4890 (#4891 ) * fix parallelfor compiling items + add tests * remove debug print * fix tests * fix parallelfor_pipeline_param_in_items_resolving test * debug test * fix tests * Revert "debug test" This reverts commit `57451143bd`. * fix tests	2021-01-14 00:09:03 -08:00
Jiaxiao Zheng	a56efb2061	feat(sdk): Merge artifact ontology from v2 to the classic KFP. (#4963 ) * move modules back to v1 * move and fix ontology tests	2021-01-07 23:00:53 -08:00
Jiaxiao Zheng	7540ba5c3b	feat(sdk): Implements artifact URI placeholder. (#4932 ) * add placeholder to spec * add output_directory to pipeline * respect uri placeholder in file outputs * wip: add data passing rewriting logic to respect the uri semantics * merge input_uri and paths when instantiating ContainerOp * fix * fix workflow rewriting * Add topology rewriting * add a test case, and various fixes * make the test case more complex * Fix the case when working with OpsGroup * Fix test case * fix resolving test * fix redundant cmd lines * fix redundant cmd lines * resolve comments * fix file outputs * resolve comments * copy file outputs instead of modifying inplace.	2021-01-05 20:39:51 -08:00
Ilias Katsakioris	8f70bf325e	fix(sdk): Do not wait for resource deletion (#4820 ) When calling the delete() method of a ResourceOp we need to ensure we do not wait for its deletion. The reason for this is described in [1]: If a pipeline creates a resource which is being consumed by its steps (e.g., a PVC), the step deleting the resource will hang waiting for the Kubernetes resource deletion which, in turn, is waiting for the other steps to get deleted. As a result, the pipeline never finishes. This commit allows specifying flags for the ResourceOp kubectl commands and defaults to the '--wait=false' flag for the deletion. Specifying flags for a ResourceTemplate is not supported in Argo v2.7 that we currently deploy. But they will be once we upgrade to v2.11+ [2]. This does not affect the delete() method because we don't rely on Argo's ResourceTemplate for it. [1] https://github.com/kubeflow/pipelines/issues/4506 [2] https://github.com/kubeflow/pipelines/issues/4553 Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com>	2020-12-17 16:54:24 -08:00
Kenta Onishi	5a4b70e37c	feat(sdk): Add settings of the dnsConfig field. Fixes #4836 (#4837 ) * feat(sdk): Add settings of the dnsConfig field. Fixes #4836 * feat(sdk): Add dnsConfig example and sample. * feat(sdk): Refactor dnsConfig param. * feat(sdk): Refactor dnsConfig param.	2020-12-14 20:05:49 -08:00
Alexey Volkov	7a66414cf7	feat(sdk): Components - Restored stack traces in lightweight python components. Fixes #4273 , #4849 (#4861 ) Currently were running the python code inline using `python -c <code>`. This has two issues: 1) Python does not show source code line in exception stack traces 2) inspect.getsource does not work. This method is used in PyTorch JIT for example. We solve these issues by writing the code into a file before executing it. The disadvantage of the new approach is that it adds complexity, a filesystem write operation and also requires the `sh` executable to be present (we could replace it with python-based program if needed).	2020-12-14 14:33:49 -08:00
Vitalii Vokhmin	2f3a686e54	feat(sdk): add ability to set retry policy (#4858 ) * feat(sdk): add ability to set retry policy This fixes the second part of the issue described in #4333 The first part was addressed in #4392 * feat(sdk): validate retry policy name * feat(sdk): simplify retry policy interface	2020-12-11 14:47:29 -08:00
David Przybilla	5f992f5d06	fix(sdk): VolumeOp has apiVersion as parameter (#4694 )	2020-11-21 03:05:33 -08:00
Alexey Volkov	f7874d38ff	fix(sdk): Compiler - Fixed pipeline parameters with empty default values (#4552 ) Fixes https://github.com/kubeflow/pipelines/issues/4549	2020-11-12 15:52:28 -08:00
Abhishek Vilas Munagekar	c52a81c1af	fix(sdk): fixes dsl.ContainerOp deprecation warning not shown (#4658 ) * change dsl.ContainerOp warning to FutureWarning * fix tests	2020-10-23 18:07:01 -07:00
Chen Sun	5020fd1079	compiler for IR (#4529 ) * Compile IR proto in setup.py * compile to IR * Fix importer node logic and lint * cleanup and lint * merge, undo setup.py change * cleanup and lint * remove currently unused code * format _component_bridge.py * cleanup and format * cleanup * upgrade protobuf in test * restructure and test * address review comments * fix bug * avoid f-strings formatting * address review comments * address review comments * limit the primitive types to only int, double, and string. * Fix test for python3.5 * use instance_schema instead of schema_title * add v2 to setup.py * address review comments * move the tests closer to the code * add more tests * cleanup and linting * add more tests * fix bug on input paramter connection * linting * restructure tests * fix python3.5 test failure * support outputs.parameters placeholder * remove pipeline decorator from v2.dsl	2020-10-13 17:13:54 -07:00
Alexey Volkov	e8fb58a221	feat(sdk): Preserve parameter arguments and input names (#4563 ) ContainerOp has no concept of inputs, so it looses any information about them such as input names and in some cases even the passed argument values (which are just injected into the command line). This commit fixes that issue by preserving the paramater arguments map and ultimately storing it in an Argo template annotation. Fixes https://github.com/kubeflow/pipelines/issues/4556	2020-10-11 20:32:48 -07:00
Alexey Volkov	1aa8068507	fix(sdk): DSL - Enabled arbitrary ContainerOp names (#4554 ) Fixes https://github.com/kubeflow/pipelines/issues/4522	2020-09-29 05:21:35 -07:00
Michalina Kotwica	0b3187966e	fix(sdk): Allow non-pythonic names for graph components' task's outputs. Fixes #4514 . (#4515 ) * add tests for pythonic and non-pythonic component outputs * fix: graph for non-pythonic container output's names Loading container component from component.yaml creates both pythonic and original output names. Graph component iterated over all outputs, using pythonic-to-output conversion on all. If some of the names are not identical to their pythonic versions, they rised KeyError on the lookup table. This commit fixes this problem by using default value for the lookup. * remove depythonification of outputs - not needed anymore	2020-09-28 20:53:25 -07:00
Alexey Volkov	03325848fc	feat(sdk): Components - Prevent passing unserializable objects to components. Fixes #4040 (#4496 )	2020-09-16 02:23:22 -07:00
Niklas Hansson	c32ea232d5	feat(compiled): set pod disruption budget for pipelines. Fixes #3877 (#4178 ) * Update _client.py * Update _client.py * added pod disruption budget * clean up * Update sdk/python/kfp/dsl/_pipeline.py * fixed parameter * updated after feedback * removed selector	2020-09-14 13:45:26 -07:00
Victor	22b7b99a8b	fix(sdk): Fix opsgroups dependency resolution (#4370 )	2020-08-27 09:03:53 -07:00
Alexey Volkov	7dc051b982	refactor(sdk): Refactored ResourceOp deletion (#3841 )	2020-08-25 23:12:02 -07:00
Jiaxin Shan	390e80ed77	feat(sdk): add aws region field in use_aws_secret in kfp sdk (#4363 )	2020-08-23 23:55:40 -07:00
Alexey Volkov	d0b799e4a9	fix(sdk): SDK - Avoiding deprecated ContainerOp methods (#4134 ) Switched from `task.set_X` to `task.container.set_X`	2020-07-14 17:02:37 -07:00
Alexey Volkov	db0af86e53	feat(sdk): SDK - Enable placeholders in task display names. Fixes #4163 (#4164 )	2020-07-09 18:42:35 -07:00
Alexey Volkov	2f9482758b	feat(sdk): SDK - Deprecation warning when using ContainerOp (#4166 ) * SDK - Added warning when not using components We have long advised our users to create reusable components. Creating reusable components is as easy as creating ContainerOp instances, but the components are shareable, portable and are easier to support going forward. * Disable warning for TFX * Fixed the warning disabling logic * Added tests	2020-07-08 23:16:53 -07:00
Alexey Volkov	48889a99d1	fix(sdk): Compiler - Fixed input artifact name sanitization when using raw string arguments. Fixes #4110 (#4120 )	2020-07-08 10:43:09 -07:00
Alexey Volkov	d707b93fb4	feat(sdk): DSL - Added support for volatile components (#4104 ) Volatile components do not reuse the cached results by default. The pipeline authors can re-enable cache reuse if they want.	2020-07-06 18:09:57 -07:00
Alexey Volkov	229eff2516	SDK - Compiler - Removed the deprecated dsl-compile --package command (#4055 )	2020-07-01 19:12:01 -07:00
Alexey Volkov	6960366846	fix(sdk): Compiler - Fixed the input argument mapping when using dsl.graph_component. Fixes #3915 (4082) * SDK - Compiler - Fixed the input argument mapping when using dsl.graph_component Fixes https://github.com/kubeflow/pipelines/issues/3915 * Stopped relying on the argument order at all This can make the compilation less fragile.	2020-06-29 02:31:37 -07:00
Alexey Volkov	d24eb78371	test(sdk) Restored the ParallelFor compiler test data (4103) * SDK - Tests - Restored the ParallelFor compiler test data Fixes https://github.com/kubeflow/pipelines/issues/4102 * Removed the pipeline-sdk-type annotations * Fixed the test_artifact_passing_using_volume test data	2020-06-29 01:30:14 -07:00
Jiaxiao Zheng	b099c6f5d3	chore: Rollback telemetry related changes (4088) * Revert "fix length (#3934)" This reverts commit `7fbb7cae` * Revert "[SDK] Add first party component label (#3861)" This reverts commit `1e2b9d4e` * Revert "[SDK] Add pod labels for telemetry purpose. (#3578)" This reverts commit `aa8da64b`	2020-06-27 15:46:14 -07:00
Alexey Volkov	54a596abd8	SDK - Compiler - Added support for volume-based data passing (3371) * SDK - Compiler - Added support for volume-based data passing Currently artifact passing is performed by Argo sidecar containers what download input data and upload output data to artifact repository (usually, S3-compatible blob storage like Minio). The performance of this method is not optimal and it requires that pod disks have enough capacity to hold all artifact data. This commit adds support for volume-based data passing. This method involves using a single milti-write Kubernetes data volume to pass all intermediate data. Parts of the volume are mounted to the input/output artifact directories, so when the user program reads and writes files, the files actually reside in the data volume. This method improves the performance and reduces storage resource requirements. The data volume must exist and support "READ_WRITE_MANY". Limitations: * All artifact file names must be the same (e.g. "data"). All auto-generated paths are already consistent. Avoid using any hard-coded paths. * Passing constant values (text) as arguments for artifact inputs is not supported. * The feature is experimental. * Added data_passing_methods.KubernetesVolume This class represents a configured volume-based artifact passing method. * Added PipelineConf.data_passing_method This property allows setting the method that will be used for intermediate data passing. Added the compiler support for the new feature. Example: ```python from kfp.dsl import PipelineConf, data_passing_methods from kubernetes.client.models import V1Volume, V1PersistentVolumeClaim pipeline_conf = PipelineConf() pipeline_conf.data_passing_method = data_passing_methods.KubernetesVolume( volume=V1Volume( name='data', persistent_volume_claim=V1PersistentVolumeClaim('data-volume'), ), path_prefix='artifact_data/', ) ``` * Added unit test * Fixed bug in the unit test Kubernetes does not validate the structures at all... * Fixed bug in the result structure * Fixed the test data The class should be V1PersistentVolumeClaimVolumeSource, not V1PersistentVolumeClaimSpec. * Fixed the test	2020-06-25 16:11:31 -07:00
Alexey Volkov	ceb860c594	SDK - Components - Python - Switched the default base image to python 3.7 (4054) Previously the default image was set to an old version of tensorflow image. That image is now outdated. It's also framework-specific and pretty big. We're switching to the official python image which is small, official and framework-agnostic. The users can easily switch to the old behavior by just specifying `base_image='tensorflow/tensorflow:1.13.2-py3'` during the component creation.	2020-06-25 15:15:31 -07:00
Alexey Volkov	f773b9c263	SDK - Components - Stabilize JSON serialization by sorting keys (#3879 ) * SDK - Components - Stabilize JSON serialization by sorting keys Otherwise serialization of the default values of the component/pipeline inputs is unstable on Python 3.5. * Fixed the test data	2020-06-01 03:07:55 -07:00
Jiaxiao Zheng	1e2b9d4e7e	[SDK] Add first party component label (#3861 ) * add OOB component dict and utility function * add test * add a transformer, which appends the component name label * add transformer function, compiler and test * move telemetry test * fix none uri * applies comments * revert dependency on frozendict * fixes some tests * resolve comments	2020-05-29 08:55:16 -07:00
Alexey Volkov	da4acbbd73	SDK - Python Components - Stop generating output saving code if no outputs (#3836 ) Removed dead code from the generated python command-line wrapper.	2020-05-28 23:47:15 -07:00
Alexey Volkov	f7acb71a9d	Cleanup - Removed unused code file (#3864 )	2020-05-28 14:21:14 -07:00
Thi Nguyen	ec9445aa01	Allow PipelineParams in dict keys too. (#3565 ) Co-authored-by: Thi Nguyen <duongnt@users.noreply.github.com>	2020-05-19 17:54:19 -07:00
Alexey Volkov	1dcea49472	SDK - Moved the tests closer to the code (#3774 ) This makes switching from code to tests easier	2020-05-18 01:37:35 -07:00
Alexey Volkov	d418f57654	SDK - Components - Improved stability of the input and output renaming (#3738 ) In some cases the input and output names need to be converted (for example, the input names need to be converted to python function parameter names). With naive renaming, multiple inputs might be mapped to the same parameter name in some edge cases. The `generate_unique_name_conversion_table` creates a correct mapping. However, in some really rare cases the resulting mapping could be confusing since it might rename an input whose name was already a correct parameter name and map a different input name to that parameter. E.g. {'AAA' -> 'aaa', 'aaa' -> 'aaa_2'}. This PR fixes that. Names that do not change when applying the conversion_func will remain unchanged in the mapping. {'AAA' -> 'aaa_2', 'aaa' -> 'aaa'}.	2020-05-13 11:06:26 -07:00
Alexey Volkov	8ba366b03f	SDK - Made outputs with original names available in ContainerOp.outputs (#3734 ) * SDK - Made outputs with original names available in ContainerOp.outputs Previously, ContainerOp had strict requirements for the output names, so we had to convert all the names before passing them to the ContainerOp constructor. Outputs with non-pythonic names could not be accessed using their original names. Now ContainerOp supports any output names, so we're now using the original output names. However to support legacy pipelines, we're also adding output references with pythonic names. * Fixed the compiler test data * Fixed the duplicate parameter outputs in the compiled workflow * Fixed long line * Stabilized the output naming conflict resolution * Fix case of missing special outputs	2020-05-12 19:08:26 -07:00
Alexey Volkov	fe30d5462a	SDK - Components - Calculate component hash digest (#3726 ) * SDK - Components - Calculate component hash digest The digest is calculated when loading the component from URL, tfile or text. Slightly refactored component loading - streams are no longer used, only bytes. TODO: Calculate the digest if missing TODO: Report possible digest conflicts * Updated the test graph component * Using the actual digest in the test	2020-05-12 18:24:26 -07:00
Alexey Volkov	b9aa106bb5	SDK - Prioritize lib2to3 when stripping type annotations (#3724 ) * SDK - Prioritize lib2to3 when stripping type annotations It's a standard python library (although not well supported) and it doe not leave training spaces. * Fixed compiler test data	2020-05-11 18:44:20 -07:00
Alexey Volkov	2279bde698	SDK - Annotate pods with component_ref (#3727 ) * SDK - Annotate pods with component_ref This preserves the information about the digest of the component and the location from which the component was loaded. * Fixed compiler tests	2020-05-11 17:18:21 -07:00
Niklas Hansson	05c1537f28	Add Nodeselector to pipelineconfig fix issue #2863 (#3616 ) * updated version * added pipeline nodeselector * removed old legacy * renaming * update test * Update sdk/python/kfp/compiler/compiler.py	2020-05-05 00:11:08 -07:00
Eterna2	9167da1b4e	Support execution throttling for executing the pipelines (#3346 ) (#3439 ) * Add parallelism limits to pipeline in kfp sdk * fix lint error	2020-05-04 23:25:08 -07:00
Alexey Volkov	9619655ed5	SDK - Enabled file inputs to be optional (#3620 ) * SDK - Enabled file inputs to be optional * Added unit tests	2020-04-27 19:34:04 -07:00
Jiaxiao Zheng	aa8da64b4c	[SDK] Add pod labels for telemetry purpose. (#3578 ) * add telemetry pod labels * revert the id label * update compiler tests * update cli arg * bypass tfx * update docstring	2020-04-27 18:50:04 -07:00
Alexey Volkov	e41ee9cdf7	SDK - Components - Task objects now have the .output attribute when component has only one output (#3622 )	2020-04-26 18:47:28 -07:00
Alexey Volkov	6cb92d45c8	SDK - Compiler - Include the SDK version information in the compiled workflows (#3583 ) * SDK - Compiler - Include the SDK version information in the compiled workflows * Fixed the unit tests * Removed the sdk_version annotation.	2020-04-25 01:49:28 -07:00
Niklas Hansson	2354776e1e	fix #2802 : Set ImagePullPolicy per pipeline. (#3534 ) * bump version * default image pull policy * Update sdk/python/kfp/dsl/_pipeline.py * task setting should dominate * Update sdk/python/kfp/dsl/_pipeline.py * fixed merge misstake	2020-04-23 07:09:13 -07:00
Alexey Volkov	b63ad7e614	SDK - Removed the ArtifactLocation feature (#3517 ) * SDK - Removed the ArtifactLocation feature The feature was deprecated in v0.1.34 https://github.com/kubeflow/pipelines/pull/2326 * Removed the artifact_location sample	2020-04-23 00:49:44 -07:00
Yuan (Bob) Gong	2742a3ed95	[SDK] Make service account configurable for build_image_from_working_dir (#3419 ) * Add kfp-container-builder sa * Allow service account to be configurable * Fix tests * Fix test * Use documentation for service account to introduce compatibility with different types of installation * updated doc * clean up * Update container_builder_test.py * Update _build_image_api.py * Update kustomization.yaml * Add executable permission for presubmit tests mkp.sh	2020-04-15 00:06:02 -07:00
Alexey Volkov	7ee500f702	SDK - Tests - Improved tests for serializing lists containing objects (#3326 ) Added test_fail_on_handling_list_arguments_containing_python_objects Added test_handling_list_arguments_containing_serializable_python_objects Moved test_handling_list_arguments_containing_pipelineparam to component_bridge_tests	2020-03-24 10:06:45 -07:00
Alexey Volkov	deb62f6b50	Style - Moved imports to the start of the file (#3325 )	2020-03-21 22:08:44 -07:00
Alexey Volkov	be12ccf2a1	SDK - Moved the @python_component decorator test to dsl tests (#3324 ) * SDK - Moved the @python_component decorator test to dsl tests * Deprecate @python_component	2020-03-21 08:14:43 -07:00
Alexey Volkov	194278337b	SDK - Moved python op pipeline compilation test to bridge tests (#3323 )	2020-03-21 00:18:44 -07:00
Alexey Volkov	734b43e3db	SDK - Added support for maxCacheStaleness (#3318 ) * SDK - Added support for maxCacheStaleness * Added the vendor prefix to the annotation	2020-03-20 13:38:09 -07:00
Alexey Volkov	264ff37c1e	SDK - Moved _dsl_bridge to dsl (#3267 ) This is a pure refactoring change. The components library should not have any dependencies on the DSL library.	2020-03-14 00:12:34 -07:00
Alexey Volkov	119e329108	SDK - Components - Fixed handling collection return values (#3263 ) * SDK - Components - Fixed handling collection return values Fixes https://github.com/kubeflow/pipelines/issues/3262 * Fixed the tests	2020-03-12 23:50:39 -07:00
Alexey Volkov	8ca603d679	SDK - Tests - Testing command-line resolving explicitly (#3257 ) * SDK - Tests - Testing command-line resolving explicitly After the recent small refactoring of the task resolving flow in the component library, some tests we left unupdated with compatibility shims added to make the tests pass. This PR updates the remaining tests and removes the shims. This mostly involves using explicitly using `_resolve_command_line_and_paths`. Some tests that validate the behavior of the dsl bridge were moved to `component_bridge_tests.py` * Indented the component texts	2020-03-11 19:38:38 -07:00
Ilias Katsakioris	c220059c8d	SDK/DSL: Enable the deletion of a resource via ResourceOp method (#3213 ) * SDK/DSL: Enable the deletion of a resource via ResourceOp method * Add the method delete() to ResourceOps * Extend ResourceOp & VolumeOp tests Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com> * Fix ValueError not being raised	2020-03-10 16:07:36 -07:00
xiaohanhuang	e704067d15	add an optional name for dsl.Condition (kubeflow#3210) (#3212 ) * add an optional name for dsl.Condition (kubeflow#3210) * add unit test	2020-03-05 21:45:22 -08:00
Alexey Volkov	578d8de91d	SDK - Reduce python component limitations - no import errors for cust… (#3106 ) * SDK - Reduce python component limitations - no import errors for custom type annotations By default, create_component_from_func copies the source code of the function and creates a component using that source code. No global imports are captured. This is problematic for the function definition, since any annotation, that uses a type that needs to be imported, will cause error. There were some special provisions for NamedTuple, InputPath and OutputPath, but even they were brittle (for example, "typing.NamedTuple" or "components.InputPath" annotations still caused failures at runtime). This commit fixes the issue by stripping the type annotations from function declarations. Fixes cases that were failing before: ```python import typing import collections MyFuncOutputs = typing.NamedTuple('Outputs', [('sum', int), ('product', int)]) @create_component_from_func def my_func( param1: CustomType, # This caused failure previously param2: collections.OrderedDict, # This caused failure previously ) -> MyFuncOutputs: # This caused failure previously pass ``` * Fixed the compiler tests * Fixed crashes on print function Code `print(line, end="")` was causing error: "lib2to3.pgen2.parse.ParseError: bad input: type=22, value='=', context=('', (2, 15))" * Using the strip_hints library to strip the annotations * Updating test workflow yamls * Workaround for bug in untokenize * Switched to the new strip_string_to_string method * Fixed typo. Co-Authored-By: Jiaxiao Zheng <jxzheng@google.com> Co-authored-by: Jiaxiao Zheng <jxzheng@google.com>	2020-02-24 20:50:48 -08:00
Alexey Volkov	7ee3244f5b	SDK - Components - Fixed dict-style type annotations (#3107 ) Refactored `_data_passing.py` interface to expose functions instead of dictionaries.	2020-02-18 20:40:25 -08:00
Alexey Volkov	839198f502	SDK - Fixed the broken kfp.gcp.use_preemptible_nodepool extension (#3091 ) It was generating broken Kubernetes structures that made the workflow fail at submission time. Fixes https://github.com/kubeflow/pipelines/issues/2847	2020-02-14 17:27:28 -08:00
Yuan (Bob) Gong	02fabd306e	[Testing] Use google/cloud-sdk:279.0.0 to resolve workload identity flakiness (#3019 ) * [Testing] Use gke 1.15.8 to mitigate workload identity flakiness * Upgrade gcloud version * Update image builder image too * Turn on workload identity * Update deploy-cluster.sh * secret sample uses python3 instead * Increase xgboost time limit * Revert files with bad format * Update component and pipelines to use gcloud 279.0.0 * Fix secret sample using python3 * Upgrade frontend integration test image * Rebuild frontend integration test image	2020-02-11 18:34:07 -08:00
Alexey Volkov	4a1b282461	SDK - Compiler - Fixed ParallelFor argument resolving (#3029 ) * SDK - Compiler - Fixed ParallelFor name clashes The ParallelFor argument reference resolving was really broken. The logic "worked" like this - of the name of the referenced output contained the name of the loop collection source output, then it was considered to be the reference to the loop item. This broke lots of scenarios especially in cases where there were multiple components with same output name (e.g. the default "Output" output name). The logic also did not distinguish between references to the loop collection item vs. references to the loop collection source itself. I've rewritten the argument resolving logic, to fix the issues. * Argo cannot use {{item}} when withParams items are dicts * Stabilize the loop template names * Renamed the test case	2020-02-11 12:18:09 -08:00
Alexey Volkov	c83aff2738	SDK - Components - Made it easier to access component spec classes (#2860 ) * SDK - Components - Made it easier to access component spec classes * Updated the imports	2020-01-31 11:41:21 -08:00
Alexey Volkov	2d9f2524c1	SDK - Components refactoring (#2865 ) * SDK - Components refactoring This change is a pure refactoring of the implementation of component task creation. For pipelines compiled using the DSL compiler (the compile() function or the command-line program) nothing should change. The main goal of the refactoring is to change the way the component instantiation can be customized. Previously, the flow was like this: `ComponentSpec` + arguments --> `TaskSpec` --resolving+transform--> `ContainerOp` This PR changes it to more direct path: `ComponentSpec` + arguments --constructor--> `ContainerOp` or `ComponentSpec` + arguments --constructor--> `TaskSpec` or `ComponentSpec` + arguments --constructor--> `SomeCustomTask` The original approach where the flow always passes through `TaskSpec` had some issues since TaskSpec only accepts string arguments (and two other reference classes). This made it harder to handle custom types of arguments like PipelineParam or Channel. Low-level refactoring changes: Resolving of command-line argument placeholders has been extracted into a function usable by different task constructors. Changed `_components._created_task_transformation_handler` to `_components._container_task_constructor`. Previously, the handler was receiving a `TaskSpec` instance. Now it receives `ComponentSpec` + arguments [+ `ComponentReference`]. Moved the `ContainerOp` construction handler setup to the `kfp.dsl.Pipeline` context class as planned. Extracted `TaskSpec` creation to `_components._create_task_spec_from_component_and_arguments`. Refactored `_dsl_bridge.create_container_op_from_task` to `_components._resolve_command_line_and_paths` which returns `_ResolvedCommandLineAndPaths`. Renamed `_dsl_bridge._create_container_op_from_resolved_task` to `_dsl_bridge._create_container_op_from_component_and_arguments`. The signature of `_components._resolve_graph_task` was changed and it now returns `_ResolvedGraphTask` instead of modified `TaskSpec`. Some of the component tests still expect ContainerOp and its attributes. These tests will be changed later. * Adapted the _python_op tests * Fixed linter failure I do not want to add any top-level kfp imports in this file to prevent circular references. * Added docstrings * FIxed the return type forward reference	2020-01-25 08:39:01 -08:00
Alexey Volkov	f39cbdca70	SDL - DSL - Stabilized the PipelineVolume names (#2794 ) The name no longer depends on unset parameters or the version of the Kubernetes package. Needed for https://github.com/kubeflow/pipelines/pull/2780 Fixes https://travis-ci.com/kubeflow/pipelines/jobs/270786161	2020-01-03 18:07:40 -08:00
Jiaxiao Zheng	358e26adb1	[SDK/compiler] Sanitize op name for PipelineParam (#2711 ) * sanitize op name for pipeline param * refactor sanitization to compiler level, and add unittest	2019-12-27 18:01:39 -08:00
Alexey Volkov	27f7e77356	SDK - Unified the function signature parsing implementations (#2689 ) * Replaced `_instance_to_dict(obj)` with `obj.to_dict()` * Fixed the capitalization in _python_function_name_to_component_name It now only changes the case of the first letter. * Replaced the _extract_component_metadata function with _extract_component_interface * Stopped adding newline to the component description. * Handling None inputs and outputs * Not including emply inputs and outputs in component spec * Renamed the private attributes that the @pipeline decorator sets * Changged _extract_pipeline_metadata to use _extract_component_interface * Fixed issues based on feedback	2019-12-27 10:05:40 -08:00
Ilias Katsakioris	4624ac817d	SDK/DSL: Fix PipelineVolume name length (#2739 ) * SDK/DSL: Fix PipelineVolume name length Volume name must be no more than 63 characters Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com> * Change which part of the hash value we make use of Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com>	2019-12-18 12:52:04 -08:00
Yuan (Bob) Gong	4a8d262abb	Migrate standalone deployment to workload identity on GCP (#2619 ) * Script to set up workload identity for standalone deployment * Migrate tests to run on standalone + workload identity * Fix test script * Switch to static GSAs for testing, because they have name length limit * Add workload identity binding for argo * Fix argo workload identity bindings * Remove user-gcp-sa from tests * Remove use_gcp_secret from xgboost sample * Allow debugging tests locally * Wait for policies to take effect * Update deploy-pipeline-lite.sh * Update deploy-pipeline-lite.sh * [WIP] test gcloud auth list with test-runner sa * Add namespace * test again * Use new image builder * test again * Remove debug code * Remove usages of use_gcp_secret * Fix unit test and tensorboard pod template * Add debug code again to test * Try waiting until workload identity bindings are ready * Fix some other samples * Fix parameterized tfx oss sample * Add retry to image building * Try fixing tfx oss sample * Fix compiled tfx oss sample * Update all google/cloud-sdk to latest * Try fixing parameterized tfx oss sample again * Also verify pipeline-runner ksa is working * Fix parameterized_tfx_oss sample * Update gcp-workload-identity-setup.sh * Revert unneeded change * Pin to new google/cloud-sdk * Remove wrongly commited binaries	2019-12-16 22:05:58 -08:00
Alexey Volkov	b8a2e6f400	SDK/Compiler - Preventing pipeline entrypoint template name from clashing with other template names (#1555 ) Case exhibiting the problem: ``` def add(a, b): ... @dsl.pipeline(name="add') def some_name(): add(...) ```	2019-12-05 18:08:49 -08:00
Niklas Hansson	88b4757d5b	SDK - Python support for arbitrary secret, similar to ".use_gcp_secret('user-gcp-sa')" (#2639 ) * added new secret support * updated the documentation and env settings * updated after feedback * added tests * nameing issue fixed * renamed test to follow unittest standard * updated after feedback * the new test after renaming * added the test to main * updates after feedback * added licensce agreement * removed space * updated the volume named to be generated * secret_name as volume name and updated test * updated the file structure * fixed build	2019-12-03 12:00:59 -08:00
Jiaxiao Zheng	790fe99aca	[SDK] Relax k8s sanitization (#2634 ) * update * add allow_capital * fix * fix volume_ops sample * fix pipeline name sanitization * fix unittests * fix sanitization in _client.py * fix component output sanitization	2019-11-26 10:28:10 -08:00
Alexey Volkov	6eb00e7aec	SDK - Containers - Renamed constructor parameter in the private ContainerBuilder class (#2261 )	2019-11-07 15:54:27 -08:00
Alexey Volkov	d315bf654c	SDK - DSL - Deprecated ArtifactLocation (#2326 ) * SDK - DSL - Deprecated the per-task artifact_location * Removed artifact_location from the docstring * Deprecated ArtifactLocation	2019-11-05 19:12:59 -08:00
Alexey Volkov	1282f16335	SDK - Python components - Fixed bug when mixing file outputs with return value outputs (#2473 )	2019-10-23 19:45:05 -07:00
Alexey Volkov	681d873fc7	SDK - Components - Added type to graph input references (#2451 ) This makes the graph input references consistent with task output references. This is a breaking change, but the graph components are not exposed in the documentation or samples yet.	2019-10-23 17:03:05 -07:00
Alexey Volkov	4c24650e5f	SDK - Tests - Fixed most of the test warnings (#2336 )	2019-10-22 18:06:13 -07:00
Alexey Volkov	735e627a03	SDK - Refactoring - Split the K8sHelper class (#2333 ) * SDK - Refactoring - Split the K8sHelper class One part was only used by container builder and provided higher-level API over K8s Client. Another was used by the compiler and did not use the kubernetes library. * Updated the license year.	2019-10-21 14:57:22 -07:00
Alexey Volkov	fd6c756dd2	SDK - DSL - Make is_exit_handler unnecessary in ContainerOp (#2411 ) Fixed two broken tests. The tests did not have `is_exit_handler=True` which was required before this commit.	2019-10-16 13:26:15 -07:00
Alexey Volkov	f4d689b4ed	SDK - Python components - Fixed handling multiline decorators (#2345 ) * SDK - Python components - Fixed handling multiline decorators * Switched to using dedent * Added error checking * Testing multiline decorator * Test calling the component created from decorated function Also fixed `helper_test_component_against_func_using_local_call`.	2019-10-16 12:17:29 -07:00
Alexey Volkov	8025511c30	SDK - Added version (#2374 )	2019-10-14 15:35:51 -07:00
Alexey Volkov	1b6047aa69	SDK - Improve errors when ContainerOp.output is unavailable (#1578 ) * SDK - Improve errors when ContainerOp.output is unavailable ContainerOp.output is only available when there is only one output. Right now, when there are multiple outputs it just holds `None` instead of the a task output reference. In this case however it's indistinguishable from just passing None argument. This PR gives a quick fix to make accessing the nonexistent `.output` a compile-time error. * Fixed the implementation and added tests * Trigger retests	2019-10-11 18:20:40 -07:00
Alexey Volkov	dc8cd7a8eb	SDK - Containers - Added support for container image cache (#2216 ) * SDK - Containers - Added support for container image cache This change makes `build_image_from_working_dir` fast when the working directory has not changed between invocations. We cache pushed container images using specially-calculated context directory hash as the cache key. * Moved the import to the top	2019-10-11 15:10:04 -07:00
Alexey Volkov	03da0a2cce	SDK - Tests - Test creating component from the real AutoML pipeline (#2314 ) * SDK - Tests - Test creating component from the real AutoML pipeline Creating component from the AutoML retail_product_stockout_prediction pipeline. * Ignoring flake8 error E821	2019-10-08 13:39:50 -07:00
Alexey Volkov	181de66cf9	SDK - Compiler - Move Argo volume specifications to templates (#2229 ) * SDK - Compiler - Move volumes to templates Argo v2.3.0+ supports per-template volume specs similiar to Kubernetes. Prior to version 2.3.0 Argo only supported workflow-level volume specs. We had several outstanding issues caused by the need to put all volumes in the same place. There was also the issue with input parameter reference placeholders in volume specifications which were placed outside their home templates declaring the inputs. This change fixes those issues. * Removed dead code line	2019-10-07 16:55:12 -07:00
Alexey Volkov	71c7100083	SDK - Containers - Made python package installation more robust (#2316 ) Fixes https://github.com/kubeflow/pipelines/issues/2252 On some systems (e.g. in DL VM containers) `pip3` does not point to the same environment as `python3`.	2019-10-07 13:35:11 -07:00
Ilias Katsakioris	a77d8e9d03	SDK/DSL: ContainerOp.add_pvolume - Fix volume passed in add_volume (#2306 ) Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com>	2019-10-04 19:59:12 -07:00
Alexey Volkov	052a6ac0ce	SDK - Components - Reorganized TaskSpec execution options (#2270 ) This part of the spec was unused, so this is not a breaking change. Consolidating Kubernetes-related options under a single attribute: `TaskSpec.execution_options.kubernetes_options`. `TaskSpec.k8s_container_options` -> `TaskSpec.execution_options.kubernetes_options.main_container` `TaskSpec.k8s_pod_options.spec` -> `TaskSpec.execution_options.kubernetes_options.pod_spec` Added `TaskSpec.execution_options.retry_strategy.max_tetries` attribute.	2019-10-02 18:44:08 -07:00
Alexey Volkov	be4f5851ed	SDK - Components - Creating graph components from python pipeline function (#2273 ) * SDK/Components - Creating graph components from python pipeline function `create_graph_component_from_pipeline_func` converts python pipeline function to a graph component object that can be saved, shared, composed or submitted for execution. Example: producer_op = load_component(component_with_0_inputs_and_2_outputs) processor_op = load_component(component_with_2_inputs_and_2_outputs) def pipeline1(pipeline_param_1: int): producer_task = producer_op() processor_task = processor_op(pipeline_param_1, producer_task.outputs['Output 2']) return OrderedDict([ ('Pipeline output 1', producer_task.outputs['Output 1']), ('Pipeline output 2', processor_task.outputs['Output 2']), ]) graph_component = create_graph_component_from_pipeline_func(pipeline1) * Changed the signatures of exported functions Non-public create_graph_component_spec_from_pipeline_func creates ComponentSpec Public create_graph_component_from_pipeline_func creates component and writes it to file. * Switched to using _extract_component_interface to analyze function signature Stopped humanizing the input names for now. I think it's benefitial to extract interface from function signature the same way for both container and graph python components. * Support outputs declared using pipeline function's return annotation * Cleaned up the test * Stop including the whole parent tasks in task output references * By default, do not include task component specs in the graph component Remove the component spec from component reference unless it will make the reference empty or unless explicitly asked by the user * Exported the create_graph_component_from_pipeline_func function * Fixed imports * Updated the copyright year.	2019-10-02 16:20:07 -07:00
Alexey Volkov	c676b838ef	SDK - Lightweight - Added package installation support to func_to_container_op (#2245 ) * SDK - Refactoring - Passing the parameters explicitly in python_op. This helps avoid problems when new parameters are added. * SDK - Components - Added package installation support to func_to_container_op Example: ```python op = func_to_container_op(my_func, packages_to_install=['pandas==0.24']) ``` * Make pip quieter * Added the test_packages_to_install_feature test	2019-09-30 19:13:32 -07:00

1 2 3 4 5 ...

321 Commits