* fix(sdk): compile ParallelFor in a deterministic manner
During compilataion ParallelFor components end up with randomized names,
which makes it very inconvenient to compare two versions of a pipeline.
This commit fixes this issue.
* fix(sdk): fix new parallel-for test cases
* add test for keyword-only arguments in pipeline func
* fix: kwargs-only argument for pipeline func
* test: kwargs generate same yaml as args
* remove whole metadata
* assert -> self.assertEqual
* programmatic example --> fixed example
* same name for both
Co-authored-by: Alexey Volkov <alexey.volkov@ark-kun.com>
* SDK - Components - Fixed python components that use \n
The escape sequence was being replaced by the `echo` command.
Apparently, unlike in the `bash` shell, the `echo` command of the `sh` shell expands the escape sequences by default and does not support an option to turn it off. (For some reason the -n option works properly even though it should not).
Fixes https://github.com/kubeflow/pipelines/issues/4939
* Fixed the test data
* Fixed the deprecated container component builder
* Fixed the new compiler test case
* Added test
* add placeholder to spec
* add output_directory to pipeline
* respect uri placeholder in file outputs
* wip: add data passing rewriting logic to respect the uri semantics
* merge input_uri and paths when instantiating ContainerOp
* fix
* fix workflow rewriting
* Add topology rewriting
* add a test case, and various fixes
* make the test case more complex
* Fix the case when working with OpsGroup
* Fix test case
* fix resolving test
* fix redundant cmd lines
* fix redundant cmd lines
* resolve comments
* fix file outputs
* resolve comments
* copy file outputs instead of modifying inplace.
Currently were running the python code inline using `python -c <code>`.
This has two issues:
1) Python does not show source code line in exception stack traces
2) inspect.getsource does not work. This method is used in PyTorch JIT for example.
We solve these issues by writing the code into a file before executing it.
The disadvantage of the new approach is that it adds complexity, a filesystem write operation and also requires the `sh` executable to be present (we could replace it with python-based program if needed).
* feat(sdk): add ability to set retry policy
This fixes the second part of the issue described in #4333
The first part was addressed in #4392
* feat(sdk): validate retry policy name
* feat(sdk): simplify retry policy interface
* Compile IR proto in setup.py
* compile to IR
* Fix importer node logic and lint
* cleanup and lint
* merge, undo setup.py change
* cleanup and lint
* remove currently unused code
* format _component_bridge.py
* cleanup and format
* cleanup
* upgrade protobuf in test
* restructure and test
* address review comments
* fix bug
* avoid f-strings formatting
* address review comments
* address review comments
* limit the primitive types to only int, double, and string.
* Fix test for python3.5
* use instance_schema instead of schema_title
* add v2 to setup.py
* address review comments
* move the tests closer to the code
* add more tests
* cleanup and linting
* add more tests
* fix bug on input paramter connection
* linting
* restructure tests
* fix python3.5 test failure
* support outputs.parameters placeholder
* remove pipeline decorator from v2.dsl
ContainerOp has no concept of inputs, so it looses any information about them such as input names and in some cases even the passed argument values (which are just injected into the command line).
This commit fixes that issue by preserving the paramater arguments map and ultimately storing it in an Argo template annotation.
Fixes https://github.com/kubeflow/pipelines/issues/4556
* SDK - Compiler - Fixed the input argument mapping when using dsl.graph_component
Fixes https://github.com/kubeflow/pipelines/issues/3915
* Stopped relying on the argument order at all
This can make the compilation less fragile.
* SDK - Tests - Restored the ParallelFor compiler test data
Fixes https://github.com/kubeflow/pipelines/issues/4102
* Removed the pipeline-sdk-type annotations
* Fixed the test_artifact_passing_using_volume test data
* SDK - Compiler - Added support for volume-based data passing
Currently artifact passing is performed by Argo sidecar containers what download input data and upload output data to artifact repository (usually, S3-compatible blob storage like Minio).
The performance of this method is not optimal and it requires that pod disks have enough capacity to hold all artifact data.
This commit adds support for volume-based data passing.
This method involves using a single milti-write Kubernetes data volume to pass all intermediate data.
Parts of the volume are mounted to the input/output artifact directories, so when the user program reads and writes files, the files actually reside in the data volume.
This method improves the performance and reduces storage resource requirements.
The data volume must exist and support "READ_WRITE_MANY".
Limitations:
* All artifact file names must be the same (e.g. "data"). All auto-generated paths are already consistent. Avoid using any hard-coded paths.
* Passing constant values (text) as arguments for artifact inputs is not supported.
* The feature is experimental.
* Added data_passing_methods.KubernetesVolume
This class represents a configured volume-based artifact passing method.
* Added PipelineConf.data_passing_method
This property allows setting the method that will be used for intermediate data passing.
Added the compiler support for the new feature.
Example:
```python
from kfp.dsl import PipelineConf, data_passing_methods
from kubernetes.client.models import V1Volume, V1PersistentVolumeClaim
pipeline_conf = PipelineConf()
pipeline_conf.data_passing_method = data_passing_methods.KubernetesVolume(
volume=V1Volume(
name='data',
persistent_volume_claim=V1PersistentVolumeClaim('data-volume'),
),
path_prefix='artifact_data/',
)
```
* Added unit test
* Fixed bug in the unit test
Kubernetes does not validate the structures at all...
* Fixed bug in the result structure
* Fixed the test data
The class should be V1PersistentVolumeClaimVolumeSource, not V1PersistentVolumeClaimSpec.
* Fixed the test
Previously the default image was set to an old version of tensorflow image. That image is now outdated. It's also framework-specific and pretty big.
We're switching to the official python image which is small, official and framework-agnostic.
The users can easily switch to the old behavior by just specifying `base_image='tensorflow/tensorflow:1.13.2-py3'` during the component creation.
* SDK - Components - Stabilize JSON serialization by sorting keys
Otherwise serialization of the default values of the component/pipeline inputs is unstable on Python 3.5.
* Fixed the test data
* add OOB component dict and utility function
* add test
* add a transformer, which appends the component name label
* add transformer function, compiler and test
* move telemetry test
* fix none uri
* applies comments
* revert dependency on frozendict
* fixes some tests
* resolve comments
* SDK - Made outputs with original names available in ContainerOp.outputs
Previously, ContainerOp had strict requirements for the output names, so we had to convert all the names before passing them to the ContainerOp constructor. Outputs with non-pythonic names could not be accessed using their original names.
Now ContainerOp supports any output names, so we're now using the original output names.
However to support legacy pipelines, we're also adding output references with pythonic names.
* Fixed the compiler test data
* Fixed the duplicate parameter outputs in the compiled workflow
* Fixed long line
* Stabilized the output naming conflict resolution
* Fix case of missing special outputs
* SDK - Prioritize lib2to3 when stripping type annotations
It's a standard python library (although not well supported) and it doe not leave training spaces.
* Fixed compiler test data
* SDK - Annotate pods with component_ref
This preserves the information about the digest of the component and the location from which the component was loaded.
* Fixed compiler tests
* Add kfp-container-builder sa
* Allow service account to be configurable
* Fix tests
* Fix test
* Use documentation for service account to introduce compatibility with different types of installation
* updated doc
* clean up
* Update container_builder_test.py
* Update _build_image_api.py
* Update kustomization.yaml
* Add executable permission for presubmit tests mkp.sh
* SDK - Reduce python component limitations - no import errors for custom type annotations
By default, create_component_from_func copies the source code of the function and creates a component using that source code. No global imports are captured. This is problematic for the function definition, since any annotation, that uses a type that needs to be imported, will cause error. There were some special provisions for
NamedTuple, InputPath and OutputPath, but even they were brittle (for example, "typing.NamedTuple" or "components.InputPath" annotations still caused failures at runtime).
This commit fixes the issue by stripping the type annotations from function declarations.
Fixes cases that were failing before:
```python
import typing
import collections
MyFuncOutputs = typing.NamedTuple('Outputs', [('sum', int), ('product', int)])
@create_component_from_func
def my_func(
param1: CustomType, # This caused failure previously
param2: collections.OrderedDict, # This caused failure previously
) -> MyFuncOutputs: # This caused failure previously
pass
```
* Fixed the compiler tests
* Fixed crashes on print function
Code `print(line, end="")` was causing error: "lib2to3.pgen2.parse.ParseError: bad input: type=22, value='=', context=('', (2, 15))"
* Using the strip_hints library to strip the annotations
* Updating test workflow yamls
* Workaround for bug in untokenize
* Switched to the new strip_string_to_string method
* Fixed typo.
Co-Authored-By: Jiaxiao Zheng <jxzheng@google.com>
Co-authored-by: Jiaxiao Zheng <jxzheng@google.com>
* [Testing] Use gke 1.15.8 to mitigate workload identity flakiness
* Upgrade gcloud version
* Update image builder image too
* Turn on workload identity
* Update deploy-cluster.sh
* secret sample uses python3 instead
* Increase xgboost time limit
* Revert files with bad format
* Update component and pipelines to use gcloud 279.0.0
* Fix secret sample using python3
* Upgrade frontend integration test image
* Rebuild frontend integration test image
* SDK - Compiler - Fixed ParallelFor name clashes
The ParallelFor argument reference resolving was really broken.
The logic "worked" like this - of the name of the referenced output
contained the name of the loop collection source output, then it was
considered to be the reference to the loop item.
This broke lots of scenarios especially in cases where there were
multiple components with same output name (e.g. the default "Output"
output name). The logic also did not distinguish between references to
the loop collection item vs. references to the loop collection source
itself.
I've rewritten the argument resolving logic, to fix the issues.
* Argo cannot use {{item}} when withParams items are dicts
* Stabilize the loop template names
* Renamed the test case
* Replaced `_instance_to_dict(obj)` with `obj.to_dict()`
* Fixed the capitalization in _python_function_name_to_component_name
It now only changes the case of the first letter.
* Replaced the _extract_component_metadata function with _extract_component_interface
* Stopped adding newline to the component description.
* Handling None inputs and outputs
* Not including emply inputs and outputs in component spec
* Renamed the private attributes that the @pipeline decorator sets
* Changged _extract_pipeline_metadata to use _extract_component_interface
* Fixed issues based on feedback
* Script to set up workload identity for standalone deployment
* Migrate tests to run on standalone + workload identity
* Fix test script
* Switch to static GSAs for testing, because they have name length limit
* Add workload identity binding for argo
* Fix argo workload identity bindings
* Remove user-gcp-sa from tests
* Remove use_gcp_secret from xgboost sample
* Allow debugging tests locally
* Wait for policies to take effect
* Update deploy-pipeline-lite.sh
* Update deploy-pipeline-lite.sh
* [WIP] test gcloud auth list with test-runner sa
* Add namespace
* test again
* Use new image builder
* test again
* Remove debug code
* Remove usages of use_gcp_secret
* Fix unit test and tensorboard pod template
* Add debug code again to test
* Try waiting until workload identity bindings are ready
* Fix some other samples
* Fix parameterized tfx oss sample
* Add retry to image building
* Try fixing tfx oss sample
* Fix compiled tfx oss sample
* Update all google/cloud-sdk to latest
* Try fixing parameterized tfx oss sample again
* Also verify pipeline-runner ksa is working
* Fix parameterized_tfx_oss sample
* Update gcp-workload-identity-setup.sh
* Revert unneeded change
* Pin to new google/cloud-sdk
* Remove wrongly commited binaries