* SDK - Made outputs with original names available in ContainerOp.outputs
Previously, ContainerOp had strict requirements for the output names, so we had to convert all the names before passing them to the ContainerOp constructor. Outputs with non-pythonic names could not be accessed using their original names.
Now ContainerOp supports any output names, so we're now using the original output names.
However to support legacy pipelines, we're also adding output references with pythonic names.
* Fixed the compiler test data
* Fixed the duplicate parameter outputs in the compiled workflow
* Fixed long line
* Stabilized the output naming conflict resolution
* Fix case of missing special outputs
* SDK - Components - Calculate component hash digest
The digest is calculated when loading the component from URL, tfile or text.
Slightly refactored component loading - streams are no longer used, only bytes.
TODO: Calculate the digest if missing
TODO: Report possible digest conflicts
* Updated the test graph component
* Using the actual digest in the test
* SDK - Prioritize lib2to3 when stripping type annotations
It's a standard python library (although not well supported) and it doe not leave training spaces.
* Fixed compiler test data
* SDK - Annotate pods with component_ref
This preserves the information about the digest of the component and the location from which the component was loaded.
* Fixed compiler tests
* Add kfp-container-builder sa
* Allow service account to be configurable
* Fix tests
* Fix test
* Use documentation for service account to introduce compatibility with different types of installation
* updated doc
* clean up
* Update container_builder_test.py
* Update _build_image_api.py
* Update kustomization.yaml
* Add executable permission for presubmit tests mkp.sh
Added test_fail_on_handling_list_arguments_containing_python_objects
Added test_handling_list_arguments_containing_serializable_python_objects
Moved test_handling_list_arguments_containing_pipelineparam to component_bridge_tests
* SDK - Tests - Testing command-line resolving explicitly
After the recent small refactoring of the task resolving flow in the component library, some tests we left unupdated with compatibility shims added to make the tests pass.
This PR updates the remaining tests and removes the shims.
This mostly involves using explicitly using `_resolve_command_line_and_paths`.
Some tests that validate the behavior of the dsl bridge were moved to `component_bridge_tests.py`
* Indented the component texts
* SDK/DSL: Enable the deletion of a resource via ResourceOp method
* Add the method delete() to ResourceOps
* Extend ResourceOp & VolumeOp tests
Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com>
* Fix ValueError not being raised
* SDK - Reduce python component limitations - no import errors for custom type annotations
By default, create_component_from_func copies the source code of the function and creates a component using that source code. No global imports are captured. This is problematic for the function definition, since any annotation, that uses a type that needs to be imported, will cause error. There were some special provisions for
NamedTuple, InputPath and OutputPath, but even they were brittle (for example, "typing.NamedTuple" or "components.InputPath" annotations still caused failures at runtime).
This commit fixes the issue by stripping the type annotations from function declarations.
Fixes cases that were failing before:
```python
import typing
import collections
MyFuncOutputs = typing.NamedTuple('Outputs', [('sum', int), ('product', int)])
@create_component_from_func
def my_func(
param1: CustomType, # This caused failure previously
param2: collections.OrderedDict, # This caused failure previously
) -> MyFuncOutputs: # This caused failure previously
pass
```
* Fixed the compiler tests
* Fixed crashes on print function
Code `print(line, end="")` was causing error: "lib2to3.pgen2.parse.ParseError: bad input: type=22, value='=', context=('', (2, 15))"
* Using the strip_hints library to strip the annotations
* Updating test workflow yamls
* Workaround for bug in untokenize
* Switched to the new strip_string_to_string method
* Fixed typo.
Co-Authored-By: Jiaxiao Zheng <jxzheng@google.com>
Co-authored-by: Jiaxiao Zheng <jxzheng@google.com>
* [Testing] Use gke 1.15.8 to mitigate workload identity flakiness
* Upgrade gcloud version
* Update image builder image too
* Turn on workload identity
* Update deploy-cluster.sh
* secret sample uses python3 instead
* Increase xgboost time limit
* Revert files with bad format
* Update component and pipelines to use gcloud 279.0.0
* Fix secret sample using python3
* Upgrade frontend integration test image
* Rebuild frontend integration test image
* SDK - Compiler - Fixed ParallelFor name clashes
The ParallelFor argument reference resolving was really broken.
The logic "worked" like this - of the name of the referenced output
contained the name of the loop collection source output, then it was
considered to be the reference to the loop item.
This broke lots of scenarios especially in cases where there were
multiple components with same output name (e.g. the default "Output"
output name). The logic also did not distinguish between references to
the loop collection item vs. references to the loop collection source
itself.
I've rewritten the argument resolving logic, to fix the issues.
* Argo cannot use {{item}} when withParams items are dicts
* Stabilize the loop template names
* Renamed the test case
* SDK - Components refactoring
This change is a pure refactoring of the implementation of component task creation.
For pipelines compiled using the DSL compiler (the compile() function or the command-line program) nothing should change.
The main goal of the refactoring is to change the way the component instantiation can be customized.
Previously, the flow was like this:
`ComponentSpec` + arguments --> `TaskSpec` --resolving+transform--> `ContainerOp`
This PR changes it to more direct path:
`ComponentSpec` + arguments --constructor--> `ContainerOp`
or
`ComponentSpec` + arguments --constructor--> `TaskSpec`
or
`ComponentSpec` + arguments --constructor--> `SomeCustomTask`
The original approach where the flow always passes through `TaskSpec` had some issues since TaskSpec only accepts string arguments (and two
other reference classes). This made it harder to handle custom types of arguments like PipelineParam or Channel.
Low-level refactoring changes:
Resolving of command-line argument placeholders has been extracted into a function usable by different task constructors.
Changed `_components._created_task_transformation_handler` to `_components._container_task_constructor`. Previously, the handler was receiving a `TaskSpec` instance. Now it receives `ComponentSpec` + arguments [+ `ComponentReference`].
Moved the `ContainerOp` construction handler setup to the `kfp.dsl.Pipeline` context class as planned.
Extracted `TaskSpec` creation to `_components._create_task_spec_from_component_and_arguments`.
Refactored `_dsl_bridge.create_container_op_from_task` to `_components._resolve_command_line_and_paths` which returns `_ResolvedCommandLineAndPaths`.
Renamed `_dsl_bridge._create_container_op_from_resolved_task` to `_dsl_bridge._create_container_op_from_component_and_arguments`.
The signature of `_components._resolve_graph_task` was changed and it now returns `_ResolvedGraphTask` instead of modified `TaskSpec`.
Some of the component tests still expect ContainerOp and its attributes.
These tests will be changed later.
* Adapted the _python_op tests
* Fixed linter failure
I do not want to add any top-level kfp imports in this file to prevent circular references.
* Added docstrings
* FIxed the return type forward reference
* Replaced `_instance_to_dict(obj)` with `obj.to_dict()`
* Fixed the capitalization in _python_function_name_to_component_name
It now only changes the case of the first letter.
* Replaced the _extract_component_metadata function with _extract_component_interface
* Stopped adding newline to the component description.
* Handling None inputs and outputs
* Not including emply inputs and outputs in component spec
* Renamed the private attributes that the @pipeline decorator sets
* Changged _extract_pipeline_metadata to use _extract_component_interface
* Fixed issues based on feedback
* SDK/DSL: Fix PipelineVolume name length
Volume name must be no more than 63 characters
Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com>
* Change which part of the hash value we make use of
Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com>
* Script to set up workload identity for standalone deployment
* Migrate tests to run on standalone + workload identity
* Fix test script
* Switch to static GSAs for testing, because they have name length limit
* Add workload identity binding for argo
* Fix argo workload identity bindings
* Remove user-gcp-sa from tests
* Remove use_gcp_secret from xgboost sample
* Allow debugging tests locally
* Wait for policies to take effect
* Update deploy-pipeline-lite.sh
* Update deploy-pipeline-lite.sh
* [WIP] test gcloud auth list with test-runner sa
* Add namespace
* test again
* Use new image builder
* test again
* Remove debug code
* Remove usages of use_gcp_secret
* Fix unit test and tensorboard pod template
* Add debug code again to test
* Try waiting until workload identity bindings are ready
* Fix some other samples
* Fix parameterized tfx oss sample
* Add retry to image building
* Try fixing tfx oss sample
* Fix compiled tfx oss sample
* Update all google/cloud-sdk to latest
* Try fixing parameterized tfx oss sample again
* Also verify pipeline-runner ksa is working
* Fix parameterized_tfx_oss sample
* Update gcp-workload-identity-setup.sh
* Revert unneeded change
* Pin to new google/cloud-sdk
* Remove wrongly commited binaries
* added new secret support
* updated the documentation and env settings
* updated after feedback
* added tests
* nameing issue fixed
* renamed test to follow unittest standard
* updated after feedback
* the new test after renaming
* added the test to main
* updates after feedback
* added licensce agreement
* removed space
* updated the volume named to be generated
* secret_name as volume name and updated test
* updated the file structure
* fixed build
This makes the graph input references consistent with task output references.
This is a breaking change, but the graph components are not exposed in the documentation or samples yet.
* SDK - Refactoring - Split the K8sHelper class
One part was only used by container builder and provided higher-level API over K8s Client.
Another was used by the compiler and did not use the kubernetes library.
* Updated the license year.
* SDK - Python components - Fixed handling multiline decorators
* Switched to using dedent
* Added error checking
* Testing multiline decorator
* Test calling the component created from decorated function
Also fixed `helper_test_component_against_func_using_local_call`.
* SDK - Improve errors when ContainerOp.output is unavailable
ContainerOp.output is only available when there is only one output.
Right now, when there are multiple outputs it just holds `None` instead of the a task output reference.
In this case however it's indistinguishable from just passing None argument.
This PR gives a quick fix to make accessing the nonexistent `.output` a compile-time error.
* Fixed the implementation and added tests
* Trigger retests
* SDK - Containers - Added support for container image cache
This change makes `build_image_from_working_dir` fast when the working directory has not changed between invocations.
We cache pushed container images using specially-calculated context directory hash as the cache key.
* Moved the import to the top
* SDK - Tests - Test creating component from the real AutoML pipeline
Creating component from the AutoML retail_product_stockout_prediction pipeline.
* Ignoring flake8 error E821