* Use pipeline_spec proto from kfp-pipeline-spec package
* update imports
* use pre-released kfp-pipeline-spec (temporarily)
* Revert "use pre-released kfp-pipeline-spec (temporarily)"
This reverts commit 77f2e9a39c.
* test_requires
* version
* make kfp a namespace package
* move to the top
* version rc0
* Fix bug where we missed injecting importer node
* moved files
* address review comments
* Add InputUriPlaceholder and OutputUriPlaceholder
* support uri placholder in v2
* lint
* test
* Preserve a test case with inputPath and outputPath usage.
* fix ut
* fix import and setup
* address comments
* skeleton of code
* commit resource spec in IR proto
* add resource setter
* add accelerator setters
* fix unit conversion
* fix attribute proxy
* add and fix unittests
* add e2e test
* clean up
* clean up
* clean up
* clean up
* bypass subclass overriding
* clean up
* clean up
* clean up
* resolve comments
Reverting most of the #2334 which inadvertently broke those artifacts by causing the names to be mangled.
KFP's DSL compiler prepends template names to output names to ensure global uniqueness of *input* names (DSL's ContainerOp does not have concept of inputs, so the inputs are generated during the compilation including input names). But prepending template names to the output names stops the backend from recognizing the mlpipeline-ui-metadata and mlpipeline-metrics artifacts.
A recent PR has added changes architecturally belonged to a different module (the component bridge). This has introduced unintended dependencies and couplings between the modules. This PR restores the module separation. It also makes the code simpler.
* Compile IR proto in setup.py
* compile to IR
* Fix importer node logic and lint
* cleanup and lint
* merge, undo setup.py change
* cleanup and lint
* remove currently unused code
* format _component_bridge.py
* cleanup and format
* cleanup
* upgrade protobuf in test
* restructure and test
* address review comments
* fix bug
* avoid f-strings formatting
* address review comments
* address review comments
* limit the primitive types to only int, double, and string.
* Fix test for python3.5
* use instance_schema instead of schema_title
* add v2 to setup.py
* address review comments
* move the tests closer to the code
* add more tests
* cleanup and linting
* add more tests
* fix bug on input paramter connection
* linting
* restructure tests
* fix python3.5 test failure
* support outputs.parameters placeholder
* remove pipeline decorator from v2.dsl
Previously the process that was used to resolve a child task of a graph component was convoluted:
* Generate a dynamic task factory function for the child task component
* Convert input argument names from original to pythonic names
* Call the generated dynamic factory function using the python arguments to get back a task object
* Convert the task object outputs from pythonic back to original names (recently removed)
This PR significantly simplifies this process o just:
* Directly construct a task object based on the task component and the original arguments
ContainerOp has no concept of inputs, so it looses any information about them such as input names and in some cases even the passed argument values (which are just injected into the command line).
This commit fixes that issue by preserving the paramater arguments map and ultimately storing it in an Argo template annotation.
Fixes https://github.com/kubeflow/pipelines/issues/4556
* SDK - Components - Added Bool as a known type name
Some components are already using this type name and are starting to fail due to more strict type checking during constant argument serialization.
* Fixed syntax error
* add tests for pythonic and non-pythonic component outputs
* fix: graph for non-pythonic container output's names
Loading container component from component.yaml creates both
pythonic and original output names. Graph component iterated over
all outputs, using pythonic-to-output conversion on all. If some
of the names are not identical to their pythonic versions, they
rised KeyError on the lookup table.
This commit fixes this problem by using default value for the lookup.
* remove depythonification of outputs - not needed anymore
* branch pipeline IR proto under sdk
* Update ImporterSpec uri type
* proto generated file
* remove the copy of pipeline_spec.proto from sdk
* move proto generated file to under kfp/v2
* add readme
* regenerate proto and add copyright
In some cases, it may need extra header to handle the API calls.
Directly add `header_name` and `header_value` props in
`.config/kfp/context.json` and pass the name/value pair
to APIClient.
Here is one of the use cases:
if the service is protected by istio RBAC and need JWT header
for authentication, you can specify the JWT id-token in
the .config/kfp/context.json with these two new props.
The id-token would be carried in the specified header name.
Then the API call can be properly authentiecated and checked
if the user has the permission to access the service.
* Prepare SDK docs environment so its easier to understand how to build the docs locally so theyre consistent with ReadTheDocs.
* Clean up docstrings for kfp.Client
* Add in updates to the docs for compiler and components
* Update components area to add in code references and make formatting a little more consistent.
* Clean up containers, add in custom CSS to ensure we do not overflow on inline code blocks
* Clean up containers, add in custom CSS to ensure we do not overflow on inline code blocks
* Remove unused kfp.notebook package links
* Clean up a few more errant references
* Clean up the DSL docs some more
* Update SDK docs for KFP extensions to follow Sphinx guidelines
* Clean up formatting of docstrings after Ark-Kuns comments
* SDK - Components - Replaced Kubernetes options with generic launcher options
This reduces the schema size and makes the task launcher options more flexible.
* Removed the launcherOptions for now
* SDK - Added warning when not using components
We have long advised our users to create reusable components.
Creating reusable components is as easy as creating ContainerOp instances, but the components are shareable, portable and are easier to support going forward.
* Disable warning for TFX
* Fixed the warning disabling logic
* Added tests
* Update _client.py
* Allow for passing name instead of only pipeline_id
* fixed old rebase issue
* protobuf fix
* raise error and remove f-string
* moved file
* updated python proto packages versions
* Update sdk/python/kfp/_client.py
* restructured and added to build
* changes structure of import
* Updated the files
* further updates of client and filter.proto
* alternative
* clean up
* clean up
* remove helperfiles
* futher clean up
* clean up or eaither name or id
* update doc strings
* remove page_size
* Update sdk/python/kfp/_client.py
Co-authored-by: Jiaxiao Zheng <jxzheng@google.com>
* Update sdk/python/kfp/cli/pipeline.py
Co-authored-by: Jiaxiao Zheng <jxzheng@google.com>
* updated to classical string formattin
Co-authored-by: Jiaxiao Zheng <jxzheng@google.com>
* SDK - Compiler - Fixed the input argument mapping when using dsl.graph_component
Fixes https://github.com/kubeflow/pipelines/issues/3915
* Stopped relying on the argument order at all
This can make the compilation less fragile.
* SDK - Compiler - Added support for volume-based data passing
Currently artifact passing is performed by Argo sidecar containers what download input data and upload output data to artifact repository (usually, S3-compatible blob storage like Minio).
The performance of this method is not optimal and it requires that pod disks have enough capacity to hold all artifact data.
This commit adds support for volume-based data passing.
This method involves using a single milti-write Kubernetes data volume to pass all intermediate data.
Parts of the volume are mounted to the input/output artifact directories, so when the user program reads and writes files, the files actually reside in the data volume.
This method improves the performance and reduces storage resource requirements.
The data volume must exist and support "READ_WRITE_MANY".
Limitations:
* All artifact file names must be the same (e.g. "data"). All auto-generated paths are already consistent. Avoid using any hard-coded paths.
* Passing constant values (text) as arguments for artifact inputs is not supported.
* The feature is experimental.
* Added data_passing_methods.KubernetesVolume
This class represents a configured volume-based artifact passing method.
* Added PipelineConf.data_passing_method
This property allows setting the method that will be used for intermediate data passing.
Added the compiler support for the new feature.
Example:
```python
from kfp.dsl import PipelineConf, data_passing_methods
from kubernetes.client.models import V1Volume, V1PersistentVolumeClaim
pipeline_conf = PipelineConf()
pipeline_conf.data_passing_method = data_passing_methods.KubernetesVolume(
volume=V1Volume(
name='data',
persistent_volume_claim=V1PersistentVolumeClaim('data-volume'),
),
path_prefix='artifact_data/',
)
```
* Added unit test
* Fixed bug in the unit test
Kubernetes does not validate the structures at all...
* Fixed bug in the result structure
* Fixed the test data
The class should be V1PersistentVolumeClaimVolumeSource, not V1PersistentVolumeClaimSpec.
* Fixed the test
Previously the default image was set to an old version of tensorflow image. That image is now outdated. It's also framework-specific and pretty big.
We're switching to the official python image which is small, official and framework-agnostic.
The users can easily switch to the old behavior by just specifying `base_image='tensorflow/tensorflow:1.13.2-py3'` during the component creation.
Removed path resolving in two tests.
The `.resolve()` should be harmless, but I've seen these tests fail to work in some unusual strict hermetic systems. The test code file path was resolving to a tree which was different from the parent of the file, so the test could not find the test data.
* Fix#3906 - check that ops to be transformed is a containerOp
* Update docstring for add_op_transformer to clarify that not only containerOp will be transformed.
* SDK - Compiler - Using properly serialized pipeline parameter defaults
Fixes https://github.com/kubeflow/pipelines/issues/3806
* Sort the keys so that the serialized defaults are stable in python 3.5
* SDK - Components - Stabilize JSON serialization by sorting keys
Otherwise serialization of the default values of the component/pipeline inputs is unstable on Python 3.5.
* Fixed the test data