* SDK - Made outputs with original names available in ContainerOp.outputs
Previously, ContainerOp had strict requirements for the output names, so we had to convert all the names before passing them to the ContainerOp constructor. Outputs with non-pythonic names could not be accessed using their original names.
Now ContainerOp supports any output names, so we're now using the original output names.
However to support legacy pipelines, we're also adding output references with pythonic names.
* Fixed the compiler test data
* Fixed the duplicate parameter outputs in the compiled workflow
* Fixed long line
* Stabilized the output naming conflict resolution
* Fix case of missing special outputs
* SDK - Annotate pods with component_ref
This preserves the information about the digest of the component and the location from which the component was loaded.
* Fixed compiler tests
* SDK - Compiler - Fixed ParallelFor name clashes
The ParallelFor argument reference resolving was really broken.
The logic "worked" like this - of the name of the referenced output
contained the name of the loop collection source output, then it was
considered to be the reference to the loop item.
This broke lots of scenarios especially in cases where there were
multiple components with same output name (e.g. the default "Output"
output name). The logic also did not distinguish between references to
the loop collection item vs. references to the loop collection source
itself.
I've rewritten the argument resolving logic, to fix the issues.
* Argo cannot use {{item}} when withParams items are dicts
* Stabilize the loop template names
* Renamed the test case
* SDK - Refactoring - Split the K8sHelper class
One part was only used by container builder and provided higher-level API over K8s Client.
Another was used by the compiler and did not use the kubernetes library.
* Updated the license year.
* SDK - Improve errors when ContainerOp.output is unavailable
ContainerOp.output is only available when there is only one output.
Right now, when there are multiple outputs it just holds `None` instead of the a task output reference.
In this case however it's indistinguishable from just passing None argument.
This PR gives a quick fix to make accessing the nonexistent `.output` a compile-time error.
* Fixed the implementation and added tests
* Trigger retests
Currently most of the component spec information (original input/output names, types, descriptions) is lost during the compilation.
This change adds that data to the workflow as annotation.
This will make it possible for the Frontend (and other services) to get the full component specs information and use it to improve the UX.
* SDK - Compiler - Move volumes to templates
Argo v2.3.0+ supports per-template volume specs similiar to Kubernetes. Prior to version 2.3.0 Argo only supported workflow-level volume specs.
We had several outstanding issues caused by the need to put all volumes in the same place.
There was also the issue with input parameter reference placeholders in volume specifications which were placed outside their home templates declaring the inputs.
This change fixes those issues.
* Removed dead code line
* SDK - Compiler - Allow creating portable pipelines
This change allows directly passing the PipelineConf instance to compiler or launcher which makes it easier to create portable pipelines by allowing the environment-specific configuration to be directly passed to the environment-specific launcher.
Background:
PipelineConf holds all pipeline-level configuration including `op_transformers`, `image_pull_secrets` etc. Some of these are specific to particular execution environment (e.g. GCP secret or Argo artifact location or Kubernetes-specific options).
Previously, the only way to modify `PipelineConf` was to do it inside the piepline function. That tied the pipeline function to specific execution environment (e.g. GCP, Argo or Kubernetes)
Solution: This change allows directly passing the PipelineConf instance to compiler or launcher. This allows writing portable enlauncher and environment agnostic pipeline functions. All environment-specific configurations can be moved to launching stage.
Before:
```python
# Defining pipeline
def my_pipeline():
# portable pipeline code
dsl.get_pipeline_conf().add_op_transformer(gcp.use_gcp_secret('user-gcp-sa'))
# Launching pipeline
kfp.Clinet().create_run_from_pipeline_func(my_pipeline, arguments={})
```
After:
```python
# Defining pipeline
def my_pipeline():
# portable pipeline code
# Launching pipeline
pipeline_conf = dsl.PipelineConf()
pipeline_conf.add_op_transformer(gcp.use_gcp_secret('user-gcp-sa'))
kfp.Clinet().create_run_from_pipeline_func(my_pipeline, arguments={}, pipeline_conf=pipeline_conf)
```
After 2 *(launching same portable pipeline using different launchers):
```python
# Loading portable pipeline
from portable_pipeline import my_pipeline
# Launching pipeline on Kubeflow
pipeline_conf = dsl.PipelineConf()
pipeline_conf.add_op_transformer(gcp.use_gcp_secret('user-gcp-sa'))
kfp.Clinet().create_run_from_pipeline_func(my_pipeline, arguments={}, pipeline_conf=pipeline_conf)
# Launching pipeline on locally (not implemented yet)
kfp.run_pipeline_func_locally(my_pipeline, arguments={})
```
* Added parameter docstring
* SDK - Moved the _container_builder from kfp.compiler to kfp.containers
This only moves the files. The imports remain the same for now.
* Simplified the imports.
* SDK - Compiler - Fix large data passing
Stop outputting parameters unless they're consumed as parameters downstream.
This prevents the situaltion when component outputs a big file, but DSL compiler instructs Argo to pick it up as parameter (parameters only hold few kilobytes of data).
As byproduct, this change fixes some minor compiler data passing bugs where some parameters were being passed around, but never consumed (happened with `ResourceOp`, `dsl.Condition` and recursion).
* Replaced ... with `raise AssertionError`
* Fixed small bug
* Removed unused variables
* Fixed names of the mark_upstream_ios_of_* functions
* Fixed detection of parameter output references
* Fixed handling of volumes
Currently, the parameter output values are not saved to storage and their values are lost as soon as garbage collector removes the workflow object.
This change makes is so the parameter output values are persisted.
* SDK - Refactoring - Replaced the ParameterMeta class with InputSpec and OutputSpec
* SDK - Refactoring - Replaced the internal PipelineMeta class with ComponentSpec
* SDK - Refactoring - Replaced the internal ComponentMeta class with ComponentSpec
* SDK - Refactoring - Replaced the *Meta classes with the *Spec classes
Replaced the ComponentMeta class with ComponentSpec
Replaced the PipelineMeta class with ComponentSpec
Replaced the ParameterMeta class with InputSpec and OutputSpec
* Removed empty fields
* first working commit
* incrememtal commit
* in the middle of converting loop args constructor to accept pipeline param
* both cases working
* output works, passed doesn't
* about to redo compiler section
* rewrite draft done
* added withparam tests
* removed sdk/python/comp.yaml
* minor
* subvars work
* more tests
* removed unneeded artifact outputs from test yaml
* sort keys
* removed dead artifact code
* Refactor. Expose a public API to append pipeline param without interacting with dsl.Pipeline obj.
* Add unit test and fix.
* Fix docstring.
* Fix test
* Fix test
* Fix two nit problems
* Refactor
* SDK - Containers - Build container image from current environment
* Removed the ability to capture the active python environment (as requested by @hongye-sun)
* Added the type hint and docstring to for the return type.
* Renamed `build_image_from_env` function to `build_image_from_working_dir`
as requested by @hongye-sun
* Explained the function behavior in the documentation.
* Removed extra empty line
* Improved caching by copying python files only after installing python packages
* Made test more portable
* Added support for specifying the base_image
`kfp.containers.default_base_image = ...`
The image can also be a callable returning the image name.
* Renamed `get_python_image` to `get_python_image_for_current_version`
* Switched the default base image to Google Deep Learning container image as requested by @hongye-sun
The size of this image is 4.35GB which really concerns me. The GPU image size is 6.45GB.
* Stopped importing kfp.containers.* into kfp.*
* Fixed test
* Fixed the regex string
* Fixed the type annotation style
* Addressed @hongye-sun feedback
* Removed the container image size warning
* Fixed import failure
* Explicitly added mlpipeline outputs to the components that actually produce them
* Updated samples
* SDK - DSL - Stopped adding mlpipeline artifacts to every compiled template
Fixes https://github.com/kubeflow/pipelines/issues/1421
Fixes https://github.com/kubeflow/pipelines/issues/1422
* Updated the Lighweight sample
* Updated the compiler tests
* Fixed the lightweight sample
* Reverted the change to one contrib/samples/openvino
The sample will still work fine as it is now.
I'll add the change to that file as a separate PR.