* SDK - Refactoring - Passing the parameters explicitly in python_op.
This helps avoid problems when new parameters are added.
* SDK - Components - Added package installation support to func_to_container_op
Example:
```python
op = func_to_container_op(my_func, packages_to_install=['pandas==0.24'])
```
* Make pip quieter
* Added the test_packages_to_install_feature test
Fixed accessing inputs and outputs without checking for None.
Fixed case where the default value of graph component input has to be passed to component as an argument.
* SDK - Lightweight - Convert the names of file inputs and outputs
Removing the "_path" and "_file" suffixes from the names of file inputs and outputs.
Problem: When accepting file inputs (outputs), the function inside the component receives file paths (or file streams), so it's natural to call the function parameter "something_file_path" (e.g. model_file_path or number_file_path).
But from the outside perspective, there are no files or paths - the actual data objects (or references to them) are passed in.
It looks very strange when argument passing code looks like this: `component(number_file_path=42)`. This looks like an error since 42 is not a path. It's not even a string.
It's much more natural to strip the names of file inputs and outputs of "_file" or "_path" suffixes. Then the argument passing code will look natural: "component(number=42)"
* Removed the _FEATURE_STRIP_FILE_IO_NAME_PARTS feature switch
Problem: It's hard to distinguish components loaded by name (e.g. using `ComponentStore`) from components that were never loaded (e.g. just created from python function).
`component_ref.name` was previously being set, since it was a required parameter.
`component_ref.name` should only be set if component was loaded by name.
* SDK - Moved the _container_builder from kfp.compiler to kfp.containers
This only moves the files. The imports remain the same for now.
* Simplified the imports.
Lightweight components now allow function to mark some outputs that it wants to produce by writing data to files, not returning it as in-memory data objects.
This is useful when the data is expected to be big.
Example 1 (writing big amount of data to output file with provided path):
```python
@func_to_container_op
def write_big_data(big_file_path: OutputPath(str)):
with open(big_file_path) as big_file:
for i in range(1000000):
big_file.write('Hello world\n')
```
Example 2 (writing big amount of data to provided output file stream):
```python
@func_to_container_op
def write_big_data(big_file: OutputTextFile(str)):
for i in range(1000000):
big_file.write('Hello world\n')
```
Lightweight components now allow function to mark some inputs that it wants to consume as files, not as in-memory data objects.
This is useful when the data is expected to be big.
Example 1:
```python
def consume_big_file_path(big_file_path: InputPath(str)) -> int:
line_count = 0
with open(big_file_path) as f:
while f.readline():
line_count = line_count + 1
return line_count
```
Example 2:
```python
def consume_big_file(big_file: InputTextFile(str)) -> int:
line_count = 0
while big_file.readline():
line_count = line_count + 1
return line_count
```
* SDK - Tests - Added better helper functions for testing python components
* SDK - Python components - Properly serializing outputs
Background:
Component arguments are already properly serialized when calling the component program and then deserialized before the execution of the component function.
But the component outputs were only serialized using `str()` which is inadequate for data types like lists or dictionaries.
This commit fixes the mismatch - theoutputs are now serialized the same ways as arguments and default values.
* SDK - Compiler - Fix large data passing
Stop outputting parameters unless they're consumed as parameters downstream.
This prevents the situaltion when component outputs a big file, but DSL compiler instructs Argo to pick it up as parameter (parameters only hold few kilobytes of data).
As byproduct, this change fixes some minor compiler data passing bugs where some parameters were being passed around, but never consumed (happened with `ResourceOp`, `dsl.Condition` and recursion).
* Replaced ... with `raise AssertionError`
* Fixed small bug
* Removed unused variables
* Fixed names of the mark_upstream_ios_of_* functions
* Fixed detection of parameter output references
* Fixed handling of volumes
Two PRs have been merged that turned out to be slightly incompatible. This PR fixes the failing tests.
Root causes:
* The pipeline parameter default values were not properly serialized when constructing the metadata object.
* The `ParameterMeta` class did not validate the default value type, so the lack of serialization has not been caught. The `ParameterMeta` was replaced by `InputSpec` which has strict type validation.
* Previously we did not have samples with complex pipeline parameter default values (e.g. lists) that could trigger the failures. Then two samples were added that had complex default values.
* Travis does not re-run tests before merging
* Prow does not re-run Travis tests before merging
Currently, the parameter output values are not saved to storage and their values are lost as soon as garbage collector removes the workflow object.
This change makes is so the parameter output values are persisted.
* SDK - Refactoring - Replaced the ParameterMeta class with InputSpec and OutputSpec
* SDK - Refactoring - Replaced the internal PipelineMeta class with ComponentSpec
* SDK - Refactoring - Replaced the internal ComponentMeta class with ComponentSpec
* SDK - Refactoring - Replaced the *Meta classes with the *Spec classes
Replaced the ComponentMeta class with ComponentSpec
Replaced the PipelineMeta class with ComponentSpec
Replaced the ParameterMeta class with InputSpec and OutputSpec
* Removed empty fields
* first working commit
* incrememtal commit
* in the middle of converting loop args constructor to accept pipeline param
* both cases working
* output works, passed doesn't
* about to redo compiler section
* rewrite draft done
* added withparam tests
* removed sdk/python/comp.yaml
* minor
* subvars work
* more tests
* removed unneeded artifact outputs from test yaml
* sort keys
* removed dead artifact code
* Refactor. Expose a public API to append pipeline param without interacting with dsl.Pipeline obj.
* Add unit test and fix.
* Fix docstring.
* Fix test
* Fix test
* Fix two nit problems
* Refactor
* SDK - Testing - Run some unit-tests in a more correct way
Replaced `@unittest.expectedFailure` with `with self.assertRaises(...):`.
Replaced `assert` with `self.assertEqual(...)`.
Stopped producing the stray "comp.yaml" file.
Enabled the test_load_component_from_url test.
* Removed a stray comment
* Addded two tests for output_component_file
* SDK - Containers - Build container image from current environment
* Removed the ability to capture the active python environment (as requested by @hongye-sun)
* Added the type hint and docstring to for the return type.
* Renamed `build_image_from_env` function to `build_image_from_working_dir`
as requested by @hongye-sun
* Explained the function behavior in the documentation.
* Removed extra empty line
* Improved caching by copying python files only after installing python packages
* Made test more portable
* Added support for specifying the base_image
`kfp.containers.default_base_image = ...`
The image can also be a callable returning the image name.
* Renamed `get_python_image` to `get_python_image_for_current_version`
* Switched the default base image to Google Deep Learning container image as requested by @hongye-sun
The size of this image is 4.35GB which really concerns me. The GPU image size is 6.45GB.
* Stopped importing kfp.containers.* into kfp.*
* Fixed test
* Fixed the regex string
* Fixed the type annotation style
* Addressed @hongye-sun feedback
* Removed the container image size warning
* Fixed import failure
* SDK - Components - Hiding signature attribute from CloudPickle
Cloudpickle has some issues with pickling type annotations in python versions < 3.7, so they disabled it. https://github.com/cloudpipe/cloudpickle/issues/196
`create component_from_airflow_op` spoofs the function signature by setting the `func.__signature__` attribute. cloudpickle then tries to pickle that attribute which leads to failures during unpickling.
To prevent this we remove the `.__signature__` attribute before pickling.
* Added comments
# Hack to prevent cloudpickle from trying to pickle generic types that might be present in the signature. See https://github.com/cloudpipe/cloudpickle/issues/196
# Currently the __signature__ is only set by Airflow components as a means to spoof/pass the function signature to _func_to_component_spec
* Explicitly added mlpipeline outputs to the components that actually produce them
* Updated samples
* SDK - DSL - Stopped adding mlpipeline artifacts to every compiled template
Fixes https://github.com/kubeflow/pipelines/issues/1421
Fixes https://github.com/kubeflow/pipelines/issues/1422
* Updated the Lighweight sample
* Updated the compiler tests
* Fixed the lightweight sample
* Reverted the change to one contrib/samples/openvino
The sample will still work fine as it is now.
I'll add the change to that file as a separate PR.
If no `name` is provided to PipelineVolume constructor, a custom name is
generated. It relies on `json.dumps()` of the struct after getting
converted to dict.
When `pvc` is provided and `name` is not, the following error is raised:
TypeError: Object of type PipelineParam is not JSON serializable
This commit fixes it and extends tests to catch it.