Commit Graph

172 Commits

Author SHA1 Message Date
Alexey Volkov 578d8de91d
SDK - Reduce python component limitations - no import errors for cust… (#3106)
* SDK - Reduce python component limitations - no import errors for custom type annotations

By default, create_component_from_func copies the source code of the function and creates a component using that source code. No global imports are captured. This is problematic for the function definition, since any annotation, that uses a type that needs to be imported, will cause error. There were some special provisions for
NamedTuple,  InputPath and OutputPath, but even they were brittle (for example, "typing.NamedTuple" or "components.InputPath" annotations still caused failures at runtime).

This commit fixes the issue by stripping the type annotations from function declarations.

Fixes cases that were failing before:

```python
import typing
import collections

MyFuncOutputs = typing.NamedTuple('Outputs', [('sum', int), ('product', int)])

@create_component_from_func
def my_func(
    param1: CustomType,  # This caused failure previously
    param2: collections.OrderedDict,  # This caused failure previously
) -> MyFuncOutputs: # This caused failure previously
    pass
```

* Fixed the compiler tests

* Fixed crashes on print function

Code `print(line, end="")` was causing error: "lib2to3.pgen2.parse.ParseError: bad input: type=22, value='=', context=('', (2, 15))"

* Using the strip_hints library to strip the annotations

* Updating test workflow yamls

* Workaround for bug in untokenize

* Switched to the new strip_string_to_string method

* Fixed typo.

Co-Authored-By: Jiaxiao Zheng <jxzheng@google.com>

Co-authored-by: Jiaxiao Zheng <jxzheng@google.com>
2020-02-24 20:50:48 -08:00
Alexey Volkov 2dee255643
SDK - Fix SDK on Python 3.8 (#3126)
* SDK - Fix SDK on Python 3.8

Fixes the follwoing error: "TypeError: code() takes at least 14 arguments (13 given)".

The cause of the issue is a breaking change in CodeType constructor in Python 3.8.
https://bugs.python.org/issue37221
This should have been fixed by https://github.com/python/cpython/pull/13959 and https://github.com/python/cpython/pull/14505, but the code still fails.

* Simplified the replace call
2020-02-24 10:22:48 -08:00
Alexey Volkov 7ee3244f5b
SDK - Components - Fixed dict-style type annotations (#3107)
Refactored `_data_passing.py` interface to expose functions instead of dictionaries.
2020-02-18 20:40:25 -08:00
Alexey Volkov 9b8e14cd9f
SDK - Components - create_graph_component_from_pipeline_func now returns a function (#2971) 2020-02-08 21:17:52 -08:00
Alexey Volkov c83aff2738
SDK - Components - Made it easier to access component spec classes (#2860)
* SDK - Components - Made it easier to access component spec classes

* Updated the imports
2020-01-31 11:41:21 -08:00
Alexey Volkov 6c72cc874a SDK - Components - Added the create_component_from_func alias (#2911)
Added the `create_component_from_func` function as alias for `func_to_container_op`.
It behaves exactly the same, but the name now does not imply that you'll always get `ContainerOp` from it.
Some function parameters are not added at this moment as they're not widely used and might be deprecated in the future.
2020-01-27 17:41:38 -08:00
Alexey Volkov 2d9f2524c1 SDK - Components refactoring (#2865)
* SDK - Components refactoring

This change is a pure refactoring of the implementation of component task creation.
For pipelines compiled using the DSL compiler (the compile() function or the command-line program) nothing should change.

The main goal of the refactoring is to change the way the component instantiation can be customized.
Previously, the flow was like this:

`ComponentSpec` + arguments --> `TaskSpec` --resolving+transform--> `ContainerOp`

This PR changes it to more direct path:

`ComponentSpec` + arguments --constructor--> `ContainerOp`
or
`ComponentSpec` + arguments --constructor--> `TaskSpec`
or
`ComponentSpec` + arguments --constructor--> `SomeCustomTask`

The original approach where the flow always passes through `TaskSpec` had some issues since TaskSpec only accepts string arguments (and two
other reference classes). This made it harder to handle custom types of arguments like PipelineParam or Channel.

Low-level refactoring changes:

Resolving of command-line argument placeholders has been extracted into a function usable by different task constructors.

Changed `_components._created_task_transformation_handler` to `_components._container_task_constructor`. Previously, the handler was receiving a `TaskSpec` instance. Now it receives `ComponentSpec` + arguments [+ `ComponentReference`].
Moved the `ContainerOp` construction handler setup to the `kfp.dsl.Pipeline` context class as planned.
Extracted `TaskSpec` creation to `_components._create_task_spec_from_component_and_arguments`.
Refactored `_dsl_bridge.create_container_op_from_task` to `_components._resolve_command_line_and_paths` which returns `_ResolvedCommandLineAndPaths`.
Renamed `_dsl_bridge._create_container_op_from_resolved_task` to `_dsl_bridge._create_container_op_from_component_and_arguments`.
The signature of `_components._resolve_graph_task` was changed and it now returns `_ResolvedGraphTask` instead of modified `TaskSpec`.

Some of the component tests still expect ContainerOp and its attributes.
These tests will be changed later.

* Adapted the _python_op tests

* Fixed linter failure

I do not want to add any top-level kfp imports in this file to prevent circular references.

* Added docstrings

* FIxed the return type forward reference
2020-01-25 08:39:01 -08:00
Alexey Volkov 27f7e77356 SDK - Unified the function signature parsing implementations (#2689)
* Replaced `_instance_to_dict(obj)` with `obj.to_dict()`

* Fixed the capitalization in _python_function_name_to_component_name
It now only changes the case of the first letter.

* Replaced the _extract_component_metadata function with _extract_component_interface

* Stopped adding newline to the component description.

* Handling None inputs and outputs

* Not including emply inputs and outputs in component spec

* Renamed the private attributes that the @pipeline decorator sets

* Changged _extract_pipeline_metadata to use _extract_component_interface

* Fixed issues based on feedback
2019-12-27 10:05:40 -08:00
Alexey Volkov 605ef804c6 Fixed the capitalization in _python_function_name_to_component_name (#2688)
It now only changes the case of the first letter.
2019-12-10 12:00:11 -08:00
Alexey Volkov 33fa8eb7b4 SDK - Protobuf version of the component schema (#2636) 2019-11-21 12:01:29 -08:00
Alexey Volkov da5cbb82a7 SDK/Components - Added Json Schema spec for the component format (#669)
* SDK/Components - Added JsonSchema spec for the component format
Added the schema outline

* Fixed missing "required"

* Replaced PrimitiveTypes with just string

* Renamed CommandlineArgumentType to StringOrPlaceholder
Made more ContainerSpec properties support placeholders

* Removed support for type inheritance and generic types as requested by Ning and Ajay

* Some people are scared of graphs/pipelines - removing them

* Some people do not like optional inputs and conditionals - removing them
Sorry, Yasser and Bradley.

* Some people might be scared of predicates or conditional execution - removing them
Sure, Argo and DSL supports it, but some people care more about the spec file size even when that means dropping already supported features.
Sorry, Bradley and Riley.

* Reverting the last 4 commits

Making those big compromises did not have any noticeable effect on people asking for them.

* Removed list-style type specifications

We've standardized on the map-style specification.

* Renamed TypeType to TypeSpecType

* Updated the structure of graphInput

* Added the type attribute to taskOutput and graphInput

* Updated the execution options structure

* Using the official Kubernetes PodSpec schema instead of Argo's subset
2019-11-12 15:50:11 -08:00
Alexey Volkov 61506a0e88 SDK - Components - Fixed YAML formatting for some components (#2529)
* SDK - Components - Fixed YAML formatting for some components

This fixes formatting for components where function does not have a return annotation.
The low-level cause of issue: Trailing whitespace when there are no serializers.
Trailing whitespace triggers ugly YAML string formatting.

* Addressed feedback
2019-11-07 14:48:19 -08:00
Alexey Volkov 1282f16335 SDK - Python components - Fixed bug when mixing file outputs with return value outputs (#2473) 2019-10-23 19:45:05 -07:00
Alexey Volkov 681d873fc7 SDK - Components - Added type to graph input references (#2451)
This makes the graph input references consistent with task output references.
This is a breaking change, but the graph components are not exposed in the documentation or samples yet.
2019-10-23 17:03:05 -07:00
Alexey Volkov f4d689b4ed SDK - Python components - Fixed handling multiline decorators (#2345)
* SDK - Python components - Fixed handling multiline decorators

* Switched to using dedent

* Added error checking

* Testing multiline decorator

* Test calling the component created from decorated function

Also fixed `helper_test_component_against_func_using_local_call`.
2019-10-16 12:17:29 -07:00
Alexey Volkov ee527c8ad4 SDK - Components - Restored attribute order when generating component.yaml files (#2262)
This makes the generated files more readable.
The attributes were properly ordered before, but the ordering broke when the `.to_dict` methods started outputting `dict` instead of `OrderedDict`.
Also fixed the existing generated `component.yaml` files.
2019-10-07 18:33:26 -07:00
Alexey Volkov 8b0cb8a5b5 SDK - Components - Deprecate the get and set methods for default image in favor of plain variable (#2257) 2019-10-04 15:35:12 -07:00
Alexey Volkov b2f1d0071f SDK - Components - Added the ComponentSpec.save method (#2264)
* SDK - Components - Added the ComponentSpec.save method

* Fixed write call
2019-10-03 15:25:55 -07:00
Alexey Volkov 052a6ac0ce SDK - Components - Reorganized TaskSpec execution options (#2270)
This part of the spec was unused, so this is not a breaking change.
Consolidating Kubernetes-related options under a single attribute: `TaskSpec.execution_options.kubernetes_options`.
`TaskSpec.k8s_container_options` -> `TaskSpec.execution_options.kubernetes_options.main_container`
`TaskSpec.k8s_pod_options.spec` -> `TaskSpec.execution_options.kubernetes_options.pod_spec`
Added `TaskSpec.execution_options.retry_strategy.max_tetries` attribute.
2019-10-02 18:44:08 -07:00
Alexey Volkov be4f5851ed SDK - Components - Creating graph components from python pipeline function (#2273)
* SDK/Components - Creating graph components from python pipeline function

`create_graph_component_from_pipeline_func` converts python pipeline function to a graph component object that can be saved, shared, composed or submitted for execution.

Example:

    producer_op = load_component(component_with_0_inputs_and_2_outputs)
    processor_op = load_component(component_with_2_inputs_and_2_outputs)

    def pipeline1(pipeline_param_1: int):
        producer_task = producer_op()
        processor_task = processor_op(pipeline_param_1, producer_task.outputs['Output 2'])

        return OrderedDict([
            ('Pipeline output 1', producer_task.outputs['Output 1']),
            ('Pipeline output 2', processor_task.outputs['Output 2']),
        ])

    graph_component = create_graph_component_from_pipeline_func(pipeline1)

* Changed the signatures of exported functions

Non-public create_graph_component_spec_from_pipeline_func creates ComponentSpec
Public create_graph_component_from_pipeline_func creates component and writes it to file.

* Switched to using _extract_component_interface to analyze function signature

Stopped humanizing the input names for now. I think it's benefitial to extract interface from function signature the same way for both container and graph python components.

* Support outputs declared using pipeline function's return annotation

* Cleaned up the test

* Stop including the whole parent tasks in task output references

* By default, do not include task component specs in the graph component

Remove the component spec from component reference unless it will make the reference empty or unless explicitly asked by the user

* Exported the create_graph_component_from_pipeline_func function

* Fixed imports

* Updated the copyright year.
2019-10-02 16:20:07 -07:00
Alexey Volkov 8f4f7bc8b6 SDK - Components - Verify the object type when serializing primitive arguments (#2272)
* SDK - Components - Verify the object type when serializing primitive arguments

Fixes an issue where if an input had a primitive type (e.g. `Integer`), you could pass anything to it (e.g. booleans, `ContainerOp`s, functions etc), because it just used `str` as serializer. Now the serializers chack the value type and raise error if the type is incorrect.

* Allow serializing integer when float is required.
2019-10-01 11:55:35 -07:00
Alexey Volkov c676b838ef SDK - Lightweight - Added package installation support to func_to_container_op (#2245)
* SDK - Refactoring - Passing the parameters explicitly in python_op.
This helps avoid problems when new parameters are added.

* SDK - Components - Added package installation support to func_to_container_op

Example:
```python
op = func_to_container_op(my_func, packages_to_install=['pandas==0.24'])
```

* Make pip quieter

* Added the test_packages_to_install_feature test
2019-09-30 19:13:32 -07:00
Alexey Volkov 646c2890de SDK - Components - Fixed small bugs in graph component resolving (#2269)
Fixed accessing inputs and outputs without checking for None.
Fixed case where the default value of graph component input has to be passed to component as an argument.
2019-09-30 18:33:32 -07:00
Alexey Volkov 06f9322a78 SDK - Lightweight - Convert the names of file inputs and outputs (#2260)
* SDK - Lightweight - Convert the names of file inputs and outputs

Removing the "_path" and "_file" suffixes from the names of file inputs and outputs.
Problem: When accepting file inputs (outputs), the function inside the component receives file paths (or file streams), so it's natural to call the function parameter "something_file_path" (e.g. model_file_path or number_file_path).
But from the outside perspective, there are no files or paths - the actual data objects (or references to them) are passed in.
It looks very strange when argument passing code looks like this: `component(number_file_path=42)`. This looks like an error since 42 is not a path. It's not even a string.
It's much more natural to strip the names of file inputs and outputs of "_file" or "_path" suffixes. Then the argument passing code will look natural: "component(number=42)"

* Removed the _FEATURE_STRIP_FILE_IO_NAME_PARTS feature switch
2019-09-30 16:35:32 -07:00
Alexey Volkov 2f0f1e47a2 SDK - Components - Stop setting component_ref.name to component name (#2265)
Problem: It's hard to distinguish components loaded by name (e.g. using `ComponentStore`) from components that were never loaded (e.g. just created from python function).
`component_ref.name` was previously being set, since it was a required parameter.
`component_ref.name` should only be set if component was loaded by name.
2019-09-30 15:37:32 -07:00
Alexey Volkov 7735a14694 SDK - Components - Stop serializing string values (#2227)
This can happen with Lightweight component outputs if they've already been serialized manually.
2019-09-25 20:29:06 -07:00
Alexey Volkov 98fd6c8c32 SDK - Components - Fixed serialization of lists and dicts containing `PipelineParam` items (#2212)
Fixes https://github.com/kubeflow/pipelines/issues/2206
The issue is fixed for both `JSON`-based and `str()`-based serialization.
2019-09-24 19:43:59 -07:00
Alexey Volkov 3caba4e06f SDK - Lightweight - Added support for file outputs (#2221)
Lightweight components now allow function to mark some outputs that it wants to produce by writing data to files, not returning it as in-memory data objects.
This is useful when the data is expected to be big.

Example 1 (writing big amount of data to output file with provided path):
```python
@func_to_container_op
def write_big_data(big_file_path: OutputPath(str)):
    with open(big_file_path) as big_file:
        for i in range(1000000):
            big_file.write('Hello world\n')

```
Example 2 (writing big amount of data to provided output file stream):
```python
@func_to_container_op
def write_big_data(big_file: OutputTextFile(str)):
    for i in range(1000000):
        big_file.write('Hello world\n')
```
2019-09-24 18:11:58 -07:00
Alexey Volkov 2510a690f2 SDK - Lightweight - Added support for file inputs (#2207)
Lightweight components now allow function to mark some inputs that it wants to consume as files, not as in-memory data objects.
This is useful when the data is expected to be big.

Example 1:
```python
def consume_big_file_path(big_file_path: InputPath(str)) -> int:
    line_count = 0
    with open(big_file_path) as f:
        while f.readline():
            line_count = line_count + 1
    return line_count
```
Example 2:
```python
def consume_big_file(big_file: InputTextFile(str)) -> int:
    line_count = 0
    while big_file.readline():
        line_count = line_count + 1
    return line_count
```
2019-09-23 17:59:25 -07:00
Alexey Volkov 1c287f2f89 SDK - Components - Simplified arg-parsing code using argparse.SUPPRESS (#2193) 2019-09-23 13:45:24 -07:00
Alexey Volkov c914df542c SDK - Python components - Properly serializing outputs (#2198)
* SDK - Tests - Added better helper functions for testing python components

* SDK - Python components - Properly serializing outputs
Background:
Component arguments are already properly serialized when calling the component program and then deserialized before the execution of the component function.
But the component outputs were only serialized using `str()` which is inadequate for data types like lists or dictionaries.

This commit fixes the mismatch - theoutputs are now serialized the same ways as arguments and default values.
2019-09-23 12:29:33 -07:00
Alexey Volkov db6625ff96 SDK - Removed some dead code (#2194) 2019-09-23 12:29:25 -07:00
Alexey Volkov e420940d67 SDK - Components - Fixed the output types for outputs with converted names (#2162)
Fixes https://github.com/kubeflow/pipelines/issues/2130
2019-09-18 20:53:00 -07:00
Alexey Volkov 0e2bf15dbc
SDK - Refactoring - Replaced the *Meta classes with the *Spec classes (#1944)
* SDK - Refactoring - Replaced the ParameterMeta class with InputSpec and OutputSpec

* SDK - Refactoring - Replaced the internal PipelineMeta class with ComponentSpec

* SDK - Refactoring - Replaced the internal ComponentMeta class with ComponentSpec

* SDK - Refactoring - Replaced the *Meta classes with the *Spec classes

Replaced the ComponentMeta class with ComponentSpec
Replaced the PipelineMeta class with ComponentSpec
Replaced the ParameterMeta class with InputSpec and OutputSpec

* Removed empty fields
2019-09-16 18:41:12 -07:00
Alexey Volkov c4c0bb8202 SDK - Components - Fixed kfp.components.set_default_base_image (#2118) 2019-09-16 15:30:26 -07:00
Alexey Volkov 647867bde1 SDK - Python components - Fixed the default base_image handling (#2119)
In python the default parameter values are only evaluated once.
2019-09-16 13:42:38 -07:00
Alexey Volkov 77d0ee014e SDK - Lightweigh - Made wrapper code compatible with python2 (#2035) 2019-09-13 16:44:40 -07:00
Alexey Volkov 6c15f27f7e SDK - Components - Hiding signature attribute from CloudPickle (#2045)
* SDK - Components - Hiding signature attribute from CloudPickle

Cloudpickle has some issues with pickling type annotations in python versions < 3.7, so they disabled it. https://github.com/cloudpipe/cloudpickle/issues/196
`create component_from_airflow_op` spoofs the function signature by setting the `func.__signature__` attribute. cloudpickle then tries to pickle that attribute which leads to failures during unpickling.
To prevent this we remove the `.__signature__` attribute before pickling.

* Added comments

        # Hack to prevent cloudpickle from trying to pickle generic types that might be present in the signature. See https://github.com/cloudpipe/cloudpickle/issues/196 
        # Currently the __signature__ is only set by Airflow components as a means to spoof/pass the function signature to _func_to_component_spec
2019-09-06 11:12:15 -07:00
Alexey Volkov e54fe67543 SDK - Components - Added type to TaskOutputReference (#1995)
* SDK - Components - Added type to TaskOutputReference
Now the task output references taken from TaskSpec instances can be
type-checked when passed to components.

* Renamed TypeType to TypeSpecType
2019-08-30 16:33:50 -07:00
Alexey Volkov efe9d87b31 SDK - Components - Enable loading graph components (#2010)
The graph components are now correctly loaded and instantiated.
Also added pre-configured ComponentStore.default_store
2019-08-30 15:06:03 -07:00
Alexey Volkov f5b2f24e06 SDK - Components - Added component properties to the task factory function (#1771)
Problem: When the user loads component using the load_component function, the object they get back is a task factory function. Since it's a normal function object, the user cannot inspect any of the attributes of the component they just loaded (they can only see the name, description and input names). For example, the user cannot see the list of component outputs, the annotations etc.

This change fixes the issue by adding the original component properties to the function object.

Example usage:

```python
train_op = load_component_from_url(...)
print(train_op.outputs)
```
2019-08-29 20:49:30 -07:00
Alexey Volkov d43de167df SDK - Components - Added output references to TaskSpec (#1991)
Also added TaskSpec.task and ComponentReference.spec attributes
2019-08-29 15:28:58 -07:00
Alexey Volkov 0fc68bbdd4 SDK - Added support for raw input artifact argument values to ContainerOp (#791)
* SDK - Added support for raw artifact values to ContainerOp

* `ContainerOp` now gets artifact artguments from command line instead of the constructor.

* Added back input_artifact_arguments to the ContainerOp constructor.
In some scenarios it's hard to provide the artifact arguments through the `command` list when it already has resolved artifact paths.

* Exporting InputArtifactArgument from kfp.dsl

* Updated the sample

* Properly passing artifact arguments as task arguments
as opposed to default input values.

* Renamed input_artifact_arguments to artifact_arguments to reduce confusion

* Renamed InputArtifactArgument to InputArgumentPath
Also renamed input_artifact_arguments to artifact_argument_paths in the ContainerOp's constructor

* Replaced getattr with isinstance checks.
getattr is too fragile and can be broken by renames.

* Fixed the type annotations

* Unlocked the input artifact support in components
Added the test_input_path_placeholder_with_constant_argument test
2019-08-28 21:09:57 -07:00
Alexey Volkov 4cbfdd8e1f SDK - Components - Only yaml component files can be used as source (#1966)
Previously, if the file was a .zip archive, some functions like exception printing would fail as it's not a text file.
2019-08-27 15:23:09 -07:00
Alexey Volkov 3a30b2bdcf SDK - Veryfying that the serializer returns string (#1965)
This change was prompted by the failure when b64encode was returning bytes instead of str.
2019-08-27 13:21:12 -07:00
Alexey Volkov d043d165a9 SDK - Components - Add support for the Base64Pickle type (#1946)
* SDK - Components - Add support for the Base64Pickle type

* Make flake8 happy
2019-08-26 18:56:37 -07:00
Alexey Volkov 5dbea6cb91 SDK - Components - Setting default base image or image factory (#1937)
Added kfp.components.set_default_base_image which sets the name of the container image that will be used for component creation when base_image is not specified.
Alternatively, the base image can also be set to a factory function that will be returning the image.

The support is added for both Lightweight components and python container components.
2019-08-26 17:48:40 -07:00
Alexey Volkov e48d563cb9 SDK - Components - Add support for the List, Dict and Json types (#1945) 2019-08-23 20:12:26 -07:00
Alexey Volkov 11de563852 SDK - Components - Add support for the Boolean type (#1936)
Fixes https://github.com/kubeflow/pipelines/issues/1488
2019-08-23 19:00:26 -07:00
Alexey Volkov 17e18a162e SDK - Components - Improved serialization and deserialization of arguments and defaults (#1934)
* SDK - Components - Improved serialization and deserialization of arguments and defaults

Properly serialize default values and passed arguments using the same code.
Check the types of passed argument values and issue warnings.
Improved argument reference type compatibility checking. When types do not match there is always either error or warning.
When creating component from python function, the input types are now canonicalized.

* Addressed the feedback
2019-08-23 18:18:25 -07:00
Alexey Volkov c01315a89d
SDK - Refactoring - Replaced the TypeMeta class (#1930)
* SDK - Refactoring - Replaced the TypeMeta class
The PipelineParam no longer exposes the private TypeMeta class
Fixes #1420

The refactoring PR is part of a series of PR which unifies the metadata and specification types.
2019-08-22 15:31:24 -07:00
Alexey Volkov 553885ffb1
SDK - Components - Fixed ModelBase comparison bug (#1874) 2019-08-21 16:38:12 -07:00
Alexey Volkov 203307dbaf
SDK - Lightweight - Fixed custom types in multi-output case (#1875)
The type was mistakenly serialized as `_ForwardRef('CustomType')`.
The input parameter types and single-output types were not affected.
2019-08-21 16:37:21 -07:00
Alexey Volkov 9adf16301d
SDK - Airflow - Fixed bug in airflow op creation (#1911)
This PR fixes a bug in AirFlow op creation.
The `_run_airflow_op` helper function was not captured along with the `_run_airflow_op_closure` function, because they belong to different modules (`_run_airflow_op_closure` was module-less).
This was not discovered during the notebook testing of the code since in that environment the `_run_airflow_op` was also module-less as it was defined in a notebook (not in .py file).
2019-08-21 16:29:54 -07:00
Christian Clauss 8e1e823139 Lint Python code for undefined names (#1721)
* Lint Python code for undefined names

* Lint Python code for undefined names

* Exclude tfdv.py to workaround an overzealous pytest

* Fixup for tfdv.py

* Fixup for tfdv.py

* Fixup for tfdv.py
2019-08-21 15:04:31 -07:00
Alexey Volkov 7917ea475e SDK - Lightweight - Added support for complex default values (#1696) 2019-08-12 02:35:13 -07:00
Alexey Volkov dd59bc2597 SDK - Lightweight - Fixed regression for components without outputs (#1726) 2019-08-05 21:47:53 -07:00
Alexey Volkov a7635f1cd4 SDK - Using Airflow ops in Pipelines (#1483)
* SDK - Using Airflow ops in Pipelines

* Documented the create_component_from_airflow_op function

* Need to set use_code_pickling=True now

* Using the original operator name as the component name

* Filtering out `*args` and `**kwargs` parameters that some operators have

* Fixed the function call

* Changed the default airflow base image
Airflow has removed most of the old images and tags.
See https://issues.apache.org/jira/browse/AIRFLOW-5093 and  https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-10+Multi-layered+and+multi-stage+official+Airflow+CI+image#AIP-10Multi-layeredandmulti-stageofficialAirflowCIimage-ProposedsetupoftheDockerHubandTravisCI .
2019-08-02 19:53:52 -07:00
Alexey Volkov 94969d6264 SDK/Lightweight - Updated default image to tensorflow:1.13.2-py3 (#1671) 2019-08-02 18:31:53 -07:00
Alexey Volkov f6cf9c5f55 SDK - Lightweight - Added support for "None" default values (#1626)
* SDK - Lightweight - Added support for "None" default values
Previously it was impossible to pass None to components since it was being converted to the string "None".

* is_required = not input.optional for now
As asked by @gaoning777
2019-07-25 18:49:59 -07:00
Alexey Volkov 3aeab312f2 SDK/Lightweight - Use argparse for command-line parsing (#1534)
It's required to correctly handle None arguments or None default values (also needed for optional and variable-number inputs).
It's easier to understand and generates better command-line code.
2019-06-23 16:45:53 -07:00
Alexey Volkov ce8df162a9 SDK/Lightweight - Added python version compatibility checks (#1524)
* SDK - Refactored the code in kfp.components._python_op._capture_function_code_using_cloudpickle

* SDK/Lightweight - Added python version compatibility checks

See my compatibility analysis: https://github.com/cloudpipe/cloudpickle/issues/293
2019-06-23 14:41:54 -07:00
Alexey Volkov 94f793c64a SDK - Generated paths will be in /tmp by default (#1531)
This makes them more compatible with images that have non-root user
2019-06-20 18:04:35 -07:00
Alexey Volkov 627b412f24 SDK/Lightweight - Disabled code pickling by default (#1512)
I've introduced code pickling to capture dependencies in https://github.com/kubeflow/pipelines/pull/1372
Later I've discovered that there is a serious opcode incompatibility between python versions 3.5 and 3.6+. See my analysis of the issue: https://github.com/cloudpipe/cloudpickle/issues/293

Dues to this issue I decided to switch back to using source code copying by default and to continue improving it.

Until we stop supporting python 3.5 (https://github.com/kubeflow/pipelines/pull/668) it's too dangerous to use code pickling by default.

Code pickling can be enabled by specifying `pickle_code=True` when calling `func_to_container_op`
2019-06-18 19:44:30 -07:00
Alexey Volkov b935836c30 SDK/Lightweight - Enable cloudpickle installation from non-root users (#1511) 2019-06-17 18:56:15 -07:00
Alexey Volkov e90085ecb3 SDK - Refactored _func_to_component_spec to split code generation from signature analysis (#1334)
* SDK - Refactored _func_to_component_spec to split out the function signature analyzer

* Renamed function to _extract_component_interface
2019-06-17 18:02:16 -07:00
Alexey Volkov aee1b5e2e5 SDK - Improving python component logs by making stdout and stderr unbuffered (#1510)
Without this the output and error lines can be printed in wrong order and sometimes not printed at all.
2019-06-14 00:20:20 -07:00
Krassimir Valev 8938669d7d Base64 encode the pickled code (#1476)
Due to its nature, Argo will replace any strings it encounters
that are enclosed in double curly braces, which will make the code
non-executable. To workaround this, the code is encoded in the Argo
yaml template and decoded on the fly, before the execution.
2019-06-13 23:30:25 -07:00
Alexey Volkov d724a4b68d SDK - Controlling which modules are captured with Lightweight components (#1435)
* SDK - Controlling which modules are captured with Lightweight components

All func_to_* functions now accept the modules_to_capture parameter: List of module names that will be captured (instead of just referencing) during the dependency scan. By default the func.__module__ is captured.

* Described the behavior more in depth.

* Added a test to check that only dependencies are captured
2019-06-07 18:47:06 -07:00
Alexey Volkov ab97d5708d SDK - Only install cloudpickle if it's not available (#1434)
This makes unit tests much faster.
Also:
Pined the version to 1.1.1.
Made the installation quiet.
2019-06-04 17:57:53 -07:00
Alexey Volkov 16213ba62d SDK - Dynamically installing cloudpickle module (#1429)
Fixes https://github.com/kubeflow/pipelines/issues/1426
2019-06-03 16:45:53 -07:00
Alexey Volkov 9a1d47a185 SDK - Capturing function dependencies when creating lightweight components (#1372)
* Transitively capturing code dependencies
Using cloudpickle.

* Got rid of func_type_declarations_code variable

* Extracted the function code extraction functions

* Improved support for capturing module-level dependencies

* Added test for capturing module-level dependencies

* Removed the _capture_function_code_using_source_copy function
As requested by Ning
2019-05-28 18:18:18 -07:00
Alexey Volkov a41bd106a1 SDK - Removing unneeded uses of dsl.Pipeline (#1229)
* SDK - Removing unneeded usages of dsl.Pipeline

* Fixed the naming-related issue
2019-05-14 18:48:18 -07:00
Alexey Volkov b61bef04a3 SDK - Renamed ModelBase.from_struct/to_struct to from_dict/to_dict (#1290) 2019-05-07 14:06:35 -07:00
Alexey Volkov 819d91d2f1 Retaining the component url, digest or tag when loading (#1090) 2019-05-03 16:55:38 -07:00
Alexey Volkov f40a22a3f4 SDK - Made ComponentSpec.implementation field optional (#1188)
* SDK - Made ComponentSpec.implementation field optional
Improved the error message when trying to convert tasks to ContainerOp.

* Switched from attribute checking to type checking
2019-04-24 12:54:46 -07:00
Alexey Volkov 6920aceeba SDK - Removed SourceSpec structure (#1119)
It has never been used and ComponentSpec.metadata.annotations['source'] is a better place for such metadata.
2019-04-24 12:06:26 -07:00
Alexey Volkov 848d4fb99c SDK - Replaced insecure yaml.load with yaml.safe_load (#1170)
This improves security and gets rid of security warnings.
See https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation
2019-04-23 15:26:00 -07:00
Alexey Volkov 929ff52fd2 Passing the annotations and labels to the ContainerOp (#1077)
Currently the annotations and labels are not passed from component to the ContainerOp. This PR fixes that.

Fixes https://github.com/kubeflow/pipelines/issues/1013
2019-04-08 22:03:05 -07:00
Alexey Volkov 291691a9f9 SDK/Components - Handling public GCS URIs in load_component (#1057) 2019-03-28 15:55:56 -07:00
Eterna2 825f64d672 Feature: sidecar for ContainerOp (#879)
* Feature: sidecar for ContainerOp

* replace f-string with string format for compatibility with py3.5

* ContainerOp now can be updated with any k8s V1Container attributes as well as sidecars with Sidecar class. ContainerOp accepts PipelineParam in any valid k8 properties.

* WIP: fix conflicts and bugs with recent master. TODO: more complex template with pipeline params

* fix proxy args

* Fixed to work with latest master head

* Added container_kwargs to ContainerOp to pass in k8s container kwargs

* Fix comment bug, updated with example in ContainerOp docstring

* fix copyright year

* expose match_serialized_pipelineparam as public for compiler to process serialized pipeline params

* fixed pydoc example and removed unnecessary ContainerOp.container.parent

* Fix conflicts in compiler tests
2019-03-28 11:11:30 -07:00
Alexey Volkov e452385a55 Fixed handling parameters with default values in task factory construction (#1047)
* Fixed handling default inputs in task factory construction

* Added tests.
2019-03-26 19:14:47 -07:00
Ning 1c4f9eb431
exposing type checking (#1022)
* exposing types under dsl.types
2019-03-26 09:33:16 -07:00
Alexey Volkov 9b804688d3 Added the metadata property to ComponentSpec (#1023)
The `metadata` section contains the `annotations` and `labels` dictionaries.
2019-03-23 16:27:05 -07:00
Alexey Volkov 07aa5db70f Fixed bug in docstring construction (#1012) 2019-03-21 14:57:36 -07:00
Alexey Volkov 665d088030 Added the component name to the docstring (#976) 2019-03-19 21:50:24 -07:00
Ning c829115574 Add type check (#938)
* add core types and type checking function

* fix unit test bug

* avoid defining dynamic classes

* typo fix

* add component metadata format

* add a construct for the component decorator

* add default values for the meta classes

* add input/output types to the metadata

* add from_dict in TypeMeta

* small fix

* add unit tests

* use python struct for the openapi schema

* add default in parameter

* add default value

* remove the str restriction for the param default

* bug fix

* add pipelinemeta

* add pipeline metadata

* ignore annotation if it is not str/BaseType/dict

* update param name in the check_type functions
remove schema validators for GCRPath, and adjust for GCRPath, GCSPath
change _check_valid_dict to _check_valid_type_dict to avoid confusion
fix typo in the comments
adjust function order for readability

* remove default values for non-primitive types in the function signature
update the _check_valid_type_dict name

* pass metadata from component decorator and task factory to containerOp

* pass pipeline metadata to Pipeline

* fix unit test

* typo in the comments

* move the metadata classes to a separate module

* fix unit test

* small change

* add __eq__ to meta classes
not export _metadata classes

* nothing

* fix unit test

* unit test python component

* unit test python pipeline

* fix bug: duplicate variable of args

* fix unit tests

* move python_component and _component decorator in _component file

* remove the print

* change parameter default value to None

* add functools wraps around _component decorator

* TypeMeta accept both str and dict

* fix indent, add unit test for type as strings

* do not set default value for the name field in ParameterMeta, ComponentMeta, and PipelineMeta

* add type check in task factory

* output error message

* add type check in component decorator; move the metadata assignment out of the containerop __init__ function

* fix bug; add unit test

* add more unit tests

* more unit tests; fix bugs

* more unit tests; fix bugs

* add unit tests

* more unit tests

* add type check switch; add unit tests

* add compiler option for type check

* resolving pr comments

* add unit test for pipeline param check with component types; fix the bug; also fix the bug when there are not a single return annotations
2019-03-11 11:22:12 -07:00
Alexey Volkov 6d080c70f9
Added support for loading zip-packed components (#931)
The zip-packed components are supported in all load_component APIs:
`kfp.components.load_component`
`kfp.components.load_component_from_file`
`kfp.components.load_component_from_url`
`kfp.components.ComponentStore.load_component`
2019-03-06 23:00:03 -08:00
Alexey Volkov fa02e750da SDK/Components - Added naming.generate_unique_name_conversion_table (#716)
generate_unique_name_conversion_table replaces _make_name_unique_by_adding_index and simplifies code in several places.
2019-03-06 15:12:58 -08:00
Ning 974d602b74
Pass meta to containerop and pipeline (#905)
pass metadata from python conf to containerop and the pipeline
2019-03-06 13:42:23 -08:00
Alexey Volkov 5ab368ac10 Added support for default values to Lightweight python components (#890) 2019-03-01 14:51:18 -08:00
Alexey Volkov f5bdf2474e Added support for default values to load_component (#889) 2019-03-01 14:12:32 -08:00
Alexey Volkov 85738cbaaf Passing the environment variables to ContainerOp (#877)
When the DSL bridge code was written, ContainerOp did not support env, so we did not pass it. Now we're adding the passing code.
Added test that chacks that the env variables get to the ContainerOp.
2019-02-28 19:29:54 -08:00
Alexey Volkov d15c72470f SDK/Components - Improved error when type checking fails in constructor (#732) 2019-01-25 14:44:15 -08:00
Alexey Volkov edf9b5471a SDK/Components - convert_object_to_struct now uses __init__ to get field list (#733)
This stops serialization of any additional attributes set on an object
2019-01-24 20:01:23 -08:00
Alexey Volkov 8c4f5de1f7 SDK/Components - Command line args can only be strings or placeholders (#711)
Ultimately, command line is an array of strings. Component yaml files should have the arguments as strings instead of Python SDK doing conversion sometimes.
2019-01-24 19:13:50 -08:00
Alexey Volkov 4457e7e940 SDK/Components - More meaningful error when trying to convert graph component to ContainerOp (#710) 2019-01-24 18:15:07 -08:00
Alexey Volkov a53cb586fc SDK/Components - Added _naming._convert_to_human_name function (#715)
* SDK/Components - Moved naming-related functions to _naming.py

* SDK/Components - Added _naming._convert_to_human_name function
2019-01-24 16:07:46 -08:00
Alexey Volkov 32475bfafb SDK/Components/Python - Improved Python2 compatibility (#718)
Improved Python2 compatibility in Lightweight python components
2019-01-24 14:42:03 -08:00
Alexey Volkov 9b4088626c SDK/Components/Python - Made the typing.NamedTuple import optional (#717)
Now it's only imported if the return type is NamedTuple.
2019-01-23 16:31:13 -08:00