Commit Graph

360 Commits

Author SHA1 Message Date
Alexey Volkov c676b838ef SDK - Lightweight - Added package installation support to func_to_container_op (#2245)
* SDK - Refactoring - Passing the parameters explicitly in python_op.
This helps avoid problems when new parameters are added.

* SDK - Components - Added package installation support to func_to_container_op

Example:
```python
op = func_to_container_op(my_func, packages_to_install=['pandas==0.24'])
```

* Make pip quieter

* Added the test_packages_to_install_feature test
2019-09-30 19:13:32 -07:00
Alexey Volkov 646c2890de SDK - Components - Fixed small bugs in graph component resolving (#2269)
Fixed accessing inputs and outputs without checking for None.
Fixed case where the default value of graph component input has to be passed to component as an argument.
2019-09-30 18:33:32 -07:00
Alexey Volkov 06f9322a78 SDK - Lightweight - Convert the names of file inputs and outputs (#2260)
* SDK - Lightweight - Convert the names of file inputs and outputs

Removing the "_path" and "_file" suffixes from the names of file inputs and outputs.
Problem: When accepting file inputs (outputs), the function inside the component receives file paths (or file streams), so it's natural to call the function parameter "something_file_path" (e.g. model_file_path or number_file_path).
But from the outside perspective, there are no files or paths - the actual data objects (or references to them) are passed in.
It looks very strange when argument passing code looks like this: `component(number_file_path=42)`. This looks like an error since 42 is not a path. It's not even a string.
It's much more natural to strip the names of file inputs and outputs of "_file" or "_path" suffixes. Then the argument passing code will look natural: "component(number=42)"

* Removed the _FEATURE_STRIP_FILE_IO_NAME_PARTS feature switch
2019-09-30 16:35:32 -07:00
Alexey Volkov 2f0f1e47a2 SDK - Components - Stop setting component_ref.name to component name (#2265)
Problem: It's hard to distinguish components loaded by name (e.g. using `ComponentStore`) from components that were never loaded (e.g. just created from python function).
`component_ref.name` was previously being set, since it was a required parameter.
`component_ref.name` should only be set if component was loaded by name.
2019-09-30 15:37:32 -07:00
Alexey Volkov c7b95dd711 SDK - Notebooks - Deprecated the docker magic (#2266) 2019-09-30 13:55:31 -07:00
sina chavoshi 02f5bc08ff remove default namespace (#2250)
* remove default namespace

* updated args description

* updated comment per feedback.
2019-09-27 16:23:40 -07:00
Timur Solovev 389b585de1 SDK: fix label check for ContainerOP entities (#2243) 2019-09-26 13:13:35 -07:00
Alexey Volkov 7735a14694 SDK - Components - Stop serializing string values (#2227)
This can happen with Lightweight component outputs if they've already been serialized manually.
2019-09-25 20:29:06 -07:00
Alexey Volkov 342abae27a SDK - Moved the _container_builder from kfp.compiler to kfp.containers (#2192)
* SDK - Moved the _container_builder from kfp.compiler to kfp.containers
This only moves the files. The imports remain the same for now.

* Simplified the imports.
2019-09-25 18:27:06 -07:00
Alexey Volkov 51585a7023 SDK - Containers - Do not create GCS bucket unless building the image (#1938)
Also missing default image name is no longer an error as long as the image name is provided to ContainerBuilder.build.
2019-09-24 20:21:58 -07:00
Alexey Volkov 98fd6c8c32 SDK - Components - Fixed serialization of lists and dicts containing `PipelineParam` items (#2212)
Fixes https://github.com/kubeflow/pipelines/issues/2206
The issue is fixed for both `JSON`-based and `str()`-based serialization.
2019-09-24 19:43:59 -07:00
Alexey Volkov 3caba4e06f SDK - Lightweight - Added support for file outputs (#2221)
Lightweight components now allow function to mark some outputs that it wants to produce by writing data to files, not returning it as in-memory data objects.
This is useful when the data is expected to be big.

Example 1 (writing big amount of data to output file with provided path):
```python
@func_to_container_op
def write_big_data(big_file_path: OutputPath(str)):
    with open(big_file_path) as big_file:
        for i in range(1000000):
            big_file.write('Hello world\n')

```
Example 2 (writing big amount of data to provided output file stream):
```python
@func_to_container_op
def write_big_data(big_file: OutputTextFile(str)):
    for i in range(1000000):
        big_file.write('Hello world\n')
```
2019-09-24 18:11:58 -07:00
Alexey Volkov 9c9fb9d87e Docs - Added kfp.containers module (#2182)
And fixed the docstring
2019-09-24 13:43:59 -07:00
Alexey Volkov 2510a690f2 SDK - Lightweight - Added support for file inputs (#2207)
Lightweight components now allow function to mark some inputs that it wants to consume as files, not as in-memory data objects.
This is useful when the data is expected to be big.

Example 1:
```python
def consume_big_file_path(big_file_path: InputPath(str)) -> int:
    line_count = 0
    with open(big_file_path) as f:
        while f.readline():
            line_count = line_count + 1
    return line_count
```
Example 2:
```python
def consume_big_file(big_file: InputTextFile(str)) -> int:
    line_count = 0
    while big_file.readline():
        line_count = line_count + 1
    return line_count
```
2019-09-23 17:59:25 -07:00
Ning 46026e56ae add support for hard and soft constraint in the preemptible nodepools (#2205)
* add support for hard and soft constraint in the preemptible nodepools

* fix unit tests
2019-09-23 15:19:26 -07:00
Alexey Volkov 1c287f2f89 SDK - Components - Simplified arg-parsing code using argparse.SUPPRESS (#2193) 2019-09-23 13:45:24 -07:00
Alexey Volkov c914df542c SDK - Python components - Properly serializing outputs (#2198)
* SDK - Tests - Added better helper functions for testing python components

* SDK - Python components - Properly serializing outputs
Background:
Component arguments are already properly serialized when calling the component program and then deserialized before the execution of the component function.
But the component outputs were only serialized using `str()` which is inadequate for data types like lists or dictionaries.

This commit fixes the mismatch - theoutputs are now serialized the same ways as arguments and default values.
2019-09-23 12:29:33 -07:00
Alexey Volkov db6625ff96 SDK - Removed some dead code (#2194) 2019-09-23 12:29:25 -07:00
IronPan dc5aab6687
Release 57d9f7f1cf (#2184)
* Updated component images to version 57d9f7f1cf

* Updated components to version e598176c02

* good version
2019-09-23 00:19:55 +08:00
Alexey Volkov ef63c653af SDK - Compiler - Fix large data passing (#2173)
* SDK - Compiler - Fix large data passing

Stop outputting parameters unless they're consumed as parameters downstream.
This prevents the situaltion when component outputs a big file, but DSL compiler instructs Argo to pick it up as parameter (parameters only hold few kilobytes of data).

As byproduct, this change fixes some minor compiler data passing bugs where some parameters were being passed around, but never consumed (happened with `ResourceOp`, `dsl.Condition` and recursion).

* Replaced ... with `raise AssertionError`

* Fixed small bug

* Removed unused variables

* Fixed names of the mark_upstream_ios_of_* functions

* Fixed detection of parameter output references

* Fixed handling of volumes
2019-09-20 15:05:27 -07:00
Alexey Volkov e420940d67 SDK - Components - Fixed the output types for outputs with converted names (#2162)
Fixes https://github.com/kubeflow/pipelines/issues/2130
2019-09-18 20:53:00 -07:00
Alexey Volkov dd071d39fa SDK - Containers - Raise exception on job failure (#2144) 2019-09-17 17:07:16 -07:00
Alexey Volkov 642dd13dde SDK - Testing - Fix metadata comparison instability (#2145)
* SDK - Testing - Fix metadata comparison instability

* Stopped comparing annotations at all
2019-09-17 15:37:22 -07:00
Alexey Volkov eae37fba33 SDK - Components - Fixed build_python_component (#2143) 2019-09-17 14:45:15 -07:00
Alexey Volkov 6afb91b902
SDK - Fix pipeline metadata serialization (#2137)
Two PRs have been merged that turned out to be slightly incompatible. This PR fixes the failing tests.
Root causes:
* The pipeline parameter default values were not properly serialized when constructing the metadata object.
* The `ParameterMeta` class did not validate the default value type, so the lack of serialization has not been caught. The `ParameterMeta` was replaced by `InputSpec` which has strict type validation.
* Previously we did not have samples with complex pipeline parameter default values (e.g. lists) that could trigger the failures. Then two samples were added that had complex default values.
* Travis does not re-run tests before merging
* Prow does not re-run Travis tests before merging
2019-09-17 13:07:34 -07:00
Alexey Volkov e3c72fc251 SDK - Persisting all output values (#2134)
Currently, the parameter output values are not saved to storage and their values are lost as soon as garbage collector removes the workflow object.
This change makes is so the parameter output values are persisted.
2019-09-16 19:44:24 -07:00
Alexey Volkov 0e2bf15dbc
SDK - Refactoring - Replaced the *Meta classes with the *Spec classes (#1944)
* SDK - Refactoring - Replaced the ParameterMeta class with InputSpec and OutputSpec

* SDK - Refactoring - Replaced the internal PipelineMeta class with ComponentSpec

* SDK - Refactoring - Replaced the internal ComponentMeta class with ComponentSpec

* SDK - Refactoring - Replaced the *Meta classes with the *Spec classes

Replaced the ComponentMeta class with ComponentSpec
Replaced the PipelineMeta class with ComponentSpec
Replaced the ParameterMeta class with InputSpec and OutputSpec

* Removed empty fields
2019-09-16 18:41:12 -07:00
Kevin Bache 2ca7d0ac31 WithParams (#2044)
* first working commit

* incrememtal commit

* in the middle of converting loop args constructor to accept pipeline param

* both cases working

* output works, passed doesn't

* about to redo compiler section

* rewrite draft done

* added withparam tests

* removed sdk/python/comp.yaml

* minor

* subvars work

* more tests

* removed unneeded artifact outputs from test yaml

* sort keys

* removed dead artifact code
2019-09-16 17:58:22 -07:00
Alexey Volkov c4c0bb8202 SDK - Components - Fixed kfp.components.set_default_base_image (#2118) 2019-09-16 15:30:26 -07:00
Alexey Volkov 60f9da6c74 SDK - Containers - Fixed kfp.containers.get_default_image_builder (#2116) 2019-09-16 14:56:23 -07:00
Alexey Volkov 647867bde1 SDK - Python components - Fixed the default base_image handling (#2119)
In python the default parameter values are only evaluated once.
2019-09-16 13:42:38 -07:00
IronPan fb9eb2d0a2
close the thread to get gcloud auth token (#2084)
Error message saw in some cases
```
/tmpfs/BUILD_ENV/lib/python3.5/site-packages/kfp/_auth.py:34: ResourceWarning: unclosed file <_io.TextIOWrapper name=6 encoding='UTF-8'>
  return os.popen('gcloud auth print-access-token').read().rstrip()

```
2019-09-15 08:54:26 +08:00
Alexey Volkov 3ec743a3e4 SDK - Started to explicitly import submodules into kfp namespace (#2117) 2019-09-13 19:06:39 -07:00
Alexey Volkov 77d0ee014e SDK - Lightweigh - Made wrapper code compatible with python2 (#2035) 2019-09-13 16:44:40 -07:00
Ning fae0361fbf fix bug: list is not expecting keyword arg (#2107) 2019-09-13 13:54:29 -07:00
Ning 06aeb0a052
update sdk versions (#2100) 2019-09-13 10:15:44 -07:00
Jiaxiao Zheng 1449d08aee Fix the logic of passing default values of pipeline parameters. (#2098)
* Fix the logic of passing default values.

* Modify unit test

* Solve.
2019-09-12 17:10:33 -07:00
Alexey Volkov 1962715688 SDK - Stop adding empty descriptions and inputs (#1969) 2019-09-11 09:58:49 -07:00
Jiaxiao Zheng 497d016e85 Expose an API for appending params/names/descriptions in a programmable way. (#2082)
* Refactor. Expose a public API to append pipeline param without interacting with dsl.Pipeline obj.

* Add unit test and fix.

* Fix docstring.

* Fix test

* Fix test

* Fix two nit problems

* Refactor
2019-09-10 17:58:47 -07:00
Alexey Volkov a3c83f50b6 SDK - Testing - Run some unit-tests in a more correct way (#2036)
* SDK - Testing - Run some unit-tests in a more correct way
Replaced `@unittest.expectedFailure` with `with self.assertRaises(...):`.
Replaced `assert` with `self.assertEqual(...)`.
Stopped producing the stray "comp.yaml" file.
Enabled the test_load_component_from_url test.

* Removed a stray comment

* Addded two tests for output_component_file
2019-09-10 08:35:05 -07:00
Alexey Volkov d83601d19a SDK - Compiler - Quoting the predicate operands (#2043)
Fixes https://github.com/kubeflow/pipelines/issues/1950
2019-09-06 17:05:21 -07:00
Alexey Volkov 979396702e SDK - Compiler - Failing when PipelineParam is unresolved (#2055)
Instead of silently producing a broken pipeline package, the compiler now raises error and instructs the user to submit a bug report.
2019-09-06 15:51:20 -07:00
Alexey Volkov 08104d6cf9 SDK - Containers - Build python container image based on current working directory (#1970)
* SDK - Containers - Build container image from current environment

* Removed the ability to capture the active python environment (as requested by @hongye-sun)

* Added the type hint and docstring to for the return type.

* Renamed `build_image_from_env` function to `build_image_from_working_dir`
as requested by @hongye-sun

* Explained the function behavior in the documentation.

* Removed extra empty line

* Improved caching by copying python files only after installing python packages

* Made test more portable

* Added support for specifying the base_image
`kfp.containers.default_base_image = ...`
The image can also be a callable returning the image name.

* Renamed `get_python_image` to `get_python_image_for_current_version`

* Switched the default base image to Google Deep Learning container image as requested by @hongye-sun
The size of this image is 4.35GB which really concerns me. The GPU image size is 6.45GB.

* Stopped importing kfp.containers.* into kfp.*

* Fixed test

* Fixed the regex string

* Fixed the type annotation style

* Addressed @hongye-sun feedback

* Removed the container image size warning

* Fixed import failure
2019-09-06 15:19:19 -07:00
Jiaxiao Zheng bd9d6319c8
Refactor kfp.compiler for better modularity (#2052)
* init analyze

* Refactor

* Renaming
2019-09-06 13:52:23 -07:00
Alexey Volkov 6c15f27f7e SDK - Components - Hiding signature attribute from CloudPickle (#2045)
* SDK - Components - Hiding signature attribute from CloudPickle

Cloudpickle has some issues with pickling type annotations in python versions < 3.7, so they disabled it. https://github.com/cloudpipe/cloudpickle/issues/196
`create component_from_airflow_op` spoofs the function signature by setting the `func.__signature__` attribute. cloudpickle then tries to pickle that attribute which leads to failures during unpickling.
To prevent this we remove the `.__signature__` attribute before pickling.

* Added comments

        # Hack to prevent cloudpickle from trying to pickle generic types that might be present in the signature. See https://github.com/cloudpipe/cloudpickle/issues/196 
        # Currently the __signature__ is only set by Airflow components as a means to spoof/pass the function signature to _func_to_component_spec
2019-09-06 11:12:15 -07:00
Alexey Volkov 5360f3fcab SDK - Compiler - Stopped adding mlpipeline artifacts to every compiled template (#2046)
* Explicitly added mlpipeline outputs to the components that actually produce them

* Updated samples

* SDK - DSL - Stopped adding mlpipeline artifacts to every compiled template
Fixes https://github.com/kubeflow/pipelines/issues/1421
Fixes https://github.com/kubeflow/pipelines/issues/1422

* Updated the Lighweight sample

* Updated the compiler tests

* Fixed the lightweight sample

* Reverted the change to one contrib/samples/openvino
The sample will still work fine as it is now.
I'll add the change to that file as a separate PR.
2019-09-05 17:56:57 -07:00
Alexey Volkov f911742d1a SDK - Compiler - Fixed handling of PipelineParams in artifact arguments (#2042)
Previously only constant strings were supported and serialized PipelineParams were not resolved, producing incorrect workflows.
2019-09-05 15:16:58 -07:00
Alexey Volkov 301186cc87 SDK - Refactoring - Reduced the usage of dsl.Pipeline context (#2034)
Also reduced the unnecessary explicit usage of PipelineParam bu the end users
2019-09-05 01:26:52 -07:00
Alexey Volkov 9104fd327f SDK - Testing - Make dsl and compiler tests discoverable by unittest (#2038)
This makes it possible to execute all test by running `python3 -m unittest discover --verbose -p *test*.py`
2019-09-04 12:38:22 -07:00
Ilias Katsakioris df4bc2365e SDK/DSL: Fix bug when using PipelineParam in `pvc` of PipelineVolume (#2018)
If no `name` is provided to PipelineVolume constructor, a custom name is
generated. It relies on `json.dumps()` of the struct after getting
converted to dict.
When `pvc` is provided and `name` is not, the following error is raised:
  TypeError: Object of type PipelineParam is not JSON serializable

This commit fixes it and extends tests to catch it.
2019-09-04 11:32:23 -07:00