pipelines

Commit Graph

Author	SHA1	Message	Date
Alexey Volkov	691eefc599	fix(sdk): Components - Fixed python components that use \n. Fixes #4939 (#4993 ) * SDK - Components - Fixed python components that use \n The escape sequence was being replaced by the `echo` command. Apparently, unlike in the `bash` shell, the `echo` command of the `sh` shell expands the escape sequences by default and does not support an option to turn it off. (For some reason the -n option works properly even though it should not). Fixes https://github.com/kubeflow/pipelines/issues/4939 * Fixed the test data * Fixed the deprecated container component builder * Fixed the new compiler test case * Added test	2021-01-14 18:21:51 -08:00
Jiaxiao Zheng	279694ec6d	feat(sdk): Container entrypoint used for new styled KFP component authoring (#4978 ) * skeleton * add entrypoint utils to parse param * wip: artifact parsing * add input param artifacts passing and clean unused code * wip * add output artifact inspection * add parameter output * finish entrypoint implementation * add entrypoint_utils_test.py * add entrypoint test * add entrypoint test * get rid of tf * fix test * fix file location * fix tests * fix tests * resolving comments * Partially rollback * resolve comments in entrypoint.py * resolve comments	2021-01-14 16:01:21 -08:00
Alexey Volkov	d629397654	feat(sdk): Components - Support annotations when creating components from python (#4996 ) The component specification has always supported component annotations, but there was no way to specify them for the components generated from python. This PR fixes that.	2021-01-14 13:59:31 -08:00
Alexey Volkov	7a66414cf7	feat(sdk): Components - Restored stack traces in lightweight python components. Fixes #4273 , #4849 (#4861 ) Currently were running the python code inline using `python -c <code>`. This has two issues: 1) Python does not show source code line in exception stack traces 2) inspect.getsource does not work. This method is used in PyTorch JIT for example. We solve these issues by writing the code into a file before executing it. The disadvantage of the new approach is that it adds complexity, a filesystem write operation and also requires the `sh` executable to be present (we could replace it with python-based program if needed).	2020-12-14 14:33:49 -08:00
Jiaxiao Zheng	fb15223f7e	chore: Add doc strings marking the feature stages for SDK. (#4575 ) * add doc strings * Simplify the docstring * fix unittest * recover cli.py * recover cli.py * substitute docstring in resource ops with TODOs * revert stable labels	2020-11-24 00:19:00 -08:00
Alexey Volkov	8699a05c27	fix(sdk): Components - Fixed handling of typing.NamedTuple in Python 3.9 (#4614 ) Python 3.9 has dropped support for `typing.NamedTuple(...)`.`_field_types` in favor of `__annotations__` which in turn does not exist in Python 3.5.	2020-10-17 00:32:12 -07:00
Abhishek Vilas Munagekar	5613db02bc	feat(sdk): Python components - Parse component input/output descriptions from the function docstring (#4512 ) * cleanup imports * add description to inputs and outputs * update requirements * add test * improve component description * update tests * review changes: fix lint and requirements * upgrade docstring-parser	2020-09-19 23:22:29 -07:00
Alexey Volkov	03325848fc	feat(sdk): Components - Prevent passing unserializable objects to components. Fixes #4040 (#4496 )	2020-09-16 02:23:22 -07:00
Alex Latchford	704c8c7660	chore: Clean up KFP SDK docstrings, make formatting a little more consistent (#4218 ) * Prepare SDK docs environment so its easier to understand how to build the docs locally so theyre consistent with ReadTheDocs. * Clean up docstrings for kfp.Client * Add in updates to the docs for compiler and components * Update components area to add in code references and make formatting a little more consistent. * Clean up containers, add in custom CSS to ensure we do not overflow on inline code blocks * Clean up containers, add in custom CSS to ensure we do not overflow on inline code blocks * Remove unused kfp.notebook package links * Clean up a few more errant references * Clean up the DSL docs some more * Update SDK docs for KFP extensions to follow Sphinx guidelines * Clean up formatting of docstrings after Ark-Kuns comments	2020-08-04 00:33:47 +08:00
Alexey Volkov	aeb0401c8a	SDK - Components - Fixed examples in docstrings (#4074 )	2020-07-14 14:27:21 -07:00
Alexey Volkov	ceb860c594	SDK - Components - Python - Switched the default base image to python 3.7 (4054) Previously the default image was set to an old version of tensorflow image. That image is now outdated. It's also framework-specific and pretty big. We're switching to the official python image which is small, official and framework-agnostic. The users can easily switch to the old behavior by just specifying `base_image='tensorflow/tensorflow:1.13.2-py3'` during the component creation.	2020-06-25 15:15:31 -07:00
Alexey Volkov	da4acbbd73	SDK - Python Components - Stop generating output saving code if no outputs (#3836 ) Removed dead code from the generated python command-line wrapper.	2020-05-28 23:47:15 -07:00
Alexey Volkov	55d41df83d	SDK - Components - Removed the deprecated _python_op.get_default_base_image and set_default_base_image functions (#3773 )	2020-05-17 20:23:36 -07:00
Alexey Volkov	b9aa106bb5	SDK - Prioritize lib2to3 when stripping type annotations (#3724 ) * SDK - Prioritize lib2to3 when stripping type annotations It's a standard python library (although not well supported) and it doe not leave training spaces. * Fixed compiler test data	2020-05-11 18:44:20 -07:00
Alexey Volkov	5ff7a65a0c	SDK - Components - Fixed bug in _strip_type_hints_using_lib2to3 (#3679 )	2020-05-04 22:41:08 -07:00
Alexey Volkov	9619655ed5	SDK - Enabled file inputs to be optional (#3620 ) * SDK - Enabled file inputs to be optional * Added unit tests	2020-04-27 19:34:04 -07:00
Alexey Volkov	be12ccf2a1	SDK - Moved the @python_component decorator test to dsl tests (#3324 ) * SDK - Moved the @python_component decorator test to dsl tests * Deprecate @python_component	2020-03-21 08:14:43 -07:00
Alexey Volkov	119e329108	SDK - Components - Fixed handling collection return values (#3263 ) * SDK - Components - Fixed handling collection return values Fixes https://github.com/kubeflow/pipelines/issues/3262 * Fixed the tests	2020-03-12 23:50:39 -07:00
Alexey Volkov	578d8de91d	SDK - Reduce python component limitations - no import errors for cust… (#3106 ) * SDK - Reduce python component limitations - no import errors for custom type annotations By default, create_component_from_func copies the source code of the function and creates a component using that source code. No global imports are captured. This is problematic for the function definition, since any annotation, that uses a type that needs to be imported, will cause error. There were some special provisions for NamedTuple, InputPath and OutputPath, but even they were brittle (for example, "typing.NamedTuple" or "components.InputPath" annotations still caused failures at runtime). This commit fixes the issue by stripping the type annotations from function declarations. Fixes cases that were failing before: ```python import typing import collections MyFuncOutputs = typing.NamedTuple('Outputs', [('sum', int), ('product', int)]) @create_component_from_func def my_func( param1: CustomType, # This caused failure previously param2: collections.OrderedDict, # This caused failure previously ) -> MyFuncOutputs: # This caused failure previously pass ``` * Fixed the compiler tests * Fixed crashes on print function Code `print(line, end="")` was causing error: "lib2to3.pgen2.parse.ParseError: bad input: type=22, value='=', context=('', (2, 15))" * Using the strip_hints library to strip the annotations * Updating test workflow yamls * Workaround for bug in untokenize * Switched to the new strip_string_to_string method * Fixed typo. Co-Authored-By: Jiaxiao Zheng <jxzheng@google.com> Co-authored-by: Jiaxiao Zheng <jxzheng@google.com>	2020-02-24 20:50:48 -08:00
Alexey Volkov	7ee3244f5b	SDK - Components - Fixed dict-style type annotations (#3107 ) Refactored `_data_passing.py` interface to expose functions instead of dictionaries.	2020-02-18 20:40:25 -08:00
Alexey Volkov	c83aff2738	SDK - Components - Made it easier to access component spec classes (#2860 ) * SDK - Components - Made it easier to access component spec classes * Updated the imports	2020-01-31 11:41:21 -08:00
Alexey Volkov	6c72cc874a	SDK - Components - Added the create_component_from_func alias (#2911 ) Added the `create_component_from_func` function as alias for `func_to_container_op`. It behaves exactly the same, but the name now does not imply that you'll always get `ContainerOp` from it. Some function parameters are not added at this moment as they're not widely used and might be deprecated in the future.	2020-01-27 17:41:38 -08:00
Alexey Volkov	27f7e77356	SDK - Unified the function signature parsing implementations (#2689 ) * Replaced `_instance_to_dict(obj)` with `obj.to_dict()` * Fixed the capitalization in _python_function_name_to_component_name It now only changes the case of the first letter. * Replaced the _extract_component_metadata function with _extract_component_interface * Stopped adding newline to the component description. * Handling None inputs and outputs * Not including emply inputs and outputs in component spec * Renamed the private attributes that the @pipeline decorator sets * Changged _extract_pipeline_metadata to use _extract_component_interface * Fixed issues based on feedback	2019-12-27 10:05:40 -08:00
Alexey Volkov	605ef804c6	Fixed the capitalization in _python_function_name_to_component_name (#2688 ) It now only changes the case of the first letter.	2019-12-10 12:00:11 -08:00
Alexey Volkov	61506a0e88	SDK - Components - Fixed YAML formatting for some components (#2529 ) * SDK - Components - Fixed YAML formatting for some components This fixes formatting for components where function does not have a return annotation. The low-level cause of issue: Trailing whitespace when there are no serializers. Trailing whitespace triggers ugly YAML string formatting. * Addressed feedback	2019-11-07 14:48:19 -08:00
Alexey Volkov	1282f16335	SDK - Python components - Fixed bug when mixing file outputs with return value outputs (#2473 )	2019-10-23 19:45:05 -07:00
Alexey Volkov	f4d689b4ed	SDK - Python components - Fixed handling multiline decorators (#2345 ) * SDK - Python components - Fixed handling multiline decorators * Switched to using dedent * Added error checking * Testing multiline decorator * Test calling the component created from decorated function Also fixed `helper_test_component_against_func_using_local_call`.	2019-10-16 12:17:29 -07:00
Alexey Volkov	8b0cb8a5b5	SDK - Components - Deprecate the get and set methods for default image in favor of plain variable (#2257 )	2019-10-04 15:35:12 -07:00
Alexey Volkov	b2f1d0071f	SDK - Components - Added the ComponentSpec.save method (#2264 ) * SDK - Components - Added the ComponentSpec.save method * Fixed write call	2019-10-03 15:25:55 -07:00
Alexey Volkov	c676b838ef	SDK - Lightweight - Added package installation support to func_to_container_op (#2245 ) * SDK - Refactoring - Passing the parameters explicitly in python_op. This helps avoid problems when new parameters are added. * SDK - Components - Added package installation support to func_to_container_op Example: ```python op = func_to_container_op(my_func, packages_to_install=['pandas==0.24']) ``` * Make pip quieter * Added the test_packages_to_install_feature test	2019-09-30 19:13:32 -07:00
Alexey Volkov	06f9322a78	SDK - Lightweight - Convert the names of file inputs and outputs (#2260 ) * SDK - Lightweight - Convert the names of file inputs and outputs Removing the "_path" and "_file" suffixes from the names of file inputs and outputs. Problem: When accepting file inputs (outputs), the function inside the component receives file paths (or file streams), so it's natural to call the function parameter "something_file_path" (e.g. model_file_path or number_file_path). But from the outside perspective, there are no files or paths - the actual data objects (or references to them) are passed in. It looks very strange when argument passing code looks like this: `component(number_file_path=42)`. This looks like an error since 42 is not a path. It's not even a string. It's much more natural to strip the names of file inputs and outputs of "_file" or "_path" suffixes. Then the argument passing code will look natural: "component(number=42)" * Removed the _FEATURE_STRIP_FILE_IO_NAME_PARTS feature switch	2019-09-30 16:35:32 -07:00
Alexey Volkov	3caba4e06f	SDK - Lightweight - Added support for file outputs (#2221 ) Lightweight components now allow function to mark some outputs that it wants to produce by writing data to files, not returning it as in-memory data objects. This is useful when the data is expected to be big. Example 1 (writing big amount of data to output file with provided path): ```python @func_to_container_op def write_big_data(big_file_path: OutputPath(str)): with open(big_file_path) as big_file: for i in range(1000000): big_file.write('Hello world\n') ``` Example 2 (writing big amount of data to provided output file stream): ```python @func_to_container_op def write_big_data(big_file: OutputTextFile(str)): for i in range(1000000): big_file.write('Hello world\n') ```	2019-09-24 18:11:58 -07:00
Alexey Volkov	2510a690f2	SDK - Lightweight - Added support for file inputs (#2207 ) Lightweight components now allow function to mark some inputs that it wants to consume as files, not as in-memory data objects. This is useful when the data is expected to be big. Example 1: ```python def consume_big_file_path(big_file_path: InputPath(str)) -> int: line_count = 0 with open(big_file_path) as f: while f.readline(): line_count = line_count + 1 return line_count ``` Example 2: ```python def consume_big_file(big_file: InputTextFile(str)) -> int: line_count = 0 while big_file.readline(): line_count = line_count + 1 return line_count ```	2019-09-23 17:59:25 -07:00
Alexey Volkov	1c287f2f89	SDK - Components - Simplified arg-parsing code using argparse.SUPPRESS (#2193 )	2019-09-23 13:45:24 -07:00
Alexey Volkov	c914df542c	SDK - Python components - Properly serializing outputs (#2198 ) * SDK - Tests - Added better helper functions for testing python components * SDK - Python components - Properly serializing outputs Background: Component arguments are already properly serialized when calling the component program and then deserialized before the execution of the component function. But the component outputs were only serialized using `str()` which is inadequate for data types like lists or dictionaries. This commit fixes the mismatch - theoutputs are now serialized the same ways as arguments and default values.	2019-09-23 12:29:33 -07:00
Alexey Volkov	db6625ff96	SDK - Removed some dead code (#2194 )	2019-09-23 12:29:25 -07:00
Alexey Volkov	c4c0bb8202	SDK - Components - Fixed kfp.components.set_default_base_image (#2118 )	2019-09-16 15:30:26 -07:00
Alexey Volkov	647867bde1	SDK - Python components - Fixed the default base_image handling (#2119 ) In python the default parameter values are only evaluated once.	2019-09-16 13:42:38 -07:00
Alexey Volkov	77d0ee014e	SDK - Lightweigh - Made wrapper code compatible with python2 (#2035 )	2019-09-13 16:44:40 -07:00
Alexey Volkov	6c15f27f7e	SDK - Components - Hiding signature attribute from CloudPickle (#2045 ) * SDK - Components - Hiding signature attribute from CloudPickle Cloudpickle has some issues with pickling type annotations in python versions < 3.7, so they disabled it. https://github.com/cloudpipe/cloudpickle/issues/196 `create component_from_airflow_op` spoofs the function signature by setting the `func.__signature__` attribute. cloudpickle then tries to pickle that attribute which leads to failures during unpickling. To prevent this we remove the `.__signature__` attribute before pickling. * Added comments # Hack to prevent cloudpickle from trying to pickle generic types that might be present in the signature. See https://github.com/cloudpipe/cloudpickle/issues/196 # Currently the __signature__ is only set by Airflow components as a means to spoof/pass the function signature to _func_to_component_spec	2019-09-06 11:12:15 -07:00
Alexey Volkov	5dbea6cb91	SDK - Components - Setting default base image or image factory (#1937 ) Added kfp.components.set_default_base_image which sets the name of the container image that will be used for component creation when base_image is not specified. Alternatively, the base image can also be set to a factory function that will be returning the image. The support is added for both Lightweight components and python container components.	2019-08-26 17:48:40 -07:00
Alexey Volkov	17e18a162e	SDK - Components - Improved serialization and deserialization of arguments and defaults (#1934 ) * SDK - Components - Improved serialization and deserialization of arguments and defaults Properly serialize default values and passed arguments using the same code. Check the types of passed argument values and issue warnings. Improved argument reference type compatibility checking. When types do not match there is always either error or warning. When creating component from python function, the input types are now canonicalized. * Addressed the feedback	2019-08-23 18:18:25 -07:00
Alexey Volkov	203307dbaf	SDK - Lightweight - Fixed custom types in multi-output case (#1875 ) The type was mistakenly serialized as `_ForwardRef('CustomType')`. The input parameter types and single-output types were not affected.	2019-08-21 16:37:21 -07:00
Alexey Volkov	7917ea475e	SDK - Lightweight - Added support for complex default values (#1696 )	2019-08-12 02:35:13 -07:00
Alexey Volkov	dd59bc2597	SDK - Lightweight - Fixed regression for components without outputs (#1726 )	2019-08-05 21:47:53 -07:00
Alexey Volkov	94969d6264	SDK/Lightweight - Updated default image to tensorflow:1.13.2-py3 (#1671 )	2019-08-02 18:31:53 -07:00
Alexey Volkov	f6cf9c5f55	SDK - Lightweight - Added support for "None" default values (#1626 ) * SDK - Lightweight - Added support for "None" default values Previously it was impossible to pass None to components since it was being converted to the string "None". * is_required = not input.optional for now As asked by @gaoning777	2019-07-25 18:49:59 -07:00
Alexey Volkov	3aeab312f2	SDK/Lightweight - Use argparse for command-line parsing (#1534 ) It's required to correctly handle None arguments or None default values (also needed for optional and variable-number inputs). It's easier to understand and generates better command-line code.	2019-06-23 16:45:53 -07:00
Alexey Volkov	ce8df162a9	SDK/Lightweight - Added python version compatibility checks (#1524 ) * SDK - Refactored the code in kfp.components._python_op._capture_function_code_using_cloudpickle * SDK/Lightweight - Added python version compatibility checks See my compatibility analysis: https://github.com/cloudpipe/cloudpickle/issues/293	2019-06-23 14:41:54 -07:00
Alexey Volkov	627b412f24	SDK/Lightweight - Disabled code pickling by default (#1512 ) I've introduced code pickling to capture dependencies in https://github.com/kubeflow/pipelines/pull/1372 Later I've discovered that there is a serious opcode incompatibility between python versions 3.5 and 3.6+. See my analysis of the issue: https://github.com/cloudpipe/cloudpickle/issues/293 Dues to this issue I decided to switch back to using source code copying by default and to continue improving it. Until we stop supporting python 3.5 (https://github.com/kubeflow/pipelines/pull/668) it's too dangerous to use code pickling by default. Code pickling can be enabled by specifying `pickle_code=True` when calling `func_to_container_op`	2019-06-18 19:44:30 -07:00

1 2

69 Commits