* SDK - Refactoring - Passing the parameters explicitly in python_op.
This helps avoid problems when new parameters are added.
* SDK - Components - Added package installation support to func_to_container_op
Example:
```python
op = func_to_container_op(my_func, packages_to_install=['pandas==0.24'])
```
* Make pip quieter
* Added the test_packages_to_install_feature test
Fixed accessing inputs and outputs without checking for None.
Fixed case where the default value of graph component input has to be passed to component as an argument.
* SDK - Lightweight - Convert the names of file inputs and outputs
Removing the "_path" and "_file" suffixes from the names of file inputs and outputs.
Problem: When accepting file inputs (outputs), the function inside the component receives file paths (or file streams), so it's natural to call the function parameter "something_file_path" (e.g. model_file_path or number_file_path).
But from the outside perspective, there are no files or paths - the actual data objects (or references to them) are passed in.
It looks very strange when argument passing code looks like this: `component(number_file_path=42)`. This looks like an error since 42 is not a path. It's not even a string.
It's much more natural to strip the names of file inputs and outputs of "_file" or "_path" suffixes. Then the argument passing code will look natural: "component(number=42)"
* Removed the _FEATURE_STRIP_FILE_IO_NAME_PARTS feature switch
Problem: It's hard to distinguish components loaded by name (e.g. using `ComponentStore`) from components that were never loaded (e.g. just created from python function).
`component_ref.name` was previously being set, since it was a required parameter.
`component_ref.name` should only be set if component was loaded by name.
Change to update the approach to add license files to the
third_party/metadata_envoy docker image. This update moves from the
model of downloading the license files in each build to downloading it
locally to a license.txt file, checking the file in and using it to
store in the docker image.
* Add necessary data types/tables for pipeline version. Mostly based
on Yang's branch at https://github.com/IronPan/pipelines/tree/kfpci/.
Backward compatible.
* Modified comment
* Modify api converter according with new pipeline (version) definition
* Change pipeline_store for DefaultVersionId field
* Add pipeline spec to pipeline version
* fix model converter
* fix a comment
* Add foreign key, pagination of list request, refactor code source
* Refactor code source
* Foreign key
* Change code source and package source type
* Fix ; separator
* Add versions table and modify existing pipeline apis
* Remove api pipeline defintiion change and leave it for later PR
* Add comment
* Make schema changing and data backfilling a single transaction
* Tolerate null default version id in code
* fix status
* Revise delete pipeline func
* Use raw query to migrate data
* No need to update versions status
* rename and minor changes
* accidentally removed a where clause
* Fix a model name prefix
* Refine comments
* Revise if condition
* Address comments
* address more comments
* Rearrange pipeline and version related parts inside CreatePipeline, to make them more separate.
* Add package url to pipeline version. Required when calling CreatePipelineVersionRequest
* Single code source url; remove pipeline id as sorting field; reformat
* resolve remote branch and local branch diff
* remove unused func
* Remove an empty line
* SDK - Moved the _container_builder from kfp.compiler to kfp.containers
This only moves the files. The imports remain the same for now.
* Simplified the imports.
* Components - Added AutoML Tables components
* Added the sample - AutoML Tables - Retail product stockout prediction
* Replaced the project ID with dummy placeholder
* Fixed the description parameter passing
* Replaced pip with pip3 and changed quotes
* Added licenses
* Updated the component links
* Revert "Replaced pip with pip3"
This reverts commit 65ed0a7fc6. (part of it)
Here, `pip` is not the name of executable. It's the module name which is
just `pip`, not `pip3`.
* Changed quotes to single quotes
* Moved the components to the gcp folder
* Switched container images to python:3.7
* Updated component versions in sample
Lightweight components now allow function to mark some outputs that it wants to produce by writing data to files, not returning it as in-memory data objects.
This is useful when the data is expected to be big.
Example 1 (writing big amount of data to output file with provided path):
```python
@func_to_container_op
def write_big_data(big_file_path: OutputPath(str)):
with open(big_file_path) as big_file:
for i in range(1000000):
big_file.write('Hello world\n')
```
Example 2 (writing big amount of data to provided output file stream):
```python
@func_to_container_op
def write_big_data(big_file: OutputTextFile(str)):
for i in range(1000000):
big_file.write('Hello world\n')
```
Lightweight components now allow function to mark some inputs that it wants to consume as files, not as in-memory data objects.
This is useful when the data is expected to be big.
Example 1:
```python
def consume_big_file_path(big_file_path: InputPath(str)) -> int:
line_count = 0
with open(big_file_path) as f:
while f.readline():
line_count = line_count + 1
return line_count
```
Example 2:
```python
def consume_big_file(big_file: InputTextFile(str)) -> int:
line_count = 0
while big_file.readline():
line_count = line_count + 1
return line_count
```
* Testing - Output Argo workflow information when the workflow times out
* Further improved argo logging in case of failure or timeout
* Fixed bug and addressed the feedback
* Fixed failure and timeout detection
* Making the script quieter
* check-argo-status no longer exits the main script on success
* SDK - Tests - Added better helper functions for testing python components
* SDK - Python components - Properly serializing outputs
Background:
Component arguments are already properly serialized when calling the component program and then deserialized before the execution of the component function.
But the component outputs were only serialized using `str()` which is inadequate for data types like lists or dictionaries.
This commit fixes the mismatch - theoutputs are now serialized the same ways as arguments and default values.
* SDK - Compiler - Fix large data passing
Stop outputting parameters unless they're consumed as parameters downstream.
This prevents the situaltion when component outputs a big file, but DSL compiler instructs Argo to pick it up as parameter (parameters only hold few kilobytes of data).
As byproduct, this change fixes some minor compiler data passing bugs where some parameters were being passed around, but never consumed (happened with `ResourceOp`, `dsl.Condition` and recursion).
* Replaced ... with `raise AssertionError`
* Fixed small bug
* Removed unused variables
* Fixed names of the mark_upstream_ios_of_* functions
* Fixed detection of parameter output references
* Fixed handling of volumes
* Utils to convert metadata api from callback paradigm to promise paradigm
* Show input and output in execution details page
* Change execution detail page input/output table styling
* Make artifact names in execution detail page a deep link
* Change deep link to artifact ID instead
* Fix absolute import
* Fix lint errors