Commit Graph

155 Commits

Author SHA1 Message Date
Alexey Volkov 8ba366b03f
SDK - Made outputs with original names available in ContainerOp.outputs (#3734)
* SDK - Made outputs with original names available in ContainerOp.outputs

Previously, ContainerOp had strict requirements for the output names, so we had to convert all the names before passing them to the ContainerOp constructor. Outputs with non-pythonic names could not be accessed using their original names.
Now ContainerOp supports any output names, so we're now using the original output names.
However to support legacy pipelines, we're also adding output references with pythonic names.

* Fixed the compiler test data

* Fixed the duplicate parameter outputs in the compiled workflow

* Fixed long line

* Stabilized the output naming conflict resolution

* Fix case of missing special outputs
2020-05-12 19:08:26 -07:00
Alexey Volkov 2279bde698
SDK - Annotate pods with component_ref (#3727)
* SDK - Annotate pods with component_ref

This preserves the information about the digest of the component and the location from which the component was loaded.

* Fixed compiler tests
2020-05-11 17:18:21 -07:00
Niklas Hansson 05c1537f28
Add Nodeselector to pipelineconfig fix issue #2863 (#3616)
* updated version

* added pipeline nodeselector

* removed old legacy

* renaming

* update test

* Update sdk/python/kfp/compiler/compiler.py
2020-05-05 00:11:08 -07:00
Eterna2 9167da1b4e
Support execution throttling for executing the pipelines (#3346) (#3439)
* Add parallelism limits to pipeline in kfp sdk

* fix lint error
2020-05-04 23:25:08 -07:00
Jiaxiao Zheng aa8da64b4c
[SDK] Add pod labels for telemetry purpose. (#3578)
* add telemetry pod labels

* revert the id label

* update compiler tests

* update cli arg

* bypass tfx

* update docstring
2020-04-27 18:50:04 -07:00
Alexey Volkov 6cb92d45c8
SDK - Compiler - Include the SDK version information in the compiled workflows (#3583)
* SDK - Compiler - Include the SDK version information in the compiled workflows

* Fixed the unit tests

* Removed the sdk_version annotation.
2020-04-25 01:49:28 -07:00
Niklas Hansson 2354776e1e
fix #2802: Set ImagePullPolicy per pipeline. (#3534)
* bump version

* default image pull policy

* Update sdk/python/kfp/dsl/_pipeline.py

* task setting should dominate

* Update sdk/python/kfp/dsl/_pipeline.py

* fixed merge misstake
2020-04-23 07:09:13 -07:00
Alexey Volkov b63ad7e614
SDK - Removed the ArtifactLocation feature (#3517)
* SDK - Removed the ArtifactLocation feature

The feature was deprecated in v0.1.34 https://github.com/kubeflow/pipelines/pull/2326

* Removed the artifact_location sample
2020-04-23 00:49:44 -07:00
Alexey Volkov 08c7c0ef36
SDK - Made YAML dumping more awesome (#3520)
See the root cause explanation in https://github.com/kubeflow/pipelines/issues/3519
2020-04-16 21:23:07 -07:00
Alexey Volkov 95aec25db4
SDK - Support kubernetes client v11 (#3319)
Fixes https://github.com/kubeflow/pipelines/issues/3275
2020-03-22 02:54:44 -07:00
Alexey Volkov 734b43e3db
SDK - Added support for maxCacheStaleness (#3318)
* SDK - Added support for maxCacheStaleness

* Added the vendor prefix to the annotation
2020-03-20 13:38:09 -07:00
Alexey Volkov 03e064cee2
SDK - Compiler - Fix incompatibility with python3.5 (#3122) 2020-02-19 13:55:47 -08:00
Alexey Volkov a33ae25bc4
SDK - Compiler - Add optional Argo validation (#3094)
argo CLI tool must be in path for this feature to work
2020-02-18 23:12:25 -08:00
Alexey Volkov 4a1b282461
SDK - Compiler - Fixed ParallelFor argument resolving (#3029)
* SDK - Compiler - Fixed ParallelFor name clashes

The ParallelFor argument reference resolving was really broken.
The logic "worked" like this - of the name of the referenced output
contained the name of the loop collection source output, then it was
considered to be the reference to the loop item.
This broke lots of scenarios especially in cases where there were
multiple components with same output name (e.g. the default "Output"
output name). The logic also did not distinguish between references to
the loop collection item vs. references to the loop collection source
itself.

I've rewritten the argument resolving logic, to fix the issues.

* Argo cannot use {{item}} when withParams items are dicts

* Stabilize the loop template names

* Renamed the test case
2020-02-11 12:18:09 -08:00
Alexey Volkov c83aff2738
SDK - Components - Made it easier to access component spec classes (#2860)
* SDK - Components - Made it easier to access component spec classes

* Updated the imports
2020-01-31 11:41:21 -08:00
Jiaxiao Zheng 358e26adb1 [SDK/compiler] Sanitize op name for PipelineParam (#2711)
* sanitize op name for pipeline param

* refactor sanitization to compiler level, and add unittest
2019-12-27 18:01:39 -08:00
Alexey Volkov b8a2e6f400 SDK/Compiler - Preventing pipeline entrypoint template name from clashing with other template names (#1555)
Case exhibiting the problem:
```
def add(a, b):
    ...
@dsl.pipeline(name="add')
def some_name():
    add(...)
```
2019-12-05 18:08:49 -08:00
Jiaxiao Zheng 790fe99aca [SDK] Relax k8s sanitization (#2634)
* update

* add allow_capital

* fix

* fix volume_ops sample

* fix pipeline name sanitization

* fix unittests

* fix sanitization in _client.py

* fix component output sanitization
2019-11-26 10:28:10 -08:00
Lulu Cheng 07296bc5ba [fix] default yaml.dump to block style (#2591)
* [fix] default every field to block style

* [change] per comment

* [fix] per comment
2019-11-18 18:55:41 -08:00
Jiaxiao Zheng ead912c6f8 [SDK] Fix withItem loop (#2572)
* fix withItem

* clean up and revert sample change

* clean up

* clean up

* clean up

* clean up

* fix

* fix nit
2019-11-07 18:40:19 -08:00
Alexey Volkov dd1bb7fe57 SDK - Compiler - Fixed failures on Jinja placeholders (#2522)
Also fixes handling other cases with double curly braces.
2019-10-31 19:03:24 -07:00
Alexey Volkov 735e627a03 SDK - Refactoring - Split the K8sHelper class (#2333)
* SDK - Refactoring - Split the K8sHelper class

One part was only used by container builder and provided higher-level API over K8s Client.
Another was used by the compiler and did not use the kubernetes library.

* Updated the license year.
2019-10-21 14:57:22 -07:00
Alexey Volkov 1b6047aa69 SDK - Improve errors when ContainerOp.output is unavailable (#1578)
* SDK - Improve errors when ContainerOp.output is unavailable

ContainerOp.output is only available when there is only one output.
Right now, when there are multiple outputs it just holds `None` instead of the a task output reference.
In this case however it's indistinguishable from just passing None argument.
This PR gives a quick fix to make accessing the nonexistent `.output` a compile-time error.

* Fixed the implementation and added tests

* Trigger retests
2019-10-11 18:20:40 -07:00
Alexey Volkov 132940be45 SDK - Compiler - Added the component spec annotations to the compiled workflow (#2323)
Currently most of the component spec information (original input/output names, types, descriptions) is lost during the compilation.
This change adds that data to the workflow as annotation.
This will make it possible for the Frontend (and other services) to get the full component specs information and use it to improve the UX.
2019-10-07 19:31:11 -07:00
deepio-oc 6e4f423e7f SDK - Compiler - Fix bugs in the data passing rewriter (#2297)
* fix for #2295

* fix for #827
2019-10-07 17:03:47 -07:00
Alexey Volkov 181de66cf9 SDK - Compiler - Move Argo volume specifications to templates (#2229)
* SDK - Compiler - Move volumes to templates

Argo v2.3.0+ supports per-template volume specs similiar to Kubernetes. Prior to version 2.3.0 Argo only supported workflow-level volume specs.
We had several outstanding issues caused by the need to put all volumes in the same place.
There was also the issue with input parameter reference placeholders in volume specifications which were placed outside their home templates declaring the inputs.

 This change fixes those issues.

* Removed dead code line
2019-10-07 16:55:12 -07:00
Jiaxiao Zheng 092845d134 [SDK/Compiler] Add _create_and_write_workflow method (#2321)
* add _create_and_write_workflow

* Add pointer to TFX dag runner usage.
2019-10-07 14:13:10 -07:00
Alexey Volkov 4b33f1b550 SDK - Compiler - Fixed deprecation warning when calling compile (#2303) 2019-10-04 13:09:12 -07:00
Jiaxiao Zheng 9a9bd904ac [SDK-compiler] Refactor Compiler to expose an API to write out yaml spec of pipeline. (#2146)
* Refactor.

* Remove redundant code.

* Fix.

* Move the implementation of create_workflow into a private api.

* Change write_workflow to private.

* deprecation warning
2019-10-03 16:45:56 -07:00
Alexey Volkov c128b2a7b4 SDK - Compiler - Make it possible to create more portable pipelines (#2271)
* SDK - Compiler - Allow creating portable pipelines

This change allows directly passing the PipelineConf instance to compiler or launcher which makes it easier to create portable pipelines by allowing the environment-specific configuration to be directly passed to the environment-specific launcher.

Background:
PipelineConf holds all pipeline-level configuration including `op_transformers`, `image_pull_secrets` etc. Some of these are specific to particular execution environment (e.g. GCP secret or Argo artifact location or Kubernetes-specific options).
Previously, the only way to modify `PipelineConf` was to do it inside the piepline function. That tied the pipeline function to specific execution environment (e.g. GCP, Argo or Kubernetes)

Solution: This change allows directly passing the PipelineConf instance to compiler or launcher. This allows writing portable enlauncher and environment agnostic pipeline functions. All environment-specific configurations can be moved to launching stage.

Before:
```python
# Defining pipeline
def my_pipeline():
    # portable pipeline code

    dsl.get_pipeline_conf().add_op_transformer(gcp.use_gcp_secret('user-gcp-sa'))

# Launching pipeline
kfp.Clinet().create_run_from_pipeline_func(my_pipeline, arguments={})
```

After:
```python
# Defining pipeline
def my_pipeline():
    # portable pipeline code

# Launching pipeline
pipeline_conf = dsl.PipelineConf()
pipeline_conf.add_op_transformer(gcp.use_gcp_secret('user-gcp-sa'))
kfp.Clinet().create_run_from_pipeline_func(my_pipeline, arguments={}, pipeline_conf=pipeline_conf)
```

After 2 *(launching same portable pipeline using different launchers):
```python
# Loading portable pipeline
from portable_pipeline import my_pipeline

# Launching pipeline on Kubeflow
pipeline_conf = dsl.PipelineConf()
pipeline_conf.add_op_transformer(gcp.use_gcp_secret('user-gcp-sa'))
kfp.Clinet().create_run_from_pipeline_func(my_pipeline, arguments={}, pipeline_conf=pipeline_conf)

# Launching pipeline on locally (not implemented yet)
kfp.run_pipeline_func_locally(my_pipeline, arguments={})
```

* Added parameter docstring
2019-10-02 20:58:08 -07:00
Alexey Volkov 66be0161be
SDK - Compiler - Fixed small bug in data passing rewriter (#2259) 2019-09-30 21:45:27 -07:00
Timur Solovev 389b585de1 SDK: fix label check for ContainerOP entities (#2243) 2019-09-26 13:13:35 -07:00
Alexey Volkov 342abae27a SDK - Moved the _container_builder from kfp.compiler to kfp.containers (#2192)
* SDK - Moved the _container_builder from kfp.compiler to kfp.containers
This only moves the files. The imports remain the same for now.

* Simplified the imports.
2019-09-25 18:27:06 -07:00
Alexey Volkov 51585a7023 SDK - Containers - Do not create GCS bucket unless building the image (#1938)
Also missing default image name is no longer an error as long as the image name is provided to ContainerBuilder.build.
2019-09-24 20:21:58 -07:00
Alexey Volkov ef63c653af SDK - Compiler - Fix large data passing (#2173)
* SDK - Compiler - Fix large data passing

Stop outputting parameters unless they're consumed as parameters downstream.
This prevents the situaltion when component outputs a big file, but DSL compiler instructs Argo to pick it up as parameter (parameters only hold few kilobytes of data).

As byproduct, this change fixes some minor compiler data passing bugs where some parameters were being passed around, but never consumed (happened with `ResourceOp`, `dsl.Condition` and recursion).

* Replaced ... with `raise AssertionError`

* Fixed small bug

* Removed unused variables

* Fixed names of the mark_upstream_ios_of_* functions

* Fixed detection of parameter output references

* Fixed handling of volumes
2019-09-20 15:05:27 -07:00
Alexey Volkov dd071d39fa SDK - Containers - Raise exception on job failure (#2144) 2019-09-17 17:07:16 -07:00
Alexey Volkov eae37fba33 SDK - Components - Fixed build_python_component (#2143) 2019-09-17 14:45:15 -07:00
Alexey Volkov e3c72fc251 SDK - Persisting all output values (#2134)
Currently, the parameter output values are not saved to storage and their values are lost as soon as garbage collector removes the workflow object.
This change makes is so the parameter output values are persisted.
2019-09-16 19:44:24 -07:00
Alexey Volkov 0e2bf15dbc
SDK - Refactoring - Replaced the *Meta classes with the *Spec classes (#1944)
* SDK - Refactoring - Replaced the ParameterMeta class with InputSpec and OutputSpec

* SDK - Refactoring - Replaced the internal PipelineMeta class with ComponentSpec

* SDK - Refactoring - Replaced the internal ComponentMeta class with ComponentSpec

* SDK - Refactoring - Replaced the *Meta classes with the *Spec classes

Replaced the ComponentMeta class with ComponentSpec
Replaced the PipelineMeta class with ComponentSpec
Replaced the ParameterMeta class with InputSpec and OutputSpec

* Removed empty fields
2019-09-16 18:41:12 -07:00
Kevin Bache 2ca7d0ac31 WithParams (#2044)
* first working commit

* incrememtal commit

* in the middle of converting loop args constructor to accept pipeline param

* both cases working

* output works, passed doesn't

* about to redo compiler section

* rewrite draft done

* added withparam tests

* removed sdk/python/comp.yaml

* minor

* subvars work

* more tests

* removed unneeded artifact outputs from test yaml

* sort keys

* removed dead artifact code
2019-09-16 17:58:22 -07:00
Ning fae0361fbf fix bug: list is not expecting keyword arg (#2107) 2019-09-13 13:54:29 -07:00
Jiaxiao Zheng 1449d08aee Fix the logic of passing default values of pipeline parameters. (#2098)
* Fix the logic of passing default values.

* Modify unit test

* Solve.
2019-09-12 17:10:33 -07:00
Jiaxiao Zheng 497d016e85 Expose an API for appending params/names/descriptions in a programmable way. (#2082)
* Refactor. Expose a public API to append pipeline param without interacting with dsl.Pipeline obj.

* Add unit test and fix.

* Fix docstring.

* Fix test

* Fix test

* Fix two nit problems

* Refactor
2019-09-10 17:58:47 -07:00
Alexey Volkov d83601d19a SDK - Compiler - Quoting the predicate operands (#2043)
Fixes https://github.com/kubeflow/pipelines/issues/1950
2019-09-06 17:05:21 -07:00
Alexey Volkov 979396702e SDK - Compiler - Failing when PipelineParam is unresolved (#2055)
Instead of silently producing a broken pipeline package, the compiler now raises error and instructs the user to submit a bug report.
2019-09-06 15:51:20 -07:00
Alexey Volkov 08104d6cf9 SDK - Containers - Build python container image based on current working directory (#1970)
* SDK - Containers - Build container image from current environment

* Removed the ability to capture the active python environment (as requested by @hongye-sun)

* Added the type hint and docstring to for the return type.

* Renamed `build_image_from_env` function to `build_image_from_working_dir`
as requested by @hongye-sun

* Explained the function behavior in the documentation.

* Removed extra empty line

* Improved caching by copying python files only after installing python packages

* Made test more portable

* Added support for specifying the base_image
`kfp.containers.default_base_image = ...`
The image can also be a callable returning the image name.

* Renamed `get_python_image` to `get_python_image_for_current_version`

* Switched the default base image to Google Deep Learning container image as requested by @hongye-sun
The size of this image is 4.35GB which really concerns me. The GPU image size is 6.45GB.

* Stopped importing kfp.containers.* into kfp.*

* Fixed test

* Fixed the regex string

* Fixed the type annotation style

* Addressed @hongye-sun feedback

* Removed the container image size warning

* Fixed import failure
2019-09-06 15:19:19 -07:00
Jiaxiao Zheng bd9d6319c8
Refactor kfp.compiler for better modularity (#2052)
* init analyze

* Refactor

* Renaming
2019-09-06 13:52:23 -07:00
Alexey Volkov 5360f3fcab SDK - Compiler - Stopped adding mlpipeline artifacts to every compiled template (#2046)
* Explicitly added mlpipeline outputs to the components that actually produce them

* Updated samples

* SDK - DSL - Stopped adding mlpipeline artifacts to every compiled template
Fixes https://github.com/kubeflow/pipelines/issues/1421
Fixes https://github.com/kubeflow/pipelines/issues/1422

* Updated the Lighweight sample

* Updated the compiler tests

* Fixed the lightweight sample

* Reverted the change to one contrib/samples/openvino
The sample will still work fine as it is now.
I'll add the change to that file as a separate PR.
2019-09-05 17:56:57 -07:00
Alexey Volkov f911742d1a SDK - Compiler - Fixed handling of PipelineParams in artifact arguments (#2042)
Previously only constant strings were supported and serialized PipelineParams were not resolved, producing incorrect workflows.
2019-09-05 15:16:58 -07:00
Alexey Volkov 301186cc87 SDK - Refactoring - Reduced the usage of dsl.Pipeline context (#2034)
Also reduced the unnecessary explicit usage of PipelineParam bu the end users
2019-09-05 01:26:52 -07:00