805 lines
31 KiB
Plaintext
805 lines
31 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# KubeFlow Pipeline DSL Static Type Checking\n",
|
|
"\n",
|
|
"In this notebook, we will demo: \n",
|
|
"\n",
|
|
"* Defining a KubeFlow pipeline with Python DSL\n",
|
|
"* Compile the pipeline with type checking\n",
|
|
"\n",
|
|
"Static type checking helps users to identify component I/O inconsistencies without running the pipeline. It also shortens the development cycles by catching the errors early. This feature is especially useful in two cases: 1) when the pipeline is huge and manually checking the types is infeasible; 2) when some components are shared ones and the type information is not immediately avaiable to the pipeline authors.\n",
|
|
"\n",
|
|
"Since this sample focuses on the DSL type checking, we will use components that are not runnable in the system but with various type checking scenarios. \n",
|
|
"\n",
|
|
"## Component definition\n",
|
|
"Components can be defined in either YAML or functions decorated by dsl.component.\n",
|
|
"\n",
|
|
"## Type definition\n",
|
|
"Types can be defined as string or a dictionary with the openapi_schema_validator property formatted as:\n",
|
|
"```yaml\n",
|
|
"{\n",
|
|
" type_name: {\n",
|
|
" openapi_schema_validator: {\n",
|
|
" }\n",
|
|
" }\n",
|
|
"}\n",
|
|
"```\n",
|
|
"For example, the following yaml declares a GCSPath type with the openapi_schema_validator for output field_m.\n",
|
|
"The type could also be a plain string, such as the GcsUri. The type name could be either one of the core types or customized ones.\n",
|
|
"```yaml\n",
|
|
"name: component a\n",
|
|
"description: component a desc\n",
|
|
"inputs:\n",
|
|
" - {name: field_l, type: Integer}\n",
|
|
"outputs:\n",
|
|
" - {name: field_m, type: {GCSPath: {openapi_schema_validator: {type: string, pattern: \"^gs://.*$\" } }}}\n",
|
|
" - {name: field_n, type: customized_type}\n",
|
|
" - {name: field_o, type: GcsUri} \n",
|
|
"implementation:\n",
|
|
" container:\n",
|
|
" image: gcr.io/ml-pipeline/component-a\n",
|
|
" command: [python3, /pipelines/component/src/train.py]\n",
|
|
" args: [\n",
|
|
" --field-l, {inputValue: field_l},\n",
|
|
" ]\n",
|
|
" fileOutputs: \n",
|
|
" field_m: /schema.txt\n",
|
|
" field_n: /feature.txt\n",
|
|
" field_o: /output.txt\n",
|
|
"```\n",
|
|
"\n",
|
|
"If you define the component using the function decorator, there are a list of [core types](https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/dsl/types.py).\n",
|
|
"For example, the following component declares a core type Integer for input field_l while\n",
|
|
"declares customized_type for its output field_n.\n",
|
|
"\n",
|
|
"```python\n",
|
|
"@component\n",
|
|
"def task_factory_a(field_l: Integer()) -> {'field_m': {'GCSPath': {'openapi_schema_validator': '{\"type\": \"string\", \"pattern\": \"^gs://.*$\"}'}}, \n",
|
|
" 'field_n': 'customized_type',\n",
|
|
" 'field_o': 'Integer'\n",
|
|
" }:\n",
|
|
" return ContainerOp(\n",
|
|
" name = 'operator a',\n",
|
|
" image = 'gcr.io/ml-pipeline/component-a',\n",
|
|
" arguments = [\n",
|
|
" '--field-l', field_l,\n",
|
|
" ],\n",
|
|
" file_outputs = {\n",
|
|
" 'field_m': '/schema.txt',\n",
|
|
" 'field_n': '/feature.txt',\n",
|
|
" 'field_o': '/output.txt'\n",
|
|
" }\n",
|
|
" )\n",
|
|
"```\n",
|
|
"\n",
|
|
"## Type check switch\n",
|
|
"Type checking is enabled by default. It can be disabled as --disable-type-check argument if dsl-compile is run in the command line, or `dsl.compiler.Compiler().compile(type_check=False)`.\n",
|
|
"\n",
|
|
"If one wants to ignore the type for one parameter, call ignore_type() function in [PipelineParam](https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/dsl/_pipeline_param.py).\n",
|
|
"\n",
|
|
"## How does type checking work?\n",
|
|
"DSL compiler checks the type consistencies among components by checking the type_name as well as the openapi_schema_validator. Some special cases are listed here:\n",
|
|
"1. Type checking succeed: If the upstream/downstream components lack the type information.\n",
|
|
"2. Type checking succeed: If the type check is disabled.\n",
|
|
"3. Type checking succeed: If the parameter type is ignored."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Setup"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {
|
|
"tags": [
|
|
"parameters"
|
|
]
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Configure the KFP_PACKAGE\n",
|
|
"KFP_PACKAGE = 'https://storage.googleapis.com/ml-pipeline/release/0.1.16/kfp.tar.gz'"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"scrolled": false
|
|
},
|
|
"source": [
|
|
"## Install Pipeline SDK"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Collecting https://storage.googleapis.com/ml-pipeline/release/0.1.12/kfp-experiment.tar.gz\n",
|
|
" Using cached https://storage.googleapis.com/ml-pipeline/release/0.1.12/kfp-experiment.tar.gz\n",
|
|
"Requirement already satisfied, skipping upgrade: urllib3>=1.15 in /opt/conda/lib/python3.6/site-packages (from kfp==0.1) (1.22)\n",
|
|
"Requirement already satisfied, skipping upgrade: six>=1.10 in /opt/conda/lib/python3.6/site-packages (from kfp==0.1) (1.11.0)\n",
|
|
"Requirement already satisfied, skipping upgrade: certifi in /opt/conda/lib/python3.6/site-packages (from kfp==0.1) (2018.11.29)\n",
|
|
"Requirement already satisfied, skipping upgrade: python-dateutil in /opt/conda/lib/python3.6/site-packages (from kfp==0.1) (2.7.5)\n",
|
|
"Requirement already satisfied, skipping upgrade: PyYAML in /opt/conda/lib/python3.6/site-packages (from kfp==0.1) (3.13)\n",
|
|
"Requirement already satisfied, skipping upgrade: google-cloud-storage==1.13.0 in /opt/conda/lib/python3.6/site-packages (from kfp==0.1) (1.13.0)\n",
|
|
"Requirement already satisfied, skipping upgrade: kubernetes==8.0.0 in /opt/conda/lib/python3.6/site-packages (from kfp==0.1) (8.0.0)\n",
|
|
"Requirement already satisfied, skipping upgrade: PyJWT==1.6.4 in /opt/conda/lib/python3.6/site-packages (from kfp==0.1) (1.6.4)\n",
|
|
"Requirement already satisfied, skipping upgrade: cryptography==2.4.2 in /opt/conda/lib/python3.6/site-packages (from kfp==0.1) (2.4.2)\n",
|
|
"Requirement already satisfied, skipping upgrade: google-auth==1.6.1 in /opt/conda/lib/python3.6/site-packages (from kfp==0.1) (1.6.1)\n",
|
|
"Requirement already satisfied, skipping upgrade: requests_toolbelt==0.8.0 in /opt/conda/lib/python3.6/site-packages (from kfp==0.1) (0.8.0)\n",
|
|
"Requirement already satisfied, skipping upgrade: google-resumable-media>=0.3.1 in /opt/conda/lib/python3.6/site-packages (from google-cloud-storage==1.13.0->kfp==0.1) (0.3.1)\n",
|
|
"Requirement already satisfied, skipping upgrade: google-cloud-core<0.29dev,>=0.28.0 in /opt/conda/lib/python3.6/site-packages (from google-cloud-storage==1.13.0->kfp==0.1) (0.28.1)\n",
|
|
"Requirement already satisfied, skipping upgrade: google-api-core<2.0.0dev,>=0.1.1 in /opt/conda/lib/python3.6/site-packages (from google-cloud-storage==1.13.0->kfp==0.1) (1.6.0)\n",
|
|
"Requirement already satisfied, skipping upgrade: adal>=1.0.2 in /opt/conda/lib/python3.6/site-packages (from kubernetes==8.0.0->kfp==0.1) (1.2.1)\n",
|
|
"Requirement already satisfied, skipping upgrade: websocket-client!=0.40.0,!=0.41.*,!=0.42.*,>=0.32.0 in /opt/conda/lib/python3.6/site-packages (from kubernetes==8.0.0->kfp==0.1) (0.54.0)\n",
|
|
"Requirement already satisfied, skipping upgrade: requests-oauthlib in /opt/conda/lib/python3.6/site-packages (from kubernetes==8.0.0->kfp==0.1) (1.0.0)\n",
|
|
"Requirement already satisfied, skipping upgrade: setuptools>=21.0.0 in /opt/conda/lib/python3.6/site-packages (from kubernetes==8.0.0->kfp==0.1) (38.4.0)\n",
|
|
"Requirement already satisfied, skipping upgrade: requests in /opt/conda/lib/python3.6/site-packages (from kubernetes==8.0.0->kfp==0.1) (2.18.4)\n",
|
|
"Requirement already satisfied, skipping upgrade: asn1crypto>=0.21.0 in /opt/conda/lib/python3.6/site-packages (from cryptography==2.4.2->kfp==0.1) (0.24.0)\n",
|
|
"Requirement already satisfied, skipping upgrade: idna>=2.1 in /opt/conda/lib/python3.6/site-packages (from cryptography==2.4.2->kfp==0.1) (2.6)\n",
|
|
"Requirement already satisfied, skipping upgrade: cffi!=1.11.3,>=1.7 in /opt/conda/lib/python3.6/site-packages (from cryptography==2.4.2->kfp==0.1) (1.11.4)\n",
|
|
"Requirement already satisfied, skipping upgrade: pyasn1-modules>=0.2.1 in /opt/conda/lib/python3.6/site-packages (from google-auth==1.6.1->kfp==0.1) (0.2.2)\n",
|
|
"Requirement already satisfied, skipping upgrade: rsa>=3.1.4 in /opt/conda/lib/python3.6/site-packages (from google-auth==1.6.1->kfp==0.1) (4.0)\n",
|
|
"Requirement already satisfied, skipping upgrade: cachetools>=2.0.0 in /opt/conda/lib/python3.6/site-packages (from google-auth==1.6.1->kfp==0.1) (3.0.0)\n",
|
|
"Requirement already satisfied, skipping upgrade: pytz in /opt/conda/lib/python3.6/site-packages (from google-api-core<2.0.0dev,>=0.1.1->google-cloud-storage==1.13.0->kfp==0.1) (2018.7)\n",
|
|
"Requirement already satisfied, skipping upgrade: googleapis-common-protos!=1.5.4,<2.0dev,>=1.5.3 in /opt/conda/lib/python3.6/site-packages (from google-api-core<2.0.0dev,>=0.1.1->google-cloud-storage==1.13.0->kfp==0.1) (1.5.5)\n",
|
|
"Requirement already satisfied, skipping upgrade: protobuf>=3.4.0 in /opt/conda/lib/python3.6/site-packages (from google-api-core<2.0.0dev,>=0.1.1->google-cloud-storage==1.13.0->kfp==0.1) (3.6.1)\n",
|
|
"Requirement already satisfied, skipping upgrade: oauthlib>=0.6.2 in /opt/conda/lib/python3.6/site-packages (from requests-oauthlib->kubernetes==8.0.0->kfp==0.1) (2.1.0)\n",
|
|
"Requirement already satisfied, skipping upgrade: chardet<3.1.0,>=3.0.2 in /opt/conda/lib/python3.6/site-packages (from requests->kubernetes==8.0.0->kfp==0.1) (3.0.4)\n",
|
|
"Requirement already satisfied, skipping upgrade: pycparser in /opt/conda/lib/python3.6/site-packages (from cffi!=1.11.3,>=1.7->cryptography==2.4.2->kfp==0.1) (2.18)\n",
|
|
"Requirement already satisfied, skipping upgrade: pyasn1<0.5.0,>=0.4.1 in /opt/conda/lib/python3.6/site-packages (from pyasn1-modules>=0.2.1->google-auth==1.6.1->kfp==0.1) (0.4.4)\n",
|
|
"Building wheels for collected packages: kfp\n",
|
|
" Running setup.py bdist_wheel for kfp ... \u001b[?25ldone\n",
|
|
"\u001b[?25h Stored in directory: /home/jovyan/.cache/pip/wheels/06/14/fc/dd58bcc821d8067efa74a9e217db214d8a075c6b5d31ff24cf\n",
|
|
"Successfully built kfp\n",
|
|
"Installing collected packages: kfp\n",
|
|
" Found existing installation: kfp 0.1\n",
|
|
" Uninstalling kfp-0.1:\n",
|
|
" Successfully uninstalled kfp-0.1\n",
|
|
"Successfully installed kfp-0.1\n",
|
|
"\u001b[33mYou are using pip version 18.1, however version 19.0.3 is available.\n",
|
|
"You should consider upgrading via the 'pip install --upgrade pip' command.\u001b[0m\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"!pip3 install $KFP_PACKAGE --upgrade"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Type Check with YAML components: successful scenario"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Author components in YAML"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# In yaml, one can optionally add the type information to both inputs and outputs.\n",
|
|
"# There are two ways to define the types: string or a dictionary with the openapi_schema_validator property.\n",
|
|
"# The openapi_schema_validator is a json schema object that describes schema of the parameter value.\n",
|
|
"component_a = '''\\\n",
|
|
"name: component a\n",
|
|
"description: component a desc\n",
|
|
"inputs:\n",
|
|
" - {name: field_l, type: Integer}\n",
|
|
"outputs:\n",
|
|
" - {name: field_m, type: {GCSPath: {openapi_schema_validator: {type: string, pattern: \"^gs://.*$\" } }}}\n",
|
|
" - {name: field_n, type: customized_type}\n",
|
|
" - {name: field_o, type: GcsUri} \n",
|
|
"implementation:\n",
|
|
" container:\n",
|
|
" image: gcr.io/ml-pipeline/component-a\n",
|
|
" command: [python3, /pipelines/component/src/train.py]\n",
|
|
" args: [\n",
|
|
" --field-l, {inputValue: field_l},\n",
|
|
" ]\n",
|
|
" fileOutputs: \n",
|
|
" field_m: /schema.txt\n",
|
|
" field_n: /feature.txt\n",
|
|
" field_o: /output.txt\n",
|
|
"'''\n",
|
|
"component_b = '''\\\n",
|
|
"name: component b\n",
|
|
"description: component b desc\n",
|
|
"inputs:\n",
|
|
" - {name: field_x, type: customized_type}\n",
|
|
" - {name: field_y, type: GcsUri}\n",
|
|
" - {name: field_z, type: {GCSPath: {openapi_schema_validator: {type: string, pattern: \"^gs://.*$\" } }}}\n",
|
|
"outputs:\n",
|
|
" - {name: output_model_uri, type: GcsUri}\n",
|
|
"implementation:\n",
|
|
" container:\n",
|
|
" image: gcr.io/ml-pipeline/component-a\n",
|
|
" command: [python3]\n",
|
|
" args: [\n",
|
|
" --field-x, {inputValue: field_x},\n",
|
|
" --field-y, {inputValue: field_y},\n",
|
|
" --field-z, {inputValue: field_z},\n",
|
|
" ]\n",
|
|
" fileOutputs: \n",
|
|
" output_model_uri: /schema.txt\n",
|
|
"'''"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Author a pipeline with the above components"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import kfp.components as comp\n",
|
|
"import kfp.dsl as dsl\n",
|
|
"import kfp.compiler as compiler\n",
|
|
"# The components are loaded as task factories that generate container_ops.\n",
|
|
"task_factory_a = comp.load_component_from_text(text=component_a)\n",
|
|
"task_factory_b = comp.load_component_from_text(text=component_b)\n",
|
|
"\n",
|
|
"#Use the component as part of the pipeline\n",
|
|
"@dsl.pipeline(name='type_check_a',\n",
|
|
" description='')\n",
|
|
"def pipeline_a():\n",
|
|
" a = task_factory_a(field_l=12)\n",
|
|
" b = task_factory_b(field_x=a.outputs['field_n'], field_y=a.outputs['field_o'], field_z=a.outputs['field_m'])\n",
|
|
"\n",
|
|
"compiler.Compiler().compile(pipeline_a, 'pipeline_a.tar.gz', type_check=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Type Check with YAML components: failed scenario"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Author components in YAML"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# In this case, the component_a contains an output field_o as GcrUri \n",
|
|
"# but the component_b requires an input field_y as GcsUri\n",
|
|
"component_a = '''\\\n",
|
|
"name: component a\n",
|
|
"description: component a desc\n",
|
|
"inputs:\n",
|
|
" - {name: field_l, type: Integer}\n",
|
|
"outputs:\n",
|
|
" - {name: field_m, type: {GCSPath: {openapi_schema_validator: {type: string, pattern: \"^gs://.*$\" } }}}\n",
|
|
" - {name: field_n, type: customized_type}\n",
|
|
" - {name: field_o, type: GcrUri} \n",
|
|
"implementation:\n",
|
|
" container:\n",
|
|
" image: gcr.io/ml-pipeline/component-a\n",
|
|
" command: [python3, /pipelines/component/src/train.py]\n",
|
|
" args: [\n",
|
|
" --field-l, {inputValue: field_l},\n",
|
|
" ]\n",
|
|
" fileOutputs: \n",
|
|
" field_m: /schema.txt\n",
|
|
" field_n: /feature.txt\n",
|
|
" field_o: /output.txt\n",
|
|
"'''\n",
|
|
"component_b = '''\\\n",
|
|
"name: component b\n",
|
|
"description: component b desc\n",
|
|
"inputs:\n",
|
|
" - {name: field_x, type: customized_type}\n",
|
|
" - {name: field_y, type: GcsUri}\n",
|
|
" - {name: field_z, type: {GCSPath: {openapi_schema_validator: {type: string, pattern: \"^gs://.*$\" } }}}\n",
|
|
"outputs:\n",
|
|
" - {name: output_model_uri, type: GcsUri}\n",
|
|
"implementation:\n",
|
|
" container:\n",
|
|
" image: gcr.io/ml-pipeline/component-a\n",
|
|
" command: [python3]\n",
|
|
" args: [\n",
|
|
" --field-x, {inputValue: field_x},\n",
|
|
" --field-y, {inputValue: field_y},\n",
|
|
" --field-z, {inputValue: field_z},\n",
|
|
" ]\n",
|
|
" fileOutputs: \n",
|
|
" output_model_uri: /schema.txt\n",
|
|
"'''"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Author a pipeline with the above components"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"type name GcrUri is different from expected: GcsUri\n",
|
|
"Component \"component b\" is expecting field_y to be type(GcsUri), but the passed argument is type(GcrUri)\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"import kfp.components as comp\n",
|
|
"import kfp.dsl as dsl\n",
|
|
"import kfp.compiler as compiler\n",
|
|
"from kfp.dsl.types import InconsistentTypeException\n",
|
|
"task_factory_a = comp.load_component_from_text(text=component_a)\n",
|
|
"task_factory_b = comp.load_component_from_text(text=component_b)\n",
|
|
"\n",
|
|
"#Use the component as part of the pipeline\n",
|
|
"@dsl.pipeline(name='type_check_b',\n",
|
|
" description='')\n",
|
|
"def pipeline_b():\n",
|
|
" a = task_factory_a(field_l=12)\n",
|
|
" b = task_factory_b(field_x=a.outputs['field_n'], field_y=a.outputs['field_o'], field_z=a.outputs['field_m'])\n",
|
|
"\n",
|
|
"try:\n",
|
|
" compiler.Compiler().compile(pipeline_b, 'pipeline_b.tar.gz', type_check=True)\n",
|
|
"except InconsistentTypeException as e:\n",
|
|
" print(e)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Author a pipeline with the above components but type checking disabled."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Disable the type_check\n",
|
|
"compiler.Compiler().compile(pipeline_b, 'pipeline_b.tar.gz', type_check=False)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Type Check with decorated components: successful scenario"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Author components with decorator"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from kfp.dsl import component\n",
|
|
"from kfp.dsl.types import Integer, GCSPath\n",
|
|
"from kfp.dsl import ContainerOp\n",
|
|
"# when components are defined based on the component decorator,\n",
|
|
"# the type information is annotated to the input or function returns.\n",
|
|
"# There are two ways to define the type: string or a dictionary with the openapi_schema_validator property\n",
|
|
"@component\n",
|
|
"def task_factory_a(field_l: Integer()) -> {'field_m': {'GCSPath': {'openapi_schema_validator': '{\"type\": \"string\", \"pattern\": \"^gs://.*$\"}'}}, \n",
|
|
" 'field_n': 'customized_type',\n",
|
|
" 'field_o': 'Integer'\n",
|
|
" }:\n",
|
|
" return ContainerOp(\n",
|
|
" name = 'operator a',\n",
|
|
" image = 'gcr.io/ml-pipeline/component-a',\n",
|
|
" arguments = [\n",
|
|
" '--field-l', field_l,\n",
|
|
" ],\n",
|
|
" file_outputs = {\n",
|
|
" 'field_m': '/schema.txt',\n",
|
|
" 'field_n': '/feature.txt',\n",
|
|
" 'field_o': '/output.txt'\n",
|
|
" }\n",
|
|
" )\n",
|
|
"\n",
|
|
"# Users can also use the core types that are pre-defined in the SDK.\n",
|
|
"# For a full list of core types, check out: https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/dsl/types.py\n",
|
|
"@component\n",
|
|
"def task_factory_b(field_x: 'customized_type',\n",
|
|
" field_y: Integer(),\n",
|
|
" field_z: {'GCSPath': {'openapi_schema_validator': '{\"type\": \"string\", \"pattern\": \"^gs://.*$\"}'}}) -> {'output_model_uri': 'GcsUri'}:\n",
|
|
" return ContainerOp(\n",
|
|
" name = 'operator b',\n",
|
|
" image = 'gcr.io/ml-pipeline/component-a',\n",
|
|
" command = [\n",
|
|
" 'python3',\n",
|
|
" field_x,\n",
|
|
" ],\n",
|
|
" arguments = [\n",
|
|
" '--field-y', field_y,\n",
|
|
" '--field-z', field_z,\n",
|
|
" ],\n",
|
|
" file_outputs = {\n",
|
|
" 'output_model_uri': '/schema.txt',\n",
|
|
" }\n",
|
|
" )"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Author a pipeline with the above components"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#Use the component as part of the pipeline\n",
|
|
"@dsl.pipeline(name='type_check_c',\n",
|
|
" description='')\n",
|
|
"def pipeline_c():\n",
|
|
" a = task_factory_a(field_l=12)\n",
|
|
" b = task_factory_b(field_x=a.outputs['field_n'], field_y=a.outputs['field_o'], field_z=a.outputs['field_m'])\n",
|
|
"\n",
|
|
"compiler.Compiler().compile(pipeline_c, 'pipeline_c.tar.gz', type_check=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Type Check with decorated components: failure scenario"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Author components with decorator"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from kfp.dsl import component\n",
|
|
"from kfp.dsl.types import Integer, GCSPath\n",
|
|
"from kfp.dsl import ContainerOp\n",
|
|
"# task_factory_a outputs an input field_m with the openapi_schema_validator different\n",
|
|
"# from the task_factory_b's input field_z.\n",
|
|
"# One is gs:// and the other is gcs://\n",
|
|
"@component\n",
|
|
"def task_factory_a(field_l: Integer()) -> {'field_m': {'GCSPath': {'openapi_schema_validator': '{\"type\": \"string\", \"pattern\": \"^gs://.*$\"}'}}, \n",
|
|
" 'field_n': 'customized_type',\n",
|
|
" 'field_o': 'Integer'\n",
|
|
" }:\n",
|
|
" return ContainerOp(\n",
|
|
" name = 'operator a',\n",
|
|
" image = 'gcr.io/ml-pipeline/component-a',\n",
|
|
" arguments = [\n",
|
|
" '--field-l', field_l,\n",
|
|
" ],\n",
|
|
" file_outputs = {\n",
|
|
" 'field_m': '/schema.txt',\n",
|
|
" 'field_n': '/feature.txt',\n",
|
|
" 'field_o': '/output.txt'\n",
|
|
" }\n",
|
|
" )\n",
|
|
"\n",
|
|
"@component\n",
|
|
"def task_factory_b(field_x: 'customized_type',\n",
|
|
" field_y: Integer(),\n",
|
|
" field_z: {'GCSPath': {'openapi_schema_validator': '{\"type\": \"string\", \"pattern\": \"^gcs://.*$\"}'}}) -> {'output_model_uri': 'GcsUri'}:\n",
|
|
" return ContainerOp(\n",
|
|
" name = 'operator b',\n",
|
|
" image = 'gcr.io/ml-pipeline/component-a',\n",
|
|
" command = [\n",
|
|
" 'python3',\n",
|
|
" field_x,\n",
|
|
" ],\n",
|
|
" arguments = [\n",
|
|
" '--field-y', field_y,\n",
|
|
" '--field-z', field_z,\n",
|
|
" ],\n",
|
|
" file_outputs = {\n",
|
|
" 'output_model_uri': '/schema.txt',\n",
|
|
" }\n",
|
|
" )"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Author a pipeline with the above components"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"GCSPath has a property openapi_schema_validator with value: {\"type\": \"string\", \"pattern\": \"^gs://.*$\"} and {\"type\": \"string\", \"pattern\": \"^gcs://.*$\"}\n",
|
|
"Component \"task_factory_b\" is expecting field_z to be type({'GCSPath': {'openapi_schema_validator': '{\"type\": \"string\", \"pattern\": \"^gcs://.*$\"}'}}), but the passed argument is type({'GCSPath': {'openapi_schema_validator': '{\"type\": \"string\", \"pattern\": \"^gs://.*$\"}'}})\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"#Use the component as part of the pipeline\n",
|
|
"@dsl.pipeline(name='type_check_d',\n",
|
|
" description='')\n",
|
|
"def pipeline_d():\n",
|
|
" a = task_factory_a(field_l=12)\n",
|
|
" b = task_factory_b(field_x=a.outputs['field_n'], field_y=a.outputs['field_o'], field_z=a.outputs['field_m'])\n",
|
|
"\n",
|
|
"try:\n",
|
|
" compiler.Compiler().compile(pipeline_d, 'pipeline_d.tar.gz', type_check=True)\n",
|
|
"except InconsistentTypeException as e:\n",
|
|
" print(e)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Author a pipeline with the above components but ignoring types."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#Use the component as part of the pipeline\n",
|
|
"@dsl.pipeline(name='type_check_d',\n",
|
|
" description='')\n",
|
|
"def pipeline_d():\n",
|
|
" a = task_factory_a(field_l=12)\n",
|
|
" # For each of the arguments, authors can also ignore the types by calling ignore_type function.\n",
|
|
" b = task_factory_b(field_x=a.outputs['field_n'], field_y=a.outputs['field_o'], field_z=a.outputs['field_m'].ignore_type())\n",
|
|
"compiler.Compiler().compile(pipeline_d, 'pipeline_d.tar.gz', type_check=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Type Check with missing type information"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Author components(with missing types)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from kfp.dsl import component\n",
|
|
"from kfp.dsl.types import Integer, GCSPath\n",
|
|
"from kfp.dsl import ContainerOp\n",
|
|
"# task_factory_a lacks the type information for output filed_n\n",
|
|
"# task_factory_b lacks the type information for input field_y\n",
|
|
"# When no type information is provided, it matches all types.\n",
|
|
"@component\n",
|
|
"def task_factory_a(field_l: Integer()) -> {'field_m': {'GCSPath': {'openapi_schema_validator': '{\"type\": \"string\", \"pattern\": \"^gs://.*$\"}'}}, \n",
|
|
" 'field_o': 'Integer'\n",
|
|
" }:\n",
|
|
" return ContainerOp(\n",
|
|
" name = 'operator a',\n",
|
|
" image = 'gcr.io/ml-pipeline/component-a',\n",
|
|
" arguments = [\n",
|
|
" '--field-l', field_l,\n",
|
|
" ],\n",
|
|
" file_outputs = {\n",
|
|
" 'field_m': '/schema.txt',\n",
|
|
" 'field_n': '/feature.txt',\n",
|
|
" 'field_o': '/output.txt'\n",
|
|
" }\n",
|
|
" )\n",
|
|
"\n",
|
|
"@component\n",
|
|
"def task_factory_b(field_x: 'customized_type',\n",
|
|
" field_y,\n",
|
|
" field_z: {'GCSPath': {'openapi_schema_validator': '{\"type\": \"string\", \"pattern\": \"^gs://.*$\"}'}}) -> {'output_model_uri': 'GcsUri'}:\n",
|
|
" return ContainerOp(\n",
|
|
" name = 'operator b',\n",
|
|
" image = 'gcr.io/ml-pipeline/component-a',\n",
|
|
" command = [\n",
|
|
" 'python3',\n",
|
|
" field_x,\n",
|
|
" ],\n",
|
|
" arguments = [\n",
|
|
" '--field-y', field_y,\n",
|
|
" '--field-z', field_z,\n",
|
|
" ],\n",
|
|
" file_outputs = {\n",
|
|
" 'output_model_uri': '/schema.txt',\n",
|
|
" }\n",
|
|
" )"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Author a pipeline with the above components"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#Use the component as part of the pipeline\n",
|
|
"@dsl.pipeline(name='type_check_e',\n",
|
|
" description='')\n",
|
|
"def pipeline_e():\n",
|
|
" a = task_factory_a(field_l=12)\n",
|
|
" b = task_factory_b(field_x=a.outputs['field_n'], field_y=a.outputs['field_o'], field_z=a.outputs['field_m'])\n",
|
|
"\n",
|
|
"compiler.Compiler().compile(pipeline_e, 'pipeline_e.tar.gz', type_check=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Type Check with both named arguments and positional arguments"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#Use the component as part of the pipeline\n",
|
|
"@dsl.pipeline(name='type_check_f',\n",
|
|
" description='')\n",
|
|
"def pipeline_f():\n",
|
|
" a = task_factory_a(field_l=12)\n",
|
|
" b = task_factory_b(a.outputs['field_n'], a.outputs['field_o'], field_z=a.outputs['field_m'])\n",
|
|
"\n",
|
|
"compiler.Compiler().compile(pipeline_f, 'pipeline_f.tar.gz', type_check=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Type Check between pipeline parameters and component parameters"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"@component\n",
|
|
"def task_factory_a(field_m: {'GCSPath': {'openapi_schema_validator': '{\"type\": \"string\", \"pattern\": \"^gs://.*$\"}'}}, field_o: 'Integer'):\n",
|
|
" return ContainerOp(\n",
|
|
" name = 'operator a',\n",
|
|
" image = 'gcr.io/ml-pipeline/component-b',\n",
|
|
" arguments = [\n",
|
|
" '--field-l', field_m,\n",
|
|
" '--field-o', field_o,\n",
|
|
" ],\n",
|
|
" )\n",
|
|
"\n",
|
|
"# Pipeline input types are also checked against the component I/O types.\n",
|
|
"@dsl.pipeline(name='type_check_g',\n",
|
|
" description='')\n",
|
|
"def pipeline_g(a: {'GCSPath': {'openapi_schema_validator': '{\"type\": \"string\", \"pattern\": \"^gs://.*$\"}'}}='good', b: Integer()=12):\n",
|
|
" task_factory_a(field_m=a, field_o=b)\n",
|
|
"\n",
|
|
"try:\n",
|
|
" compiler.Compiler().compile(pipeline_g, 'pipeline_g.tar.gz', type_check=True)\n",
|
|
"except InconsistentTypeException as e:\n",
|
|
" print(e)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Clean up"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 17,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from pathlib import Path\n",
|
|
"for p in Path(\".\").glob(\"pipeline_[a-g].tar.gz\"):\n",
|
|
" p.unlink()"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"celltoolbar": "Tags",
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.6.5"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|