feat(components) Adds RoboMaker and SageMaker RLEstimator components (#4813)

* Adds RoboMaker and SageMaker RLEstimator components

* Genericise samples

* Genericise samples

* Adds better logging and updates shim component in samples

* Adds fixes for PR comments. Updates tests accordingly

* Adds docker image reference for integration tests. Allows for setting job_name for RLEstimator training jobs

* Separate RM and SM execution roles

* Remove README reference to VPC config items

* Adds more reliable integration test for RoboMaker Simulation Job

* Simplifies integration tests

* Reverted test container entrypoints

* Update black formatting

* Update components for redbackthomson repo

* Prefix RLEstimator job name

* Add RoboMakerFullAccess to generated roles

* Update version to official 1.1.0

* Formatting int test file

* Add PassRole IAM permission to OIDC

* Adds ROBOMAKER_EXECUTION_ROLE_ARN to build vars

Co-authored-by: Nicholas Thomson <nithomso@amazon.com>
This commit is contained in:
Leonard O' Sullivan 2020-12-12 07:27:27 +10:00 committed by GitHub
parent cab66700dc
commit 4aa11c3c7f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
92 changed files with 5404 additions and 70 deletions

View File

@ -4,6 +4,12 @@ The version of the AWS SageMaker Components is determined by the docker image ta
Repository: https://hub.docker.com/repository/docker/amazon/aws-sagemaker-kfp-components
---------------------------------------------
**Change log for version 1.1.0**
- Add SageMaker RLEstimator component
- Add RoboMaker create/delete simulation application, cerate simulation job components
> Pull requests : [#4813](https://github.com/kubeflow/pipelines/pull/4813/)
**Change log for version 1.0.0**
- First release to guarantee backward compatibility within major version
- Internally refactored components

View File

@ -18,6 +18,10 @@ There is no additional charge for using Amazon SageMaker Components for Kubeflow
The Training component allows you to submit Amazon SageMaker Training jobs directly from a Kubeflow Pipelines workflow. For more information, see [SageMaker Training Kubeflow Pipelines component](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/train).
#### RLEstimator
The RLEstimator component allows you to submit RLEstimator (Reinforcement Learning) SageMaker Training jobs directly from a Kubeflow Pipelines workflow. For more information, see [SageMaker RLEstimator Kubeflow Pipelines component](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/rlestimator).
#### Hyperparameter Optimization
The Hyperparameter Optimization component enables you to submit hyperparameter tuning jobs to Amazon SageMaker directly from a Kubeflow Pipelines workflow. For more information, see [SageMaker Hyperparameter Optimization Kubeflow Pipeline component](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/hyperparameter_tuning).
@ -49,3 +53,20 @@ The Ground Truth component enables you to to submit Amazon SageMaker Ground Trut
The Workteam component enables you to create Amazon SageMaker private workteam jobs directly from a Kubeflow Pipelines workflow. For more information, see [SageMaker create private workteam Kubeflow Pipelines component](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/workteam).
### RoboMaker components
#### Create Simulation Application
The Create Simulation Application component allows you to create a RoboMaker Simulation Application directly from a Kubeflow Pipelines workflow. For more information, see [RoboMaker Create Simulation app Kubeflow Pipelines component](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/create_simulation_app).
#### Simulation Job
The Simulation Job component allows you to run a RoboMaker Simulation Job directly from a Kubeflow Pipelines workflow. For more information, see [RoboMaker Simulation Job Kubeflow Pipelines component](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/simulation_job).
#### Simulation Job Batch
The Simulation Job Batch component allows you to run a RoboMaker Simulation Job Batch directly from a Kubeflow Pipelines workflow. For more information, see [RoboMaker Simulation Job Batch Kubeflow Pipelines component](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/simulation_job_batch).
#### Delete Simulation Application
The Delete Simulation Application component allows you to delete a RoboMaker Simulation Application directly from a Kubeflow Pipelines workflow. For more information, see [RoboMaker Delete Simulation app Kubeflow Pipelines component](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/delete_simulation_app).

View File

@ -1,4 +1,4 @@
** Amazon SageMaker Components for Kubeflow Pipelines; version 1.0.0 --
** Amazon SageMaker Components for Kubeflow Pipelines; version 1.1.0 --
https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker
Copyright 2019-2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
** boto3; version 1.14.12 -- https://github.com/boto/boto3/

View File

@ -56,7 +56,7 @@ outputs:
- {name: output_location, description: S3 URI of the transform job results.}
implementation:
container:
image: amazon/aws-sagemaker-kfp-components:1.0.0
image: amazon/aws-sagemaker-kfp-components:1.1.0
command: [python3]
args:
- batch_transform/src/sagemaker_transform_component.py

View File

@ -2,7 +2,7 @@ version: 0.2
env:
variables:
CONTAINER_VARIABLES: "AWS_CONTAINER_CREDENTIALS_RELATIVE_URI EKS_PRIVATE_SUBNETS EKS_PUBLIC_SUBNETS PYTEST_MARKER PYTEST_ADDOPTS S3_DATA_BUCKET EKS_EXISTING_CLUSTER SAGEMAKER_EXECUTION_ROLE_ARN REGION SKIP_FSX_TESTS"
CONTAINER_VARIABLES: "AWS_CONTAINER_CREDENTIALS_RELATIVE_URI EKS_PRIVATE_SUBNETS EKS_PUBLIC_SUBNETS PYTEST_MARKER PYTEST_ADDOPTS S3_DATA_BUCKET EKS_EXISTING_CLUSTER SAGEMAKER_EXECUTION_ROLE_ARN REGION SKIP_FSX_TESTS ROBOMAKER_EXECUTION_ROLE_ARN"
phases:
pre_build:

View File

@ -20,6 +20,7 @@ from botocore.credentials import (
JSONFileCache,
)
from botocore.session import Session as BotocoreSession
from sagemaker.session import Session as SageMakerSession
class Boto3Manager(object):
@ -45,7 +46,7 @@ class Boto3Manager(object):
@staticmethod
def _get_boto3_session(
region: str, role_arn: str = None, assume_duration: int = 3600
):
) -> Session:
"""Creates a boto3 session, optionally assuming a role.
Args:
@ -112,6 +113,64 @@ class Boto3Manager(object):
)
return client
@staticmethod
def get_sagemaker_session(
component_version: str,
region: str,
endpoint_url: str = None,
assume_role_arn: str = None,
):
"""Builds a SageMaker Session which can be used by any Estimator.
Args:
component_version: The version of the component to include in
the user agent.
region: The AWS region for the SageMaker client and SageMaker Session.
endpoint_url: A private link endpoint for SageMaker.
assume_role_arn: The ARN of a role for the boto3 client to assume.
Returns:
object: A SageMaker boto3 session.
"""
return SageMakerSession(
boto_session=Boto3Manager._get_boto3_session(region, assume_role_arn),
sagemaker_client=Boto3Manager.get_sagemaker_client(
component_version, region, endpoint_url, assume_role_arn
),
)
@staticmethod
def get_robomaker_client(
component_version: str,
region: str,
endpoint_url: str = None,
assume_role_arn: str = None,
):
"""Builds a client to the AWS RoboMaker API.
Args:
component_version: The version of the component to include in
the user agent.
region: The AWS region for the RoboMaker client.
endpoint_url: A private link endpoint for RoboMaker.
assume_role_arn: The ARN of a role for the boto3 client to assume.
Returns:
object: A RoboMaker boto3 client.
"""
session = Boto3Manager._get_boto3_session(region, assume_role_arn)
session_config = Config(
user_agent=f"sagemaker-on-kubeflow-pipelines-v{component_version}",
retries={"max_attempts": 10, "mode": "standard"},
)
client = session.client(
"robomaker",
region_name=region,
endpoint_url=endpoint_url,
config=session_config,
)
return client
@staticmethod
def get_cloudwatch_client(region: str, assume_role_arn: str = None):
"""Builds a client to the AWS CloudWatch API.

View File

@ -20,11 +20,16 @@ import common.sagemaker_component as component_module
COMPONENT_DIRECTORIES = [
"batch_transform",
"create_simulation_app",
"delete_simulation_app",
"deploy",
"ground_truth",
"hyperparameter_tuning",
"model",
"process",
"rlestimator",
"simulation_job",
"simulation_job_batch",
"train",
"workteam",
]

View File

@ -18,6 +18,7 @@ import signal
import string
import logging
import json
from enum import Enum, auto
from types import FunctionType
import yaml
import random
@ -76,6 +77,25 @@ class SageMakerJobStatus(NamedTuple):
error_message: Optional[str] = None
class DebugRulesStatus(Enum):
COMPLETED = auto()
ERRORED = auto()
INPROGRESS = auto()
@classmethod
def from_describe(cls, response):
has_error = False
for debug_rule in response["DebugRuleEvaluationStatuses"]:
if debug_rule["RuleEvaluationStatus"] == "Error":
has_error = True
if debug_rule["RuleEvaluationStatus"] == "InProgress":
return DebugRulesStatus.INPROGRESS
if has_error:
return DebugRulesStatus.ERRORED
else:
return DebugRulesStatus.COMPLETED
class SageMakerComponent:
"""Base class for a KFP SageMaker component.

View File

@ -44,16 +44,16 @@ class SpecInputParsers:
def yaml_or_json_list(value):
"""Parses a YAML or JSON list to a Python list."""
parsed = SpecInputParsers._yaml_or_json_str(value)
if not isinstance(parsed, List):
raise ArgumentTypeError(f"{value} is not a list")
if parsed is not None and not isinstance(parsed, List):
raise ArgumentTypeError(f"{value} (type {type(value)}) is not a list")
return parsed
@staticmethod
def yaml_or_json_dict(value):
"""Parses a YAML or JSON dictionary to a Python dictionary."""
parsed = SpecInputParsers._yaml_or_json_str(value)
if not isinstance(parsed, Dict):
raise ArgumentTypeError(f"{value} is not a dictionary")
if parsed is not None and not isinstance(parsed, Dict):
raise ArgumentTypeError(f"{value} (type {type(value)}) is not a dictionary")
return parsed
@staticmethod

View File

@ -0,0 +1,15 @@
name: ''
sources:
s3Bucket:
s3Key:
architecture:
simulationSoftwareSuite:
name: ''
version: ''
robotSoftwareSuite:
name: ''
version: ''
renderingEngine:
name: ''
version: ''
tags: {}

View File

@ -0,0 +1,2 @@
application: ''
applicationVersion: ''

View File

@ -0,0 +1,5 @@
batchPolicy:
timeoutInSeconds:
maxConcurrency:
createSimulationJobRequests: []
tags: {}

View File

@ -0,0 +1,18 @@
outputLocation:
s3Bucket:
s3Prefix:
loggingConfig:
recordAllRosTopics: False
maxJobDurationInSeconds: 28800
iamRole: ''
failureBehavior: 'Fail'
robotApplications: []
simulationApplications: []
dataSources: []
vpcConfig:
subnets: []
securityGroups: []
assignPublicIp: False
compute:
simulationUnitLimit: 15
tags: {}

View File

@ -0,0 +1,47 @@
# RoboMaker Create Simulation Application Kubeflow Pipelines component
## Summary
Component to create RoboMaker Simulation Application's from a Kubeflow Pipelines workflow.
https://docs.aws.amazon.com/robomaker/latest/dg/create-simulation-application.html
## Intended Use
For running your simulation workloads using AWS RoboMaker.
## Runtime Arguments
Argument | Description | Optional | Data type | Accepted values | Default |
:--- | :---------- | :----------| :----------| :---------- | :----------|
region | The region where the cluster launches | No | String | | |
endpoint_url | The endpoint URL for the private link VPC endpoint | Yes | String | | |
assume_role | The ARN of an IAM role to assume when connecting to SageMaker | Yes | String | | |
app_name | The name of the simulation application. Must be unique within the same AWS account and AWS region | Yes | String | | SimulationApplication-[datetime]-[random id]|
role | The Amazon Resource Name (ARN) that Amazon RoboMaker assumes to perform tasks on your behalf | No | String | | |
sources | The code sources of the simulation application | No | String | | |
simulation_software_name | The simulation software used by the simulation application | No | Dict | | {} |
simulation_software_version | The simulation software version used by the simulation application | No | Dict | | {} |
robot_software_name | The robot software (ROS distribution) used by the simulation application | No | Dict | | {} |
robot_software_version | The robot software version (ROS distribution) used by the simulation application | No | Dict | | {} |
rendering_engine_name | The rendering engine for the simulation application | Yes | Dict | | {} |
rendering_engine_version | The rendering engine version for the simulation application | Yes | Dict | | {} |
tags | Key-value pairs to categorize AWS resources | Yes | Dict | | {} |
Notes:
* This component should to be ran as a precursor to the RoboMaker [`Simulation Job component`](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/simulation_job/README.md)
* The format for the [`sources`](https://docs.aws.amazon.com/robomaker/latest/dg/API_SourceConfig.html) field is:
```
[
{
"s3Bucket": "string",
"s3Key": "string",
"architecture": "string",
}
]
```
## Output
The ARN of the created Simulation Application. This can be passed as an input to other components such as RoboMaker Simulation Job.
# Example code
Example of creating a Sim app, then a Sim job and finally deleting the Sim app : [robomaker_simulation_job_app](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/robomaker_simulation/robomaker_simulation_job_app.py)
# Resources
* [Create RoboMaker Simulation Application via Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/robomaker.html#RoboMaker.Client.create_simulation_application)

View File

@ -0,0 +1,69 @@
name: RoboMaker - Create Simulation Application
description: Creates a simulation application.
inputs:
- {name: region, type: String, description: The region for the SageMaker resource.}
- {name: endpoint_url, type: String, description: The URL to use when communicating
with the SageMaker service., default: ''}
- {name: assume_role, type: String, description: The ARN of an IAM role to assume
when connecting to SageMaker., default: ''}
- {name: tags, type: JsonObject, description: 'An array of key-value pairs, to categorize
AWS resources.', default: '{}'}
- {name: app_name, type: String, description: The name of the simulation application.,
default: ''}
- {name: sources, type: JsonArray, description: The code sources of the simulation
application., default: '{}'}
- {name: simulation_software_name, type: String, description: The simulation software
used by the simulation application., default: ''}
- {name: simulation_software_version, type: String, description: The simulation software
version used by the simulation application., default: ''}
- {name: robot_software_name, type: String, description: The robot software used by
the simulation application., default: ''}
- {name: robot_software_version, type: String, description: The robot software version
used by the simulation application., default: ''}
- {name: rendering_engine_name, type: String, description: The rendering engine for
the simulation application., default: ''}
- {name: rendering_engine_version, type: String, description: The rendering engine
version for the simulation application., default: ''}
outputs:
- {name: arn, description: The Amazon Resource Name (ARN) of the simulation application.}
- {name: app_name, description: The name of the simulation application.}
- {name: version, description: The version of the simulation application.}
- {name: revision_id, description: The revision id of the simulation application.}
implementation:
container:
image: amazon/aws-sagemaker-kfp-components:1.1.0
command: [python3]
args:
- create_simulation_app/src/robomaker_create_simulation_app_component.py
- --region
- {inputValue: region}
- --endpoint_url
- {inputValue: endpoint_url}
- --assume_role
- {inputValue: assume_role}
- --tags
- {inputValue: tags}
- --app_name
- {inputValue: app_name}
- --sources
- {inputValue: sources}
- --simulation_software_name
- {inputValue: simulation_software_name}
- --simulation_software_version
- {inputValue: simulation_software_version}
- --robot_software_name
- {inputValue: robot_software_name}
- --robot_software_version
- {inputValue: robot_software_version}
- --rendering_engine_name
- {inputValue: rendering_engine_name}
- --rendering_engine_version
- {inputValue: rendering_engine_version}
- --arn_output_path
- {outputPath: arn}
- --app_name_output_path
- {outputPath: app_name}
- --version_output_path
- {outputPath: version}
- --revision_id_output_path
- {outputPath: revision_id}

View File

@ -0,0 +1,160 @@
"""RoboMaker component for creating a simulation application."""
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
from typing import Dict
from create_simulation_app.src.robomaker_create_simulation_app_spec import (
RoboMakerCreateSimulationAppSpec,
RoboMakerCreateSimulationAppInputs,
RoboMakerCreateSimulationAppOutputs,
)
from common.sagemaker_component import (
SageMakerComponent,
ComponentMetadata,
SageMakerJobStatus,
)
from common.boto3_manager import Boto3Manager
from common.common_inputs import SageMakerComponentCommonInputs
@ComponentMetadata(
name="RoboMaker - Create Simulation Application",
description="Creates a simulation application.",
spec=RoboMakerCreateSimulationAppSpec,
)
class RoboMakerCreateSimulationAppComponent(SageMakerComponent):
"""RoboMaker component for creating a simulation application."""
def Do(self, spec: RoboMakerCreateSimulationAppSpec):
self._app_name = (
spec.inputs.app_name
if spec.inputs.app_name
else RoboMakerCreateSimulationAppComponent._generate_unique_timestamped_id(
prefix="SimulationApplication"
)
)
super().Do(spec.inputs, spec.outputs, spec.output_paths)
def _get_job_status(self) -> SageMakerJobStatus:
try:
response = self._rm_client.describe_simulation_application(
application=self._arn
)
status = response["arn"]
if status is not None:
return SageMakerJobStatus(is_completed=True, raw_status=status)
else:
return SageMakerJobStatus(
is_completed=True,
has_error=True,
error_message="No ARN present",
raw_status=status,
)
except Exception as ex:
return SageMakerJobStatus(
is_completed=True,
has_error=True,
error_message=str(ex),
raw_status=str(ex),
)
def _configure_aws_clients(self, inputs: SageMakerComponentCommonInputs):
"""Configures the internal AWS clients for the component.
Args:
inputs: A populated list of user inputs.
"""
self._rm_client = Boto3Manager.get_robomaker_client(
self._get_component_version(),
inputs.region,
endpoint_url=inputs.endpoint_url,
assume_role_arn=inputs.assume_role,
)
self._cw_client = Boto3Manager.get_cloudwatch_client(
inputs.region, assume_role_arn=inputs.assume_role
)
def _after_job_complete(
self,
job: Dict,
request: Dict,
inputs: RoboMakerCreateSimulationAppInputs,
outputs: RoboMakerCreateSimulationAppOutputs,
):
outputs.app_name = self._app_name
outputs.arn = job["arn"]
outputs.version = job["version"]
outputs.revision_id = job["revisionId"]
logging.info(
"Simulation Application in RoboMaker: https://{}.console.aws.amazon.com/robomaker/home?region={}#/simulationApplications/{}".format(
inputs.region, inputs.region, str(outputs.arn).split("/", 1)[1]
)
)
def _on_job_terminated(self):
self._rm_client.delete_simulation_application(application=self._arn)
def _create_job_request(
self,
inputs: RoboMakerCreateSimulationAppInputs,
outputs: RoboMakerCreateSimulationAppOutputs,
) -> Dict:
"""
Documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/robomaker.html#RoboMaker.Client.create_simulation_application
"""
request = self._get_request_template("robomaker.create.simulation.app")
request["name"] = self._app_name
request["sources"] = inputs.sources
request["simulationSoftwareSuite"]["name"] = inputs.simulation_software_name
request["simulationSoftwareSuite"][
"version"
] = inputs.simulation_software_version
request["robotSoftwareSuite"]["name"] = inputs.robot_software_name
request["robotSoftwareSuite"]["version"] = inputs.robot_software_version
if inputs.rendering_engine_name:
request["renderingEngine"]["name"] = inputs.rendering_engine_name
request["renderingEngine"]["version"] = inputs.rendering_engine_version
else:
request.pop("renderingEngine")
return request
def _submit_job_request(self, request: Dict) -> Dict:
return self._rm_client.create_simulation_application(**request)
def _after_submit_job_request(
self,
job: Dict,
request: Dict,
inputs: RoboMakerCreateSimulationAppInputs,
outputs: RoboMakerCreateSimulationAppOutputs,
):
outputs.arn = self._arn = job["arn"]
logging.info(
f"Created Robomaker Simulation Application with name: {self._app_name}"
)
def _print_logs_for_job(self):
pass
if __name__ == "__main__":
import sys
spec = RoboMakerCreateSimulationAppSpec(sys.argv[1:])
component = RoboMakerCreateSimulationAppComponent()
component.Do(spec)

View File

@ -0,0 +1,141 @@
"""Specification for the RoboMaker create simulation application component."""
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from dataclasses import dataclass
from typing import List
from common.sagemaker_component_spec import SageMakerComponentSpec
from common.spec_input_parsers import SpecInputParsers
from common.common_inputs import (
COMMON_INPUTS,
SageMakerComponentCommonInputs,
SageMakerComponentInput as Input,
SageMakerComponentOutput as Output,
SageMakerComponentBaseOutputs,
SageMakerComponentInputValidator as InputValidator,
SageMakerComponentOutputValidator as OutputValidator,
)
@dataclass(frozen=True)
class RoboMakerCreateSimulationAppInputs(SageMakerComponentCommonInputs):
"""Defines the set of inputs for the create simulation application component."""
app_name: Input
sources: Input
simulation_software_name: Input
simulation_software_version: Input
robot_software_name: Input
robot_software_version: Input
rendering_engine_name: Input
rendering_engine_version: Input
@dataclass
class RoboMakerCreateSimulationAppOutputs(SageMakerComponentBaseOutputs):
"""Defines the set of outputs for the create simulation application component."""
arn: Output
app_name: Output
version: Output
revision_id: Output
class RoboMakerCreateSimulationAppSpec(
SageMakerComponentSpec[
RoboMakerCreateSimulationAppInputs, RoboMakerCreateSimulationAppOutputs
]
):
INPUTS: RoboMakerCreateSimulationAppInputs = RoboMakerCreateSimulationAppInputs(
app_name=InputValidator(
input_type=str,
required=True,
description="The name of the simulation application.",
default="",
),
sources=InputValidator(
input_type=SpecInputParsers.yaml_or_json_list,
required=True,
description="The code sources of the simulation application.",
default={},
),
simulation_software_name=InputValidator(
input_type=str,
required=True,
description="The simulation software used by the simulation application.",
default="",
),
simulation_software_version=InputValidator(
input_type=str,
required=True,
description="The simulation software version used by the simulation application.",
default="",
),
robot_software_name=InputValidator(
input_type=str,
required=True,
description="The robot software used by the simulation application.",
default="",
),
robot_software_version=InputValidator(
input_type=str,
required=True,
description="The robot software version used by the simulation application.",
default="",
),
rendering_engine_name=InputValidator(
input_type=str,
required=False,
description="The rendering engine for the simulation application.",
default="",
),
rendering_engine_version=InputValidator(
input_type=str,
required=False,
description="The rendering engine version for the simulation application.",
default="",
),
**vars(COMMON_INPUTS),
)
OUTPUTS = RoboMakerCreateSimulationAppOutputs(
arn=OutputValidator(
description="The Amazon Resource Name (ARN) of the simulation application."
),
app_name=OutputValidator(description="The name of the simulation application."),
version=OutputValidator(
description="The version of the simulation application."
),
revision_id=OutputValidator(
description="The revision id of the simulation application."
),
)
def __init__(self, arguments: List[str]):
super().__init__(
arguments,
RoboMakerCreateSimulationAppInputs,
RoboMakerCreateSimulationAppOutputs,
)
@property
def inputs(self) -> RoboMakerCreateSimulationAppInputs:
return self._inputs
@property
def outputs(self) -> RoboMakerCreateSimulationAppOutputs:
return self._outputs
@property
def output_paths(self) -> RoboMakerCreateSimulationAppOutputs:
return self._output_paths

View File

@ -0,0 +1,34 @@
# RoboMaker Delete Simulation Application Kubeflow Pipelines component
## Summary
Component to delete RoboMaker Simulation Application's from a Kubeflow Pipelines workflow.
https://docs.aws.amazon.com/robomaker/latest/dg/API_DeleteSimulationApplication.html
## Intended Use
For running your simulation workloads using AWS RoboMaker.
## Runtime Arguments
Argument | Description | Optional | Data type | Accepted values | Default |
:--- | :---------- | :----------| :----------| :---------- | :----------|
region | The region where the cluster launches | No | String | | |
endpoint_url | The endpoint URL for the private link VPC endpoint | Yes | String | | |
assume_role | The ARN of an IAM role to assume when connecting to SageMaker | Yes | String | | |
app_name | The name of the simulation application. Must be unique within the same AWS account and AWS region | Yes | String | | SimulationApplication-[datetime]-[random id]|
role | The Amazon Resource Name (ARN) that Amazon RoboMaker assumes to perform tasks on your behalf | No | String | | |
arn | The Amazon Resource Name (ARN) of the simulation application | No | String | | |
version | The version of the simulation application | Yes | String | | |
Notes:
* This component can be used to clean up any simulation apps that were created by other components such as the Create Simulation App component.
* This component should to be ran as after the RoboMaker [`Simulation Job component`](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/simulation_job/README.md)
* The format for the [`sources`](https://docs.aws.amazon.com/robomaker/latest/dg/API_SourceConfig.html) field is:
## Output
The ARN of the deleted Simulation Application.
# Example code
Example of creating a Sim app, then a Sim job and finally deleting the Sim app : [robomaker_simulation_job_app](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/robomaker_simulation/robomaker_simulation_job_app.py)
# Resources
* [Delete RoboMaker Simulation Application via Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/robomaker.html#RoboMaker.Client.delete_simulation_application)

View File

@ -0,0 +1,36 @@
name: RoboMaker - Delete Simulation Application
description: Delete a simulation application.
inputs:
- {name: region, type: String, description: The region for the SageMaker resource.}
- {name: endpoint_url, type: String, description: The URL to use when communicating
with the SageMaker service., default: ''}
- {name: assume_role, type: String, description: The ARN of an IAM role to assume
when connecting to SageMaker., default: ''}
- {name: tags, type: JsonObject, description: 'An array of key-value pairs, to categorize
AWS resources.', default: '{}'}
- {name: arn, type: String, description: The Amazon Resource Name (ARN) of the simulation
application., default: ''}
- {name: version, type: String, description: The version of the simulation application.,
default: ''}
outputs:
- {name: arn, description: The Amazon Resource Name (ARN) of the simulation application.}
implementation:
container:
image: amazon/aws-sagemaker-kfp-components:1.1.0
command: [python3]
args:
- delete_simulation_app/src/robomaker_delete_simulation_app_component.py
- --region
- {inputValue: region}
- --endpoint_url
- {inputValue: endpoint_url}
- --assume_role
- {inputValue: assume_role}
- --tags
- {inputValue: tags}
- --arn
- {inputValue: arn}
- --version
- {inputValue: version}
- --arn_output_path
- {outputPath: arn}

View File

@ -0,0 +1,127 @@
"""RoboMaker component for deleting a simulation application."""
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
from typing import Dict
from delete_simulation_app.src.robomaker_delete_simulation_app_spec import (
RoboMakerDeleteSimulationAppSpec,
RoboMakerDeleteSimulationAppInputs,
RoboMakerDeleteSimulationAppOutputs,
)
from common.sagemaker_component import (
SageMakerComponent,
ComponentMetadata,
SageMakerJobStatus,
)
from common.boto3_manager import Boto3Manager
from common.common_inputs import SageMakerComponentCommonInputs
@ComponentMetadata(
name="RoboMaker - Delete Simulation Application",
description="Delete a simulation application.",
spec=RoboMakerDeleteSimulationAppSpec,
)
class RoboMakerDeleteSimulationAppComponent(SageMakerComponent):
"""RoboMaker component for deleting a simulation application."""
def Do(self, spec: RoboMakerDeleteSimulationAppSpec):
self._arn = spec.inputs.arn
self._version = spec.inputs.version
super().Do(spec.inputs, spec.outputs, spec.output_paths)
def _get_job_status(self) -> SageMakerJobStatus:
try:
response = self._rm_client.describe_simulation_application(
application=self._arn
)
status = response["arn"]
if status is not None:
return SageMakerJobStatus(is_completed=False, raw_status=status,)
else:
return SageMakerJobStatus(is_completed=True, raw_status="Item deleted")
except Exception as ex:
return SageMakerJobStatus(is_completed=True, raw_status=str(ex))
def _configure_aws_clients(self, inputs: SageMakerComponentCommonInputs):
"""Configures the internal AWS clients for the component.
Args:
inputs: A populated list of user inputs.
"""
self._rm_client = Boto3Manager.get_robomaker_client(
self._get_component_version(),
inputs.region,
endpoint_url=inputs.endpoint_url,
assume_role_arn=inputs.assume_role,
)
self._cw_client = Boto3Manager.get_cloudwatch_client(
inputs.region, assume_role_arn=inputs.assume_role
)
def _after_job_complete(
self,
job: Dict,
request: Dict,
inputs: RoboMakerDeleteSimulationAppInputs,
outputs: RoboMakerDeleteSimulationAppOutputs,
):
outputs.arn = self._arn
logging.info("Simulation Application {} has been deleted".format(outputs.arn))
def _on_job_terminated(self):
logging.info("Simulation Application {} failed to delete".format(self._arn))
def _create_job_request(
self,
inputs: RoboMakerDeleteSimulationAppInputs,
outputs: RoboMakerDeleteSimulationAppOutputs,
) -> Dict:
"""
Documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/robomaker.html#RoboMaker.Client.delete_simulation_application
"""
request = self._get_request_template("robomaker.delete.simulation.app")
request["application"] = self._arn
# If we have a version then use it, else remove it from request object
if inputs.version:
request["applicationVersion"] = inputs.version
else:
request.pop("applicationVersion")
return request
def _submit_job_request(self, request: Dict) -> Dict:
return self._rm_client.delete_simulation_application(**request)
def _after_submit_job_request(
self,
job: Dict,
request: Dict,
inputs: RoboMakerDeleteSimulationAppInputs,
outputs: RoboMakerDeleteSimulationAppOutputs,
):
logging.info(f"Deleted Robomaker Simulation Application with arn: {self._arn}")
def _print_logs_for_job(self):
pass
if __name__ == "__main__":
import sys
spec = RoboMakerDeleteSimulationAppSpec(sys.argv[1:])
component = RoboMakerDeleteSimulationAppComponent()
component.Do(spec)

View File

@ -0,0 +1,88 @@
"""Specification for the RoboMaker delete. simulation application component."""
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from dataclasses import dataclass
from typing import List
from common.sagemaker_component_spec import SageMakerComponentSpec
from common.common_inputs import (
COMMON_INPUTS,
SageMakerComponentCommonInputs,
SageMakerComponentInput as Input,
SageMakerComponentOutput as Output,
SageMakerComponentBaseOutputs,
SageMakerComponentInputValidator as InputValidator,
SageMakerComponentOutputValidator as OutputValidator,
)
@dataclass(frozen=True)
class RoboMakerDeleteSimulationAppInputs(SageMakerComponentCommonInputs):
"""Defines the set of inputs for the delete simulation application component."""
arn: Input
version: Input
@dataclass
class RoboMakerDeleteSimulationAppOutputs(SageMakerComponentBaseOutputs):
"""Defines the set of outputs for the create simulation application component."""
arn: Output
class RoboMakerDeleteSimulationAppSpec(
SageMakerComponentSpec[
RoboMakerDeleteSimulationAppInputs, RoboMakerDeleteSimulationAppOutputs
]
):
INPUTS: RoboMakerDeleteSimulationAppInputs = RoboMakerDeleteSimulationAppInputs(
arn=InputValidator(
input_type=str,
required=True,
description="The Amazon Resource Name (ARN) of the simulation application.",
default="",
),
version=InputValidator(
input_type=str,
required=False,
description="The version of the simulation application.",
default=None,
),
**vars(COMMON_INPUTS),
)
OUTPUTS = RoboMakerDeleteSimulationAppOutputs(
arn=OutputValidator(
description="The Amazon Resource Name (ARN) of the simulation application."
),
)
def __init__(self, arguments: List[str]):
super().__init__(
arguments,
RoboMakerDeleteSimulationAppInputs,
RoboMakerDeleteSimulationAppOutputs,
)
@property
def inputs(self) -> RoboMakerDeleteSimulationAppInputs:
return self._inputs
@property
def outputs(self) -> RoboMakerDeleteSimulationAppOutputs:
return self._outputs
@property
def output_paths(self) -> RoboMakerDeleteSimulationAppOutputs:
return self._output_paths

View File

@ -64,7 +64,7 @@ outputs:
- {name: endpoint_name, description: The created endpoint name.}
implementation:
container:
image: amazon/aws-sagemaker-kfp-components:1.0.0
image: amazon/aws-sagemaker-kfp-components:1.1.0
command: [python3]
args:
- deploy/src/sagemaker_deploy_component.py

View File

@ -79,7 +79,7 @@ outputs:
SageMaker model trained as part of automated data labeling.}
implementation:
container:
image: amazon/aws-sagemaker-kfp-components:1.0.0
image: amazon/aws-sagemaker-kfp-components:1.1.0
command: [python3]
args:
- ground_truth/src/sagemaker_ground_truth_component.py

View File

@ -98,7 +98,7 @@ outputs:
the training algorithm.}
implementation:
container:
image: amazon/aws-sagemaker-kfp-components:1.0.0
image: amazon/aws-sagemaker-kfp-components:1.1.0
command: [python3]
args:
- hyperparameter_tuning/src/sagemaker_tuning_component.py

View File

@ -37,7 +37,7 @@ outputs:
- {name: model_name, description: The name of the model created by SageMaker.}
implementation:
container:
image: amazon/aws-sagemaker-kfp-components:1.0.0
image: amazon/aws-sagemaker-kfp-components:1.1.0
command: [python3]
args:
- model/src/sagemaker_model_component.py

View File

@ -57,7 +57,7 @@ outputs:
- {name: output_artifacts, description: A dictionary containing the output S3 artifacts.}
implementation:
container:
image: amazon/aws-sagemaker-kfp-components:1.0.0
image: amazon/aws-sagemaker-kfp-components:1.1.0
command: [python3]
args:
- process/src/sagemaker_process_component.py

View File

@ -0,0 +1,93 @@
# SageMaker RLEstimator Kubeflow Pipelines component
## Summary
Component to submit SageMaker RLEstimator (Reinforcement Learning) training jobs directly from a Kubeflow Pipelines workflow.
https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-rl-workflow.html
## Intended Use
For running your data processing workloads, such as feature engineering, data validation, model evaluation, and model interpretation using AWS SageMaker.
## Runtime Arguments
Argument | Description | Optional | Data type | Accepted values | Default |
:--- | :---------- | :----------| :----------| :---------- | :----------|
region | The region where the cluster launches | No | String | | |
endpoint_url | The endpoint URL for the private link VPC endpoint | Yes | String | | |
assume_role | The ARN of an IAM role to assume when connecting to SageMaker | Yes | String | | |
job_name | The name of the training job. Must be unique within the same AWS account and AWS region | Yes | String | | TrainingJob-[datetime]-[random id]|
role | The Amazon Resource Name (ARN) that Amazon SageMaker assumes to perform tasks on your behalf | No | String | | |
image | The registry path of the Docker image that contains your custom image, or you can use a prebuilt AWS RL image | Yes | String | | |
entry_point | Path (absolute or relative) to the Python source file which should be executed as the entry point to training | No | String | | |
source_dir | Path (S3 URI) to a directory with any other training source code dependencies aside from the entry point file | Yes | String | | |
toolkit | RL toolkit you want to use for executing your model training code | Yes | String | | |
toolkit_version | RL toolkit version you want to be use for executing your model training code | Yes | String | | |
framework | Framework (MXNet, TensorFlow or PyTorch) you want to be used as a toolkit backed for reinforcement learning training | Yes | String | | |
metric_definitions | The dictionary of name-regex pairs specify the metrics that the algorithm emits | Yes | Dict | | {} |
training_input_mode | The input mode that the algorithm supports | No | String | File, Pipe | File |
hyperparameters | Hyperparameters for the selected algorithm | No | Dict | [Depends on Algo](https://docs.aws.amazon.com/sagemaker/latest/dg/k-means-api-config.html)| |
instance_type | The ML compute instance type | Yes | String | ml.m4.xlarge, ml.m4.2xlarge, ml.m4.4xlarge, ml.m4.10xlarge, ml.m4.16xlarge, ml.m5.large, ml.m5.xlarge, ml.m5.2xlarge, ml.m5.4xlarge, ml.m5.12xlarge, ml.m5.24xlarge, ml.c4.xlarge, ml.c4.2xlarge, ml.c4.4xlarge, ml.c4.8xlarge, ml.p2.xlarge, ml.p2.8xlarge, ml.p2.16xlarge, ml.p3.2xlarge, ml.p3.8xlarge, ml.p3.16xlarge, ml.c5.xlarge, ml.c5.2xlarge, ml.c5.4xlarge, ml.c5.9xlarge, ml.c5.18xlarge [and many more](https://aws.amazon.com/sagemaker/pricing/instance-types/) | ml.m4.xlarge |
instance_count | The number of ML compute instances to use in each training job | Yes | Int | ≥ 1 | 1 |
volume_size | The size of the ML storage volume that you want to provision in GB | Yes | Int | ≥ 1 | 30 |
max_run | The maximum run time in seconds per training job | Yes | Int | ≤ 432000 (5 days) | 86400 (1 day) |
model_artifact_path | | No | String | | |
output_encryption_key | The AWS KMS key that Amazon SageMaker uses to encrypt the model artifacts | Yes | String | | |
vpc_security_group_ids | A comma-delimited list of security group IDs, in the form sg-xxxxxxxx | Yes | String | | |
vpc_subnets | A comma-delimited list of subnet IDs in the VPC to which you want to connect your RLEstimator job | Yes | String | | |
spot_instance | Use managed spot training if true | No | Boolean | False, True | False |
max_wait_time | The maximum time in seconds you are willing to wait for a managed spot training job to complete | Yes | Int | ≤ 432000 (5 days) | 86400 (1 day) |
checkpoint_config | Dictionary of information about the output location for managed spot training checkpoint data | Yes | Dict | | {} |
debug_hook_config | Dictionary of configuration information for the debug hook parameters, collection configurations, and storage paths | Yes | Dict | | {} |
debug_rule_config | List of configuration information for debugging rules | Yes | List of Dicts | | [] |
tags | Key-value pairs to categorize AWS resources | Yes | Dict | | {} |
Notes:
* There are two ways to use this compnent, you can build your own Docker image with baked in code or pass code in via the source_dir input. You then use the entry_point to provide a filename to use as the code entrypoint.
* The format for the [`debug_hook_config`](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DebugHookConfig.html) field is:
```
{
"CollectionConfigurations": [
{
'CollectionName': 'string',
'CollectionParameters': {
'string' : 'string'
}
}
],
'HookParameters': {
'string' : 'string'
},
'LocalPath': 'string',
'S3OutputPath': 'string'
}
```
* The format for the [`debug_rule_config`](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DebugRuleConfiguration.html) field is:
```
[
{
'InstanceType': 'string',
'LocalPath': 'string',
'RuleConfigurationName': 'string',
'RuleEvaluatorImage': 'string',
'RuleParameters': {
'string' : 'string'
},
'S3OutputPath': 'string',
'VolumeSizeInGB': number
}
]
```
## Output
Stores the Model in the s3 bucket you specified via model_artifact_path
# Example code
Simple example pipeline that uses a custom image : [rlestimator_pipeline_custom_image](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/rlestimator_pipeline/rlestimator_pipeline_custom_image.py)
Sample pipeline for using an image selected for you by the RLEstimator class dependent on the framework and toolkit you provide: [rlestimator_pipeline_toolkit_image](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/rlestimator_pipeline/rlestimator_pipeline_toolkit_image.py)
# Resources
* [Create RLEstimator Job API documentation](https://sagemaker.readthedocs.io/en/stable/frameworks/rl/sagemaker.rl.html)
* [Amazon SageMaker Debugger](https://docs.aws.amazon.com/sagemaker/latest/dg/train-debugger.html)
* [Debugger Built-In Rules](https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-built-in-rules.html)
* [Debugger Custom Rules](https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-custom-rules.html)
* [Debugger Registry URLs](https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-docker-images-rules.html)
* [Debugger API Examples](https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-createtrainingjob-api.html)

View File

@ -0,0 +1,148 @@
name: SageMaker - RLEstimator Training Job
description: Handle end-to-end training and deployment of custom RLEstimator code.
inputs:
- name: spot_instance
type: Bool
description: Use managed spot training.
default: "False"
- {name: max_wait_time, type: Integer, description: The maximum time in seconds you
are willing to wait for a managed spot training job to complete., default: '86400'}
- {name: max_run_time, type: Integer, description: The maximum run time in seconds
for the training job., default: '86400'}
- {name: checkpoint_config, type: JsonObject, description: Dictionary of information
about the output location for managed spot training checkpoint data., default: '{}'}
- {name: region, type: String, description: The region for the SageMaker resource.}
- {name: endpoint_url, type: String, description: The URL to use when communicating
with the SageMaker service., default: ''}
- {name: assume_role, type: String, description: The ARN of an IAM role to assume
when connecting to SageMaker., default: ''}
- {name: tags, type: JsonObject, description: 'An array of key-value pairs, to categorize
AWS resources.', default: '{}'}
- {name: job_name, type: String, description: Training job name., default: ''}
- {name: role, type: String, description: The Amazon Resource Name (ARN) that Amazon
SageMaker assumes to perform tasks on your behalf.}
- {name: image, type: String, description: 'An ECR url. If specified, the estimator
will use this image for training and hosting', default: ''}
- {name: entry_point, type: String, description: Path (absolute or relative) to the
Python source file which should be executed as the entry point to training., default: ''}
- {name: source_dir, type: String, description: Path (S3 URI) to a directory with
any other training source code dependencies aside from the entry point file.,
default: ''}
- {name: toolkit, type: String, description: RL toolkit you want to use for executing
your model training code., default: ''}
- {name: toolkit_version, type: String, description: RL toolkit version you want to
be use for executing your model training code., default: ''}
- {name: framework, type: String, description: 'Framework (MXNet, TensorFlow or PyTorch)
you want to be used as a toolkit backed for reinforcement learning training.',
default: ''}
- {name: metric_definitions, type: JsonArray, description: The dictionary of name-regex
pairs specify the metrics that the algorithm emits., default: '[]'}
- {name: training_input_mode, type: String, description: The input mode that the algorithm
supports. File or Pipe., default: File}
- {name: hyperparameters, type: JsonObject, description: Hyperparameters that will
be used for training., default: '{}'}
- {name: instance_type, type: String, description: The ML compute instance type.,
default: ml.m4.xlarge}
- {name: instance_count, type: Integer, description: The number of ML compute instances
to use in the training job., default: '1'}
- {name: volume_size, type: Integer, description: The size of the ML storage volume
that you want to provision., default: '30'}
- {name: max_run, type: Integer, description: 'Timeout in seconds for training (default:
24 * 60 * 60).', default: '86400'}
- {name: model_artifact_path, type: String, description: Identifies the S3 path where
you want Amazon SageMaker to store the model artifacts.}
- {name: vpc_security_group_ids, type: JsonArray, description: 'The VPC security group
IDs, in the form sg-xxxxxxxx.', default: '[]'}
- {name: vpc_subnets, type: JsonArray, description: The ID of the subnets in the VPC
to which you want to connect your hpo job., default: '[]'}
- name: network_isolation
type: Bool
description: Isolates the training container.
default: "False"
- name: traffic_encryption
type: Bool
description: Encrypts all communications between ML compute instances in distributed
training.
default: "False"
- {name: debug_hook_config, type: JsonObject, description: 'Configuration information
for the debug hook parameters, collection configuration, and storage paths.',
default: '{}'}
- {name: debug_rule_config, type: JsonArray, description: Configuration information
for debugging rules., default: '[]'}
outputs:
- {name: model_artifact_url, description: The model artifacts URL.}
- {name: job_name, description: The training job name.}
- {name: training_image, description: The registry path of the Docker image that contains
the training algorithm.}
implementation:
container:
image: amazon/aws-sagemaker-kfp-components:1.1.0
command: [python3]
args:
- rlestimator/src/sagemaker_rlestimator_component.py
- --spot_instance
- {inputValue: spot_instance}
- --max_wait_time
- {inputValue: max_wait_time}
- --max_run_time
- {inputValue: max_run_time}
- --checkpoint_config
- {inputValue: checkpoint_config}
- --region
- {inputValue: region}
- --endpoint_url
- {inputValue: endpoint_url}
- --assume_role
- {inputValue: assume_role}
- --tags
- {inputValue: tags}
- --job_name
- {inputValue: job_name}
- --role
- {inputValue: role}
- --image
- {inputValue: image}
- --entry_point
- {inputValue: entry_point}
- --source_dir
- {inputValue: source_dir}
- --toolkit
- {inputValue: toolkit}
- --toolkit_version
- {inputValue: toolkit_version}
- --framework
- {inputValue: framework}
- --metric_definitions
- {inputValue: metric_definitions}
- --training_input_mode
- {inputValue: training_input_mode}
- --hyperparameters
- {inputValue: hyperparameters}
- --instance_type
- {inputValue: instance_type}
- --instance_count
- {inputValue: instance_count}
- --volume_size
- {inputValue: volume_size}
- --max_run
- {inputValue: max_run}
- --model_artifact_path
- {inputValue: model_artifact_path}
- --vpc_security_group_ids
- {inputValue: vpc_security_group_ids}
- --vpc_subnets
- {inputValue: vpc_subnets}
- --network_isolation
- {inputValue: network_isolation}
- --traffic_encryption
- {inputValue: traffic_encryption}
- --debug_hook_config
- {inputValue: debug_hook_config}
- --debug_rule_config
- {inputValue: debug_rule_config}
- --model_artifact_url_output_path
- {outputPath: model_artifact_url}
- --job_name_output_path
- {outputPath: job_name}
- --training_image_output_path
- {outputPath: training_image}

View File

@ -0,0 +1,277 @@
"""SageMaker component for RLEstimator."""
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
from typing import Dict
import os
from sagemaker.rl import RLEstimator, RLToolkit, RLFramework
from rlestimator.src.sagemaker_rlestimator_spec import (
SageMakerRLEstimatorSpec,
SageMakerRLEstimatorInputs,
SageMakerRLEstimatorOutputs,
)
from common.sagemaker_component import (
SageMakerComponent,
ComponentMetadata,
SageMakerJobStatus,
DebugRulesStatus,
)
from common.boto3_manager import Boto3Manager
from common.common_inputs import SageMakerComponentCommonInputs
from common.spec_input_parsers import SpecInputParsers
@ComponentMetadata(
name="SageMaker - RLEstimator Training Job",
description="Handle end-to-end training and deployment of custom RLEstimator code.",
spec=SageMakerRLEstimatorSpec,
)
class SageMakerRLEstimatorComponent(SageMakerComponent):
"""SageMaker component for RLEstimator."""
def Do(self, spec: SageMakerRLEstimatorSpec):
self._rlestimator_job_name = (
spec.inputs.job_name
if spec.inputs.job_name
else SageMakerComponent._generate_unique_timestamped_id(
prefix="RLEstimatorJob"
)
)
super().Do(spec.inputs, spec.outputs, spec.output_paths)
def _configure_aws_clients(self, inputs: SageMakerComponentCommonInputs):
"""Configures the internal AWS clients for the component.
Args:
inputs: A populated list of user inputs.
"""
self._sm_client = Boto3Manager.get_sagemaker_client(
self._get_component_version(),
inputs.region,
endpoint_url=inputs.endpoint_url,
assume_role_arn=inputs.assume_role,
)
self._cw_client = Boto3Manager.get_cloudwatch_client(
inputs.region, assume_role_arn=inputs.assume_role
)
self._sagemaker_session = Boto3Manager.get_sagemaker_session(
self._get_component_version(),
inputs.region,
assume_role_arn=inputs.assume_role,
)
def _get_job_status(self) -> SageMakerJobStatus:
response = self._sm_client.describe_training_job(
TrainingJobName=self._rlestimator_job_name
)
status = response["TrainingJobStatus"]
if status == "Completed" or status == "Stopped":
return self._get_debug_rule_status()
if status == "Failed":
message = response["FailureReason"]
return SageMakerJobStatus(
is_completed=True,
has_error=True,
error_message=message,
raw_status=status,
)
return SageMakerJobStatus(is_completed=False, raw_status=status)
def _get_debug_rule_status(self) -> SageMakerJobStatus:
"""Get the job status of the training debugging rules.
Returns:
SageMakerJobStatus: A status object.
"""
response = self._sm_client.describe_training_job(
TrainingJobName=self._rlestimator_job_name
)
# No debugging configured
if "DebugRuleEvaluationStatuses" not in response:
return SageMakerJobStatus(is_completed=True, has_error=False, raw_status="")
raw_status = DebugRulesStatus.from_describe(response)
if raw_status != DebugRulesStatus.INPROGRESS:
logging.info("Rules have ended with status:\n")
self._print_debug_rule_status(response, True)
return SageMakerJobStatus(
is_completed=True,
has_error=(raw_status == DebugRulesStatus.ERRORED),
raw_status=raw_status,
)
self._print_debug_rule_status(response)
return SageMakerJobStatus(is_completed=False, raw_status=raw_status)
def _print_debug_rule_status(self, response, last_print=False):
"""Prints the status of each debug rule.
Example of DebugRuleEvaluationStatuses:
response['DebugRuleEvaluationStatuses'] =
[{
"RuleConfigurationName": "VanishingGradient",
"RuleEvaluationStatus": "IssuesFound",
"StatusDetails": "There was an issue."
}]
If last_print is False:
INFO:root: - LossNotDecreasing: InProgress
INFO:root: - Overtraining: NoIssuesFound
ERROR:root:- CustomGradientRule: Error
If last_print is True:
INFO:root: - LossNotDecreasing: IssuesFound
INFO:root: - RuleEvaluationConditionMet: Evaluation of the rule LossNotDecreasing at step 10 resulted in the condition being met
Args:
response: A describe training job response.
last_print: If true, prints each of the debug rule issues if found.
"""
for debug_rule in response["DebugRuleEvaluationStatuses"]:
line_ending = "\n" if last_print else ""
if "StatusDetails" in debug_rule:
status_details = (
f"- {debug_rule['StatusDetails'].rstrip()}{line_ending}"
)
line_ending = ""
else:
status_details = ""
rule_status = f"- {debug_rule['RuleConfigurationName']}: {debug_rule['RuleEvaluationStatus']}{line_ending}"
if debug_rule["RuleEvaluationStatus"] == "Error":
log_fn = logging.error
status_padding = 1
else:
log_fn = logging.info
status_padding = 2
log_fn(f"{status_padding * ' '}{rule_status}")
if last_print and status_details:
log_fn(f"{(status_padding + 2) * ' '}{status_details}")
self._print_log_header(50)
def _after_job_complete(
self,
job: object,
request: Dict,
inputs: SageMakerRLEstimatorInputs,
outputs: SageMakerRLEstimatorOutputs,
):
outputs.job_name = self._rlestimator_job_name
outputs.model_artifact_url = self._get_model_artifacts_from_job(
self._rlestimator_job_name
)
outputs.training_image = self._get_image_from_job(self._rlestimator_job_name)
def _on_job_terminated(self):
self._sm_client.stop_training_job(TrainingJobName=self._rlestimator_job_name)
def _print_logs_for_job(self):
self._print_cloudwatch_logs(
"/aws/sagemaker/TrainingJobs", self._rlestimator_job_name
)
def _create_job_request(
self, inputs: SageMakerRLEstimatorInputs, outputs: SageMakerRLEstimatorOutputs,
) -> RLEstimator:
# Documentation: https://sagemaker.readthedocs.io/en/stable/frameworks/rl/sagemaker.rl.html
# We need to configure region and it is not something we can do via the RLEstimator class.
# Only use max wait time default value if electing to use spot instances
if not inputs.spot_instance:
max_wait_time = None
else:
max_wait_time = inputs.max_wait_time
estimator = RLEstimator(
entry_point=inputs.entry_point,
source_dir=inputs.source_dir,
image_uri=inputs.image,
toolkit=self._get_toolkit(inputs.toolkit),
toolkit_version=inputs.toolkit_version,
framework=self._get_framework(inputs.framework),
role=inputs.role,
debugger_hook_config=self._nullable(inputs.debug_hook_config),
rules=self._nullable(inputs.debug_rule_config),
instance_type=inputs.instance_type,
instance_count=inputs.instance_count,
output_path=inputs.model_artifact_path,
metric_definitions=inputs.metric_definitions,
input_mode=inputs.training_input_mode,
max_run=inputs.max_run,
hyperparameters=self._validate_hyperparameters(inputs.hyperparameters),
subnets=self._nullable(inputs.vpc_subnets),
security_group_ids=self._nullable(inputs.vpc_security_group_ids),
use_spot_instances=inputs.spot_instance,
enable_network_isolation=inputs.network_isolation,
encrypt_inter_container_traffic=inputs.traffic_encryption,
max_wait=max_wait_time,
sagemaker_session=self._sagemaker_session,
)
return estimator
def _submit_job_request(self, estimator: RLEstimator) -> object:
# By setting wait to false we don't block the current thread.
estimator.fit(job_name=self._rlestimator_job_name, wait=False)
job_name = estimator.latest_training_job.job_name
self._rlestimator_job_name = job_name
response = self._sm_client.describe_training_job(TrainingJobName=job_name)
return response
def _after_submit_job_request(
self,
job: object,
request: Dict,
inputs: SageMakerRLEstimatorInputs,
outputs: SageMakerRLEstimatorOutputs,
):
logging.info(f"Created Training Job with name: {self._rlestimator_job_name}")
logging.info(
"Training job in SageMaker: https://{}.console.aws.amazon.com/sagemaker/home?region={}#/jobs/{}".format(
inputs.region, inputs.region, self._rlestimator_job_name,
)
)
logging.info(
"CloudWatch logs: https://{}.console.aws.amazon.com/cloudwatch/home?region={}#logStream:group=/aws/sagemaker/TrainingJobs;prefix={};streamFilter=typeLogStreamPrefix".format(
inputs.region, inputs.region, self._rlestimator_job_name,
)
)
@staticmethod
def _get_toolkit(toolkit_type: str) -> RLToolkit:
if toolkit_type == "":
return None
return RLToolkit[toolkit_type.upper()]
@staticmethod
def _get_framework(framework_type: str) -> RLFramework:
if framework_type == "":
return None
return RLFramework[framework_type.upper()]
@staticmethod
def _nullable(value: str):
if value:
return value
else:
return None
if __name__ == "__main__":
import sys
spec = SageMakerRLEstimatorSpec(sys.argv[1:])
component = SageMakerRLEstimatorComponent()
component.Do(spec)

View File

@ -0,0 +1,232 @@
"""Specification for the SageMaker RLEstimator component."""
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from dataclasses import dataclass
from typing import List
from common.sagemaker_component_spec import (
SageMakerComponentSpec,
SageMakerComponentBaseOutputs,
)
from common.spec_input_parsers import SpecInputParsers
from common.common_inputs import (
COMMON_INPUTS,
SageMakerComponentCommonInputs,
SpotInstanceInputs,
SPOT_INSTANCE_INPUTS,
SageMakerComponentInput as Input,
SageMakerComponentOutput as Output,
SageMakerComponentInputValidator as InputValidator,
SageMakerComponentOutputValidator as OutputValidator,
)
@dataclass(frozen=True)
class SageMakerRLEstimatorInputs(SageMakerComponentCommonInputs, SpotInstanceInputs):
"""Defines the set of inputs for the rlestimator component."""
job_name: Input
role: Input
image: Input
entry_point: Input
source_dir: Input
toolkit: Input
toolkit_version: Input
framework: Input
metric_definitions: Input
training_input_mode: Input
hyperparameters: Input
instance_type: Input
instance_count: Input
volume_size: Input
max_run: Input
model_artifact_path: Input
vpc_security_group_ids: Input
vpc_subnets: Input
network_isolation: Input
traffic_encryption: Input
debug_hook_config: Input
debug_rule_config: Input
@dataclass
class SageMakerRLEstimatorOutputs(SageMakerComponentBaseOutputs):
"""Defines the set of outputs for the rlestimator component."""
model_artifact_url: Output
job_name: Output
training_image: Output
class SageMakerRLEstimatorSpec(
SageMakerComponentSpec[SageMakerRLEstimatorInputs, SageMakerRLEstimatorOutputs]
):
INPUTS: SageMakerRLEstimatorInputs = SageMakerRLEstimatorInputs(
job_name=InputValidator(
input_type=str,
required=False,
description="Training job name.",
default="",
),
role=InputValidator(
input_type=str,
required=True,
description="The Amazon Resource Name (ARN) that Amazon SageMaker assumes to perform tasks on your behalf.",
),
image=InputValidator(
input_type=str,
required=False,
description="An ECR url. If specified, the estimator will use this image for training and hosting",
default=None,
),
entry_point=InputValidator(
input_type=str,
required=True,
description="Path (absolute or relative) to the Python source file which should be executed as the entry point to training.",
default="",
),
source_dir=InputValidator(
input_type=str,
required=False,
description="Path (S3 URI) to a directory with any other training source code dependencies aside from the entry point file.",
default="",
),
toolkit=InputValidator(
input_type=str,
choices=["coach", "ray", ""],
required=False,
description="RL toolkit you want to use for executing your model training code.",
default="",
),
toolkit_version=InputValidator(
input_type=str,
required=False,
description="RL toolkit version you want to be use for executing your model training code.",
default=None,
),
framework=InputValidator(
input_type=str,
choices=["tensorflow", "mxnet", "pytorch", ""],
required=False,
description="Framework (MXNet, TensorFlow or PyTorch) you want to be used as a toolkit backed for reinforcement learning training.",
default="",
),
metric_definitions=InputValidator(
input_type=SpecInputParsers.yaml_or_json_list,
required=False,
description="The dictionary of name-regex pairs specify the metrics that the algorithm emits.",
default=[],
),
training_input_mode=InputValidator(
choices=["File", "Pipe"],
input_type=str,
description="The input mode that the algorithm supports. File or Pipe.",
default="File",
),
hyperparameters=InputValidator(
input_type=SpecInputParsers.yaml_or_json_dict,
required=False,
description="Hyperparameters that will be used for training.",
default={},
),
instance_type=InputValidator(
input_type=str,
required=False,
description="The ML compute instance type.",
default="ml.m4.xlarge",
),
instance_count=InputValidator(
input_type=int,
required=False,
description="The number of ML compute instances to use in the training job.",
default=1,
),
volume_size=InputValidator(
input_type=int,
required=True,
description="The size of the ML storage volume that you want to provision.",
default=30,
),
max_run=InputValidator(
input_type=int,
required=False,
description="Timeout in seconds for training (default: 24 * 60 * 60).",
default=24 * 60 * 60,
),
model_artifact_path=InputValidator(
input_type=str,
required=True,
description="Identifies the S3 path where you want Amazon SageMaker to store the model artifacts.",
),
vpc_security_group_ids=InputValidator(
input_type=SpecInputParsers.yaml_or_json_list,
required=False,
description="The VPC security group IDs, in the form sg-xxxxxxxx.",
default=[],
),
vpc_subnets=InputValidator(
input_type=SpecInputParsers.yaml_or_json_list,
required=False,
description="The ID of the subnets in the VPC to which you want to connect your hpo job.",
default=[],
),
network_isolation=InputValidator(
input_type=SpecInputParsers.str_to_bool,
description="Isolates the training container.",
default=False,
),
traffic_encryption=InputValidator(
input_type=SpecInputParsers.str_to_bool,
description="Encrypts all communications between ML compute instances in distributed training.",
default=False,
),
debug_hook_config=InputValidator(
input_type=SpecInputParsers.yaml_or_json_dict,
required=False,
description="Configuration information for the debug hook parameters, collection configuration, and storage paths.",
default={},
),
debug_rule_config=InputValidator(
input_type=SpecInputParsers.yaml_or_json_list,
required=False,
description="Configuration information for debugging rules.",
default=[],
),
**vars(COMMON_INPUTS),
**vars(SPOT_INSTANCE_INPUTS)
)
OUTPUTS = SageMakerRLEstimatorOutputs(
model_artifact_url=OutputValidator(description="The model artifacts URL."),
job_name=OutputValidator(description="The training job name."),
training_image=OutputValidator(
description="The registry path of the Docker image that contains the training algorithm."
),
)
def __init__(self, arguments: List[str]):
super().__init__(
arguments, SageMakerRLEstimatorInputs, SageMakerRLEstimatorOutputs
)
@property
def inputs(self) -> SageMakerRLEstimatorInputs:
return self._inputs
@property
def outputs(self) -> SageMakerRLEstimatorOutputs:
return self._outputs
@property
def output_paths(self) -> SageMakerRLEstimatorOutputs:
return self._output_paths

View File

@ -0,0 +1,77 @@
# RoboMaker Simulation Job Kubeflow Pipelines component
## Summary
Component to run a RoboMaker Simulation Job from a Kubeflow Pipelines workflow.
https://docs.aws.amazon.com/robomaker/latest/dg/API_CreateSimulationJob.html
## Intended Use
For running your simulation workloads using AWS RoboMaker.
## Runtime Arguments
Argument | Description | Optional | Data type | Accepted values | Default |
:--- | :---------- | :----------| :----------| :---------- | :----------|
region | The region where the cluster launches | No | String | | |
endpoint_url | The endpoint URL for the private link VPC endpoint | Yes | String | | |
assume_role | The ARN of an IAM role to assume when connecting to SageMaker | Yes | String | | |
app_name | The name of the simulation application. Must be unique within the same AWS account and AWS region | Yes | String | | SimulationApplication-[datetime]-[random id]|
role | The Amazon Resource Name (ARN) that Amazon RoboMaker assumes to perform tasks on your behalf | No | String | | |
output_bucket | The bucket to place outputs from the simulation job | No | String | | |
output_path | The S3 key where outputs from the simulation job are placed | No | String | | |
max_run | Timeout in seconds for simulation job (default: 8 * 60 * 60) | No | String | | |
failure_behavior | The failure behavior the simulation job (Continue|Fail) | Yes | String | | |
sim_app_arn | The application ARN for the simulation application | Yes | String | | |
sim_app_version | The application version for the simulation application | Yes | String | | |
sim_app_launch_config | The launch configuration for the simulation application | Yes | String | | |
sim_app_world_config | A list of world configurations | Yes | List of Dicts | | [] |
robot_app_arn | The application ARN for the robot application | Yes | String | | |
robot_app_version | The application version for the robot application | Yes | String | | |
robot_app_launch_config | The launch configuration for the robot application | Yes | Dict | | {} |
data_sources | Specify data sources to mount read-only files from S3 into your simulation | Yes | List of Dicts | | [] |
vpc_security_group_ids | A comma-delimited list of security group IDs, in the form sg-xxxxxxxx | Yes | String | | |
vpc_subnets | A comma-delimited list of subnet IDs in the VPC to which you want to connect your simulation job | Yes | String | | |
use_public_ip | A boolean indicating whether to assign a public IP address | Yes | Bool | | False |
sim_unit_limit | The simulation unit limit | Yes | String | | |
record_ros_topics | A boolean indicating whether to record all ROS topics (Used for logging) | Yes | Bool | | False |
tags | Key-value pairs to categorize AWS resources | Yes | Dict | | {} |
Notes:
* This component can be ran in a pipeline with the Create Simulation App and Delete Simulation App components or as a standalone.
* One of sim_app_arn or robot_app_arn and any related inputs must be provided.
* The format for the [`sim_app_launch_config`](https://docs.aws.amazon.com/robomaker/latest/dg/API_LaunchConfig.html) field is:
```
{
"packageName": "string",
"launchFile": "string",
"environmentVariables": {
"string": "string",
},
"streamUI": "bool",
}
```
* The format for the [`sim_app_world_config`](https://docs.aws.amazon.com/robomaker/latest/dg/API_WorldConfig.html) field is:
```
{
"world": "string"
}
```
* The format for the [`robot_app_launch_config`](https://docs.aws.amazon.com/robomaker/latest/dg/API_LaunchConfig.html) field is:
```
{
"packageName": "string",
"launchFile": "string",
"environmentVariables": {
"string": "string",
},
"streamUI": "bool",
}
```
## Output
The output of the simulation job is sent to the location configured via output_artifacts
# Example code
Example of creating a Sim app, then a Sim job and finally deleting the Sim app : [robomaker_simulation_job_app](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/robomaker_simulation/robomaker_simulation_job_app.py)
# Resources
* [Create RoboMaker Simulation Job via Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/robomaker.html#RoboMaker.Client.create_simulation_job)

View File

@ -0,0 +1,109 @@
name: RoboMaker - Create Simulation Job
description: Creates a simulation job.
inputs:
- {name: region, type: String, description: The region for the SageMaker resource.}
- {name: endpoint_url, type: String, description: The URL to use when communicating
with the SageMaker service., default: ''}
- {name: assume_role, type: String, description: The ARN of an IAM role to assume
when connecting to SageMaker., default: ''}
- {name: tags, type: JsonObject, description: 'An array of key-value pairs, to categorize
AWS resources.', default: '{}'}
- {name: role, type: String, description: The Amazon Resource Name (ARN) that Amazon
RoboMaker assumes to perform tasks on your behalf.}
- {name: output_bucket, type: String, description: The bucket to place outputs from
the simulation job., default: ''}
- {name: output_path, type: String, description: The S3 key where outputs from the
simulation job are placed., default: ''}
- {name: max_run, type: Integer, description: 'Timeout in seconds for simulation job
(default: 8 * 60 * 60).', default: '28800'}
- {name: failure_behavior, type: String, description: The failure behavior the simulation
job (Continue|Fail)., default: Fail}
- {name: sim_app_arn, type: String, description: The application ARN for the simulation
application., default: ''}
- {name: sim_app_version, type: String, description: The application version for the
simulation application., default: ''}
- {name: sim_app_launch_config, type: JsonObject, description: The launch configuration
for the simulation application., default: '{}'}
- {name: sim_app_world_config, type: JsonArray, description: A list of world configurations.,
default: '[]'}
- {name: robot_app_arn, type: String, description: The application ARN for the robot
application., default: ''}
- {name: robot_app_version, type: String, description: The application version for
the robot application., default: ''}
- {name: robot_app_launch_config, type: JsonObject, description: The launch configuration
for the robot application., default: '{}'}
- {name: data_sources, type: JsonArray, description: Specify data sources to mount
read-only files from S3 into your simulation., default: '[]'}
- {name: vpc_security_group_ids, type: JsonArray, description: 'The VPC security group
IDs, in the form sg-xxxxxxxx.', default: '[]'}
- {name: vpc_subnets, type: JsonArray, description: The ID of the subnets in the VPC
to which you want to connect your simulation job., default: '[]'}
- name: use_public_ip
type: Bool
description: A boolean indicating whether to assign a public IP address.
default: "False"
- {name: sim_unit_limit, type: Integer, description: The simulation unit limit., default: '15'}
- name: record_ros_topics
type: Bool
description: A boolean indicating whether to record all ROS topics. Used for logging.
default: "False"
outputs:
- {name: arn, description: The Amazon Resource Name (ARN) of the simulation job.}
- {name: output_artifacts, description: The simulation job artifacts URL.}
- {name: job_id, description: The simulation job id.}
implementation:
container:
image: amazon/aws-sagemaker-kfp-components:1.1.0
command: [python3]
args:
- simulation_job/src/robomaker_simulation_job_component.py
- --region
- {inputValue: region}
- --endpoint_url
- {inputValue: endpoint_url}
- --assume_role
- {inputValue: assume_role}
- --tags
- {inputValue: tags}
- --role
- {inputValue: role}
- --output_bucket
- {inputValue: output_bucket}
- --output_path
- {inputValue: output_path}
- --max_run
- {inputValue: max_run}
- --failure_behavior
- {inputValue: failure_behavior}
- --sim_app_arn
- {inputValue: sim_app_arn}
- --sim_app_version
- {inputValue: sim_app_version}
- --sim_app_launch_config
- {inputValue: sim_app_launch_config}
- --sim_app_world_config
- {inputValue: sim_app_world_config}
- --robot_app_arn
- {inputValue: robot_app_arn}
- --robot_app_version
- {inputValue: robot_app_version}
- --robot_app_launch_config
- {inputValue: robot_app_launch_config}
- --data_sources
- {inputValue: data_sources}
- --vpc_security_group_ids
- {inputValue: vpc_security_group_ids}
- --vpc_subnets
- {inputValue: vpc_subnets}
- --use_public_ip
- {inputValue: use_public_ip}
- --sim_unit_limit
- {inputValue: sim_unit_limit}
- --record_ros_topics
- {inputValue: record_ros_topics}
- --arn_output_path
- {outputPath: arn}
- --output_artifacts_output_path
- {outputPath: output_artifacts}
- --job_id_output_path
- {outputPath: job_id}

View File

@ -0,0 +1,252 @@
"""RoboMaker component for creating a simulation job."""
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
from typing import Dict
from simulation_job.src.robomaker_simulation_job_spec import (
RoboMakerSimulationJobSpec,
RoboMakerSimulationJobInputs,
RoboMakerSimulationJobOutputs,
)
from common.sagemaker_component import (
SageMakerComponent,
ComponentMetadata,
SageMakerJobStatus,
)
from common.boto3_manager import Boto3Manager
from common.common_inputs import SageMakerComponentCommonInputs
@ComponentMetadata(
name="RoboMaker - Create Simulation Job",
description="Creates a simulation job.",
spec=RoboMakerSimulationJobSpec,
)
class RoboMakerSimulationJobComponent(SageMakerComponent):
"""RoboMaker component for creating a simulation job."""
def Do(self, spec: RoboMakerSimulationJobSpec):
super().Do(spec.inputs, spec.outputs, spec.output_paths)
def _get_job_status(self) -> SageMakerJobStatus:
response = self._rm_client.describe_simulation_job(job=self._arn)
status = response["status"]
if status in ["Completed"]:
return SageMakerJobStatus(
is_completed=True, has_error=False, raw_status=status
)
if status in ["Terminating", "Terminated", "Canceled"]:
if "failureCode" in response:
simulation_message = (
f"Simulation failed with code:{response['failureCode']}"
)
return SageMakerJobStatus(
is_completed=True,
has_error=True,
error_message=simulation_message,
raw_status=status,
)
else:
simulation_message = "Exited without error code.\n"
if "failureReason" in response:
simulation_message += (
f"Simulation exited with reason:{response['failureReason']}\n"
)
return SageMakerJobStatus(
is_completed=True,
has_error=False,
error_message=simulation_message,
raw_status=status,
)
if status in ["Failed", "RunningFailed"]:
failure_message = f"Simulation job is in status:{status}\n"
if "failureReason" in response:
failure_message += (
f"Simulation failed with reason:{response['failureReason']}"
)
if "failureCode" in response:
failure_message += (
f"Simulation failed with errorCode:{response['failureCode']}"
)
return SageMakerJobStatus(
is_completed=True,
has_error=True,
error_message=failure_message,
raw_status=status,
)
return SageMakerJobStatus(is_completed=False, raw_status=status)
def _configure_aws_clients(self, inputs: SageMakerComponentCommonInputs):
"""Configures the internal AWS clients for the component.
Args:
inputs: A populated list of user inputs.
"""
self._rm_client = Boto3Manager.get_robomaker_client(
self._get_component_version(),
inputs.region,
endpoint_url=inputs.endpoint_url,
assume_role_arn=inputs.assume_role,
)
self._cw_client = Boto3Manager.get_cloudwatch_client(
inputs.region, assume_role_arn=inputs.assume_role
)
def _after_job_complete(
self,
job: Dict,
request: Dict,
inputs: RoboMakerSimulationJobInputs,
outputs: RoboMakerSimulationJobOutputs,
):
outputs.output_artifacts = self._get_job_outputs()
logging.info(
"Simulation Job in RoboMaker: https://{}.console.aws.amazon.com/robomaker/home?region={}#/simulationJobs/{}".format(
inputs.region, inputs.region, self._job_id
)
)
def _on_job_terminated(self):
self._rm_client.cancel_simulation_job(application=self._arn)
def _create_job_request(
self,
inputs: RoboMakerSimulationJobInputs,
outputs: RoboMakerSimulationJobOutputs,
) -> Dict:
"""
Documentation:https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/robomaker.html#RoboMaker.Client.create_simulation_job
"""
# Need one of sim_app_arn or robot_app_arn to be provided
if not inputs.sim_app_arn and not inputs.robot_app_arn:
logging.error("Must specify a Simulation App ARN or a Robot App ARN.")
raise Exception("Could not create simulation job request")
request = self._get_request_template("robomaker.simulation.job")
# Set the required inputs
request["outputLocation"]["s3Bucket"] = inputs.output_bucket
request["outputLocation"]["s3Prefix"] = inputs.output_path
request["maxJobDurationInSeconds"] = inputs.max_run
request["iamRole"] = inputs.role
# Set networking inputs
if inputs.vpc_subnets:
request["vpcConfig"]["subnets"] = inputs.vpc_subnets
if inputs.vpc_security_group_ids:
request["vpcConfig"]["securityGroups"] = inputs.vpc_security_group_ids
if inputs.use_public_ip:
request["vpcConfig"]["assignPublicIp"] = inputs.use_public_ip
else:
request.pop("vpcConfig")
# Set simulation application inputs
if inputs.sim_app_arn:
if not inputs.sim_app_launch_config:
logging.error("Must specify a Launch Config for your Simulation App")
raise Exception("Could not create simulation job request")
sim_app = {
"application": inputs.sim_app_arn,
"launchConfig": inputs.sim_app_launch_config,
}
if inputs.sim_app_version:
sim_app["version"]: inputs.sim_app_version
if inputs.sim_app_world_config:
sim_app["worldConfigs"]: inputs.sim_app_world_config
request["simulationApplications"].append(sim_app)
else:
request.pop("simulationApplications")
# Set robot application inputs
if inputs.robot_app_arn:
if not inputs.robot_app_launch_config:
logging.error("Must specify a Launch Config for your Robot App")
raise Exception("Could not create simulation job request")
robot_app = {
"application": inputs.robot_app_arn,
"launchConfig": inputs.robot_app_launch_config,
}
if inputs.robot_app_version:
robot_app["version"]: inputs.robot_app_version
request["robotApplications"].append(robot_app)
else:
request.pop("robotApplications")
# Set optional inputs
if inputs.record_ros_topics:
request["loggingConfig"]["recordAllRosTopics"] = inputs.record_ros_topics
else:
request.pop("loggingConfig")
if inputs.failure_behavior:
request["failureBehavior"] = inputs.failure_behavior
else:
request.pop("failureBehavior")
if inputs.data_sources:
request["dataSources"] = inputs.data_sources
else:
request.pop("dataSources")
if inputs.sim_unit_limit:
request["compute"]["simulationUnitLimit"] = inputs.sim_unit_limit
self._enable_tag_support(request, inputs)
return request
def _submit_job_request(self, request: Dict) -> Dict:
return self._rm_client.create_simulation_job(**request)
def _after_submit_job_request(
self,
job: Dict,
request: Dict,
inputs: RoboMakerSimulationJobInputs,
outputs: RoboMakerSimulationJobOutputs,
):
outputs.arn = self._arn = job["arn"]
outputs.job_id = self._job_id = job["arn"].split("/")[-1]
logging.info(f"Started Robomaker Simulation Job with ID: {self._job_id}")
logging.info(
"Simulation Job in RoboMaker: https://{}.console.aws.amazon.com/robomaker/home?region={}#/simulationJobs/{}".format(
inputs.region, inputs.region, self._job_id
)
)
def _print_logs_for_job(self):
self._print_cloudwatch_logs("/aws/robomaker/SimulationJobs", self._job_id)
def _get_job_outputs(self):
"""Map the S3 outputs of a simulation job to a dictionary object.
Returns:
dict: A dictionary of output S3 URIs.
"""
response = self._rm_client.describe_simulation_job(job=self._arn)
artifact_uri = f"s3://{response['outputLocation']['s3Bucket']}/{response['outputLocation']['s3Prefix']}"
return artifact_uri
if __name__ == "__main__":
import sys
spec = RoboMakerSimulationJobSpec(sys.argv[1:])
component = RoboMakerSimulationJobComponent()
component.Do(spec)

View File

@ -0,0 +1,200 @@
"""Specification for the RoboMaker create simulation job component."""
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from dataclasses import dataclass
from typing import List
from common.sagemaker_component_spec import SageMakerComponentSpec
from common.spec_input_parsers import SpecInputParsers
from common.common_inputs import (
COMMON_INPUTS,
SageMakerComponentCommonInputs,
SageMakerComponentInput as Input,
SageMakerComponentOutput as Output,
SageMakerComponentBaseOutputs,
SageMakerComponentInputValidator as InputValidator,
SageMakerComponentOutputValidator as OutputValidator,
)
@dataclass(frozen=True)
class RoboMakerSimulationJobInputs(SageMakerComponentCommonInputs):
"""Defines the set of inputs for the create simulation job component."""
role: Input
output_bucket: Input
output_path: Input
max_run: Input
failure_behavior: Input
sim_app_arn: Input
sim_app_version: Input
sim_app_launch_config: Input
sim_app_world_config: Input
robot_app_arn: Input
robot_app_version: Input
robot_app_launch_config: Input
data_sources: Input
vpc_security_group_ids: Input
vpc_subnets: Input
use_public_ip: Input
sim_unit_limit: Input
record_ros_topics: Input
@dataclass
class RoboMakerSimulationJobOutputs(SageMakerComponentBaseOutputs):
"""Defines the set of outputs for the create simulation job component."""
arn: Output
output_artifacts: Output
job_id: Output
class RoboMakerSimulationJobSpec(
SageMakerComponentSpec[RoboMakerSimulationJobInputs, RoboMakerSimulationJobOutputs]
):
INPUTS: RoboMakerSimulationJobInputs = RoboMakerSimulationJobInputs(
role=InputValidator(
input_type=str,
required=True,
description="The Amazon Resource Name (ARN) that Amazon RoboMaker assumes to perform tasks on your behalf.",
),
output_bucket=InputValidator(
input_type=str,
required=True,
description="The bucket to place outputs from the simulation job.",
default="",
),
output_path=InputValidator(
input_type=str,
required=True,
description="The S3 key where outputs from the simulation job are placed.",
default="",
),
max_run=InputValidator(
input_type=int,
required=True,
description="Timeout in seconds for simulation job (default: 8 * 60 * 60).",
default=8 * 60 * 60,
),
failure_behavior=InputValidator(
input_type=str,
required=False,
description="The failure behavior the simulation job (Continue|Fail).",
default="Fail",
),
sim_app_arn=InputValidator(
input_type=str,
required=False,
description="The application ARN for the simulation application.",
default="",
),
sim_app_version=InputValidator(
input_type=str,
required=False,
description="The application version for the simulation application.",
default="",
),
sim_app_launch_config=InputValidator(
input_type=SpecInputParsers.yaml_or_json_dict,
required=False,
description="The launch configuration for the simulation application.",
default={},
),
sim_app_world_config=InputValidator(
input_type=SpecInputParsers.yaml_or_json_list,
required=False,
description="A list of world configurations.",
default=[],
),
robot_app_arn=InputValidator(
input_type=str,
required=False,
description="The application ARN for the robot application.",
default="",
),
robot_app_version=InputValidator(
input_type=str,
required=False,
description="The application version for the robot application.",
default="",
),
robot_app_launch_config=InputValidator(
input_type=SpecInputParsers.yaml_or_json_dict,
required=False,
description="The launch configuration for the robot application.",
default={},
),
data_sources=InputValidator(
input_type=SpecInputParsers.yaml_or_json_list,
required=False,
description="Specify data sources to mount read-only files from S3 into your simulation.",
default=[],
),
vpc_security_group_ids=InputValidator(
input_type=SpecInputParsers.yaml_or_json_list,
required=False,
description="The VPC security group IDs, in the form sg-xxxxxxxx.",
default=[],
),
vpc_subnets=InputValidator(
input_type=SpecInputParsers.yaml_or_json_list,
required=False,
description="The ID of the subnets in the VPC to which you want to connect your simulation job.",
default=[],
),
use_public_ip=InputValidator(
input_type=bool,
description="A boolean indicating whether to assign a public IP address.",
default=False,
),
sim_unit_limit=InputValidator(
input_type=int,
required=False,
description="The simulation unit limit.",
default=15,
),
record_ros_topics=InputValidator(
input_type=bool,
description="A boolean indicating whether to record all ROS topics. Used for logging.",
default=False,
),
**vars(COMMON_INPUTS),
)
OUTPUTS = RoboMakerSimulationJobOutputs(
arn=OutputValidator(
description="The Amazon Resource Name (ARN) of the simulation job."
),
output_artifacts=OutputValidator(
description="The simulation job artifacts URL."
),
job_id=OutputValidator(description="The simulation job id."),
)
def __init__(self, arguments: List[str]):
super().__init__(
arguments, RoboMakerSimulationJobInputs, RoboMakerSimulationJobOutputs,
)
@property
def inputs(self) -> RoboMakerSimulationJobInputs:
return self._inputs
@property
def outputs(self) -> RoboMakerSimulationJobOutputs:
return self._outputs
@property
def output_paths(self) -> RoboMakerSimulationJobOutputs:
return self._output_paths

View File

@ -0,0 +1,72 @@
# RoboMaker Simulation Job Batch Kubeflow Pipelines component
## Summary
Component to run a RoboMaker Simulation Job Batch from a Kubeflow Pipelines workflow.
https://docs.aws.amazon.com/robomaker/latest/dg/API_StartSimulationJobBatch.html
## Intended Use
For running your simulation workloads using AWS RoboMaker.
max_concurrency: Input
simulation_job_requests: Input
sim_app_arn: Input
## Runtime Arguments
Argument | Description | Optional | Data type | Accepted values | Default |
:--- | :---------- | :----------| :----------| :---------- | :----------|
region | The region where the cluster launches | No | String | | |
endpoint_url | The endpoint URL for the private link VPC endpoint | Yes | String | | |
assume_role | The ARN of an IAM role to assume when connecting to SageMaker | Yes | String | | |
app_name | The name of the simulation application. Must be unique within the same AWS account and AWS region | Yes | String | | SimulationApplication-[datetime]-[random id]|
role | The Amazon Resource Name (ARN) that Amazon RoboMaker assumes to perform tasks on your behalf | No | String | | |
timeout_in_secs | The amount of time, in seconds, to wait for the batch to complete | Yes | String | | |
max_concurrency | The number of active simulation jobs create as part of the batch that can be in an active state at the same time | Yes | Int | | |
simulation_job_requests | A list of simulation job requests to create in the batch | No | List of Dicts | | [] |
sim_app_arn | The application ARN for the simulation application | Yes | String | | |
tags | Key-value pairs to categorize AWS resources | Yes | Dict | | {} |
Notes:
* This component can be ran in a pipeline with the Create Simulation App and Delete Simulation App components or as a standalone.
* One of sim_app_arn can be provided as an input, or can be embedded as the 'application' value for any of the simulation_job_requests.
* The format for the [`simulation_job_requests`](https://docs.aws.amazon.com/robomaker/latest/dg/API_SimulationJobRequest.html) field is:
```
[
{
"outputLocation": {
"s3Bucket": "string",
"s3Prefix": "string",
},
"loggingConfig": {"recordAllRosTopics": "bool"},
"maxJobDurationInSeconds": "int",
"iamRole": "string",
"failureBehavior": "string",
"simulationApplications": [
{
"application": "string",
"launchConfig": {
"packageName": "string",
"launchFile": "string",
"environmentVariables": {
"string": "string",
},
"streamUI": "bool",
},
}
],
"vpcConfig": {
"subnets": "list",
"securityGroups": "list",
"assignPublicIp": "bool",
},
}
]
```
## Output
The ARN and ID of the batch job.
# Example code
Example of creating a Sim app, then a Sim job batch and finally deleting the Sim app : [robomaker_simulation_job_batch_app](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/robomaker_simulation/robomaker_simulation_job_batch_app.py)
# Resources
* [Create RoboMaker Simulation Job Batch via Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/robomaker.html#RoboMaker.Client.start_simulation_job_batch)

View File

@ -0,0 +1,52 @@
name: RoboMaker - Create Simulation Job Batch
description: Creates a simulation job batch.
inputs:
- {name: region, type: String, description: The region for the SageMaker resource.}
- {name: endpoint_url, type: String, description: The URL to use when communicating
with the SageMaker service., default: ''}
- {name: assume_role, type: String, description: The ARN of an IAM role to assume
when connecting to SageMaker., default: ''}
- {name: tags, type: JsonObject, description: 'An array of key-value pairs, to categorize
AWS resources.', default: '{}'}
- {name: role, type: String, description: The Amazon Resource Name (ARN) that Amazon
RoboMaker assumes to perform tasks on your behalf.}
- {name: timeout_in_secs, type: Integer, description: 'The amount of time, in seconds,
to wait for the batch to complete.', default: '0'}
- {name: max_concurrency, type: Integer, description: The number of active simulation
jobs create as part of the batch that can be in an active state at the same time.,
default: '0'}
- {name: simulation_job_requests, type: JsonArray, description: A list of simulation
job requests to create in the batch., default: '[]'}
- {name: sim_app_arn, type: String, description: The application ARN for the simulation
application., default: ''}
outputs:
- {name: arn, description: The Amazon Resource Name (ARN) of the simulation job.}
- {name: batch_job_id, description: The simulation job batch id.}
implementation:
container:
image: amazon/aws-sagemaker-kfp-components:1.1.0
command: [python3]
args:
- simulation_job_batch/src/robomaker_simulation_job_batch_component.py
- --region
- {inputValue: region}
- --endpoint_url
- {inputValue: endpoint_url}
- --assume_role
- {inputValue: assume_role}
- --tags
- {inputValue: tags}
- --role
- {inputValue: role}
- --timeout_in_secs
- {inputValue: timeout_in_secs}
- --max_concurrency
- {inputValue: max_concurrency}
- --simulation_job_requests
- {inputValue: simulation_job_requests}
- --sim_app_arn
- {inputValue: sim_app_arn}
- --arn_output_path
- {outputPath: arn}
- --batch_job_id_output_path
- {outputPath: batch_job_id}

View File

@ -0,0 +1,194 @@
"""RoboMaker component for creating a simulation job batch."""
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
from typing import Dict
from simulation_job_batch.src.robomaker_simulation_job_batch_spec import (
RoboMakerSimulationJobBatchSpec,
RoboMakerSimulationJobBatchInputs,
RoboMakerSimulationJobBatchOutputs,
)
from common.sagemaker_component import (
SageMakerComponent,
ComponentMetadata,
SageMakerJobStatus,
)
from common.boto3_manager import Boto3Manager
from common.common_inputs import SageMakerComponentCommonInputs
@ComponentMetadata(
name="RoboMaker - Create Simulation Job Batch",
description="Creates a simulation job batch.",
spec=RoboMakerSimulationJobBatchSpec,
)
class RoboMakerSimulationJobBatchComponent(SageMakerComponent):
"""RoboMaker component for creating a simulation job."""
def Do(self, spec: RoboMakerSimulationJobBatchSpec):
super().Do(spec.inputs, spec.outputs, spec.output_paths)
def _get_job_status(self) -> SageMakerJobStatus:
batch_response = self._rm_client.describe_simulation_job_batch(batch=self._arn)
batch_status = batch_response["status"]
if batch_status in ["Completed"]:
return SageMakerJobStatus(
is_completed=True, has_error=False, raw_status=batch_status
)
if batch_status in ["TimedOut", "Canceled"]:
simulation_message = "Simulation jobs are completed\n"
has_error = False
for completed_request in batch_response["createdRequests"]:
self._sim_request_ids.add(completed_request["arn"].split("/")[-1])
simulation_response = self._rm_client.describe_simulation_job(
job=completed_request["arn"]
)
if "failureCode" in simulation_response:
simulation_message += f"Simulation job: {simulation_response['arn']} failed with errorCode:{simulation_response['failureCode']}\n"
has_error = True
return SageMakerJobStatus(
is_completed=True,
has_error=has_error,
error_message=simulation_message,
raw_status=batch_status,
)
if batch_status in ["Failed"]:
failure_message = f"Simulation batch job is in status:{batch_status}\n"
if "failureReason" in batch_response:
failure_message += (
f"Simulation failed with reason:{batch_response['failureReason']}"
)
if "failureCode" in batch_response:
failure_message += (
f"Simulation failed with errorCode:{batch_response['failureCode']}"
)
return SageMakerJobStatus(
is_completed=True,
has_error=True,
error_message=failure_message,
raw_status=batch_status,
)
return SageMakerJobStatus(is_completed=False, raw_status=batch_status)
def _configure_aws_clients(self, inputs: SageMakerComponentCommonInputs):
"""Configures the internal AWS clients for the component.
Args:
inputs: A populated list of user inputs.
"""
self._rm_client = Boto3Manager.get_robomaker_client(
self._get_component_version(),
inputs.region,
endpoint_url=inputs.endpoint_url,
assume_role_arn=inputs.assume_role,
)
self._cw_client = Boto3Manager.get_cloudwatch_client(
inputs.region, assume_role_arn=inputs.assume_role
)
def _after_job_complete(
self,
job: Dict,
request: Dict,
inputs: RoboMakerSimulationJobBatchInputs,
outputs: RoboMakerSimulationJobBatchOutputs,
):
for sim_request_id in self._sim_request_ids:
logging.info(
"Simulation Job in RoboMaker: https://{}.console.aws.amazon.com/robomaker/home?region={}#/simulationJobBatches/{}".format(
inputs.region, inputs.region, sim_request_id
)
)
def _on_job_terminated(self):
self._rm_client.cancel_simulation_job_batch(batch=self._arn)
def _create_job_request(
self,
inputs: RoboMakerSimulationJobBatchInputs,
outputs: RoboMakerSimulationJobBatchOutputs,
) -> Dict:
"""
Documentation:https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/robomaker.html#RoboMaker.Client.start_simulation_job_batch
"""
request = self._get_request_template("robomaker.simulation.job.batch")
# Set batch policy inputs
if inputs.timeout_in_secs:
request["batchPolicy"]["timeoutInSeconds"] = inputs.timeout_in_secs
if inputs.max_concurrency:
request["batchPolicy"]["maxConcurrency"] = inputs.max_concurrency
if not inputs.timeout_in_secs and not inputs.max_concurrency:
request.pop("batchPolicy")
# Set the simulation job inputs
request["createSimulationJobRequests"] = inputs.simulation_job_requests
# Override with ARN of sim application from input. Can be used to pass ARN from create sim app component.
if inputs.sim_app_arn:
for sim_job_request in request["createSimulationJobRequests"]:
for sim_jobs in sim_job_request["simulationApplications"]:
sim_jobs["application"] = inputs.sim_app_arn
return request
def _submit_job_request(self, request: Dict) -> Dict:
return self._rm_client.start_simulation_job_batch(**request)
def _after_submit_job_request(
self,
job: Dict,
request: Dict,
inputs: RoboMakerSimulationJobBatchInputs,
outputs: RoboMakerSimulationJobBatchOutputs,
):
outputs.arn = self._arn = job["arn"]
outputs.batch_job_id = self._batch_job_id = job["arn"].split("/")[-1]
logging.info(
f"Started Robomaker Simulation Job Batch with ID: {self._batch_job_id}"
)
logging.info(
"Simulation Job Batch in RoboMaker: https://{}.console.aws.amazon.com/robomaker/home?region={}#/simulationJobBatches/{}".format(
inputs.region, inputs.region, self._batch_job_id
)
)
self._sim_request_ids = set()
for created_request in job["createdRequests"]:
self._sim_request_ids.add(created_request["arn"].split("/")[-1])
logging.info(
f"Started Robomaker Simulation Job with ID: {created_request['arn'].split('/')[-1]}"
)
# Inform if we have any pending or failed requests
if job["pendingRequests"]:
logging.info("Some Simulation Requests are in state Pending")
if job["failedRequests"]:
logging.info("Some Simulation Requests are in state Failed")
def _print_logs_for_job(self):
for sim_request_id in self._sim_request_ids:
self._print_cloudwatch_logs("/aws/robomaker/SimulationJobs", sim_request_id)
if __name__ == "__main__":
import sys
spec = RoboMakerSimulationJobBatchSpec(sys.argv[1:])
component = RoboMakerSimulationJobBatchComponent()
component.Do(spec)

View File

@ -0,0 +1,111 @@
"""Specification for the RoboMaker create simulation job batch component."""
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from dataclasses import dataclass
from typing import List
from common.sagemaker_component_spec import SageMakerComponentSpec
from common.spec_input_parsers import SpecInputParsers
from common.common_inputs import (
COMMON_INPUTS,
SageMakerComponentCommonInputs,
SageMakerComponentInput as Input,
SageMakerComponentOutput as Output,
SageMakerComponentBaseOutputs,
SageMakerComponentInputValidator as InputValidator,
SageMakerComponentOutputValidator as OutputValidator,
)
@dataclass(frozen=True)
class RoboMakerSimulationJobBatchInputs(SageMakerComponentCommonInputs):
"""Defines the set of inputs for the create simulation job batch component."""
role: Input
timeout_in_secs: Input
max_concurrency: Input
simulation_job_requests: Input
sim_app_arn: Input
@dataclass
class RoboMakerSimulationJobBatchOutputs(SageMakerComponentBaseOutputs):
"""Defines the set of outputs for the create simulation job batch component."""
arn: Output
batch_job_id: Output
class RoboMakerSimulationJobBatchSpec(
SageMakerComponentSpec[
RoboMakerSimulationJobBatchInputs, RoboMakerSimulationJobBatchOutputs
]
):
INPUTS: RoboMakerSimulationJobBatchInputs = RoboMakerSimulationJobBatchInputs(
role=InputValidator(
input_type=str,
required=True,
description="The Amazon Resource Name (ARN) that Amazon RoboMaker assumes to perform tasks on your behalf.",
),
timeout_in_secs=InputValidator(
input_type=int,
required=False,
description="The amount of time, in seconds, to wait for the batch to complete.",
default=0,
),
max_concurrency=InputValidator(
input_type=int,
required=False,
description="The number of active simulation jobs create as part of the batch that can be in an active state at the same time.",
default=0,
),
simulation_job_requests=InputValidator(
input_type=SpecInputParsers.yaml_or_json_list,
required=True,
description="A list of simulation job requests to create in the batch.",
default=[],
),
sim_app_arn=InputValidator(
input_type=str,
required=False,
description="The application ARN for the simulation application.",
default="",
),
**vars(COMMON_INPUTS),
)
OUTPUTS = RoboMakerSimulationJobBatchOutputs(
arn=OutputValidator(
description="The Amazon Resource Name (ARN) of the simulation job."
),
batch_job_id=OutputValidator(description="The simulation job batch id."),
)
def __init__(self, arguments: List[str]):
super().__init__(
arguments,
RoboMakerSimulationJobBatchInputs,
RoboMakerSimulationJobBatchOutputs,
)
@property
def inputs(self) -> RoboMakerSimulationJobBatchInputs:
return self._inputs
@property
def outputs(self) -> RoboMakerSimulationJobBatchOutputs:
return self._outputs
@property
def output_paths(self) -> RoboMakerSimulationJobBatchOutputs:
return self._output_paths

View File

@ -6,6 +6,7 @@
REGION=us-east-1
SAGEMAKER_EXECUTION_ROLE_ARN=arn:aws:iam::123456789012:role/service-role/AmazonSageMaker-ExecutionRole-Example
ROBOMAKER_EXECUTION_ROLE_ARN=arn:aws:iam::123456789012:role/service-role/AmazonRoboMaker-ExecutionRole-Example
S3_DATA_BUCKET=my-data-bucket
# If you hope to use an existing EKS cluster, rather than creating a new one.

View File

@ -34,6 +34,7 @@ ENV PATH "/opt/conda/envs/kfp_test_env/bin":$PATH
# Environment variables to be used by tests
ENV REGION="us-west-2"
ENV SAGEMAKER_EXECUTION_ROLE_ARN="arn:aws:iam::1234567890:role/sagemaker-role"
ENV ROBOMAKER_EXECUTION_ROLE_ARN="arn:aws:iam::1234567890:role/robomaker-role"
ENV S3_DATA_BUCKET="kfp-test-data"
ENV MINIO_LOCAL_PORT=9000
ENV KFP_NAMESPACE="kubeflow"
@ -41,4 +42,6 @@ ENV KFP_NAMESPACE="kubeflow"
RUN mkdir pipelines
COPY ./ ./pipelines/
ENTRYPOINT [ "/bin/bash", "./pipelines/components/aws/sagemaker/tests/integration_tests/scripts/run_integration_tests" ]
WORKDIR /pipelines/components/aws/sagemaker/tests/integration_tests/scripts/
ENTRYPOINT [ "/bin/bash", "./run_integration_tests" ]

View File

@ -1,16 +1,32 @@
## Requirements
1. [Docker](https://www.docker.com/)
1. [IAM Role](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) with a SageMakerFullAccess and AmazonS3FullAccess
1. IAM User credentials with SageMakerFullAccess, AWSCloudFormationFullAccess, IAMFullAccess, AmazonEC2FullAccess, AmazonS3FullAccess permissions
1. [IAM Role](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) with a SageMakerFullAccess, RoboMakerFullAccess and AmazonS3FullAccess
1. IAM User credentials with SageMakerFullAccess, RoboMakerFullAccess, AWSCloudFormationFullAccess, IAMFullAccess, AmazonEC2FullAccess, AmazonS3FullAccess permissions
2. The SageMaker WorkTeam and GroundTruth Component tests expect that at least one private workteam already exists in the region where you are running these tests.
## Creating S3 buckets with datasets
1. In the following Python script, change the bucket name and run the [`s3_sample_data_creator.py`](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/mnist-kmeans-sagemaker#the-sample-dataset) to create an S3 bucket with the sample mnist dataset in the region where you want to run the tests.
2. To prepare the dataset for the SageMaker GroundTruth Component test, follow the steps in the `[GroundTruth Sample README](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/ground_truth_pipeline_demo#prep-the-dataset-label-categories-and-ui-template)`.
2. To prepare the dataset for the SageMaker GroundTruth Component test, follow the steps in the [GroundTruth Sample README](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/ground_truth_pipeline_demo#prep-the-dataset-label-categories-and-ui-template).
3. To prepare the processing script for the SageMaker Processing Component tests, upload the `scripts/kmeans_preprocessing.py` script to your bucket. This can be done by replacing `<my-bucket>` with your bucket name and running `aws s3 cp scripts/kmeans_preprocessing.py s3://<my-bucket>/mnist_kmeans_example/processing_code/kmeans_preprocessing.py`
4. Prepare RoboMaker Simulation App sources and Robot App sources and place them in the data bucket under the `/robomaker` key. The easiest way to create the files you need is to copy them from the public buckets that are used to store the [RoboMaker Hello World](https://console.aws.amazon.com/robomaker/home?region=us-east-1#sampleSimulationJobs) demos:
```bash
aws s3 cp s3://aws-robomaker-samples-us-east-1-1fd12c306611/hello-world/melodic/gazebo9/1.4.0.62/1.2.0/simulation_ws.tar .
aws s3 cp ./simulation_ws.tar s3://<your_bucket_name>/robomaker/simulation_ws.tar
aws s3 cp s3://aws-robomaker-samples-us-east-1-1fd12c306611/hello-world/melodic/gazebo9/1.4.0.62/1.2.0/robot_ws.tar .
aws s3 cp ./robot_ws.tar s3://<your_bucket_name>/robomaker/robot_ws.tar
```
The files in the `/robomaker` directory on S3 should follow this pattern:
```
/robomaker/simulation_ws.tar
/robomaker/robot_ws.tar
```
5. Prepare RLEstimator sources and place them in the data bucket under the `/rlestimator` key. The easiest way to create the files you need is to follow the notebooks outlined in the [RLEstimator Samples README](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/rlestimator_pipeline/README.md).
The files in the `/rlestimator` directory on S3 should follow this pattern:
```
/rlestimator/sourcedir.tar.gz
```
## Step to run integration tests
1. Copy the `.env.example` file to `.env` and in the following steps modify the fields of this new file:
@ -21,5 +37,5 @@
1. Build the image by doing the following:
1. Navigate to the root of this github directory.
1. Run `docker build . -f components/aws/sagemaker/tests/integration_tests/Dockerfile -t amazon/integration_test`
1. Run the image, injecting your environment variable files:
1. Run `docker run --env-file components/aws/sagemaker/tests/integration_tests/.env amazon/integration_test`
1. Run the image, injecting your environment variable files and mounting the repo files into the container:
1. Run `docker run -v <path_to_this_repo_on_your_machine>:/pipelines --env-file components/aws/sagemaker/tests/integration_tests/.env amazon/integration_test`

View File

@ -0,0 +1,127 @@
import random
import string
import pytest
import os
import utils
from utils import kfp_client_utils
from utils import minio_utils
from utils import sagemaker_utils
from utils import argo_utils
@pytest.mark.parametrize(
"test_file_dir",
[
pytest.param(
"resources/config/rlestimator-training", marks=pytest.mark.canary_test
),
],
)
def test_trainingjob(
kfp_client, experiment_id, region, sagemaker_client, test_file_dir
):
download_dir = utils.mkdir(os.path.join(test_file_dir + "/generated"))
test_params = utils.load_params(
utils.replace_placeholders(
os.path.join(test_file_dir, "config.yaml"),
os.path.join(download_dir, "config.yaml"),
)
)
test_params["Arguments"]["job_name"] = input_job_name = (
utils.generate_random_string(5) + "-" + test_params["Arguments"]["job_name"]
)
print(f"running test with job_name: {input_job_name}")
_, _, workflow_json = kfp_client_utils.compile_run_monitor_pipeline(
kfp_client,
experiment_id,
test_params["PipelineDefinition"],
test_params["Arguments"],
download_dir,
test_params["TestName"],
test_params["Timeout"],
)
outputs = {
"sagemaker-rlestimator-training-job": [
"job_name",
"model_artifact_url",
"training_image",
]
}
output_files = minio_utils.artifact_download_iterator(
workflow_json, outputs, download_dir
)
# Verify Training job was successful on SageMaker
training_job_name = utils.read_from_file_in_tar(
output_files["sagemaker-rlestimator-training-job"]["job_name"]
)
print(f"training job name: {training_job_name}")
train_response = sagemaker_utils.describe_training_job(
sagemaker_client, training_job_name
)
assert train_response["TrainingJobStatus"] == "Stopped"
# Verify model artifacts output was generated from this run
model_artifact_url = utils.read_from_file_in_tar(
output_files["sagemaker-rlestimator-training-job"]["model_artifact_url"]
)
print(f"model_artifact_url: {model_artifact_url}")
assert model_artifact_url == train_response["ModelArtifacts"]["S3ModelArtifacts"]
assert training_job_name in model_artifact_url
# Verify training image output is an ECR image
training_image = utils.read_from_file_in_tar(
output_files["sagemaker-rlestimator-training-job"]["training_image"]
)
print(f"Training image used: {training_image}")
if "ExpectedTrainingImage" in test_params.keys():
assert test_params["ExpectedTrainingImage"] == training_image
else:
assert f"dkr.ecr.{region}.amazonaws.com" in training_image
assert not argo_utils.error_in_cw_logs(
workflow_json["metadata"]["name"]
), "Found the CloudWatch error message in the log output. Check SageMaker to see if the job has failed."
utils.remove_dir(download_dir)
def test_terminate_trainingjob(kfp_client, experiment_id, sagemaker_client):
test_file_dir = "resources/config/rlestimator-training"
download_dir = utils.mkdir(
os.path.join(test_file_dir + "/generated_test_terminate")
)
test_params = utils.load_params(
utils.replace_placeholders(
os.path.join(test_file_dir, "config.yaml"),
os.path.join(download_dir, "config.yaml"),
)
)
input_job_name = test_params["Arguments"]["job_name"] = (
"".join(random.choice(string.ascii_lowercase) for i in range(10))
+ "-terminate-job"
)
run_id, _, workflow_json = kfp_client_utils.compile_run_monitor_pipeline(
kfp_client,
experiment_id,
test_params["PipelineDefinition"],
test_params["Arguments"],
download_dir,
test_params["TestName"],
60,
"running",
)
print(f"Terminating run: {run_id} where Training job_name: {input_job_name}")
kfp_client_utils.terminate_run(kfp_client, run_id)
response = sagemaker_utils.describe_training_job(sagemaker_client, input_job_name)
assert response["TrainingJobStatus"] in ["Stopping", "Stopped"]
utils.remove_dir(download_dir)

View File

@ -0,0 +1,219 @@
import pytest
import os
import utils
from utils import kfp_client_utils
from utils import minio_utils
from utils import robomaker_utils
from utils import get_s3_data_bucket
def create_simulation_app(kfp_client, experiment_id, creat_app_dir, app_name):
download_dir = utils.mkdir(os.path.join(creat_app_dir + "/generated"))
test_params = utils.load_params(
utils.replace_placeholders(
os.path.join(creat_app_dir, "config.yaml"),
os.path.join(download_dir, "config.yaml"),
)
)
# Generate random prefix for sim app name
sim_app_name = test_params["Arguments"]["app_name"] = (
utils.generate_random_string(5) + "-" + app_name
)
_, _, workflow_json = kfp_client_utils.compile_run_monitor_pipeline(
kfp_client,
experiment_id,
test_params["PipelineDefinition"],
test_params["Arguments"],
download_dir,
test_params["TestName"],
test_params["Timeout"],
)
return workflow_json, sim_app_name
def create_robot_app(client):
robomaker_sources = [
{
"s3Bucket": get_s3_data_bucket(),
"s3Key": "robomaker/robot_ws.tar",
"architecture": "X86_64",
}
]
robomaker_suite = {"name": "ROS", "version": "Melodic"}
app_name = utils.generate_random_string(5) + "-test-robot-app"
response = robomaker_utils.create_robot_application(
client, app_name, robomaker_sources, robomaker_suite
)
return response["arn"]
@pytest.mark.parametrize(
"test_file_dir", ["resources/config/robomaker-create-simulation-app"],
)
def test_create_simulation_app(
kfp_client, experiment_id, robomaker_client, test_file_dir
):
download_dir = utils.mkdir(os.path.join(test_file_dir + "/generated"))
test_params = utils.load_params(
utils.replace_placeholders(
os.path.join(test_file_dir, "config.yaml"),
os.path.join(download_dir, "config.yaml"),
)
)
# Create simulation app with random name
workflow_json, sim_app_name = create_simulation_app(
kfp_client, experiment_id, test_file_dir, test_params["Arguments"]["app_name"]
)
try:
print(f"running test with simulation application name: {sim_app_name}")
outputs = {"robomaker-create-simulation-application": ["arn"]}
output_files = minio_utils.artifact_download_iterator(
workflow_json, outputs, download_dir
)
sim_app_arn = utils.read_from_file_in_tar(
output_files["robomaker-create-simulation-application"]["arn"]
)
print(f"Simulation Application arn: {sim_app_arn}")
# Verify simulation application exists
assert (
robomaker_utils.describe_simulation_application(
robomaker_client, sim_app_arn
)["name"]
== sim_app_name
)
finally:
robomaker_utils.delete_simulation_application(robomaker_client, sim_app_arn)
@pytest.mark.parametrize(
"test_file_dir", ["resources/config/robomaker-delete-simulation-app"],
)
def test_delete_simulation_app(
kfp_client, experiment_id, robomaker_client, test_file_dir
):
download_dir = utils.mkdir(os.path.join(test_file_dir + "/generated"))
test_params = utils.load_params(
utils.replace_placeholders(
os.path.join(test_file_dir, "config.yaml"),
os.path.join(download_dir, "config.yaml"),
)
)
# Create simulation app with random name
workflow_json, sim_app_name = create_simulation_app(
kfp_client,
experiment_id,
"resources/config/robomaker-create-simulation-app",
"fake-app-name",
)
print(f"running test with simulation application name: {sim_app_name}")
create_outputs = {"robomaker-create-simulation-application": ["arn"]}
create_output_files = minio_utils.artifact_download_iterator(
workflow_json, create_outputs, download_dir
)
sim_app_arn = utils.read_from_file_in_tar(
create_output_files["robomaker-create-simulation-application"]["arn"]
)
print(f"Simulation Application arn: {sim_app_arn}")
# Here we perform the delete
test_params["Arguments"]["arn"] = sim_app_arn
_, _, workflow_json = kfp_client_utils.compile_run_monitor_pipeline(
kfp_client,
experiment_id,
test_params["PipelineDefinition"],
test_params["Arguments"],
download_dir,
test_params["TestName"],
test_params["Timeout"],
)
# Verify simulation application does not exist
simulation_applications = robomaker_utils.list_simulation_applications(
robomaker_client, sim_app_name
)
assert len(simulation_applications["simulationApplicationSummaries"]) == 0
@pytest.mark.parametrize(
"test_file_dir", ["resources/config/robomaker-simulation-job"],
)
def test_run_simulation_job(kfp_client, experiment_id, robomaker_client, test_file_dir):
download_dir = utils.mkdir(os.path.join(test_file_dir + "/generated"))
test_params = utils.load_params(
utils.replace_placeholders(
os.path.join(test_file_dir, "config.yaml"),
os.path.join(download_dir, "config.yaml"),
)
)
# Create simulation app with random name
sim_app_workflow_json, sim_app_name = create_simulation_app(
kfp_client,
experiment_id,
"resources/config/robomaker-create-simulation-app",
"random-app-name",
)
print(f"running test with simulation application name: {sim_app_name}")
sim_app_outputs = {"robomaker-create-simulation-application": ["arn"]}
sim_app_output_files = minio_utils.artifact_download_iterator(
sim_app_workflow_json, sim_app_outputs, download_dir
)
sim_app_arn = utils.read_from_file_in_tar(
sim_app_output_files["robomaker-create-simulation-application"]["arn"]
)
print(f"Simulation Application arn: {sim_app_arn}")
# Create Robot App by invoking api directly
robot_app_arn = create_robot_app(robomaker_client)
# Here we run the simulation job
test_params["Arguments"]["sim_app_arn"] = sim_app_arn
test_params["Arguments"]["robot_app_arn"] = robot_app_arn
_, _, sim_job_workflow_json = kfp_client_utils.compile_run_monitor_pipeline(
kfp_client,
experiment_id,
test_params["PipelineDefinition"],
test_params["Arguments"],
download_dir,
test_params["TestName"],
test_params["Timeout"],
)
sim_job_outputs = {"robomaker-create-simulation-job": ["arn"]}
sim_job_output_files = minio_utils.artifact_download_iterator(
sim_job_workflow_json, sim_job_outputs, download_dir
)
sim_job_arn = utils.read_from_file_in_tar(
sim_job_output_files["robomaker-create-simulation-job"]["arn"]
)
print(f"Simulation Job arn: {sim_job_arn}")
# Verify simulation job ran successfully
assert robomaker_utils.describe_simulation_job(robomaker_client, sim_job_arn)[
"status"
] not in ["Failed", "RunningFailed"]

View File

@ -16,7 +16,10 @@ def pytest_addoption(parser):
help="AWS region where test will run",
)
parser.addoption(
"--role-arn", required=True, help="SageMaker execution IAM role ARN",
"--sagemaker-role-arn", required=True, help="SageMaker execution IAM role ARN",
)
parser.addoption(
"--robomaker-role-arn", required=True, help="RoboMaker execution IAM role ARN",
)
parser.addoption(
"--assume-role-arn",
@ -73,9 +76,15 @@ def assume_role_arn(request):
@pytest.fixture(scope="session", autouse=True)
def role_arn(request):
os.environ["ROLE_ARN"] = request.config.getoption("--role-arn")
return request.config.getoption("--role-arn")
def sagemaker_role_arn(request):
os.environ["SAGEMAKER_ROLE_ARN"] = request.config.getoption("--sagemaker-role-arn")
return request.config.getoption("--sagemaker-role-arn")
@pytest.fixture(scope="session", autouse=True)
def robomaker_role_arn(request):
os.environ["ROBOMAKER_ROLE_ARN"] = request.config.getoption("--robomaker-role-arn")
return request.config.getoption("--robomaker-role-arn")
@pytest.fixture(scope="session", autouse=True)
@ -124,6 +133,11 @@ def sagemaker_client(boto3_session):
return boto3_session.client(service_name="sagemaker")
@pytest.fixture(scope="session")
def robomaker_client(boto3_session):
return boto3_session.client(service_name="robomaker")
@pytest.fixture(scope="session")
def s3_client(boto3_session):
return boto3_session.client(service_name="s3")

View File

@ -44,4 +44,4 @@ Arguments:
instance_count: 1
volume_size: 50
max_run_time: 1800
role: ((ROLE_ARN))
role: ((SAGEMAKER_ROLE_ARN))

View File

@ -30,4 +30,4 @@ Arguments:
spot_instance: "False"
max_wait_time: 3600
checkpoint_config: "{}"
role: ((ROLE_ARN))
role: ((SAGEMAKER_ROLE_ARN))

View File

@ -33,4 +33,4 @@ Arguments:
spot_instance: "False"
max_wait_time: 3600
checkpoint_config: "{}"
role: ((ROLE_ARN))
role: ((SAGEMAKER_ROLE_ARN))

View File

@ -4,7 +4,7 @@ Timeout: 300
StatusToCheck: 'running'
Arguments:
region: ((REGION))
role: ((ROLE_ARN))
role: ((SAGEMAKER_ROLE_ARN))
ground_truth_train_job_name: 'image-labeling'
ground_truth_label_attribute_name: 'category'
ground_truth_train_manifest_location: 's3://((DATA_BUCKET))/mini-image-classification/ground-truth-demo/train.manifest'

View File

@ -43,4 +43,4 @@ Arguments:
instance_count: 1
volume_size: 50
max_run_time: 1800
role: ((ROLE_ARN))
role: ((SAGEMAKER_ROLE_ARN))

View File

@ -11,7 +11,7 @@ Arguments:
instance_type: ml.m4.xlarge
instance_count: 1
network_isolation: "True"
role: ((ROLE_ARN))
role: ((SAGEMAKER_ROLE_ARN))
data_input: s3://((DATA_BUCKET))/mnist_kmeans_example/input
data_type: S3Prefix
content_type: text/csv

View File

@ -17,5 +17,5 @@ Arguments:
instance_type_1: ml.m4.xlarge
initial_instance_count_1: 1
network_isolation: "True"
role: ((ROLE_ARN))
role: ((SAGEMAKER_ROLE_ARN))

View File

@ -53,4 +53,4 @@ Arguments:
output_location: s3://((DATA_BUCKET))/mnist_kmeans_example/output
network_isolation: "True"
max_wait_time: 3600
role: ((ROLE_ARN))
role: ((SAGEMAKER_ROLE_ARN))

View File

@ -7,5 +7,5 @@ Arguments:
image: ((KMEANS_REGISTRY)).dkr.ecr.((REGION)).amazonaws.com/kmeans:1
model_artifact_url: s3://((DATA_BUCKET))/mnist_kmeans_example/model/kmeans-mnist-model/model.tar.gz
network_isolation: "True"
role: ((ROLE_ARN))
role: ((SAGEMAKER_ROLE_ARN))

View File

@ -19,6 +19,6 @@ Arguments:
instance_type_2: ml.m5.xlarge
initial_instance_count_1: 1
network_isolation: "True"
role: ((ROLE_ARN))
role: ((SAGEMAKER_ROLE_ARN))
update_endpoint: "True"

View File

@ -0,0 +1,19 @@
PipelineDefinition: resources/definition/rlestimator_training_job_pipeline.py
TestName: rlestimator-pipeline-training
Timeout: 900
ExpectedTrainingImage: 462105765813.dkr.ecr.((REGION)).amazonaws.com/sagemaker-rl-ray-container:ray-0.8.5-tf-cpu-py36
Arguments:
region: ((REGION))
role: ((SAGEMAKER_ROLE_ARN))
entry_point: "train_news_vendor.py"
metric_definitions: "[]"
hyperparameters: "{}"
source_dir: s3://((DATA_BUCKET))/rlestimator/sourcedir.tar.gz
max_run: 300
job_name: rlestimator-test
model_artifact_path: s3://((DATA_BUCKET))/rlestimator/output/
instance_count: 1
instance_type: ml.c5.2xlarge
framework: tensorflow
toolkit: ray
toolkit_version: "0.8.5"

View File

@ -0,0 +1,16 @@
PipelineDefinition: resources/definition/robomaker_create_simulation_app_pipeline.py
TestName: robomaker-create-simulation-app-test
Timeout: 300
Arguments:
region: ((REGION))
app_name: robomaker-create-simulation-app-test
sources:
- s3Bucket: ((DATA_BUCKET))
s3Key: "robomaker/simulation_ws.tar"
architecture: "X86_64"
simulation_software_name: "Gazebo"
simulation_software_version: "9"
robot_software_name: "ROS"
robot_software_version: "Melodic"
rendering_engine_name: "OGRE"
rendering_engine_version: "1.x"

View File

@ -0,0 +1,6 @@
PipelineDefinition: resources/definition/robomaker_delete_simulation_app_pipeline.py
TestName: robomaker-delete-simulation-app-test
Timeout: 300
Arguments:
region: ((REGION))
arn: ""

View File

@ -0,0 +1,22 @@
PipelineDefinition: resources/definition/robomaker_simulation_job_pipeline.py
TestName: robomaker-run-simulation-job-test
Timeout: 600
Arguments:
region: ((REGION))
output_bucket: ((DATA_BUCKET))
output_path: "robomaker-output-key"
max_run: 300
failure_behavior: "Fail"
sim_app_arn:
sim_app_launch_config:
packageName: "hello_world_simulation"
launchFile: "empty_world.launch"
environmentVariables:
TURTLEBOT3_MODEL: "waffle_pi"
robot_app_arn:
robot_app_launch_config:
packageName: "hello_world_robot"
launchFile: "rotate.launch"
environmentVariables:
TURTLEBOT3_MODEL: "waffle_pi"
role: ((ROBOMAKER_ROLE_ARN))

View File

@ -29,4 +29,4 @@ Arguments:
spot_instance: "False"
max_wait_time: 3600
checkpoint_config: "{}"
role: ((ROLE_ARN))
role: ((SAGEMAKER_ROLE_ARN))

View File

@ -22,4 +22,4 @@ Arguments:
checkpoint_config:
S3Uri: s3://((DATA_BUCKET))/mnist_kmeans_example/train-checkpoints
model_artifact_path: s3://((DATA_BUCKET))/mnist_kmeans_example/output
role: ((ROLE_ARN))
role: ((SAGEMAKER_ROLE_ARN))

View File

@ -4,4 +4,4 @@ Timeout: 3600
ExpectedTrainingImage: 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:0.90-2-cpu-py3
Arguments:
bucket_name: ((DATA_BUCKET))
role_arn: ((ROLE_ARN))
role_arn: ((SAGEMAKER_ROLE_ARN))

View File

@ -0,0 +1,51 @@
import kfp
from kfp import components
from kfp import dsl
rlestimator_training_job_op = components.load_component_from_file(
"../../rlestimator/component.yaml"
)
@dsl.pipeline(
name="RLEstimator Toolkit & Framework Pipeline test",
description="RLEstimator training job test where the AWS Docker image is auto-selected based on the Toolkit and Framework we define",
)
def rlestimator_training_toolkit_pipeline_test(
region="",
entry_point="",
source_dir="",
toolkit="",
toolkit_version="",
framework="",
role="",
instance_type="",
instance_count="",
model_artifact_path="",
job_name="",
metric_definitions="",
max_run="",
hyperparameters="",
):
rlestimator_training_job_op(
region=region,
entry_point=entry_point,
source_dir=source_dir,
toolkit=toolkit,
toolkit_version=toolkit_version,
framework=framework,
role=role,
instance_type=instance_type,
instance_count=instance_count,
model_artifact_path=model_artifact_path,
job_name=job_name,
metric_definitions=metric_definitions,
max_run=max_run,
hyperparameters=hyperparameters,
)
if __name__ == "__main__":
kfp.compiler.Compiler().compile(
rlestimator_training_toolkit_pipeline_test, __file__ + ".zip"
)

View File

@ -0,0 +1,42 @@
import kfp
from kfp import components
from kfp import dsl
robomaker_create_sim_app_op = components.load_component_from_file(
"../../create_simulation_app/component.yaml"
)
@dsl.pipeline(
name="RoboMaker Create Simulation App",
description="RoboMaker Create Simulation App test pipeline",
)
def robomaker_create_simulation_app_test(
region="",
app_name="",
sources="",
simulation_software_name="",
simulation_software_version="",
robot_software_name="",
robot_software_version="",
rendering_engine_name="",
rendering_engine_version="",
):
robomaker_create_sim_app_op(
region=region,
app_name=app_name,
sources=sources,
simulation_software_name=simulation_software_name,
simulation_software_version=simulation_software_version,
robot_software_name=robot_software_name,
robot_software_version=robot_software_version,
rendering_engine_name=rendering_engine_name,
rendering_engine_version=rendering_engine_version,
)
if __name__ == "__main__":
kfp.compiler.Compiler().compile(
robomaker_create_simulation_app_test, __file__ + ".yaml"
)

View File

@ -0,0 +1,26 @@
import kfp
from kfp import components
from kfp import dsl
robomaker_delete_sim_app_op = components.load_component_from_file(
"../../delete_simulation_app/component.yaml"
)
@dsl.pipeline(
name="RoboMaker Delete Simulation App",
description="RoboMaker Delete Simulation App test pipeline",
)
def robomaker_delete_simulation_app_test(
region="", arn="",
):
robomaker_delete_sim_app_op(
region=region, arn=arn,
)
if __name__ == "__main__":
kfp.compiler.Compiler().compile(
robomaker_delete_simulation_app_test, __file__ + ".yaml"
)

View File

@ -0,0 +1,42 @@
import kfp
from kfp import components
from kfp import dsl
robomaker_sim_job_op = components.load_component_from_file(
"../../simulation_job/component.yaml"
)
@dsl.pipeline(
name="Run RoboMaker Simulation Job",
description="RoboMaker Simulation Job test pipeline",
)
def robomaker_simulation_job_test(
region="",
role="",
output_bucket="",
output_path="",
max_run="",
failure_behavior="",
sim_app_arn="",
sim_app_launch_config="",
robot_app_arn="",
robot_app_launch_config="",
):
robomaker_sim_job_op(
region=region,
role=role,
output_bucket=output_bucket,
output_path=output_path,
max_run=max_run,
failure_behavior=failure_behavior,
sim_app_arn=sim_app_arn,
sim_app_launch_config=sim_app_launch_config,
robot_app_arn=robot_app_arn,
robot_app_launch_config=robot_app_launch_config,
)
if __name__ == "__main__":
kfp.compiler.Compiler().compile(robomaker_simulation_job_test, __file__ + ".yaml")

View File

@ -46,6 +46,19 @@ function create_namespaced_iam_role {
echo "IAM Role does not exist, creating a new Role for the cluster"
aws iam create-role --role-name ${ROLE_NAME} --assume-role-policy-document file://${trust_file_path} --output=text --query "Role.Arn"
aws iam attach-role-policy --role-name ${ROLE_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
aws iam attach-role-policy --role-name ${ROLE_NAME} --policy-arn arn:aws:iam::aws:policy/AWSRoboMaker_FullAccess
printf '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "*"
}
]
}' > ${assume_role_file}
aws iam put-role-policy --role-name ${ROLE_NAME} --policy-name AllowPassRole --policy-document file://${assume_role_file}
printf '{
"Version": "2012-10-17",

View File

@ -34,6 +34,7 @@ PYTEST_MARKER=${PYTEST_MARKER:-""}
S3_DATA_BUCKET=${S3_DATA_BUCKET:-""}
SAGEMAKER_EXECUTION_ROLE_ARN=${SAGEMAKER_EXECUTION_ROLE_ARN:-""}
ASSUMED_ROLE_NAME=${ASSUMED_ROLE_NAME:-""}
ROBOMAKER_EXECUTION_ROLE_ARN=${ROBOMAKER_EXECUTION_ROLE_ARN:-""}
SKIP_FSX_TESTS=${SKIP_FSX_TESTS:-"false"}
@ -120,7 +121,7 @@ function delete_eks() {
time_unit=m
timeout=15
retry_interval=5
loop_counter=$timeout
while [ "$loop_counter" -gt "0" ]; do
eksctl delete cluster --name "$EKS_CLUSTER_NAME" --region "$REGION" --wait
@ -175,7 +176,9 @@ function install_oidc_role() {
function delete_oidc_role() {
# Delete the role associated with the cluster thats being deleted
aws iam detach-role-policy --role-name "${OIDC_ROLE_NAME}" --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
aws iam detach-role-policy --role-name "${OIDC_ROLE_NAME}" --policy-arn arn:aws:iam::aws:policy/AWSRoboMaker_FullAccess
aws iam delete-role-policy --role-name "${OIDC_ROLE_NAME}" --policy-name AllowAssumeRole
aws iam delete-role-policy --role-name "${OIDC_ROLE_NAME}" --policy-name AllowPassRole
aws iam delete-role --role-name "${OIDC_ROLE_NAME}"
}
@ -202,6 +205,7 @@ function generate_assumed_role() {
}' > "${assumed_trust_file}"
aws iam create-role --role-name "${ASSUMED_ROLE_NAME}" --assume-role-policy-document file://${assumed_trust_file} --output=text --query "Role.Arn"
aws iam attach-role-policy --role-name ${ASSUMED_ROLE_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
aws iam attach-role-policy --role-name ${ASSUMED_ROLE_NAME} --policy-arn arn:aws:iam::aws:policy/AWSRoboMaker_FullAccess
fi
# Generate the ARN using the role name
@ -213,6 +217,7 @@ function delete_assumed_role() {
if [[ ! -z "${ASSUMED_ROLE_NAME}" && "${CREATED_ASSUMED_ROLE:-false}" == "true" ]]; then
# Delete the role associated with the cluster thats being deleted
aws iam detach-role-policy --role-name "${ASSUMED_ROLE_NAME}" --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
aws iam detach-role-policy --role-name "${ASSUMED_ROLE_NAME}" --policy-arn arn:aws:iam::aws:policy/AWSRoboMaker_FullAccess
aws iam delete-role --role-name "${ASSUMED_ROLE_NAME}"
fi
}
@ -257,9 +262,10 @@ install_kfp
[ "${SKIP_KFP_OIDC_SETUP}" == "false" ] && install_oidc_role
generate_assumed_role
pytest_args=( --region "${REGION}" --role-arn "${SAGEMAKER_EXECUTION_ROLE_ARN}" \
pytest_args=( --region "${REGION}" --sagemaker-role-arn "${SAGEMAKER_EXECUTION_ROLE_ARN}" \
--s3-data-bucket "${S3_DATA_BUCKET}" --kfp-namespace "${KFP_NAMESPACE}" \
--minio-service-port "${MINIO_LOCAL_PORT}" --assume-role-arn "${ASSUMED_ROLE_ARN}")
--minio-service-port "${MINIO_LOCAL_PORT}" --assume-role-arn "${ASSUMED_ROLE_ARN}" \
--robomaker-role-arn "${ROBOMAKER_EXECUTION_ROLE_ARN}")
if [[ "${SKIP_FSX_TESTS}" == "true" ]]; then
pytest_args+=( -m "not fsx_test" )

View File

@ -1,3 +1,4 @@
import json
import os
import subprocess
import pytest
@ -14,8 +15,12 @@ def get_region():
return os.environ.get("AWS_REGION")
def get_role_arn():
return os.environ.get("ROLE_ARN")
def get_sagemaker_role_arn():
return os.environ.get("SAGEMAKER_ROLE_ARN")
def get_robomaker_role_arn():
return os.environ.get("ROBOMAKER_ROLE_ARN")
def get_s3_data_bucket():
@ -82,7 +87,7 @@ def replace_placeholders(input_filename, output_filename):
region = get_region()
variables_to_replace = {
"((REGION))": region,
"((ROLE_ARN))": get_role_arn(),
"((SAGEMAKER_ROLE_ARN))": get_sagemaker_role_arn(),
"((DATA_BUCKET))": get_s3_data_bucket(),
"((KMEANS_REGISTRY))": get_algorithm_image_registry("kmeans", region, "1"),
"((XGBOOST_REGISTRY))": get_algorithm_image_registry(
@ -93,6 +98,7 @@ def replace_placeholders(input_filename, output_filename):
"((FSX_SUBNET))": get_fsx_subnet(),
"((FSX_SECURITY_GROUP))": get_fsx_security_group(),
"((ASSUME_ROLE_ARN))": get_assume_role_arn(),
"((ROBOMAKER_ROLE_ARN))": get_robomaker_role_arn(),
}
filedata = ""

View File

@ -0,0 +1,34 @@
def describe_simulation_application(client, sim_app_arn):
return client.describe_simulation_application(application=sim_app_arn)
def describe_simulation_job(client, sim_job_arn):
return client.describe_simulation_job(job=sim_job_arn)
def describe_simulation_job_batch(client, batch_job_id):
return client.describe_simulation_job_batch(batch=batch_job_id)
def delete_simulation_application(client, sim_app_arn):
return client.delete_simulation_application(application=sim_app_arn)
def cancel_simulation_job(client, sim_job_arn):
return client.cancel_simulation_job(job=sim_job_arn)
def cancel_simulation_job_batch(client, batch_job_id):
return client.cancel_simulation_job_batch(batch=batch_job_id)
def list_simulation_applications(client, sim_app_name):
return client.list_simulation_applications(
filters=[{"name": "name", "values": [sim_app_name]}]
)
def create_robot_application(client, app_name, sources, robot_software_suite):
return client.create_robot_application(
name=app_name, sources=sources, robotSoftwareSuite=robot_software_suite
)

View File

@ -1,3 +1,9 @@
import logging
import re
from datetime import datetime
from time import sleep
def describe_training_job(client, training_job_name):
return client.describe_training_job(TrainingJobName=training_job_name)

View File

@ -15,8 +15,9 @@
```
3. Run all unit tests
```
docker run -it amazon/unit-test-aws-sagemaker-kfp-components
docker run -it -v <path_to_this_repo_on_your_machine>:/app/ amazon/unit-test-aws-sagemaker-kfp-components:latest
```
This runs the tests against a mounted volume from your host machine. This means you can edit the files and rerun the tests immediately without having to rebuild the docker container.
--------------
@ -37,5 +38,4 @@
cd tests/unit_tests/
./run_unit_tests.sh
```
```

View File

@ -0,0 +1,459 @@
from common.sagemaker_component import SageMakerJobStatus
from rlestimator.src.sagemaker_rlestimator_spec import SageMakerRLEstimatorSpec
from rlestimator.src.sagemaker_rlestimator_component import (
SageMakerRLEstimatorComponent,
DebugRulesStatus,
)
from tests.unit_tests.tests.rlestimator.test_rlestimator_spec import (
RLEstimatorSpecTestCase,
)
import unittest
from unittest.mock import patch, MagicMock
HAS_ATTR_MESSAGE = "{} should have an attribute {}"
HAS_NOT_ATTR_MESSAGE = "{} should not have an attribute {}"
ATTR_NOT_NONE_MESSAGE = "{} attribute {} should be None"
class BaseTestCase(unittest.TestCase):
def assertHasAttr(self, obj, attrname, message=None):
if not hasattr(obj, attrname):
if message is not None:
self.fail(message)
else:
self.fail(HAS_ATTR_MESSAGE.format(obj, attrname))
def assertHasNotAttr(self, obj, attrname, message=None):
if hasattr(obj, attrname):
if message is not None:
self.fail(message)
else:
self.fail(HAS_NOT_ATTR_MESSAGE.format(obj, attrname))
def assertAttrNone(self, obj, attrname, message=None):
if not hasattr(obj, attrname):
if message is not None:
self.fail(message)
else:
self.fail(HAS_NOT_ATTR_MESSAGE.format(obj, attrname))
if getattr(obj, attrname) is not None:
if message is not None:
self.fail(message)
else:
self.fail(ATTR_NOT_NONE_MESSAGE.format(obj, attrname))
class RLEstimatorComponentTestCase(BaseTestCase):
CUSTOM_IMAGE_ARGS = RLEstimatorSpecTestCase.CUSTOM_IMAGE_ARGS
TOOLKIT_IMAGE_ARGS = RLEstimatorSpecTestCase.TOOLKIT_IMAGE_ARGS
@classmethod
def setUp(cls):
cls.component = SageMakerRLEstimatorComponent()
# Instantiate without calling Do()
cls.component._rlestimator_job_name = "test-job"
cls.component._sagemaker_session = MagicMock()
@patch("rlestimator.src.sagemaker_rlestimator_component.super", MagicMock())
def test_do_sets_name(self):
named_spec = SageMakerRLEstimatorSpec(
self.CUSTOM_IMAGE_ARGS + ["--job_name", "job-name"]
)
unnamed_spec = SageMakerRLEstimatorSpec(self.CUSTOM_IMAGE_ARGS)
self.component.Do(named_spec)
self.assertEqual("job-name", self.component._rlestimator_job_name)
with patch(
"rlestimator.src.sagemaker_rlestimator_component.SageMakerComponent._generate_unique_timestamped_id",
MagicMock(return_value="unique"),
):
self.component.Do(unnamed_spec)
self.assertEqual("unique", self.component._rlestimator_job_name)
def test_create_rlestimator_custom_job(self):
spec = SageMakerRLEstimatorSpec(self.CUSTOM_IMAGE_ARGS)
rlestimator = self.component._create_job_request(spec.inputs, spec.outputs)
self.assertHasAttr(rlestimator, "image_uri")
self.assertHasAttr(rlestimator, "role")
self.assertHasAttr(rlestimator, "source_dir")
self.assertHasAttr(rlestimator, "entry_point")
self.assertHasNotAttr(rlestimator, "toolkit")
self.assertHasNotAttr(rlestimator, "toolkit_version")
self.assertHasNotAttr(rlestimator, "framework")
def test_create_rlestimator_toolkit_job(self):
spec = SageMakerRLEstimatorSpec(self.TOOLKIT_IMAGE_ARGS)
rlestimator = self.component._create_job_request(spec.inputs, spec.outputs)
self.assertHasAttr(rlestimator, "role")
self.assertHasAttr(rlestimator, "source_dir")
self.assertHasAttr(rlestimator, "entry_point")
self.assertHasAttr(rlestimator, "toolkit")
self.assertHasAttr(rlestimator, "toolkit_version")
self.assertHasAttr(rlestimator, "framework")
self.assertAttrNone(rlestimator, "image_uri")
def test_get_job_status(self):
self.component._sm_client = mock_client = MagicMock()
self.component._get_debug_rule_status = MagicMock(
return_value=SageMakerJobStatus(
is_completed=True, has_error=False, raw_status="Completed"
)
)
self.component._sm_client.describe_training_job.return_value = {
"TrainingJobStatus": "Starting"
}
self.assertEqual(
self.component._get_job_status(),
SageMakerJobStatus(is_completed=False, raw_status="Starting"),
)
self.component._sm_client.describe_training_job.return_value = {
"TrainingJobStatus": "Downloading"
}
self.assertEqual(
self.component._get_job_status(),
SageMakerJobStatus(is_completed=False, raw_status="Downloading"),
)
self.component._sm_client.describe_training_job.return_value = {
"TrainingJobStatus": "Completed"
}
self.assertEqual(
self.component._get_job_status(),
SageMakerJobStatus(is_completed=True, raw_status="Completed"),
)
self.component._sm_client.describe_training_job.return_value = {
"TrainingJobStatus": "Failed",
"FailureReason": "lolidk",
}
self.assertEqual(
self.component._get_job_status(),
SageMakerJobStatus(
is_completed=True,
raw_status="Failed",
has_error=True,
error_message="lolidk",
),
)
def test_after_job_completed(self):
self.component._get_model_artifacts_from_job = MagicMock(return_value="model")
self.component._get_image_from_job = MagicMock(return_value="image")
spec = SageMakerRLEstimatorSpec(self.CUSTOM_IMAGE_ARGS)
self.component._after_job_complete({}, {}, spec.inputs, spec.outputs)
self.assertEqual(spec.outputs.job_name, "test-job")
self.assertEqual(spec.outputs.model_artifact_url, "model")
self.assertEqual(spec.outputs.training_image, "image")
def test_metric_definitions(self):
spec = SageMakerRLEstimatorSpec(
self.CUSTOM_IMAGE_ARGS
+ [
"--metric_definitions",
'[ {"Name": "metric1", "Regex": "regexval1"},{"Name": "metric2", "Regex": "regexval2"},]',
]
)
rlestimator = self.component._create_job_request(spec.inputs, spec.outputs)
self.assertEqual(
getattr(rlestimator, "metric_definitions"),
[
{"Name": "metric1", "Regex": "regexval1"},
{"Name": "metric2", "Regex": "regexval2"},
],
)
def test_no_defined_image(self):
# Pass the image to pass the parser
no_image_args = self.CUSTOM_IMAGE_ARGS.copy()
image_index = no_image_args.index("--image")
# Cut out --image and it's associated value
no_image_args = no_image_args[:image_index] + no_image_args[image_index + 2 :]
spec = SageMakerRLEstimatorSpec(no_image_args)
with self.assertRaises(Exception):
self.component._create_job_request(spec.inputs, spec.outputs)
def test_valid_hyperparameters(self):
hyperparameters_str = '{"hp1": "val1", "hp2": "val2", "hp3": "val3"}'
spec = SageMakerRLEstimatorSpec(
self.CUSTOM_IMAGE_ARGS + ["--hyperparameters", hyperparameters_str]
)
rlestimator = self.component._create_job_request(spec.inputs, spec.outputs)
self.assertIn("hp1", getattr(rlestimator, "_hyperparameters"))
self.assertIn("hp2", getattr(rlestimator, "_hyperparameters"))
self.assertIn("hp3", getattr(rlestimator, "_hyperparameters"))
self.assertEqual(getattr(rlestimator, "_hyperparameters")["hp1"], "val1")
self.assertEqual(getattr(rlestimator, "_hyperparameters")["hp2"], "val2")
self.assertEqual(getattr(rlestimator, "_hyperparameters")["hp3"], "val3")
def test_empty_hyperparameters(self):
hyperparameters_str = "{}"
spec = SageMakerRLEstimatorSpec(
self.CUSTOM_IMAGE_ARGS + ["--hyperparameters", hyperparameters_str]
)
rlestimator = self.component._create_job_request(spec.inputs, spec.outputs)
self.assertEqual(getattr(rlestimator, "_hyperparameters"), {})
def test_object_hyperparameters(self):
hyperparameters_str = '{"hp1": {"innerkey": "innerval"}}'
spec = SageMakerRLEstimatorSpec(
self.CUSTOM_IMAGE_ARGS + ["--hyperparameters", hyperparameters_str]
)
with self.assertRaises(Exception):
self.component._create_job_request(spec.inputs, spec.outputs)
def test_vpc_configuration(self):
spec = SageMakerRLEstimatorSpec(
self.CUSTOM_IMAGE_ARGS
+ [
"--vpc_security_group_ids",
'["sg1", "sg2"]',
"--vpc_subnets",
'["subnet1", "subnet2"]',
]
)
rlestimator = self.component._create_job_request(spec.inputs, spec.outputs)
self.assertHasAttr(rlestimator, "subnets")
self.assertHasAttr(rlestimator, "security_group_ids")
self.assertIn("sg1", getattr(rlestimator, "security_group_ids"))
self.assertIn("sg2", getattr(rlestimator, "security_group_ids"))
self.assertIn("subnet1", getattr(rlestimator, "subnets"))
self.assertIn("subnet2", getattr(rlestimator, "subnets"))
def test_training_mode(self):
spec = SageMakerRLEstimatorSpec(
self.CUSTOM_IMAGE_ARGS + ["--training_input_mode", "Pipe"]
)
rlestimator = self.component._create_job_request(spec.inputs, spec.outputs)
self.assertEqual(getattr(rlestimator, "input_mode"), "Pipe")
def test_wait_for_debug_rules(self):
self.component._sm_client = mock_client = MagicMock()
mock_client.describe_training_job.side_effect = [
{
"DebugRuleEvaluationStatuses": [
{
"RuleConfigurationName": "rule1",
"RuleEvaluationStatus": "InProgress",
},
{
"RuleConfigurationName": "rule2",
"RuleEvaluationStatus": "InProgress",
},
]
},
{
"DebugRuleEvaluationStatuses": [
{
"RuleConfigurationName": "rule1",
"RuleEvaluationStatus": "NoIssuesFound",
},
{
"RuleConfigurationName": "rule2",
"RuleEvaluationStatus": "InProgress",
},
]
},
{
"DebugRuleEvaluationStatuses": [
{
"RuleConfigurationName": "rule1",
"RuleEvaluationStatus": "NoIssuesFound",
},
{
"RuleConfigurationName": "rule2",
"RuleEvaluationStatus": "IssuesFound",
},
]
},
]
self.assertEqual(
self.component._get_debug_rule_status(),
SageMakerJobStatus(
is_completed=False,
has_error=False,
raw_status=DebugRulesStatus.INPROGRESS,
),
)
self.assertEqual(
self.component._get_debug_rule_status(),
SageMakerJobStatus(
is_completed=False,
has_error=False,
raw_status=DebugRulesStatus.INPROGRESS,
),
)
self.assertEqual(
self.component._get_debug_rule_status(),
SageMakerJobStatus(
is_completed=True,
has_error=False,
raw_status=DebugRulesStatus.COMPLETED,
),
)
def test_wait_for_errored_rule(self):
self.component._sm_client = mock_client = MagicMock()
mock_client.describe_training_job.side_effect = [
{
"DebugRuleEvaluationStatuses": [
{
"RuleConfigurationName": "rule1",
"RuleEvaluationStatus": "InProgress",
},
{
"RuleConfigurationName": "rule2",
"RuleEvaluationStatus": "InProgress",
},
]
},
{
"DebugRuleEvaluationStatuses": [
{"RuleConfigurationName": "rule1", "RuleEvaluationStatus": "Error"},
{
"RuleConfigurationName": "rule2",
"RuleEvaluationStatus": "InProgress",
},
]
},
{
"DebugRuleEvaluationStatuses": [
{"RuleConfigurationName": "rule1", "RuleEvaluationStatus": "Error"},
{
"RuleConfigurationName": "rule2",
"RuleEvaluationStatus": "NoIssuesFound",
},
]
},
]
self.assertEqual(
self.component._get_debug_rule_status(),
SageMakerJobStatus(
is_completed=False,
has_error=False,
raw_status=DebugRulesStatus.INPROGRESS,
),
)
self.assertEqual(
self.component._get_debug_rule_status(),
SageMakerJobStatus(
is_completed=False,
has_error=False,
raw_status=DebugRulesStatus.INPROGRESS,
),
)
self.assertEqual(
self.component._get_debug_rule_status(),
SageMakerJobStatus(
is_completed=True, has_error=True, raw_status=DebugRulesStatus.ERRORED
),
)
def test_hook_min_args(self):
spec = SageMakerRLEstimatorSpec(
self.CUSTOM_IMAGE_ARGS
+ ["--debug_hook_config", '{"S3OutputPath": "s3://fake-uri/"}']
)
rlestimator = self.component._create_job_request(spec.inputs, spec.outputs)
self.assertEqual(
getattr(rlestimator, "debugger_hook_config")["S3OutputPath"],
"s3://fake-uri/",
)
def test_hook_max_args(self):
spec = SageMakerRLEstimatorSpec(
self.CUSTOM_IMAGE_ARGS
+ [
"--debug_hook_config",
'{"S3OutputPath": "s3://fake-uri/", "LocalPath": "/local/path/", "HookParameters": {"key": "value"}, "CollectionConfigurations": [{"CollectionName": "collection1", "CollectionParameters": {"key1": "value1"}}, {"CollectionName": "collection2", "CollectionParameters": {"key2": "value2", "key3": "value3"}}]}',
]
)
rlestimator = self.component._create_job_request(spec.inputs, spec.outputs)
self.assertEqual(
getattr(rlestimator, "debugger_hook_config")["S3OutputPath"],
"s3://fake-uri/",
)
self.assertEqual(
getattr(rlestimator, "debugger_hook_config")["LocalPath"], "/local/path/"
)
self.assertEqual(
getattr(rlestimator, "debugger_hook_config")["HookParameters"],
{"key": "value"},
)
self.assertEqual(
getattr(rlestimator, "debugger_hook_config")["CollectionConfigurations"],
[
{
"CollectionName": "collection1",
"CollectionParameters": {"key1": "value1"},
},
{
"CollectionName": "collection2",
"CollectionParameters": {"key2": "value2", "key3": "value3"},
},
],
)
def test_rule_max_args(self):
spec = SageMakerRLEstimatorSpec(
self.CUSTOM_IMAGE_ARGS
+ [
"--debug_rule_config",
'[{"InstanceType": "ml.m4.xlarge", "LocalPath": "/local/path/", "RuleConfigurationName": "rule_name", "RuleEvaluatorImage": "test-image", "RuleParameters": {"key1": "value1"}, "S3OutputPath": "s3://fake-uri/", "VolumeSizeInGB": 1}]',
]
)
rlestimator = self.component._create_job_request(spec.inputs, spec.outputs)
attrs = vars(rlestimator)
print(", ".join("%s: %s" % item for item in attrs.items()))
print(getattr(rlestimator, "debugger_rule_configs"))
self.assertEqual(
getattr(rlestimator, "rules")[0]["InstanceType"], "ml.m4.xlarge"
)
self.assertEqual(getattr(rlestimator, "rules")[0]["LocalPath"], "/local/path/")
self.assertEqual(
getattr(rlestimator, "rules")[0]["RuleConfigurationName"], "rule_name"
)
self.assertEqual(
getattr(rlestimator, "rules")[0]["RuleEvaluatorImage"], "test-image"
)
self.assertEqual(
getattr(rlestimator, "rules")[0]["RuleParameters"], {"key1": "value1"}
)
self.assertEqual(
getattr(rlestimator, "rules")[0]["S3OutputPath"], "s3://fake-uri/"
)
self.assertEqual(getattr(rlestimator, "rules")[0]["VolumeSizeInGB"], 1)
def test_rule_min_good_args(self):
spec = SageMakerRLEstimatorSpec(
self.CUSTOM_IMAGE_ARGS
+ [
"--debug_rule_config",
'[{"RuleConfigurationName": "rule_name", "RuleEvaluatorImage": "test-image"}]',
]
)
rlestimator = self.component._create_job_request(spec.inputs, spec.outputs)
self.assertEqual(
getattr(rlestimator, "rules")[0]["RuleConfigurationName"], "rule_name"
)
self.assertEqual(
getattr(rlestimator, "rules")[0]["RuleEvaluatorImage"], "test-image"
)

View File

@ -0,0 +1,62 @@
from rlestimator.src.sagemaker_rlestimator_spec import SageMakerRLEstimatorSpec
import unittest
class RLEstimatorSpecTestCase(unittest.TestCase):
CUSTOM_IMAGE_ARGS = [
"--region",
"us-east-1",
"--entry_point",
"train-unity.py",
"--source_dir",
"s3://input_bucket_name/input_key",
"--role",
"arn:aws:iam::123456789012:user/Development/product_1234/*",
"--image",
"test-image",
"--instance_type",
"ml.m4.xlarge",
"--instance_count",
"1",
"--volume_size",
"50",
"--max_run",
"900",
"--model_artifact_path",
"test-path",
]
TOOLKIT_IMAGE_ARGS = [
"--region",
"us-east-1",
"--entry_point",
"train-unity.py",
"--source_dir",
"s3://input_bucket_name/input_key",
"--role",
"arn:aws:iam::123456789012:user/Development/product_1234/*",
"--toolkit",
"ray",
"--toolkit_version",
"0.8.5",
"--framework",
"tensorflow",
"--instance_type",
"ml.m4.xlarge",
"--instance_count",
"1",
"--volume_size",
"50",
"--max_run",
"900",
"--model_artifact_path",
"test-path",
]
def test_custom_image_args(self):
# Will raise if the inputs are incorrect
spec = SageMakerRLEstimatorSpec(self.CUSTOM_IMAGE_ARGS)
def test_toolkit_image_args(self):
# Will raise if the inputs are incorrect
spec = SageMakerRLEstimatorSpec(self.TOOLKIT_IMAGE_ARGS)

View File

@ -0,0 +1,131 @@
from common.sagemaker_component import SageMakerJobStatus
from create_simulation_app.src.robomaker_create_simulation_app_spec import (
RoboMakerCreateSimulationAppSpec,
)
from create_simulation_app.src.robomaker_create_simulation_app_component import (
RoboMakerCreateSimulationAppComponent,
)
from tests.unit_tests.tests.robomaker.test_robomaker_create_sim_app_spec import (
RoboMakerCreateSimAppSpecTestCase,
)
import unittest
from unittest.mock import patch, MagicMock
class RoboMakerCreateSimAppTestCase(unittest.TestCase):
REQUIRED_ARGS = RoboMakerCreateSimAppSpecTestCase.REQUIRED_ARGS
@classmethod
def setUp(cls):
cls.component = RoboMakerCreateSimulationAppComponent()
# Instantiate without calling Do()
cls.component._app_name = "test-app"
@patch(
"create_simulation_app.src.robomaker_create_simulation_app_component.super",
MagicMock(),
)
def test_do_sets_name(self):
named_spec = RoboMakerCreateSimulationAppSpec(
self.REQUIRED_ARGS + ["--app_name", "my-app-name"]
)
self.component.Do(named_spec)
self.assertEqual("my-app-name", self.component._app_name)
def test_create_simulation_application_request(self):
spec = RoboMakerCreateSimulationAppSpec(self.REQUIRED_ARGS)
request = self.component._create_job_request(spec.inputs, spec.outputs)
self.assertEqual(
request,
{
"name": "test-app",
"renderingEngine": {
"name": "rendering_engine_name",
"version": "rendering_engine_version",
},
"robotSoftwareSuite": {
"name": "robot_software_name",
"version": "robot_software_version",
},
"simulationSoftwareSuite": {
"name": "simulation_software_name",
"version": "simulation_software_version",
},
"sources": [
{
"architecture": "X86_64",
"s3Bucket": "sources_bucket",
"s3Key": "sources_key",
}
],
"tags": {},
},
)
def test_missing_required_input(self):
missing_input = self.REQUIRED_ARGS.copy()
missing_input.remove("--app_name")
missing_input.remove("app-name")
with self.assertRaises(SystemExit):
spec = RoboMakerCreateSimulationAppSpec(missing_input)
self.component._create_job_request(spec.inputs, spec.outputs)
def test_get_job_status(self):
self.component._rm_client = MagicMock()
self.component._arn = "cool-arn"
self.component._rm_client.describe_simulation_application.return_value = {
"arn": None
}
self.assertEqual(
self.component._get_job_status(),
SageMakerJobStatus(
is_completed=True,
has_error=True,
error_message="No ARN present",
raw_status=None,
),
)
self.component._rm_client.describe_simulation_application.return_value = {
"arn": "arn:aws:robomaker:us-west-2:111111111111:simulation-application/MyRobotApplication/1551203301792"
}
self.assertEqual(
self.component._get_job_status(),
SageMakerJobStatus(
is_completed=True,
raw_status="arn:aws:robomaker:us-west-2:111111111111:simulation-application/MyRobotApplication/1551203301792",
),
)
@patch(
"create_simulation_app.src.robomaker_create_simulation_app_component.logging"
)
def test_after_submit_job_request(self, mock_logging):
spec = RoboMakerCreateSimulationAppSpec(self.REQUIRED_ARGS)
self.component._after_submit_job_request(
{"arn": "cool-arn"}, {}, spec.inputs, spec.outputs
)
mock_logging.info.assert_called_once()
def test_after_job_completed(self):
spec = RoboMakerCreateSimulationAppSpec(self.REQUIRED_ARGS)
mock_job_response = {
"arn": "arn:aws:robomaker:us-west-2:111111111111:simulation-application/MyRobotApplication/1551203301792",
"version": "latest",
"revisionId": "ee753e53-519c-4d37-895d-65e79bcd1914",
"tags": {},
}
self.component._after_job_complete(
mock_job_response, {}, spec.inputs, spec.outputs
)
self.assertEqual(
spec.outputs.arn,
"arn:aws:robomaker:us-west-2:111111111111:simulation-application/MyRobotApplication/1551203301792",
)

View File

@ -0,0 +1,32 @@
from create_simulation_app.src.robomaker_create_simulation_app_spec import (
RoboMakerCreateSimulationAppSpec,
)
import unittest
class RoboMakerCreateSimAppSpecTestCase(unittest.TestCase):
REQUIRED_ARGS = [
"--region",
"us-west-2",
"--app_name",
"app-name",
"--sources",
'[{"s3Bucket": "sources_bucket", "s3Key": "sources_key", "architecture": "X86_64"}]',
"--simulation_software_name",
"simulation_software_name",
"--simulation_software_version",
"simulation_software_version",
"--robot_software_name",
"robot_software_name",
"--robot_software_version",
"robot_software_version",
"--rendering_engine_name",
"rendering_engine_name",
"--rendering_engine_version",
"rendering_engine_version",
]
def test_minimum_required_args(self):
# Will raise if the inputs are incorrect
spec = RoboMakerCreateSimulationAppSpec(self.REQUIRED_ARGS)

View File

@ -0,0 +1,97 @@
from common.sagemaker_component import SageMakerJobStatus
from delete_simulation_app.src.robomaker_delete_simulation_app_spec import (
RoboMakerDeleteSimulationAppSpec,
)
from delete_simulation_app.src.robomaker_delete_simulation_app_component import (
RoboMakerDeleteSimulationAppComponent,
)
from tests.unit_tests.tests.robomaker.test_robomaker_delete_sim_app_spec import (
RoboMakerDeleteSimAppSpecTestCase,
)
import unittest
import json
from unittest.mock import patch, MagicMock
class RoboMakerDeleteSimAppTestCase(unittest.TestCase):
REQUIRED_ARGS = RoboMakerDeleteSimAppSpecTestCase.REQUIRED_ARGS
@classmethod
def setUp(cls):
cls.component = RoboMakerDeleteSimulationAppComponent()
# Instantiate without calling Do()
cls.component._arn = "cool-arn"
@patch(
"delete_simulation_app.src.robomaker_delete_simulation_app_component.super",
MagicMock(),
)
def test_do_sets_version(self):
named_spec = RoboMakerDeleteSimulationAppSpec(
self.REQUIRED_ARGS + ["--version", "cool-version"]
)
self.component.Do(named_spec)
self.assertEqual("cool-version", self.component._version)
def test_delete_simulation_application_request(self):
spec = RoboMakerDeleteSimulationAppSpec(self.REQUIRED_ARGS)
request = self.component._create_job_request(spec.inputs, spec.outputs)
self.assertEqual(request, {"application": "cool-arn",})
def test_missing_required_input(self):
missing_input = self.REQUIRED_ARGS.copy()
missing_input.remove("--arn")
missing_input.remove("cool-arn")
with self.assertRaises(SystemExit):
spec = RoboMakerDeleteSimulationAppSpec(missing_input)
self.component._create_job_request(spec.inputs, spec.outputs)
def test_get_job_status(self):
self.component._rm_client = MagicMock()
self.component._arn = "cool-arn"
self.component._rm_client.describe_simulation_application.return_value = {
"arn": None
}
self.assertEqual(
self.component._get_job_status(),
SageMakerJobStatus(is_completed=True, raw_status="Item deleted"),
)
self.component._rm_client.describe_simulation_application.return_value = {
"arn": "arn:aws:robomaker:us-west-2:111111111111:simulation-application/MyRobotApplication/1551203301792"
}
self.assertEqual(
self.component._get_job_status(),
SageMakerJobStatus(
is_completed=False,
raw_status="arn:aws:robomaker:us-west-2:111111111111:simulation-application/MyRobotApplication/1551203301792",
),
)
@patch(
"delete_simulation_app.src.robomaker_delete_simulation_app_component.logging"
)
def test_after_submit_job_request(self, mock_logging):
spec = RoboMakerDeleteSimulationAppSpec(self.REQUIRED_ARGS)
self.component._after_submit_job_request(
{"arn": "cool-arn"}, {}, spec.inputs, spec.outputs
)
mock_logging.info.assert_called_once()
def test_after_job_completed(self):
spec = RoboMakerDeleteSimulationAppSpec(self.REQUIRED_ARGS)
mock_job_response = {}
self.component._version = "cool-version"
self.component._after_job_complete(
mock_job_response, {}, spec.inputs, spec.outputs
)
# We expect to get returned the initial value we set for arn from REQUIRED_ARGS
# The response from the api for the delete call will always be empty or None
self.assertEqual(spec.outputs.arn, "cool-arn")

View File

@ -0,0 +1,18 @@
from delete_simulation_app.src.robomaker_delete_simulation_app_spec import (
RoboMakerDeleteSimulationAppSpec,
)
import unittest
class RoboMakerDeleteSimAppSpecTestCase(unittest.TestCase):
REQUIRED_ARGS = [
"--region",
"us-east-1",
"--arn",
"cool-arn",
]
def test_minimum_required_args(self):
# Will raise if the inputs are incorrect
spec = RoboMakerDeleteSimulationAppSpec(self.REQUIRED_ARGS)

View File

@ -0,0 +1,159 @@
from yaml.parser import ParserError
from common.sagemaker_component import SageMakerJobStatus
from simulation_job_batch.src.robomaker_simulation_job_batch_spec import (
RoboMakerSimulationJobBatchSpec,
)
from simulation_job_batch.src.robomaker_simulation_job_batch_component import (
RoboMakerSimulationJobBatchComponent,
)
from tests.unit_tests.tests.robomaker.test_robomaker_simulation_job_batch_spec import (
RoboMakerSimulationJobBatchSpecTestCase,
)
import unittest
from unittest.mock import MagicMock
class RoboMakerSimulationJobTestCase(unittest.TestCase):
REQUIRED_ARGS = RoboMakerSimulationJobBatchSpecTestCase.REQUIRED_ARGS
@classmethod
def setUp(cls):
cls.component = RoboMakerSimulationJobBatchComponent()
cls.component._arn = "fake-arn"
cls.component._batch_job_id = "fake-id"
cls.component._sim_request_ids = set()
def test_create_simulation_batch_job(self):
spec = RoboMakerSimulationJobBatchSpec(self.REQUIRED_ARGS)
request = self.component._create_job_request(spec.inputs, spec.outputs)
self.assertEqual(
request,
{
"batchPolicy": {"maxConcurrency": 3, "timeoutInSeconds": 5800},
"createSimulationJobRequests": [
{
"dataSources": {
"name": "data-source-name",
"s3Bucket": "data-source-bucket",
"s3Keys": [{"s3Key": "data-source-key"}],
},
"failureBehavior": "Fail",
"iamRole": "TestRole",
"loggingConfig": {"recordAllRosTopics": "True"},
"maxJobDurationInSeconds": "123",
"outputLocation": {
"s3Bucket": "fake-bucket",
"s3Prefix": "fake-key",
},
"simulationApplications": [
{
"application": "test-arn",
"applicationVersion": "1",
"launchConfig": {
"environmentVariables": {"Env": "var"},
"launchFile": "launch-file.py",
"packageName": "package-name",
"portForwardingConfig": {
"portMappings": [
{
"applicationPort": "123",
"enableOnPublicIp": "True",
"jobPort": "123",
}
]
},
"streamUI": "True",
},
}
],
}
],
"tags": {},
},
)
def test_get_job_status(self):
self.component._rm_client = MagicMock()
self.component._rm_client.describe_simulation_job_batch.return_value = {
"status": "Starting"
}
self.assertEqual(
self.component._get_job_status(),
SageMakerJobStatus(is_completed=False, raw_status="Starting"),
)
self.component._rm_client.describe_simulation_job_batch.return_value = {
"status": "Downloading"
}
self.assertEqual(
self.component._get_job_status(),
SageMakerJobStatus(is_completed=False, raw_status="Downloading"),
)
self.component._rm_client.describe_simulation_job_batch.return_value = {
"status": "Completed",
"createdRequests": [{"status": "Completed",}],
}
self.assertEqual(
self.component._get_job_status(),
SageMakerJobStatus(is_completed=True, raw_status="Completed"),
)
self.component._rm_client.describe_simulation_job_batch.return_value = {
"status": "Canceled",
"createdRequests": [{"status": "Failed", "arn": "fake-arn"}],
}
self.component._rm_client.describe_simulation_job.return_value = {
"status": "Failed",
"arn": "fake-arn",
"failureCode": "InternalServiceError",
"failureReason": "Big Reason",
}
self.assertEqual(
self.component._get_job_status(),
SageMakerJobStatus(
is_completed=True,
raw_status="Canceled",
has_error=True,
error_message="Simulation jobs are completed\nSimulation job: fake-arn failed with errorCode:InternalServiceError\n",
),
)
self.component._rm_client.describe_simulation_job_batch.return_value = {
"status": "Failed",
"failureCode": "InternalServiceError",
"failureReason": "Big Reason",
}
self.assertEqual(
self.component._get_job_status(),
SageMakerJobStatus(
is_completed=True,
raw_status="Failed",
has_error=True,
error_message="Simulation batch job is in status:Failed\nSimulation failed with reason:Big ReasonSimulation failed with errorCode:InternalServiceError",
),
)
def test_no_simulation_job_requests(self):
no_job_requests = self.REQUIRED_ARGS.copy()
no_job_requests = no_job_requests[
: no_job_requests.index("--simulation_job_requests")
]
with self.assertRaises(SystemExit):
spec = RoboMakerSimulationJobBatchSpec(no_job_requests)
self.component._create_job_request(spec.inputs, spec.outputs)
def test_empty_simulation_job_requests(self):
empty_job_requests = self.REQUIRED_ARGS.copy()
empty_job_requests[-1:] = "[]"
print(empty_job_requests)
with self.assertRaises(ParserError):
spec = RoboMakerSimulationJobBatchSpec(empty_job_requests)
self.component._create_job_request(spec.inputs, spec.outputs)

View File

@ -0,0 +1,64 @@
from simulation_job_batch.src.robomaker_simulation_job_batch_spec import (
RoboMakerSimulationJobBatchSpec,
)
import unittest
import json
class RoboMakerSimulationJobBatchSpecTestCase(unittest.TestCase):
REQUIRED_ARGS = [
"--region",
"us-west-2",
"--role",
"role-arn",
"--timeout_in_secs",
"5800",
"--max_concurrency",
"3",
"--simulation_job_requests",
json.dumps(
[
{
"outputLocation": {
"s3Bucket": "fake-bucket",
"s3Prefix": "fake-key",
},
"loggingConfig": {"recordAllRosTopics": "True"},
"maxJobDurationInSeconds": "123",
"iamRole": "TestRole",
"failureBehavior": "Fail",
"simulationApplications": [
{
"application": "test-arn",
"applicationVersion": "1",
"launchConfig": {
"packageName": "package-name",
"launchFile": "launch-file.py",
"environmentVariables": {"Env": "var",},
"portForwardingConfig": {
"portMappings": [
{
"jobPort": "123",
"applicationPort": "123",
"enableOnPublicIp": "True",
}
]
},
"streamUI": "True",
},
}
],
"dataSources": {
"name": "data-source-name",
"s3Bucket": "data-source-bucket",
"s3Keys": [{"s3Key": "data-source-key",}],
},
}
]
),
]
def test_minimum_required_args(self):
# Will raise if the inputs are incorrect
spec = RoboMakerSimulationJobBatchSpec(self.REQUIRED_ARGS)

View File

@ -0,0 +1,148 @@
from common.sagemaker_component import SageMakerJobStatus
from simulation_job.src.robomaker_simulation_job_spec import RoboMakerSimulationJobSpec
from simulation_job.src.robomaker_simulation_job_component import (
RoboMakerSimulationJobComponent,
)
from tests.unit_tests.tests.robomaker.test_robomaker_simulation_job_spec import (
RoboMakerSimulationJobSpecTestCase,
)
import unittest
from unittest.mock import MagicMock
class RoboMakerSimulationJobTestCase(unittest.TestCase):
REQUIRED_ARGS = RoboMakerSimulationJobSpecTestCase.REQUIRED_ARGS
@classmethod
def setUp(cls):
cls.component = RoboMakerSimulationJobComponent()
cls.component._arn = "fake-arn"
cls.component._job_id = "fake-id"
def test_create_simulation_job(self):
spec = RoboMakerSimulationJobSpec(self.REQUIRED_ARGS)
request = self.component._create_job_request(spec.inputs, spec.outputs)
self.assertEqual(
request,
{
"outputLocation": {
"s3Bucket": "output-bucket-name",
"s3Prefix": "output-bucket-key",
},
"maxJobDurationInSeconds": 900,
"iamRole": "role-arn",
"failureBehavior": "Fail",
"simulationApplications": [
{
"application": "simulation_app_arn",
"launchConfig": {
"environmentVariables": {"Env": "var"},
"launchFile": "launch-file.py",
"packageName": "package-name",
"portForwardingConfig": {
"portMappings": [
{
"applicationPort": "123",
"enableOnPublicIp": "True",
"jobPort": "123",
}
]
},
"streamUI": "True",
},
}
],
"dataSources": [
{
"name": "data-source-name",
"s3Bucket": "data-source-bucket",
"s3Keys": [{"s3Key": "data-source-key"}],
}
],
"compute": {"simulationUnitLimit": 15},
"tags": {},
},
)
def test_get_job_status(self):
self.component._rm_client = MagicMock()
self.component._rm_client.describe_simulation_job.return_value = {
"status": "Starting"
}
self.assertEqual(
self.component._get_job_status(),
SageMakerJobStatus(is_completed=False, raw_status="Starting"),
)
self.component._rm_client.describe_simulation_job.return_value = {
"status": "Downloading"
}
self.assertEqual(
self.component._get_job_status(),
SageMakerJobStatus(is_completed=False, raw_status="Downloading"),
)
self.component._rm_client.describe_simulation_job.return_value = {
"status": "Completed"
}
self.assertEqual(
self.component._get_job_status(),
SageMakerJobStatus(is_completed=True, raw_status="Completed"),
)
self.component._rm_client.describe_simulation_job.return_value = {
"status": "Failed",
"failureCode": "InternalServiceError",
"failureReason": "Big Reason",
}
self.assertEqual(
self.component._get_job_status(),
SageMakerJobStatus(
is_completed=True,
raw_status="Failed",
has_error=True,
error_message="Simulation job is in status:Failed\nSimulation failed with reason:Big ReasonSimulation failed with errorCode:InternalServiceError",
),
)
def test_after_job_completed(self):
spec = RoboMakerSimulationJobSpec(self.REQUIRED_ARGS)
mock_out = "s3://cool-bucket/fake-key"
self.component._get_job_outputs = MagicMock(return_value=mock_out)
self.component._after_job_complete({}, {}, spec.inputs, spec.outputs)
self.assertEqual(spec.outputs.output_artifacts, mock_out)
def test_get_job_outputs(self):
self.component._rm_client = mock_client = MagicMock()
mock_client.describe_simulation_job.return_value = {
"outputLocation": {"s3Bucket": "cool-bucket", "s3Prefix": "fake-key",}
}
self.assertEqual(
self.component._get_job_outputs(), "s3://cool-bucket/fake-key",
)
def test_no_simulation_app_defined(self):
no_sim_app = self.REQUIRED_ARGS.copy()
no_sim_app.remove("--sim_app_arn")
no_sim_app.remove("simulation_app_arn")
with self.assertRaises(Exception):
spec = RoboMakerSimulationJobSpec(no_sim_app)
self.component._create_job_request(spec.inputs, spec.outputs)
def test_no_launch_config_defined(self):
no_launch_config = self.REQUIRED_ARGS.copy()
no_launch_config = no_launch_config[
: no_launch_config.index("--sim_app_launch_config")
]
with self.assertRaises(Exception):
spec = RoboMakerSimulationJobSpec(no_launch_config)
self.component._create_job_request(spec.inputs, spec.outputs)

View File

@ -0,0 +1,55 @@
from simulation_job.src.robomaker_simulation_job_spec import RoboMakerSimulationJobSpec
import unittest
import json
class RoboMakerSimulationJobSpecTestCase(unittest.TestCase):
REQUIRED_ARGS = [
"--region",
"us-west-2",
"--role",
"role-arn",
"--output_bucket",
"output-bucket-name",
"--output_path",
"output-bucket-key",
"--max_run",
"900",
"--data_sources",
json.dumps(
[
{
"name": "data-source-name",
"s3Bucket": "data-source-bucket",
"s3Keys": [{"s3Key": "data-source-key",}],
}
]
),
"--sim_app_arn",
"simulation_app_arn",
"--sim_app_version",
"1",
"--sim_app_launch_config",
json.dumps(
{
"packageName": "package-name",
"launchFile": "launch-file.py",
"environmentVariables": {"Env": "var",},
"portForwardingConfig": {
"portMappings": [
{
"jobPort": "123",
"applicationPort": "123",
"enableOnPublicIp": "True",
}
]
},
"streamUI": "True",
}
),
]
def test_minimum_required_args(self):
# Will raise if the inputs are incorrect
spec = RoboMakerSimulationJobSpec(self.REQUIRED_ARGS)

View File

@ -36,8 +36,8 @@ inputs:
channels. Must have at least one.}
- {name: instance_type, type: String, description: The ML compute instance type.,
default: ml.m4.xlarge}
- {name: instance_count, type: Integer, description: The registry path of the Docker
image that contains the training algorithm., default: '1'}
- {name: instance_count, type: Integer, description: The number of ML compute instances
to use in the training job., default: '1'}
- {name: volume_size, type: Integer, description: The size of the ML storage volume
that you want to provision., default: '30'}
- {name: resource_encryption_key, type: String, description: The AWS KMS key that
@ -72,7 +72,7 @@ outputs:
the training algorithm.}
implementation:
container:
image: amazon/aws-sagemaker-kfp-components:1.0.0
image: amazon/aws-sagemaker-kfp-components:1.1.0
command: [python3]
args:
- train/src/sagemaker_training_component.py

View File

@ -26,28 +26,10 @@ from common.sagemaker_component import (
SageMakerComponent,
ComponentMetadata,
SageMakerJobStatus,
DebugRulesStatus,
)
class DebugRulesStatus(Enum):
COMPLETED = auto()
ERRORED = auto()
INPROGRESS = auto()
@classmethod
def from_describe(cls, response):
has_error = False
for debug_rule in response["DebugRuleEvaluationStatuses"]:
if debug_rule["RuleEvaluationStatus"] == "Error":
has_error = True
if debug_rule["RuleEvaluationStatus"] == "InProgress":
return DebugRulesStatus.INPROGRESS
if has_error:
return DebugRulesStatus.ERRORED
else:
return DebugRulesStatus.COMPLETED
@ComponentMetadata(
name="SageMaker - Training Job",
description="Train Machine Learning and Deep Learning Models using SageMaker",

View File

@ -118,7 +118,7 @@ class SageMakerTrainingSpec(
instance_count=InputValidator(
required=True,
input_type=int,
description="The registry path of the Docker image that contains the training algorithm.",
description="The number of ML compute instances to use in the training job.",
default=1,
),
volume_size=InputValidator(

View File

@ -22,7 +22,7 @@ outputs:
- {name: workteam_arn, description: The ARN of the workteam.}
implementation:
container:
image: amazon/aws-sagemaker-kfp-components:1.0.0
image: amazon/aws-sagemaker-kfp-components:1.1.0
command: [python3]
args:
- workteam/src/sagemaker_workteam_component.py

View File

@ -0,0 +1,72 @@
The two examples in this directory each run a different type of RLEstimator Reinforcement Learning job as a SageMaker training job.
## Examples
Each example is based on a notebook from the [AWS SageMaker Examples](https://github.com/aws/amazon-sagemaker-examples) repo.
(It should be noted that all of these examples are available by default on all SageMaker Notebook Instance's)
The `rlestimator_pipeline_custom_image` pipeline example is based on the
[`rl_unity_ray`](https://github.com/aws/amazon-sagemaker-examples/blob/master/reinforcement_learning/rl_unity_ray/rl_unity_ray.ipynb) notebook.
The `rlestimator_pipeline_toolkit_image` pipeline example is based on the
[`rl_news_vendor_ray_custom`](https://github.com/aws/amazon-sagemaker-examples/blob/master/reinforcement_learning/rl_resource_allocation_ray_customEnv/rl_news_vendor_ray_custom.ipynb) notebook.
## Prerequisites
To run these examples you will need to create a number of resources that will then be used as inputs for the pipeline component.
rlestimator_pipeline_custom_image required inputs:
```
output_bucket_name = <bucket used for outputs from the training job>
input_bucket_name = <bucket used for inputs, in this case custom code via a tar.gz>
input_key = <the path and file name of the source code tar.gz>
job_name_prefix = <not required, but can be useful to identify these training jobs>
image_uri = <docker image uri, can be docker.io if you have internet access, but might be easier to use ECR>
assume_role = <sagemaker execution role, this is created for you automatically when you launch a notebook instance>
```
rl_news_vendor_ray_custom required inputs:
```
output_bucket_name = <bucket used for outputs from the training job>
input_bucket_name = <bucket used for inputs, in this case custom code via a tar.gz>
input_key = <the path and file name of the source code tar.gz>
job_name_prefix = <not required, but can be useful to identify these training jobs>
role = <sagemaker execution role, this is created for you automatically when you launch a notebook instance>
```
You could go to the bother of creating all of these resources individually, but it might be easier to run each of the notebooks
mentioned above, and then use the resources that are created by the notebooks. For the input bucket and output bucket they
will be created under a name like 'sagemaker-us-east-1-520713654638' depending on your region and account number. Within
these buckets a key will be created for each of your training job runs. After you have executed all cells in each of the notebooks
a key for each training job that has completed will be made and any custom code required for the training job will be placed
there as a .tar.gz file. The tar.gz file full S3 URI can be used as the source_dir input for these pipeline components.
## Compiling the pipeline template
Follow the guide to [building a pipeline](https://www.kubeflow.org/docs/guides/pipelines/build-pipeline/) to install the Kubeflow Pipelines SDK, then run the following command to compile the sample Python into a workflow specification. The specification takes the form of a YAML file compressed into a `.tar.gz` file.
```bash
dsl-compile --py rlestimator_pipeline_custom_image.py --output rlestimator_pipeline_custom_image.tar.gz
dsl-compile --py rlestimator_pipeline_toolkit_image.py --output rlestimator_pipeline_toolkit_image.tar.gz
```
## Deploying the pipeline
Open the Kubeflow pipelines UI. Create a new pipeline, and then upload the compiled specification (`.tar.gz` file) as a new pipeline template.
Once the pipeline done, you can go to the S3 path specified in `output` to check your prediction results. There're three columes, `PassengerId`, `prediction`, `Survived` (Ground True value)
```
...
4,1,1
5,0,0
6,0,0
7,0,0
...
```
## Components source
RLEstimator Training Job:
[source code](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/rlestimator/src)

View File

@ -0,0 +1,76 @@
#!/usr/bin/env python3
# Uncomment the apply(use_aws_secret()) below if you are not using OIDC
# more info : https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/README.md
import kfp
import os
from kfp import components
from kfp import dsl
from kfp.aws import use_aws_secret
from sagemaker.rl import RLEstimator, RLToolkit
cur_file_dir = os.path.dirname(__file__)
components_dir = os.path.join(cur_file_dir, "../../../../components/aws/sagemaker/")
sagemaker_rlestimator_op = components.load_component_from_file(
components_dir + "/rlestimator/component.yaml"
)
output_bucket_name = "kf-pipelines-rlestimator-output"
input_bucket_name = "kf-pipelines-rlestimator-input"
input_key = "sourcedir.tar.gz"
job_name_prefix = "rlestimator-pipeline-custom-image"
image_uri = "your_sagemaker_image_name"
role = "your_sagemaker_role_name"
security_groups = ["sg-0490601e83f220e82"]
subnets = [
"subnet-0efc73526db16a4a4",
"subnet-0b8af626f39e7d462",
]
# You need to specify your own metric_definitions if using a custom image_uri
metric_definitions = RLEstimator.default_metric_definitions(RLToolkit.RAY)
@dsl.pipeline(
name="RLEstimator Custom Docker Image",
description="RLEstimator training job where we provide a reference to a Docker image containing our training code",
)
def rlestimator_training_custom_pipeline(
region="us-east-1",
entry_point="train-unity.py",
source_dir="s3://{}/{}".format(input_bucket_name, input_key),
image_uri=image_uri,
assume_role=role,
instance_type="ml.c5.2xlarge",
instance_count=1,
output_path="s3://{}/".format(output_bucket_name),
base_job_name=job_name_prefix,
metric_definitions=metric_definitions,
hyperparameters={},
vpc_security_group_ids=security_groups,
vpc_subnets=subnets,
):
rlestimator_training_custom = sagemaker_rlestimator_op(
region=region,
entry_point=entry_point,
source_dir=source_dir,
image=image_uri,
role=assume_role,
model_artifact_path=output_path,
job_name=base_job_name,
metric_definitions=metric_definitions,
instance_type=instance_type,
instance_count=instance_count,
hyperparameters=hyperparameters,
vpc_security_group_ids=vpc_security_group_ids,
vpc_subnets=vpc_subnets,
) # .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
if __name__ == "__main__":
kfp.compiler.Compiler().compile(
rlestimator_training_custom_pipeline, __file__ + ".zip"
)

View File

@ -0,0 +1,93 @@
#!/usr/bin/env python3
# Uncomment the apply(use_aws_secret()) below if you are not using OIDC
# more info : https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/README.md
import kfp
import os
from kfp import components
from kfp import dsl
from kfp.aws import use_aws_secret
cur_file_dir = os.path.dirname(__file__)
components_dir = os.path.join(cur_file_dir, "../../../../components/aws/sagemaker/")
sagemaker_rlestimator_op = components.load_component_from_file(
components_dir + "/rlestimator/component.yaml"
)
metric_definitions = [
{
"Name": "episode_reward_mean",
"Regex": "episode_reward_mean: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)",
},
{
"Name": "episode_reward_max",
"Regex": "episode_reward_max: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)",
},
{
"Name": "episode_len_mean",
"Regex": "episode_len_mean: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)",
},
{"Name": "entropy", "Regex": "entropy: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)"},
{
"Name": "episode_reward_min",
"Regex": "episode_reward_min: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)",
},
{"Name": "vf_loss", "Regex": "vf_loss: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)"},
{
"Name": "policy_loss",
"Regex": "policy_loss: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)",
},
]
output_bucket_name = "your_sagemaker_bucket_name"
input_bucket_name = "your_sagemaker_bucket_name"
input_key = "rl-newsvendor-2020-11-11-10-43-30-556/source/sourcedir.tar.gz"
job_name_prefix = "rlestimator-pipeline-toolkit-image"
role = "your_sagemaker_role_name"
@dsl.pipeline(
name="RLEstimator Toolkit & Framework Pipeline",
description="RLEstimator training job where the AWS Docker image is auto-selected based on the Toolkit and Framework we define",
)
def rlestimator_training_toolkit_pipeline(
region="us-east-1",
entry_point="train_news_vendor.py",
source_dir="s3://{}/{}".format(input_bucket_name, input_key),
toolkit="ray",
toolkit_version="0.8.5",
framework="tensorflow",
assume_role=role,
instance_type="ml.c5.2xlarge",
instance_count=1,
output_path="s3://{}/".format(output_bucket_name),
base_job_name=job_name_prefix,
metric_definitions=metric_definitions,
max_run=300,
hyperparameters={},
):
rlestimator_training_toolkit = sagemaker_rlestimator_op(
region=region,
entry_point=entry_point,
source_dir=source_dir,
toolkit=toolkit,
toolkit_version=toolkit_version,
framework=framework,
role=assume_role,
instance_type=instance_type,
instance_count=instance_count,
model_artifact_path=output_path,
job_name=base_job_name,
metric_definitions=metric_definitions,
max_run=max_run,
hyperparameters=hyperparameters,
) # .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
if __name__ == "__main__":
kfp.compiler.Compiler().compile(
rlestimator_training_toolkit_pipeline, __file__ + ".zip"
)

View File

@ -0,0 +1,87 @@
Examples for creating a simulation application, running a simulation job, running a simulation job batch, and deleting a simulation application.
## Examples
The examples are based on a notebook from the [AWS SageMaker Examples](https://github.com/aws/amazon-sagemaker-examples) repo.
The simulation jobs that are launched by these examples are based on the
[`rl_objecttracker_robomaker_coach_gazebo`](https://github.com/aws/amazon-sagemaker-examples/tree/3de42334720a7197ea1f15395b66c44cf5ef7fd4/reinforcement_learning/rl_objecttracker_robomaker_coach_gazebo) notebook.
This is an older notebook example, but you can still download it from github and upload directly to Jupyter Lab in SageMaker.
## Prerequisites
To run these examples you will need to create a number of resources that will then be used as inputs for the pipeline component.
Some of the inputs are used to create the RoboMaker Simulation Application and some are used as inputs for the RoboMaker
Simulation Job.
required inputs for simulation job example:
```
role = <robomaker execution role, this is created for you automatically when you launch a notebook instance>
region = <region in which to deploy the robomaker resources>
app_name = <name to be given to the simulation application>
sources = <source code files for the simulation application>
simulation_software_suite = <select the simulation application software suite to use>
robot_software_suite = <select the simulation application robot software suite to use>
rendering_engine = <select the simulation application rendering engine suite to use>
output_bucket = <bucket used for outputs from the training job>
output_path = <key within the output bucket to use for output artifacts>
max_run = <the maximum time to run the simulation job for>
max_run = <the maximum time to run the simulation job for>
failure_behavior = <"Fail" or "Continue">
sim_app_arn = <used as input to simulation job component, comes as an output from simulation application component>
sim_app_launch_config = <dictionary containing launch configurations>
vpc_subnets = <subnets to launch the simulation job into>
vpc_security_group_ids = <security groups to use if launching in a VPC>
use_public_ip = <whether or not to use a public ip to access the simulation job>
```
required inputs for simulation job batch example:
```
role = <robomaker execution role, this is created for you automatically when you launch a notebook instance>
region = <region in which to deploy the robomaker resources>
app_name = <name to be given to the simulation application>
sources = <source code files for the simulation application>
simulation_software_suite = <select the simulation application software suite to use>
robot_software_suite = <select the simulation application robot software suite to use>
rendering_engine = <select the simulation application rendering engine suite to use>
timeout_in_secs = <maximum timeout to wait for simulation jobs in batch to launch>,
max_concurrency = <maximum concurrency for simulation jobs in batch>,
simulation_job_requests = <the definitions for the simulation jobs, things like launch configs and vpc configs are placed in here>,
sim_app_arn=robomaker_create_sim_app.outputs["arn"]
sim_app_arn = <used as input to simulation job component, comes as an output from simulation application component>
```
You could go to the bother of creating all of these resources individually, but it might be easier to run the notebook
mentioned above, and then use the resources that are created by that notebook. The notebook should create the output_bucket,
output_key, vpc configs, launch config, etc and you can use those as the inputs for this example.
## Compiling the pipeline template
Follow the guide to [building a pipeline](https://www.kubeflow.org/docs/guides/pipelines/build-pipeline/) to install the Kubeflow Pipelines SDK, then run the following command to compile the sample Python into a workflow specification. The specification takes the form of a YAML file compressed into a `.tar.gz` file.
```bash
dsl-compile --py rlestimator_pipeline_custom_image.py --output rlestimator_pipeline_custom_image.tar.gz
dsl-compile --py rlestimator_pipeline_toolkit_image.py --output rlestimator_pipeline_toolkit_image.tar.gz
dsl-compile --py sagemaker_robomaker_rl_job.py --output sagemaker_robomaker_rl_job.tar.gz
```
## Deploying the pipeline
Open the Kubeflow pipelines UI. Create a new pipeline, and then upload the compiled specification (`.tar.gz` file) as a new pipeline template.
Once the pipeline done, you can go to the S3 path specified in `output` to check your prediction results. There're three columes, `PassengerId`, `prediction`, `Survived` (Ground True value)
```
...
4,1,1
5,0,0
6,0,0
7,0,0
...
```
## Components source
RoboMaker Create Simulation Application:
[source code](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/create_simulation_application/src)

View File

@ -0,0 +1,128 @@
#!/usr/bin/env python3
# Uncomment the apply(use_aws_secret()) below if you are not using OIDC
# more info : https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/README.md
import kfp
import os
from kfp import components
from kfp import dsl
import random
import string
from kfp.aws import use_aws_secret
cur_file_dir = os.path.dirname(__file__)
components_dir = os.path.join(cur_file_dir, "../../../../components/aws/sagemaker/")
robomaker_create_sim_app_op = components.load_component_from_file(
components_dir + "/create_simulation_app/component.yaml"
)
robomaker_sim_job_op = components.load_component_from_file(
components_dir + "/simulation_job/component.yaml"
)
robomaker_delete_sim_app_op = components.load_component_from_file(
components_dir + "/delete_simulation_app/component.yaml"
)
launch_config = {
"packageName": "object_tracker_simulation",
"launchFile": "evaluation.launch",
"environmentVariables": {
"MODEL_S3_BUCKET": "your_sagemaker_bucket_name",
"MODEL_S3_PREFIX": "rl-object-tracker-sagemaker-201116-051751",
"ROS_AWS_REGION": "us-east-1",
"MARKOV_PRESET_FILE": "object_tracker.py",
"NUMBER_OF_ROLLOUT_WORKERS": "1",
},
"streamUI": True,
}
simulation_app_name = "robomaker-pipeline-simulation-application"
sources_bucket = "your_sagemaker_bucket_name"
sources_key = "object-tracker/simulation_ws.tar.gz"
sources_architecture = "X86_64"
simulation_software_name = "Gazebo"
simulation_software_version = "7"
robot_software_name = "ROS"
robot_software_version = "Kinetic"
rendering_engine_name = "OGRE"
rendering_engine_version = "1.x"
role = "your_sagemaker_role_name"
output_bucket = "kf-pipelines-robomaker-output"
output_key = "test-output-key"
security_groups = ["sg-0490601e83f220e82"]
subnets = [
"subnet-0efc73526db16a4a4",
"subnet-0b8af626f39e7d462",
]
@dsl.pipeline(
name="RoboMaker Simulation Job Pipeline",
description="RoboMaker simulation job and simulation application created via pipeline components",
)
def robomaker_simulation_job_app_pipeline(
region="us-east-1",
role=role,
name=simulation_app_name
+ "".join(random.choice(string.ascii_lowercase) for i in range(10)),
sources=[
{
"s3Bucket": sources_bucket,
"s3Key": sources_key,
"architecture": sources_architecture,
}
],
simulation_software_name=simulation_software_name,
simulation_software_version=simulation_software_version,
robot_software_name=robot_software_name,
robot_software_version=robot_software_version,
rendering_engine_name=rendering_engine_name,
rendering_engine_version=rendering_engine_version,
output_bucket=output_bucket,
output_path=output_key,
sim_app_launch_config=launch_config,
vpc_security_group_ids=security_groups,
vpc_subnets=subnets,
):
robomaker_create_sim_app = robomaker_create_sim_app_op(
region=region,
app_name=name,
sources=sources,
simulation_software_name=simulation_software_name,
simulation_software_version=simulation_software_version,
robot_software_name=robot_software_name,
robot_software_version=robot_software_version,
rendering_engine_name=rendering_engine_name,
rendering_engine_version=rendering_engine_version,
)
# .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
robomaker_simulation_job = robomaker_sim_job_op(
region=region,
role=role,
output_bucket=output_bucket,
output_path=output_path,
max_run=300,
failure_behavior="Fail",
sim_app_arn=robomaker_create_sim_app.outputs["arn"],
sim_app_launch_config=sim_app_launch_config,
vpc_security_group_ids=vpc_security_group_ids,
vpc_subnets=vpc_subnets,
use_public_ip="True",
).after(robomaker_create_sim_app)
# .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
robomaker_delete_sim_app = robomaker_delete_sim_app_op(
region=region, arn=robomaker_create_sim_app.outputs["arn"],
).after(robomaker_simulation_job, robomaker_create_sim_app)
# .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
if __name__ == "__main__":
kfp.compiler.Compiler().compile(
robomaker_simulation_job_app_pipeline, __file__ + ".zip"
)

View File

@ -0,0 +1,136 @@
#!/usr/bin/env python3
# Uncomment the apply(use_aws_secret()) below if you are not using OIDC
# more info : https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/README.md
import kfp
import os
from kfp import components
from kfp import dsl
import random
import string
from kfp.aws import use_aws_secret
cur_file_dir = os.path.dirname(__file__)
components_dir = os.path.join(cur_file_dir, "../../../../components/aws/sagemaker/")
robomaker_create_sim_app_op = components.load_component_from_file(
components_dir + "/create_simulation_app/component.yaml"
)
robomaker_sim_job_batch_op = components.load_component_from_file(
components_dir + "/simulation_job_batch/component.yaml"
)
robomaker_delete_sim_app_op = components.load_component_from_file(
components_dir + "/delete_simulation_app/component.yaml"
)
simulation_app_name = "robomaker-pipeline-simulation-batch-application"
sources_bucket = "your_sagemaker_bucket_name"
sources_key = "object-tracker/simulation_ws.tar.gz"
sources_architecture = "X86_64"
simulation_software_name = "Gazebo"
simulation_software_version = "7"
robot_software_name = "ROS"
robot_software_version = "Kinetic"
rendering_engine_name = "OGRE"
rendering_engine_version = "1.x"
role = "your_sagemaker_role_name"
job_requests = [
{
"outputLocation": {
"s3Bucket": "kf-pipelines-robomaker-output",
"s3Prefix": "test-output-key",
},
"loggingConfig": {"recordAllRosTopics": True},
"maxJobDurationInSeconds": 900,
"iamRole": "your_sagemaker_role_name",
"failureBehavior": "Fail",
"simulationApplications": [
{
"application": "test-arn",
"launchConfig": {
"packageName": "object_tracker_simulation",
"launchFile": "evaluation.launch",
"environmentVariables": {
"MODEL_S3_BUCKET": "your_sagemaker_bucket_name",
"MODEL_S3_PREFIX": "rl-object-tracker-sagemaker-201116-051751",
"ROS_AWS_REGION": "us-east-1",
"MARKOV_PRESET_FILE": "object_tracker.py",
"NUMBER_OF_ROLLOUT_WORKERS": "1",
},
"streamUI": True,
},
}
],
"vpcConfig": {
"subnets": ["subnet-0efc73526db16a4a4", "subnet-0b8af626f39e7d462",],
"securityGroups": ["sg-0490601e83f220e82"],
"assignPublicIp": True,
},
}
]
@dsl.pipeline(
name="RoboMaker Job Batch Pipeline",
description="RoboMaker simulation job batch is launched via a pipeline component",
)
def robomaker_simulation_job_batch_app_pipeline(
region="us-east-1",
role=role,
name=simulation_app_name
+ "".join(random.choice(string.ascii_lowercase) for i in range(10)),
sources=[
{
"s3Bucket": sources_bucket,
"s3Key": sources_key,
"architecture": sources_architecture,
}
],
simulation_software_name=simulation_software_name,
simulation_software_version=simulation_software_version,
robot_software_name=robot_software_name,
robot_software_version=robot_software_version,
rendering_engine_name=rendering_engine_name,
rendering_engine_version=rendering_engine_version,
timeout_in_secs="900",
max_concurrency="3",
simulation_job_requests=job_requests,
):
robomaker_create_sim_app = robomaker_create_sim_app_op(
region=region,
app_name=name,
sources=sources,
simulation_software_name=simulation_software_name,
simulation_software_version=simulation_software_version,
robot_software_name=robot_software_name,
robot_software_version=robot_software_version,
rendering_engine_name=rendering_engine_name,
rendering_engine_version=rendering_engine_version,
)
# .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
robomaker_simulation_batch_job = robomaker_sim_job_batch_op(
region=region,
role=role,
timeout_in_secs=timeout_in_secs,
max_concurrency=max_concurrency,
simulation_job_requests=simulation_job_requests,
sim_app_arn=robomaker_create_sim_app.outputs["arn"],
).after(robomaker_create_sim_app)
# .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
robomaker_delete_sim_app = robomaker_delete_sim_app_op(
region=region, arn=robomaker_create_sim_app.outputs["arn"],
).after(robomaker_simulation_batch_job, robomaker_create_sim_app)
# .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
if __name__ == "__main__":
kfp.compiler.Compiler().compile(
robomaker_simulation_job_batch_app_pipeline, __file__ + ".zip"
)

View File

@ -0,0 +1,196 @@
#!/usr/bin/env python3
# Uncomment the apply(use_aws_secret()) below if you are not using OIDC
# more info : https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/README.md
import kfp
import os
from kfp import components
from kfp import dsl
import random
import string
from kfp.aws import use_aws_secret
cur_file_dir = os.path.dirname(__file__)
components_dir = os.path.join(cur_file_dir, "../../../../components/aws/sagemaker/")
robomaker_create_sim_app_op = components.load_component_from_file(
components_dir + "/create_simulation_app/component.yaml"
)
robomaker_sim_job_op = components.load_component_from_file(
components_dir + "/simulation_job/component.yaml"
)
robomaker_delete_sim_app_op = components.load_component_from_file(
components_dir + "/delete_simulation_app/component.yaml"
)
sagemaker_rlestimator_op = components.load_component_from_file(
components_dir + "/rlestimator/component.yaml"
)
metric_definitions = [
{"Name": "reward-training", "Regex": "^Training>.*Total reward=(.*?),"},
{"Name": "ppo-surrogate-loss", "Regex": "^Policy training>.*Surrogate loss=(.*?),"},
{"Name": "ppo-entropy", "Regex": "^Policy training>.*Entropy=(.*?),"},
{"Name": "reward-testing", "Regex": "^Testing>.*Total reward=(.*?),"},
]
# Simulation Application Inputs
region = "us-east-1"
simulation_software_name = "Gazebo"
simulation_software_version = "7"
robot_software_name = "ROS"
robot_software_version = "Kinetic"
rendering_engine_name = "OGRE"
rendering_engine_version = "1.x"
simulation_app_name = "robomaker-pipeline-objecttracker-sim-app" + "".join(
random.choice(string.ascii_lowercase) for i in range(10)
)
sources_bucket = "your_sagemaker_bucket_name"
sources_key = "object-tracker/simulation_ws.tar.gz"
sources_architecture = "X86_64"
sources = [
{
"s3Bucket": sources_bucket,
"s3Key": sources_key,
"architecture": sources_architecture,
}
]
# RLEstimator Inputs
entry_point = "training_worker.py"
rl_sources_key = "rl-object-tracker-sagemaker-201123-042019/source/sourcedir.tar.gz"
source_dir = "s3://{}/{}".format(sources_bucket, rl_sources_key)
rl_output_path = "s3://{}/".format(sources_bucket)
train_instance_type = "ml.c5.2xlarge"
train_instance_count = 1
toolkit = "coach"
toolkit_version = "0.11"
framework = "tensorflow"
job_name = "rl-kf-pipeline-objecttracker" + "".join(
random.choice(string.ascii_lowercase) for i in range(10)
)
max_run = 300
s3_prefix = "rl-object-tracker-sagemaker-201123-042019"
hyperparameters = {
"s3_bucket": sources_bucket,
"s3_prefix": s3_prefix,
"aws_region": "us-east-1",
"RLCOACH_PRESET": "object_tracker",
}
role = "your_sagemaker_role_name"
security_groups = ["sg-0490601e83f220e82"]
subnets = [
"subnet-0efc73526db16a4a4",
"subnet-0b8af626f39e7d462",
]
# Simulation Job Inputs
output_bucket = "kf-pipelines-robomaker-output"
output_key = "test-output-key"
@dsl.pipeline(
name="SageMaker & RoboMaker pipeline",
description="SageMaker & RoboMaker Reinforcement Learning job where the jobs work together to train an RL model",
)
def sagemaker_robomaker_rl_job(
region=region,
role=role,
name=simulation_app_name,
sources=sources,
simulation_software_name=simulation_software_name,
simulation_software_version=simulation_software_version,
robot_software_name=robot_software_name,
robot_software_version=robot_software_version,
rendering_engine_name=rendering_engine_name,
rendering_engine_version=rendering_engine_version,
output_bucket=output_bucket,
robomaker_output_path=output_key,
vpc_security_group_ids=security_groups,
vpc_subnets=subnets,
entry_point=entry_point,
source_dir=source_dir,
toolkit=toolkit,
toolkit_version=toolkit_version,
framework=framework,
assume_role=role,
instance_type=train_instance_type,
instance_count=train_instance_count,
output_path=rl_output_path,
job_name=job_name,
metric_definitions=metric_definitions,
max_run=max_run,
hyperparameters=hyperparameters,
sources_bucket=sources_bucket,
s3_prefix=s3_prefix,
):
robomaker_create_sim_app = robomaker_create_sim_app_op(
region=region,
app_name=name,
sources=sources,
simulation_software_name=simulation_software_name,
simulation_software_version=simulation_software_version,
robot_software_name=robot_software_name,
robot_software_version=robot_software_version,
rendering_engine_name=rendering_engine_name,
rendering_engine_version=rendering_engine_version,
)
# .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
rlestimator_training_toolkit_coach = sagemaker_rlestimator_op(
region=region,
entry_point=entry_point,
source_dir=source_dir,
toolkit=toolkit,
toolkit_version=toolkit_version,
framework=framework,
role=assume_role,
instance_type=instance_type,
instance_count=instance_count,
model_artifact_path=output_path,
job_name=job_name,
max_run=max_run,
hyperparameters=hyperparameters,
metric_definitions=metric_definitions,
vpc_subnets=vpc_subnets,
vpc_security_group_ids=vpc_security_group_ids,
)
# .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
robomaker_simulation_job = robomaker_sim_job_op(
region=region,
role=role,
output_bucket=output_bucket,
output_path=robomaker_output_path,
max_run=3800,
failure_behavior="Continue",
sim_app_arn=robomaker_create_sim_app.outputs["arn"],
sim_app_launch_config={
"packageName": "object_tracker_simulation",
"launchFile": "evaluation.launch",
"environmentVariables": {
"MODEL_S3_BUCKET": sources_bucket,
"MODEL_S3_PREFIX": s3_prefix,
"ROS_AWS_REGION": region,
"NUMBER_OF_ROLLOUT_WORKERS": "1",
"MARKOV_PRESET_FILE": "object_tracker.py",
},
"streamUI": True,
},
vpc_security_group_ids=vpc_security_group_ids,
vpc_subnets=vpc_subnets,
use_public_ip="True",
)
# .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
robomaker_delete_sim_app = robomaker_delete_sim_app_op(
region=region, arn=robomaker_create_sim_app.outputs["arn"],
).after(robomaker_simulation_job, robomaker_create_sim_app)
# .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
if __name__ == "__main__":
kfp.compiler.Compiler().compile(sagemaker_robomaker_rl_job, __file__ + ".zip")