feat(components) Adds RoboMaker and SageMaker RLEstimator components (#4813)
* Adds RoboMaker and SageMaker RLEstimator components * Genericise samples * Genericise samples * Adds better logging and updates shim component in samples * Adds fixes for PR comments. Updates tests accordingly * Adds docker image reference for integration tests. Allows for setting job_name for RLEstimator training jobs * Separate RM and SM execution roles * Remove README reference to VPC config items * Adds more reliable integration test for RoboMaker Simulation Job * Simplifies integration tests * Reverted test container entrypoints * Update black formatting * Update components for redbackthomson repo * Prefix RLEstimator job name * Add RoboMakerFullAccess to generated roles * Update version to official 1.1.0 * Formatting int test file * Add PassRole IAM permission to OIDC * Adds ROBOMAKER_EXECUTION_ROLE_ARN to build vars Co-authored-by: Nicholas Thomson <nithomso@amazon.com>
This commit is contained in:
parent
cab66700dc
commit
4aa11c3c7f
|
|
@ -4,6 +4,12 @@ The version of the AWS SageMaker Components is determined by the docker image ta
|
|||
Repository: https://hub.docker.com/repository/docker/amazon/aws-sagemaker-kfp-components
|
||||
|
||||
---------------------------------------------
|
||||
**Change log for version 1.1.0**
|
||||
- Add SageMaker RLEstimator component
|
||||
- Add RoboMaker create/delete simulation application, cerate simulation job components
|
||||
|
||||
> Pull requests : [#4813](https://github.com/kubeflow/pipelines/pull/4813/)
|
||||
|
||||
**Change log for version 1.0.0**
|
||||
- First release to guarantee backward compatibility within major version
|
||||
- Internally refactored components
|
||||
|
|
|
|||
|
|
@ -18,6 +18,10 @@ There is no additional charge for using Amazon SageMaker Components for Kubeflow
|
|||
|
||||
The Training component allows you to submit Amazon SageMaker Training jobs directly from a Kubeflow Pipelines workflow. For more information, see [SageMaker Training Kubeflow Pipelines component](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/train).
|
||||
|
||||
#### RLEstimator
|
||||
|
||||
The RLEstimator component allows you to submit RLEstimator (Reinforcement Learning) SageMaker Training jobs directly from a Kubeflow Pipelines workflow. For more information, see [SageMaker RLEstimator Kubeflow Pipelines component](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/rlestimator).
|
||||
|
||||
#### Hyperparameter Optimization
|
||||
|
||||
The Hyperparameter Optimization component enables you to submit hyperparameter tuning jobs to Amazon SageMaker directly from a Kubeflow Pipelines workflow. For more information, see [SageMaker Hyperparameter Optimization Kubeflow Pipeline component](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/hyperparameter_tuning).
|
||||
|
|
@ -49,3 +53,20 @@ The Ground Truth component enables you to to submit Amazon SageMaker Ground Trut
|
|||
The Workteam component enables you to create Amazon SageMaker private workteam jobs directly from a Kubeflow Pipelines workflow. For more information, see [SageMaker create private workteam Kubeflow Pipelines component](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/workteam).
|
||||
|
||||
|
||||
### RoboMaker components
|
||||
|
||||
#### Create Simulation Application
|
||||
|
||||
The Create Simulation Application component allows you to create a RoboMaker Simulation Application directly from a Kubeflow Pipelines workflow. For more information, see [RoboMaker Create Simulation app Kubeflow Pipelines component](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/create_simulation_app).
|
||||
|
||||
#### Simulation Job
|
||||
|
||||
The Simulation Job component allows you to run a RoboMaker Simulation Job directly from a Kubeflow Pipelines workflow. For more information, see [RoboMaker Simulation Job Kubeflow Pipelines component](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/simulation_job).
|
||||
|
||||
#### Simulation Job Batch
|
||||
|
||||
The Simulation Job Batch component allows you to run a RoboMaker Simulation Job Batch directly from a Kubeflow Pipelines workflow. For more information, see [RoboMaker Simulation Job Batch Kubeflow Pipelines component](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/simulation_job_batch).
|
||||
|
||||
#### Delete Simulation Application
|
||||
|
||||
The Delete Simulation Application component allows you to delete a RoboMaker Simulation Application directly from a Kubeflow Pipelines workflow. For more information, see [RoboMaker Delete Simulation app Kubeflow Pipelines component](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/delete_simulation_app).
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
** Amazon SageMaker Components for Kubeflow Pipelines; version 1.0.0 --
|
||||
** Amazon SageMaker Components for Kubeflow Pipelines; version 1.1.0 --
|
||||
https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker
|
||||
Copyright 2019-2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
|
||||
** boto3; version 1.14.12 -- https://github.com/boto/boto3/
|
||||
|
|
|
|||
|
|
@ -56,7 +56,7 @@ outputs:
|
|||
- {name: output_location, description: S3 URI of the transform job results.}
|
||||
implementation:
|
||||
container:
|
||||
image: amazon/aws-sagemaker-kfp-components:1.0.0
|
||||
image: amazon/aws-sagemaker-kfp-components:1.1.0
|
||||
command: [python3]
|
||||
args:
|
||||
- batch_transform/src/sagemaker_transform_component.py
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@ version: 0.2
|
|||
|
||||
env:
|
||||
variables:
|
||||
CONTAINER_VARIABLES: "AWS_CONTAINER_CREDENTIALS_RELATIVE_URI EKS_PRIVATE_SUBNETS EKS_PUBLIC_SUBNETS PYTEST_MARKER PYTEST_ADDOPTS S3_DATA_BUCKET EKS_EXISTING_CLUSTER SAGEMAKER_EXECUTION_ROLE_ARN REGION SKIP_FSX_TESTS"
|
||||
CONTAINER_VARIABLES: "AWS_CONTAINER_CREDENTIALS_RELATIVE_URI EKS_PRIVATE_SUBNETS EKS_PUBLIC_SUBNETS PYTEST_MARKER PYTEST_ADDOPTS S3_DATA_BUCKET EKS_EXISTING_CLUSTER SAGEMAKER_EXECUTION_ROLE_ARN REGION SKIP_FSX_TESTS ROBOMAKER_EXECUTION_ROLE_ARN"
|
||||
|
||||
phases:
|
||||
pre_build:
|
||||
|
|
|
|||
|
|
@ -20,6 +20,7 @@ from botocore.credentials import (
|
|||
JSONFileCache,
|
||||
)
|
||||
from botocore.session import Session as BotocoreSession
|
||||
from sagemaker.session import Session as SageMakerSession
|
||||
|
||||
|
||||
class Boto3Manager(object):
|
||||
|
|
@ -45,7 +46,7 @@ class Boto3Manager(object):
|
|||
@staticmethod
|
||||
def _get_boto3_session(
|
||||
region: str, role_arn: str = None, assume_duration: int = 3600
|
||||
):
|
||||
) -> Session:
|
||||
"""Creates a boto3 session, optionally assuming a role.
|
||||
|
||||
Args:
|
||||
|
|
@ -112,6 +113,64 @@ class Boto3Manager(object):
|
|||
)
|
||||
return client
|
||||
|
||||
@staticmethod
|
||||
def get_sagemaker_session(
|
||||
component_version: str,
|
||||
region: str,
|
||||
endpoint_url: str = None,
|
||||
assume_role_arn: str = None,
|
||||
):
|
||||
"""Builds a SageMaker Session which can be used by any Estimator.
|
||||
|
||||
Args:
|
||||
component_version: The version of the component to include in
|
||||
the user agent.
|
||||
region: The AWS region for the SageMaker client and SageMaker Session.
|
||||
endpoint_url: A private link endpoint for SageMaker.
|
||||
assume_role_arn: The ARN of a role for the boto3 client to assume.
|
||||
|
||||
Returns:
|
||||
object: A SageMaker boto3 session.
|
||||
"""
|
||||
return SageMakerSession(
|
||||
boto_session=Boto3Manager._get_boto3_session(region, assume_role_arn),
|
||||
sagemaker_client=Boto3Manager.get_sagemaker_client(
|
||||
component_version, region, endpoint_url, assume_role_arn
|
||||
),
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def get_robomaker_client(
|
||||
component_version: str,
|
||||
region: str,
|
||||
endpoint_url: str = None,
|
||||
assume_role_arn: str = None,
|
||||
):
|
||||
"""Builds a client to the AWS RoboMaker API.
|
||||
|
||||
Args:
|
||||
component_version: The version of the component to include in
|
||||
the user agent.
|
||||
region: The AWS region for the RoboMaker client.
|
||||
endpoint_url: A private link endpoint for RoboMaker.
|
||||
assume_role_arn: The ARN of a role for the boto3 client to assume.
|
||||
|
||||
Returns:
|
||||
object: A RoboMaker boto3 client.
|
||||
"""
|
||||
session = Boto3Manager._get_boto3_session(region, assume_role_arn)
|
||||
session_config = Config(
|
||||
user_agent=f"sagemaker-on-kubeflow-pipelines-v{component_version}",
|
||||
retries={"max_attempts": 10, "mode": "standard"},
|
||||
)
|
||||
client = session.client(
|
||||
"robomaker",
|
||||
region_name=region,
|
||||
endpoint_url=endpoint_url,
|
||||
config=session_config,
|
||||
)
|
||||
return client
|
||||
|
||||
@staticmethod
|
||||
def get_cloudwatch_client(region: str, assume_role_arn: str = None):
|
||||
"""Builds a client to the AWS CloudWatch API.
|
||||
|
|
|
|||
|
|
@ -20,11 +20,16 @@ import common.sagemaker_component as component_module
|
|||
|
||||
COMPONENT_DIRECTORIES = [
|
||||
"batch_transform",
|
||||
"create_simulation_app",
|
||||
"delete_simulation_app",
|
||||
"deploy",
|
||||
"ground_truth",
|
||||
"hyperparameter_tuning",
|
||||
"model",
|
||||
"process",
|
||||
"rlestimator",
|
||||
"simulation_job",
|
||||
"simulation_job_batch",
|
||||
"train",
|
||||
"workteam",
|
||||
]
|
||||
|
|
|
|||
|
|
@ -18,6 +18,7 @@ import signal
|
|||
import string
|
||||
import logging
|
||||
import json
|
||||
from enum import Enum, auto
|
||||
from types import FunctionType
|
||||
import yaml
|
||||
import random
|
||||
|
|
@ -76,6 +77,25 @@ class SageMakerJobStatus(NamedTuple):
|
|||
error_message: Optional[str] = None
|
||||
|
||||
|
||||
class DebugRulesStatus(Enum):
|
||||
COMPLETED = auto()
|
||||
ERRORED = auto()
|
||||
INPROGRESS = auto()
|
||||
|
||||
@classmethod
|
||||
def from_describe(cls, response):
|
||||
has_error = False
|
||||
for debug_rule in response["DebugRuleEvaluationStatuses"]:
|
||||
if debug_rule["RuleEvaluationStatus"] == "Error":
|
||||
has_error = True
|
||||
if debug_rule["RuleEvaluationStatus"] == "InProgress":
|
||||
return DebugRulesStatus.INPROGRESS
|
||||
if has_error:
|
||||
return DebugRulesStatus.ERRORED
|
||||
else:
|
||||
return DebugRulesStatus.COMPLETED
|
||||
|
||||
|
||||
class SageMakerComponent:
|
||||
"""Base class for a KFP SageMaker component.
|
||||
|
||||
|
|
|
|||
|
|
@ -44,16 +44,16 @@ class SpecInputParsers:
|
|||
def yaml_or_json_list(value):
|
||||
"""Parses a YAML or JSON list to a Python list."""
|
||||
parsed = SpecInputParsers._yaml_or_json_str(value)
|
||||
if not isinstance(parsed, List):
|
||||
raise ArgumentTypeError(f"{value} is not a list")
|
||||
if parsed is not None and not isinstance(parsed, List):
|
||||
raise ArgumentTypeError(f"{value} (type {type(value)}) is not a list")
|
||||
return parsed
|
||||
|
||||
@staticmethod
|
||||
def yaml_or_json_dict(value):
|
||||
"""Parses a YAML or JSON dictionary to a Python dictionary."""
|
||||
parsed = SpecInputParsers._yaml_or_json_str(value)
|
||||
if not isinstance(parsed, Dict):
|
||||
raise ArgumentTypeError(f"{value} is not a dictionary")
|
||||
if parsed is not None and not isinstance(parsed, Dict):
|
||||
raise ArgumentTypeError(f"{value} (type {type(value)}) is not a dictionary")
|
||||
return parsed
|
||||
|
||||
@staticmethod
|
||||
|
|
|
|||
|
|
@ -0,0 +1,15 @@
|
|||
name: ''
|
||||
sources:
|
||||
s3Bucket:
|
||||
s3Key:
|
||||
architecture:
|
||||
simulationSoftwareSuite:
|
||||
name: ''
|
||||
version: ''
|
||||
robotSoftwareSuite:
|
||||
name: ''
|
||||
version: ''
|
||||
renderingEngine:
|
||||
name: ''
|
||||
version: ''
|
||||
tags: {}
|
||||
|
|
@ -0,0 +1,2 @@
|
|||
application: ''
|
||||
applicationVersion: ''
|
||||
|
|
@ -0,0 +1,5 @@
|
|||
batchPolicy:
|
||||
timeoutInSeconds:
|
||||
maxConcurrency:
|
||||
createSimulationJobRequests: []
|
||||
tags: {}
|
||||
|
|
@ -0,0 +1,18 @@
|
|||
outputLocation:
|
||||
s3Bucket:
|
||||
s3Prefix:
|
||||
loggingConfig:
|
||||
recordAllRosTopics: False
|
||||
maxJobDurationInSeconds: 28800
|
||||
iamRole: ''
|
||||
failureBehavior: 'Fail'
|
||||
robotApplications: []
|
||||
simulationApplications: []
|
||||
dataSources: []
|
||||
vpcConfig:
|
||||
subnets: []
|
||||
securityGroups: []
|
||||
assignPublicIp: False
|
||||
compute:
|
||||
simulationUnitLimit: 15
|
||||
tags: {}
|
||||
|
|
@ -0,0 +1,47 @@
|
|||
# RoboMaker Create Simulation Application Kubeflow Pipelines component
|
||||
|
||||
## Summary
|
||||
Component to create RoboMaker Simulation Application's from a Kubeflow Pipelines workflow.
|
||||
https://docs.aws.amazon.com/robomaker/latest/dg/create-simulation-application.html
|
||||
|
||||
## Intended Use
|
||||
For running your simulation workloads using AWS RoboMaker.
|
||||
|
||||
## Runtime Arguments
|
||||
Argument | Description | Optional | Data type | Accepted values | Default |
|
||||
:--- | :---------- | :----------| :----------| :---------- | :----------|
|
||||
region | The region where the cluster launches | No | String | | |
|
||||
endpoint_url | The endpoint URL for the private link VPC endpoint | Yes | String | | |
|
||||
assume_role | The ARN of an IAM role to assume when connecting to SageMaker | Yes | String | | |
|
||||
app_name | The name of the simulation application. Must be unique within the same AWS account and AWS region | Yes | String | | SimulationApplication-[datetime]-[random id]|
|
||||
role | The Amazon Resource Name (ARN) that Amazon RoboMaker assumes to perform tasks on your behalf | No | String | | |
|
||||
sources | The code sources of the simulation application | No | String | | |
|
||||
simulation_software_name | The simulation software used by the simulation application | No | Dict | | {} |
|
||||
simulation_software_version | The simulation software version used by the simulation application | No | Dict | | {} |
|
||||
robot_software_name | The robot software (ROS distribution) used by the simulation application | No | Dict | | {} |
|
||||
robot_software_version | The robot software version (ROS distribution) used by the simulation application | No | Dict | | {} |
|
||||
rendering_engine_name | The rendering engine for the simulation application | Yes | Dict | | {} |
|
||||
rendering_engine_version | The rendering engine version for the simulation application | Yes | Dict | | {} |
|
||||
tags | Key-value pairs to categorize AWS resources | Yes | Dict | | {} |
|
||||
|
||||
Notes:
|
||||
* This component should to be ran as a precursor to the RoboMaker [`Simulation Job component`](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/simulation_job/README.md)
|
||||
* The format for the [`sources`](https://docs.aws.amazon.com/robomaker/latest/dg/API_SourceConfig.html) field is:
|
||||
```
|
||||
[
|
||||
{
|
||||
"s3Bucket": "string",
|
||||
"s3Key": "string",
|
||||
"architecture": "string",
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
## Output
|
||||
The ARN of the created Simulation Application. This can be passed as an input to other components such as RoboMaker Simulation Job.
|
||||
|
||||
# Example code
|
||||
Example of creating a Sim app, then a Sim job and finally deleting the Sim app : [robomaker_simulation_job_app](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/robomaker_simulation/robomaker_simulation_job_app.py)
|
||||
|
||||
# Resources
|
||||
* [Create RoboMaker Simulation Application via Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/robomaker.html#RoboMaker.Client.create_simulation_application)
|
||||
|
|
@ -0,0 +1,69 @@
|
|||
name: RoboMaker - Create Simulation Application
|
||||
description: Creates a simulation application.
|
||||
inputs:
|
||||
- {name: region, type: String, description: The region for the SageMaker resource.}
|
||||
- {name: endpoint_url, type: String, description: The URL to use when communicating
|
||||
with the SageMaker service., default: ''}
|
||||
- {name: assume_role, type: String, description: The ARN of an IAM role to assume
|
||||
when connecting to SageMaker., default: ''}
|
||||
- {name: tags, type: JsonObject, description: 'An array of key-value pairs, to categorize
|
||||
AWS resources.', default: '{}'}
|
||||
- {name: app_name, type: String, description: The name of the simulation application.,
|
||||
default: ''}
|
||||
- {name: sources, type: JsonArray, description: The code sources of the simulation
|
||||
application., default: '{}'}
|
||||
- {name: simulation_software_name, type: String, description: The simulation software
|
||||
used by the simulation application., default: ''}
|
||||
- {name: simulation_software_version, type: String, description: The simulation software
|
||||
version used by the simulation application., default: ''}
|
||||
- {name: robot_software_name, type: String, description: The robot software used by
|
||||
the simulation application., default: ''}
|
||||
- {name: robot_software_version, type: String, description: The robot software version
|
||||
used by the simulation application., default: ''}
|
||||
- {name: rendering_engine_name, type: String, description: The rendering engine for
|
||||
the simulation application., default: ''}
|
||||
- {name: rendering_engine_version, type: String, description: The rendering engine
|
||||
version for the simulation application., default: ''}
|
||||
outputs:
|
||||
- {name: arn, description: The Amazon Resource Name (ARN) of the simulation application.}
|
||||
- {name: app_name, description: The name of the simulation application.}
|
||||
- {name: version, description: The version of the simulation application.}
|
||||
- {name: revision_id, description: The revision id of the simulation application.}
|
||||
implementation:
|
||||
container:
|
||||
image: amazon/aws-sagemaker-kfp-components:1.1.0
|
||||
command: [python3]
|
||||
args:
|
||||
- create_simulation_app/src/robomaker_create_simulation_app_component.py
|
||||
- --region
|
||||
- {inputValue: region}
|
||||
- --endpoint_url
|
||||
- {inputValue: endpoint_url}
|
||||
- --assume_role
|
||||
- {inputValue: assume_role}
|
||||
- --tags
|
||||
- {inputValue: tags}
|
||||
- --app_name
|
||||
- {inputValue: app_name}
|
||||
- --sources
|
||||
- {inputValue: sources}
|
||||
- --simulation_software_name
|
||||
- {inputValue: simulation_software_name}
|
||||
- --simulation_software_version
|
||||
- {inputValue: simulation_software_version}
|
||||
- --robot_software_name
|
||||
- {inputValue: robot_software_name}
|
||||
- --robot_software_version
|
||||
- {inputValue: robot_software_version}
|
||||
- --rendering_engine_name
|
||||
- {inputValue: rendering_engine_name}
|
||||
- --rendering_engine_version
|
||||
- {inputValue: rendering_engine_version}
|
||||
- --arn_output_path
|
||||
- {outputPath: arn}
|
||||
- --app_name_output_path
|
||||
- {outputPath: app_name}
|
||||
- --version_output_path
|
||||
- {outputPath: version}
|
||||
- --revision_id_output_path
|
||||
- {outputPath: revision_id}
|
||||
|
|
@ -0,0 +1,160 @@
|
|||
"""RoboMaker component for creating a simulation application."""
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import logging
|
||||
from typing import Dict
|
||||
from create_simulation_app.src.robomaker_create_simulation_app_spec import (
|
||||
RoboMakerCreateSimulationAppSpec,
|
||||
RoboMakerCreateSimulationAppInputs,
|
||||
RoboMakerCreateSimulationAppOutputs,
|
||||
)
|
||||
from common.sagemaker_component import (
|
||||
SageMakerComponent,
|
||||
ComponentMetadata,
|
||||
SageMakerJobStatus,
|
||||
)
|
||||
from common.boto3_manager import Boto3Manager
|
||||
from common.common_inputs import SageMakerComponentCommonInputs
|
||||
|
||||
|
||||
@ComponentMetadata(
|
||||
name="RoboMaker - Create Simulation Application",
|
||||
description="Creates a simulation application.",
|
||||
spec=RoboMakerCreateSimulationAppSpec,
|
||||
)
|
||||
class RoboMakerCreateSimulationAppComponent(SageMakerComponent):
|
||||
"""RoboMaker component for creating a simulation application."""
|
||||
|
||||
def Do(self, spec: RoboMakerCreateSimulationAppSpec):
|
||||
self._app_name = (
|
||||
spec.inputs.app_name
|
||||
if spec.inputs.app_name
|
||||
else RoboMakerCreateSimulationAppComponent._generate_unique_timestamped_id(
|
||||
prefix="SimulationApplication"
|
||||
)
|
||||
)
|
||||
super().Do(spec.inputs, spec.outputs, spec.output_paths)
|
||||
|
||||
def _get_job_status(self) -> SageMakerJobStatus:
|
||||
try:
|
||||
response = self._rm_client.describe_simulation_application(
|
||||
application=self._arn
|
||||
)
|
||||
status = response["arn"]
|
||||
|
||||
if status is not None:
|
||||
return SageMakerJobStatus(is_completed=True, raw_status=status)
|
||||
else:
|
||||
return SageMakerJobStatus(
|
||||
is_completed=True,
|
||||
has_error=True,
|
||||
error_message="No ARN present",
|
||||
raw_status=status,
|
||||
)
|
||||
except Exception as ex:
|
||||
return SageMakerJobStatus(
|
||||
is_completed=True,
|
||||
has_error=True,
|
||||
error_message=str(ex),
|
||||
raw_status=str(ex),
|
||||
)
|
||||
|
||||
def _configure_aws_clients(self, inputs: SageMakerComponentCommonInputs):
|
||||
"""Configures the internal AWS clients for the component.
|
||||
|
||||
Args:
|
||||
inputs: A populated list of user inputs.
|
||||
"""
|
||||
self._rm_client = Boto3Manager.get_robomaker_client(
|
||||
self._get_component_version(),
|
||||
inputs.region,
|
||||
endpoint_url=inputs.endpoint_url,
|
||||
assume_role_arn=inputs.assume_role,
|
||||
)
|
||||
self._cw_client = Boto3Manager.get_cloudwatch_client(
|
||||
inputs.region, assume_role_arn=inputs.assume_role
|
||||
)
|
||||
|
||||
def _after_job_complete(
|
||||
self,
|
||||
job: Dict,
|
||||
request: Dict,
|
||||
inputs: RoboMakerCreateSimulationAppInputs,
|
||||
outputs: RoboMakerCreateSimulationAppOutputs,
|
||||
):
|
||||
outputs.app_name = self._app_name
|
||||
outputs.arn = job["arn"]
|
||||
outputs.version = job["version"]
|
||||
outputs.revision_id = job["revisionId"]
|
||||
logging.info(
|
||||
"Simulation Application in RoboMaker: https://{}.console.aws.amazon.com/robomaker/home?region={}#/simulationApplications/{}".format(
|
||||
inputs.region, inputs.region, str(outputs.arn).split("/", 1)[1]
|
||||
)
|
||||
)
|
||||
|
||||
def _on_job_terminated(self):
|
||||
self._rm_client.delete_simulation_application(application=self._arn)
|
||||
|
||||
def _create_job_request(
|
||||
self,
|
||||
inputs: RoboMakerCreateSimulationAppInputs,
|
||||
outputs: RoboMakerCreateSimulationAppOutputs,
|
||||
) -> Dict:
|
||||
"""
|
||||
Documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/robomaker.html#RoboMaker.Client.create_simulation_application
|
||||
"""
|
||||
request = self._get_request_template("robomaker.create.simulation.app")
|
||||
|
||||
request["name"] = self._app_name
|
||||
request["sources"] = inputs.sources
|
||||
request["simulationSoftwareSuite"]["name"] = inputs.simulation_software_name
|
||||
request["simulationSoftwareSuite"][
|
||||
"version"
|
||||
] = inputs.simulation_software_version
|
||||
request["robotSoftwareSuite"]["name"] = inputs.robot_software_name
|
||||
request["robotSoftwareSuite"]["version"] = inputs.robot_software_version
|
||||
|
||||
if inputs.rendering_engine_name:
|
||||
request["renderingEngine"]["name"] = inputs.rendering_engine_name
|
||||
request["renderingEngine"]["version"] = inputs.rendering_engine_version
|
||||
else:
|
||||
request.pop("renderingEngine")
|
||||
|
||||
return request
|
||||
|
||||
def _submit_job_request(self, request: Dict) -> Dict:
|
||||
return self._rm_client.create_simulation_application(**request)
|
||||
|
||||
def _after_submit_job_request(
|
||||
self,
|
||||
job: Dict,
|
||||
request: Dict,
|
||||
inputs: RoboMakerCreateSimulationAppInputs,
|
||||
outputs: RoboMakerCreateSimulationAppOutputs,
|
||||
):
|
||||
outputs.arn = self._arn = job["arn"]
|
||||
logging.info(
|
||||
f"Created Robomaker Simulation Application with name: {self._app_name}"
|
||||
)
|
||||
|
||||
def _print_logs_for_job(self):
|
||||
pass
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import sys
|
||||
|
||||
spec = RoboMakerCreateSimulationAppSpec(sys.argv[1:])
|
||||
|
||||
component = RoboMakerCreateSimulationAppComponent()
|
||||
component.Do(spec)
|
||||
|
|
@ -0,0 +1,141 @@
|
|||
"""Specification for the RoboMaker create simulation application component."""
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from dataclasses import dataclass
|
||||
|
||||
from typing import List
|
||||
from common.sagemaker_component_spec import SageMakerComponentSpec
|
||||
from common.spec_input_parsers import SpecInputParsers
|
||||
from common.common_inputs import (
|
||||
COMMON_INPUTS,
|
||||
SageMakerComponentCommonInputs,
|
||||
SageMakerComponentInput as Input,
|
||||
SageMakerComponentOutput as Output,
|
||||
SageMakerComponentBaseOutputs,
|
||||
SageMakerComponentInputValidator as InputValidator,
|
||||
SageMakerComponentOutputValidator as OutputValidator,
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class RoboMakerCreateSimulationAppInputs(SageMakerComponentCommonInputs):
|
||||
"""Defines the set of inputs for the create simulation application component."""
|
||||
|
||||
app_name: Input
|
||||
sources: Input
|
||||
simulation_software_name: Input
|
||||
simulation_software_version: Input
|
||||
robot_software_name: Input
|
||||
robot_software_version: Input
|
||||
rendering_engine_name: Input
|
||||
rendering_engine_version: Input
|
||||
|
||||
|
||||
@dataclass
|
||||
class RoboMakerCreateSimulationAppOutputs(SageMakerComponentBaseOutputs):
|
||||
"""Defines the set of outputs for the create simulation application component."""
|
||||
|
||||
arn: Output
|
||||
app_name: Output
|
||||
version: Output
|
||||
revision_id: Output
|
||||
|
||||
|
||||
class RoboMakerCreateSimulationAppSpec(
|
||||
SageMakerComponentSpec[
|
||||
RoboMakerCreateSimulationAppInputs, RoboMakerCreateSimulationAppOutputs
|
||||
]
|
||||
):
|
||||
INPUTS: RoboMakerCreateSimulationAppInputs = RoboMakerCreateSimulationAppInputs(
|
||||
app_name=InputValidator(
|
||||
input_type=str,
|
||||
required=True,
|
||||
description="The name of the simulation application.",
|
||||
default="",
|
||||
),
|
||||
sources=InputValidator(
|
||||
input_type=SpecInputParsers.yaml_or_json_list,
|
||||
required=True,
|
||||
description="The code sources of the simulation application.",
|
||||
default={},
|
||||
),
|
||||
simulation_software_name=InputValidator(
|
||||
input_type=str,
|
||||
required=True,
|
||||
description="The simulation software used by the simulation application.",
|
||||
default="",
|
||||
),
|
||||
simulation_software_version=InputValidator(
|
||||
input_type=str,
|
||||
required=True,
|
||||
description="The simulation software version used by the simulation application.",
|
||||
default="",
|
||||
),
|
||||
robot_software_name=InputValidator(
|
||||
input_type=str,
|
||||
required=True,
|
||||
description="The robot software used by the simulation application.",
|
||||
default="",
|
||||
),
|
||||
robot_software_version=InputValidator(
|
||||
input_type=str,
|
||||
required=True,
|
||||
description="The robot software version used by the simulation application.",
|
||||
default="",
|
||||
),
|
||||
rendering_engine_name=InputValidator(
|
||||
input_type=str,
|
||||
required=False,
|
||||
description="The rendering engine for the simulation application.",
|
||||
default="",
|
||||
),
|
||||
rendering_engine_version=InputValidator(
|
||||
input_type=str,
|
||||
required=False,
|
||||
description="The rendering engine version for the simulation application.",
|
||||
default="",
|
||||
),
|
||||
**vars(COMMON_INPUTS),
|
||||
)
|
||||
|
||||
OUTPUTS = RoboMakerCreateSimulationAppOutputs(
|
||||
arn=OutputValidator(
|
||||
description="The Amazon Resource Name (ARN) of the simulation application."
|
||||
),
|
||||
app_name=OutputValidator(description="The name of the simulation application."),
|
||||
version=OutputValidator(
|
||||
description="The version of the simulation application."
|
||||
),
|
||||
revision_id=OutputValidator(
|
||||
description="The revision id of the simulation application."
|
||||
),
|
||||
)
|
||||
|
||||
def __init__(self, arguments: List[str]):
|
||||
super().__init__(
|
||||
arguments,
|
||||
RoboMakerCreateSimulationAppInputs,
|
||||
RoboMakerCreateSimulationAppOutputs,
|
||||
)
|
||||
|
||||
@property
|
||||
def inputs(self) -> RoboMakerCreateSimulationAppInputs:
|
||||
return self._inputs
|
||||
|
||||
@property
|
||||
def outputs(self) -> RoboMakerCreateSimulationAppOutputs:
|
||||
return self._outputs
|
||||
|
||||
@property
|
||||
def output_paths(self) -> RoboMakerCreateSimulationAppOutputs:
|
||||
return self._output_paths
|
||||
|
|
@ -0,0 +1,34 @@
|
|||
# RoboMaker Delete Simulation Application Kubeflow Pipelines component
|
||||
|
||||
## Summary
|
||||
Component to delete RoboMaker Simulation Application's from a Kubeflow Pipelines workflow.
|
||||
https://docs.aws.amazon.com/robomaker/latest/dg/API_DeleteSimulationApplication.html
|
||||
|
||||
## Intended Use
|
||||
For running your simulation workloads using AWS RoboMaker.
|
||||
|
||||
## Runtime Arguments
|
||||
Argument | Description | Optional | Data type | Accepted values | Default |
|
||||
:--- | :---------- | :----------| :----------| :---------- | :----------|
|
||||
region | The region where the cluster launches | No | String | | |
|
||||
endpoint_url | The endpoint URL for the private link VPC endpoint | Yes | String | | |
|
||||
assume_role | The ARN of an IAM role to assume when connecting to SageMaker | Yes | String | | |
|
||||
app_name | The name of the simulation application. Must be unique within the same AWS account and AWS region | Yes | String | | SimulationApplication-[datetime]-[random id]|
|
||||
role | The Amazon Resource Name (ARN) that Amazon RoboMaker assumes to perform tasks on your behalf | No | String | | |
|
||||
arn | The Amazon Resource Name (ARN) of the simulation application | No | String | | |
|
||||
version | The version of the simulation application | Yes | String | | |
|
||||
|
||||
Notes:
|
||||
* This component can be used to clean up any simulation apps that were created by other components such as the Create Simulation App component.
|
||||
* This component should to be ran as after the RoboMaker [`Simulation Job component`](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/simulation_job/README.md)
|
||||
* The format for the [`sources`](https://docs.aws.amazon.com/robomaker/latest/dg/API_SourceConfig.html) field is:
|
||||
|
||||
|
||||
## Output
|
||||
The ARN of the deleted Simulation Application.
|
||||
|
||||
# Example code
|
||||
Example of creating a Sim app, then a Sim job and finally deleting the Sim app : [robomaker_simulation_job_app](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/robomaker_simulation/robomaker_simulation_job_app.py)
|
||||
|
||||
# Resources
|
||||
* [Delete RoboMaker Simulation Application via Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/robomaker.html#RoboMaker.Client.delete_simulation_application)
|
||||
|
|
@ -0,0 +1,36 @@
|
|||
name: RoboMaker - Delete Simulation Application
|
||||
description: Delete a simulation application.
|
||||
inputs:
|
||||
- {name: region, type: String, description: The region for the SageMaker resource.}
|
||||
- {name: endpoint_url, type: String, description: The URL to use when communicating
|
||||
with the SageMaker service., default: ''}
|
||||
- {name: assume_role, type: String, description: The ARN of an IAM role to assume
|
||||
when connecting to SageMaker., default: ''}
|
||||
- {name: tags, type: JsonObject, description: 'An array of key-value pairs, to categorize
|
||||
AWS resources.', default: '{}'}
|
||||
- {name: arn, type: String, description: The Amazon Resource Name (ARN) of the simulation
|
||||
application., default: ''}
|
||||
- {name: version, type: String, description: The version of the simulation application.,
|
||||
default: ''}
|
||||
outputs:
|
||||
- {name: arn, description: The Amazon Resource Name (ARN) of the simulation application.}
|
||||
implementation:
|
||||
container:
|
||||
image: amazon/aws-sagemaker-kfp-components:1.1.0
|
||||
command: [python3]
|
||||
args:
|
||||
- delete_simulation_app/src/robomaker_delete_simulation_app_component.py
|
||||
- --region
|
||||
- {inputValue: region}
|
||||
- --endpoint_url
|
||||
- {inputValue: endpoint_url}
|
||||
- --assume_role
|
||||
- {inputValue: assume_role}
|
||||
- --tags
|
||||
- {inputValue: tags}
|
||||
- --arn
|
||||
- {inputValue: arn}
|
||||
- --version
|
||||
- {inputValue: version}
|
||||
- --arn_output_path
|
||||
- {outputPath: arn}
|
||||
|
|
@ -0,0 +1,127 @@
|
|||
"""RoboMaker component for deleting a simulation application."""
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import logging
|
||||
from typing import Dict
|
||||
from delete_simulation_app.src.robomaker_delete_simulation_app_spec import (
|
||||
RoboMakerDeleteSimulationAppSpec,
|
||||
RoboMakerDeleteSimulationAppInputs,
|
||||
RoboMakerDeleteSimulationAppOutputs,
|
||||
)
|
||||
from common.sagemaker_component import (
|
||||
SageMakerComponent,
|
||||
ComponentMetadata,
|
||||
SageMakerJobStatus,
|
||||
)
|
||||
from common.boto3_manager import Boto3Manager
|
||||
from common.common_inputs import SageMakerComponentCommonInputs
|
||||
|
||||
|
||||
@ComponentMetadata(
|
||||
name="RoboMaker - Delete Simulation Application",
|
||||
description="Delete a simulation application.",
|
||||
spec=RoboMakerDeleteSimulationAppSpec,
|
||||
)
|
||||
class RoboMakerDeleteSimulationAppComponent(SageMakerComponent):
|
||||
"""RoboMaker component for deleting a simulation application."""
|
||||
|
||||
def Do(self, spec: RoboMakerDeleteSimulationAppSpec):
|
||||
self._arn = spec.inputs.arn
|
||||
self._version = spec.inputs.version
|
||||
super().Do(spec.inputs, spec.outputs, spec.output_paths)
|
||||
|
||||
def _get_job_status(self) -> SageMakerJobStatus:
|
||||
try:
|
||||
response = self._rm_client.describe_simulation_application(
|
||||
application=self._arn
|
||||
)
|
||||
status = response["arn"]
|
||||
|
||||
if status is not None:
|
||||
return SageMakerJobStatus(is_completed=False, raw_status=status,)
|
||||
else:
|
||||
return SageMakerJobStatus(is_completed=True, raw_status="Item deleted")
|
||||
except Exception as ex:
|
||||
return SageMakerJobStatus(is_completed=True, raw_status=str(ex))
|
||||
|
||||
def _configure_aws_clients(self, inputs: SageMakerComponentCommonInputs):
|
||||
"""Configures the internal AWS clients for the component.
|
||||
|
||||
Args:
|
||||
inputs: A populated list of user inputs.
|
||||
"""
|
||||
self._rm_client = Boto3Manager.get_robomaker_client(
|
||||
self._get_component_version(),
|
||||
inputs.region,
|
||||
endpoint_url=inputs.endpoint_url,
|
||||
assume_role_arn=inputs.assume_role,
|
||||
)
|
||||
self._cw_client = Boto3Manager.get_cloudwatch_client(
|
||||
inputs.region, assume_role_arn=inputs.assume_role
|
||||
)
|
||||
|
||||
def _after_job_complete(
|
||||
self,
|
||||
job: Dict,
|
||||
request: Dict,
|
||||
inputs: RoboMakerDeleteSimulationAppInputs,
|
||||
outputs: RoboMakerDeleteSimulationAppOutputs,
|
||||
):
|
||||
outputs.arn = self._arn
|
||||
logging.info("Simulation Application {} has been deleted".format(outputs.arn))
|
||||
|
||||
def _on_job_terminated(self):
|
||||
logging.info("Simulation Application {} failed to delete".format(self._arn))
|
||||
|
||||
def _create_job_request(
|
||||
self,
|
||||
inputs: RoboMakerDeleteSimulationAppInputs,
|
||||
outputs: RoboMakerDeleteSimulationAppOutputs,
|
||||
) -> Dict:
|
||||
"""
|
||||
Documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/robomaker.html#RoboMaker.Client.delete_simulation_application
|
||||
"""
|
||||
request = self._get_request_template("robomaker.delete.simulation.app")
|
||||
request["application"] = self._arn
|
||||
|
||||
# If we have a version then use it, else remove it from request object
|
||||
if inputs.version:
|
||||
request["applicationVersion"] = inputs.version
|
||||
else:
|
||||
request.pop("applicationVersion")
|
||||
|
||||
return request
|
||||
|
||||
def _submit_job_request(self, request: Dict) -> Dict:
|
||||
return self._rm_client.delete_simulation_application(**request)
|
||||
|
||||
def _after_submit_job_request(
|
||||
self,
|
||||
job: Dict,
|
||||
request: Dict,
|
||||
inputs: RoboMakerDeleteSimulationAppInputs,
|
||||
outputs: RoboMakerDeleteSimulationAppOutputs,
|
||||
):
|
||||
logging.info(f"Deleted Robomaker Simulation Application with arn: {self._arn}")
|
||||
|
||||
def _print_logs_for_job(self):
|
||||
pass
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import sys
|
||||
|
||||
spec = RoboMakerDeleteSimulationAppSpec(sys.argv[1:])
|
||||
|
||||
component = RoboMakerDeleteSimulationAppComponent()
|
||||
component.Do(spec)
|
||||
|
|
@ -0,0 +1,88 @@
|
|||
"""Specification for the RoboMaker delete. simulation application component."""
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from dataclasses import dataclass
|
||||
|
||||
from typing import List
|
||||
from common.sagemaker_component_spec import SageMakerComponentSpec
|
||||
from common.common_inputs import (
|
||||
COMMON_INPUTS,
|
||||
SageMakerComponentCommonInputs,
|
||||
SageMakerComponentInput as Input,
|
||||
SageMakerComponentOutput as Output,
|
||||
SageMakerComponentBaseOutputs,
|
||||
SageMakerComponentInputValidator as InputValidator,
|
||||
SageMakerComponentOutputValidator as OutputValidator,
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class RoboMakerDeleteSimulationAppInputs(SageMakerComponentCommonInputs):
|
||||
"""Defines the set of inputs for the delete simulation application component."""
|
||||
|
||||
arn: Input
|
||||
version: Input
|
||||
|
||||
|
||||
@dataclass
|
||||
class RoboMakerDeleteSimulationAppOutputs(SageMakerComponentBaseOutputs):
|
||||
"""Defines the set of outputs for the create simulation application component."""
|
||||
|
||||
arn: Output
|
||||
|
||||
|
||||
class RoboMakerDeleteSimulationAppSpec(
|
||||
SageMakerComponentSpec[
|
||||
RoboMakerDeleteSimulationAppInputs, RoboMakerDeleteSimulationAppOutputs
|
||||
]
|
||||
):
|
||||
INPUTS: RoboMakerDeleteSimulationAppInputs = RoboMakerDeleteSimulationAppInputs(
|
||||
arn=InputValidator(
|
||||
input_type=str,
|
||||
required=True,
|
||||
description="The Amazon Resource Name (ARN) of the simulation application.",
|
||||
default="",
|
||||
),
|
||||
version=InputValidator(
|
||||
input_type=str,
|
||||
required=False,
|
||||
description="The version of the simulation application.",
|
||||
default=None,
|
||||
),
|
||||
**vars(COMMON_INPUTS),
|
||||
)
|
||||
|
||||
OUTPUTS = RoboMakerDeleteSimulationAppOutputs(
|
||||
arn=OutputValidator(
|
||||
description="The Amazon Resource Name (ARN) of the simulation application."
|
||||
),
|
||||
)
|
||||
|
||||
def __init__(self, arguments: List[str]):
|
||||
super().__init__(
|
||||
arguments,
|
||||
RoboMakerDeleteSimulationAppInputs,
|
||||
RoboMakerDeleteSimulationAppOutputs,
|
||||
)
|
||||
|
||||
@property
|
||||
def inputs(self) -> RoboMakerDeleteSimulationAppInputs:
|
||||
return self._inputs
|
||||
|
||||
@property
|
||||
def outputs(self) -> RoboMakerDeleteSimulationAppOutputs:
|
||||
return self._outputs
|
||||
|
||||
@property
|
||||
def output_paths(self) -> RoboMakerDeleteSimulationAppOutputs:
|
||||
return self._output_paths
|
||||
|
|
@ -64,7 +64,7 @@ outputs:
|
|||
- {name: endpoint_name, description: The created endpoint name.}
|
||||
implementation:
|
||||
container:
|
||||
image: amazon/aws-sagemaker-kfp-components:1.0.0
|
||||
image: amazon/aws-sagemaker-kfp-components:1.1.0
|
||||
command: [python3]
|
||||
args:
|
||||
- deploy/src/sagemaker_deploy_component.py
|
||||
|
|
|
|||
|
|
@ -79,7 +79,7 @@ outputs:
|
|||
SageMaker model trained as part of automated data labeling.}
|
||||
implementation:
|
||||
container:
|
||||
image: amazon/aws-sagemaker-kfp-components:1.0.0
|
||||
image: amazon/aws-sagemaker-kfp-components:1.1.0
|
||||
command: [python3]
|
||||
args:
|
||||
- ground_truth/src/sagemaker_ground_truth_component.py
|
||||
|
|
|
|||
|
|
@ -98,7 +98,7 @@ outputs:
|
|||
the training algorithm.}
|
||||
implementation:
|
||||
container:
|
||||
image: amazon/aws-sagemaker-kfp-components:1.0.0
|
||||
image: amazon/aws-sagemaker-kfp-components:1.1.0
|
||||
command: [python3]
|
||||
args:
|
||||
- hyperparameter_tuning/src/sagemaker_tuning_component.py
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ outputs:
|
|||
- {name: model_name, description: The name of the model created by SageMaker.}
|
||||
implementation:
|
||||
container:
|
||||
image: amazon/aws-sagemaker-kfp-components:1.0.0
|
||||
image: amazon/aws-sagemaker-kfp-components:1.1.0
|
||||
command: [python3]
|
||||
args:
|
||||
- model/src/sagemaker_model_component.py
|
||||
|
|
|
|||
|
|
@ -57,7 +57,7 @@ outputs:
|
|||
- {name: output_artifacts, description: A dictionary containing the output S3 artifacts.}
|
||||
implementation:
|
||||
container:
|
||||
image: amazon/aws-sagemaker-kfp-components:1.0.0
|
||||
image: amazon/aws-sagemaker-kfp-components:1.1.0
|
||||
command: [python3]
|
||||
args:
|
||||
- process/src/sagemaker_process_component.py
|
||||
|
|
|
|||
|
|
@ -0,0 +1,93 @@
|
|||
# SageMaker RLEstimator Kubeflow Pipelines component
|
||||
|
||||
## Summary
|
||||
Component to submit SageMaker RLEstimator (Reinforcement Learning) training jobs directly from a Kubeflow Pipelines workflow.
|
||||
https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-rl-workflow.html
|
||||
|
||||
## Intended Use
|
||||
For running your data processing workloads, such as feature engineering, data validation, model evaluation, and model interpretation using AWS SageMaker.
|
||||
|
||||
## Runtime Arguments
|
||||
Argument | Description | Optional | Data type | Accepted values | Default |
|
||||
:--- | :---------- | :----------| :----------| :---------- | :----------|
|
||||
region | The region where the cluster launches | No | String | | |
|
||||
endpoint_url | The endpoint URL for the private link VPC endpoint | Yes | String | | |
|
||||
assume_role | The ARN of an IAM role to assume when connecting to SageMaker | Yes | String | | |
|
||||
job_name | The name of the training job. Must be unique within the same AWS account and AWS region | Yes | String | | TrainingJob-[datetime]-[random id]|
|
||||
role | The Amazon Resource Name (ARN) that Amazon SageMaker assumes to perform tasks on your behalf | No | String | | |
|
||||
image | The registry path of the Docker image that contains your custom image, or you can use a prebuilt AWS RL image | Yes | String | | |
|
||||
entry_point | Path (absolute or relative) to the Python source file which should be executed as the entry point to training | No | String | | |
|
||||
source_dir | Path (S3 URI) to a directory with any other training source code dependencies aside from the entry point file | Yes | String | | |
|
||||
toolkit | RL toolkit you want to use for executing your model training code | Yes | String | | |
|
||||
toolkit_version | RL toolkit version you want to be use for executing your model training code | Yes | String | | |
|
||||
framework | Framework (MXNet, TensorFlow or PyTorch) you want to be used as a toolkit backed for reinforcement learning training | Yes | String | | |
|
||||
metric_definitions | The dictionary of name-regex pairs specify the metrics that the algorithm emits | Yes | Dict | | {} |
|
||||
training_input_mode | The input mode that the algorithm supports | No | String | File, Pipe | File |
|
||||
hyperparameters | Hyperparameters for the selected algorithm | No | Dict | [Depends on Algo](https://docs.aws.amazon.com/sagemaker/latest/dg/k-means-api-config.html)| |
|
||||
instance_type | The ML compute instance type | Yes | String | ml.m4.xlarge, ml.m4.2xlarge, ml.m4.4xlarge, ml.m4.10xlarge, ml.m4.16xlarge, ml.m5.large, ml.m5.xlarge, ml.m5.2xlarge, ml.m5.4xlarge, ml.m5.12xlarge, ml.m5.24xlarge, ml.c4.xlarge, ml.c4.2xlarge, ml.c4.4xlarge, ml.c4.8xlarge, ml.p2.xlarge, ml.p2.8xlarge, ml.p2.16xlarge, ml.p3.2xlarge, ml.p3.8xlarge, ml.p3.16xlarge, ml.c5.xlarge, ml.c5.2xlarge, ml.c5.4xlarge, ml.c5.9xlarge, ml.c5.18xlarge [and many more](https://aws.amazon.com/sagemaker/pricing/instance-types/) | ml.m4.xlarge |
|
||||
instance_count | The number of ML compute instances to use in each training job | Yes | Int | ≥ 1 | 1 |
|
||||
volume_size | The size of the ML storage volume that you want to provision in GB | Yes | Int | ≥ 1 | 30 |
|
||||
max_run | The maximum run time in seconds per training job | Yes | Int | ≤ 432000 (5 days) | 86400 (1 day) |
|
||||
model_artifact_path | | No | String | | |
|
||||
output_encryption_key | The AWS KMS key that Amazon SageMaker uses to encrypt the model artifacts | Yes | String | | |
|
||||
vpc_security_group_ids | A comma-delimited list of security group IDs, in the form sg-xxxxxxxx | Yes | String | | |
|
||||
vpc_subnets | A comma-delimited list of subnet IDs in the VPC to which you want to connect your RLEstimator job | Yes | String | | |
|
||||
spot_instance | Use managed spot training if true | No | Boolean | False, True | False |
|
||||
max_wait_time | The maximum time in seconds you are willing to wait for a managed spot training job to complete | Yes | Int | ≤ 432000 (5 days) | 86400 (1 day) |
|
||||
checkpoint_config | Dictionary of information about the output location for managed spot training checkpoint data | Yes | Dict | | {} |
|
||||
debug_hook_config | Dictionary of configuration information for the debug hook parameters, collection configurations, and storage paths | Yes | Dict | | {} |
|
||||
debug_rule_config | List of configuration information for debugging rules | Yes | List of Dicts | | [] |
|
||||
tags | Key-value pairs to categorize AWS resources | Yes | Dict | | {} |
|
||||
|
||||
Notes:
|
||||
* There are two ways to use this compnent, you can build your own Docker image with baked in code or pass code in via the source_dir input. You then use the entry_point to provide a filename to use as the code entrypoint.
|
||||
* The format for the [`debug_hook_config`](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DebugHookConfig.html) field is:
|
||||
```
|
||||
{
|
||||
"CollectionConfigurations": [
|
||||
{
|
||||
'CollectionName': 'string',
|
||||
'CollectionParameters': {
|
||||
'string' : 'string'
|
||||
}
|
||||
}
|
||||
],
|
||||
'HookParameters': {
|
||||
'string' : 'string'
|
||||
},
|
||||
'LocalPath': 'string',
|
||||
'S3OutputPath': 'string'
|
||||
}
|
||||
```
|
||||
* The format for the [`debug_rule_config`](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DebugRuleConfiguration.html) field is:
|
||||
```
|
||||
[
|
||||
{
|
||||
'InstanceType': 'string',
|
||||
'LocalPath': 'string',
|
||||
'RuleConfigurationName': 'string',
|
||||
'RuleEvaluatorImage': 'string',
|
||||
'RuleParameters': {
|
||||
'string' : 'string'
|
||||
},
|
||||
'S3OutputPath': 'string',
|
||||
'VolumeSizeInGB': number
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
|
||||
## Output
|
||||
Stores the Model in the s3 bucket you specified via model_artifact_path
|
||||
|
||||
# Example code
|
||||
Simple example pipeline that uses a custom image : [rlestimator_pipeline_custom_image](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/rlestimator_pipeline/rlestimator_pipeline_custom_image.py)
|
||||
Sample pipeline for using an image selected for you by the RLEstimator class dependent on the framework and toolkit you provide: [rlestimator_pipeline_toolkit_image](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/rlestimator_pipeline/rlestimator_pipeline_toolkit_image.py)
|
||||
|
||||
# Resources
|
||||
* [Create RLEstimator Job API documentation](https://sagemaker.readthedocs.io/en/stable/frameworks/rl/sagemaker.rl.html)
|
||||
* [Amazon SageMaker Debugger](https://docs.aws.amazon.com/sagemaker/latest/dg/train-debugger.html)
|
||||
* [Debugger Built-In Rules](https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-built-in-rules.html)
|
||||
* [Debugger Custom Rules](https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-custom-rules.html)
|
||||
* [Debugger Registry URLs](https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-docker-images-rules.html)
|
||||
* [Debugger API Examples](https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-createtrainingjob-api.html)
|
||||
|
|
@ -0,0 +1,148 @@
|
|||
name: SageMaker - RLEstimator Training Job
|
||||
description: Handle end-to-end training and deployment of custom RLEstimator code.
|
||||
inputs:
|
||||
- name: spot_instance
|
||||
type: Bool
|
||||
description: Use managed spot training.
|
||||
default: "False"
|
||||
- {name: max_wait_time, type: Integer, description: The maximum time in seconds you
|
||||
are willing to wait for a managed spot training job to complete., default: '86400'}
|
||||
- {name: max_run_time, type: Integer, description: The maximum run time in seconds
|
||||
for the training job., default: '86400'}
|
||||
- {name: checkpoint_config, type: JsonObject, description: Dictionary of information
|
||||
about the output location for managed spot training checkpoint data., default: '{}'}
|
||||
- {name: region, type: String, description: The region for the SageMaker resource.}
|
||||
- {name: endpoint_url, type: String, description: The URL to use when communicating
|
||||
with the SageMaker service., default: ''}
|
||||
- {name: assume_role, type: String, description: The ARN of an IAM role to assume
|
||||
when connecting to SageMaker., default: ''}
|
||||
- {name: tags, type: JsonObject, description: 'An array of key-value pairs, to categorize
|
||||
AWS resources.', default: '{}'}
|
||||
- {name: job_name, type: String, description: Training job name., default: ''}
|
||||
- {name: role, type: String, description: The Amazon Resource Name (ARN) that Amazon
|
||||
SageMaker assumes to perform tasks on your behalf.}
|
||||
- {name: image, type: String, description: 'An ECR url. If specified, the estimator
|
||||
will use this image for training and hosting', default: ''}
|
||||
- {name: entry_point, type: String, description: Path (absolute or relative) to the
|
||||
Python source file which should be executed as the entry point to training., default: ''}
|
||||
- {name: source_dir, type: String, description: Path (S3 URI) to a directory with
|
||||
any other training source code dependencies aside from the entry point file.,
|
||||
default: ''}
|
||||
- {name: toolkit, type: String, description: RL toolkit you want to use for executing
|
||||
your model training code., default: ''}
|
||||
- {name: toolkit_version, type: String, description: RL toolkit version you want to
|
||||
be use for executing your model training code., default: ''}
|
||||
- {name: framework, type: String, description: 'Framework (MXNet, TensorFlow or PyTorch)
|
||||
you want to be used as a toolkit backed for reinforcement learning training.',
|
||||
default: ''}
|
||||
- {name: metric_definitions, type: JsonArray, description: The dictionary of name-regex
|
||||
pairs specify the metrics that the algorithm emits., default: '[]'}
|
||||
- {name: training_input_mode, type: String, description: The input mode that the algorithm
|
||||
supports. File or Pipe., default: File}
|
||||
- {name: hyperparameters, type: JsonObject, description: Hyperparameters that will
|
||||
be used for training., default: '{}'}
|
||||
- {name: instance_type, type: String, description: The ML compute instance type.,
|
||||
default: ml.m4.xlarge}
|
||||
- {name: instance_count, type: Integer, description: The number of ML compute instances
|
||||
to use in the training job., default: '1'}
|
||||
- {name: volume_size, type: Integer, description: The size of the ML storage volume
|
||||
that you want to provision., default: '30'}
|
||||
- {name: max_run, type: Integer, description: 'Timeout in seconds for training (default:
|
||||
24 * 60 * 60).', default: '86400'}
|
||||
- {name: model_artifact_path, type: String, description: Identifies the S3 path where
|
||||
you want Amazon SageMaker to store the model artifacts.}
|
||||
- {name: vpc_security_group_ids, type: JsonArray, description: 'The VPC security group
|
||||
IDs, in the form sg-xxxxxxxx.', default: '[]'}
|
||||
- {name: vpc_subnets, type: JsonArray, description: The ID of the subnets in the VPC
|
||||
to which you want to connect your hpo job., default: '[]'}
|
||||
- name: network_isolation
|
||||
type: Bool
|
||||
description: Isolates the training container.
|
||||
default: "False"
|
||||
- name: traffic_encryption
|
||||
type: Bool
|
||||
description: Encrypts all communications between ML compute instances in distributed
|
||||
training.
|
||||
default: "False"
|
||||
- {name: debug_hook_config, type: JsonObject, description: 'Configuration information
|
||||
for the debug hook parameters, collection configuration, and storage paths.',
|
||||
default: '{}'}
|
||||
- {name: debug_rule_config, type: JsonArray, description: Configuration information
|
||||
for debugging rules., default: '[]'}
|
||||
outputs:
|
||||
- {name: model_artifact_url, description: The model artifacts URL.}
|
||||
- {name: job_name, description: The training job name.}
|
||||
- {name: training_image, description: The registry path of the Docker image that contains
|
||||
the training algorithm.}
|
||||
implementation:
|
||||
container:
|
||||
image: amazon/aws-sagemaker-kfp-components:1.1.0
|
||||
command: [python3]
|
||||
args:
|
||||
- rlestimator/src/sagemaker_rlestimator_component.py
|
||||
- --spot_instance
|
||||
- {inputValue: spot_instance}
|
||||
- --max_wait_time
|
||||
- {inputValue: max_wait_time}
|
||||
- --max_run_time
|
||||
- {inputValue: max_run_time}
|
||||
- --checkpoint_config
|
||||
- {inputValue: checkpoint_config}
|
||||
- --region
|
||||
- {inputValue: region}
|
||||
- --endpoint_url
|
||||
- {inputValue: endpoint_url}
|
||||
- --assume_role
|
||||
- {inputValue: assume_role}
|
||||
- --tags
|
||||
- {inputValue: tags}
|
||||
- --job_name
|
||||
- {inputValue: job_name}
|
||||
- --role
|
||||
- {inputValue: role}
|
||||
- --image
|
||||
- {inputValue: image}
|
||||
- --entry_point
|
||||
- {inputValue: entry_point}
|
||||
- --source_dir
|
||||
- {inputValue: source_dir}
|
||||
- --toolkit
|
||||
- {inputValue: toolkit}
|
||||
- --toolkit_version
|
||||
- {inputValue: toolkit_version}
|
||||
- --framework
|
||||
- {inputValue: framework}
|
||||
- --metric_definitions
|
||||
- {inputValue: metric_definitions}
|
||||
- --training_input_mode
|
||||
- {inputValue: training_input_mode}
|
||||
- --hyperparameters
|
||||
- {inputValue: hyperparameters}
|
||||
- --instance_type
|
||||
- {inputValue: instance_type}
|
||||
- --instance_count
|
||||
- {inputValue: instance_count}
|
||||
- --volume_size
|
||||
- {inputValue: volume_size}
|
||||
- --max_run
|
||||
- {inputValue: max_run}
|
||||
- --model_artifact_path
|
||||
- {inputValue: model_artifact_path}
|
||||
- --vpc_security_group_ids
|
||||
- {inputValue: vpc_security_group_ids}
|
||||
- --vpc_subnets
|
||||
- {inputValue: vpc_subnets}
|
||||
- --network_isolation
|
||||
- {inputValue: network_isolation}
|
||||
- --traffic_encryption
|
||||
- {inputValue: traffic_encryption}
|
||||
- --debug_hook_config
|
||||
- {inputValue: debug_hook_config}
|
||||
- --debug_rule_config
|
||||
- {inputValue: debug_rule_config}
|
||||
- --model_artifact_url_output_path
|
||||
- {outputPath: model_artifact_url}
|
||||
- --job_name_output_path
|
||||
- {outputPath: job_name}
|
||||
- --training_image_output_path
|
||||
- {outputPath: training_image}
|
||||
|
|
@ -0,0 +1,277 @@
|
|||
"""SageMaker component for RLEstimator."""
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import logging
|
||||
from typing import Dict
|
||||
import os
|
||||
from sagemaker.rl import RLEstimator, RLToolkit, RLFramework
|
||||
from rlestimator.src.sagemaker_rlestimator_spec import (
|
||||
SageMakerRLEstimatorSpec,
|
||||
SageMakerRLEstimatorInputs,
|
||||
SageMakerRLEstimatorOutputs,
|
||||
)
|
||||
from common.sagemaker_component import (
|
||||
SageMakerComponent,
|
||||
ComponentMetadata,
|
||||
SageMakerJobStatus,
|
||||
DebugRulesStatus,
|
||||
)
|
||||
from common.boto3_manager import Boto3Manager
|
||||
from common.common_inputs import SageMakerComponentCommonInputs
|
||||
from common.spec_input_parsers import SpecInputParsers
|
||||
|
||||
|
||||
@ComponentMetadata(
|
||||
name="SageMaker - RLEstimator Training Job",
|
||||
description="Handle end-to-end training and deployment of custom RLEstimator code.",
|
||||
spec=SageMakerRLEstimatorSpec,
|
||||
)
|
||||
class SageMakerRLEstimatorComponent(SageMakerComponent):
|
||||
"""SageMaker component for RLEstimator."""
|
||||
|
||||
def Do(self, spec: SageMakerRLEstimatorSpec):
|
||||
self._rlestimator_job_name = (
|
||||
spec.inputs.job_name
|
||||
if spec.inputs.job_name
|
||||
else SageMakerComponent._generate_unique_timestamped_id(
|
||||
prefix="RLEstimatorJob"
|
||||
)
|
||||
)
|
||||
super().Do(spec.inputs, spec.outputs, spec.output_paths)
|
||||
|
||||
def _configure_aws_clients(self, inputs: SageMakerComponentCommonInputs):
|
||||
"""Configures the internal AWS clients for the component.
|
||||
|
||||
Args:
|
||||
inputs: A populated list of user inputs.
|
||||
"""
|
||||
self._sm_client = Boto3Manager.get_sagemaker_client(
|
||||
self._get_component_version(),
|
||||
inputs.region,
|
||||
endpoint_url=inputs.endpoint_url,
|
||||
assume_role_arn=inputs.assume_role,
|
||||
)
|
||||
self._cw_client = Boto3Manager.get_cloudwatch_client(
|
||||
inputs.region, assume_role_arn=inputs.assume_role
|
||||
)
|
||||
self._sagemaker_session = Boto3Manager.get_sagemaker_session(
|
||||
self._get_component_version(),
|
||||
inputs.region,
|
||||
assume_role_arn=inputs.assume_role,
|
||||
)
|
||||
|
||||
def _get_job_status(self) -> SageMakerJobStatus:
|
||||
response = self._sm_client.describe_training_job(
|
||||
TrainingJobName=self._rlestimator_job_name
|
||||
)
|
||||
status = response["TrainingJobStatus"]
|
||||
|
||||
if status == "Completed" or status == "Stopped":
|
||||
return self._get_debug_rule_status()
|
||||
if status == "Failed":
|
||||
message = response["FailureReason"]
|
||||
return SageMakerJobStatus(
|
||||
is_completed=True,
|
||||
has_error=True,
|
||||
error_message=message,
|
||||
raw_status=status,
|
||||
)
|
||||
|
||||
return SageMakerJobStatus(is_completed=False, raw_status=status)
|
||||
|
||||
def _get_debug_rule_status(self) -> SageMakerJobStatus:
|
||||
"""Get the job status of the training debugging rules.
|
||||
|
||||
Returns:
|
||||
SageMakerJobStatus: A status object.
|
||||
"""
|
||||
response = self._sm_client.describe_training_job(
|
||||
TrainingJobName=self._rlestimator_job_name
|
||||
)
|
||||
|
||||
# No debugging configured
|
||||
if "DebugRuleEvaluationStatuses" not in response:
|
||||
return SageMakerJobStatus(is_completed=True, has_error=False, raw_status="")
|
||||
|
||||
raw_status = DebugRulesStatus.from_describe(response)
|
||||
if raw_status != DebugRulesStatus.INPROGRESS:
|
||||
logging.info("Rules have ended with status:\n")
|
||||
self._print_debug_rule_status(response, True)
|
||||
return SageMakerJobStatus(
|
||||
is_completed=True,
|
||||
has_error=(raw_status == DebugRulesStatus.ERRORED),
|
||||
raw_status=raw_status,
|
||||
)
|
||||
|
||||
self._print_debug_rule_status(response)
|
||||
return SageMakerJobStatus(is_completed=False, raw_status=raw_status)
|
||||
|
||||
def _print_debug_rule_status(self, response, last_print=False):
|
||||
"""Prints the status of each debug rule.
|
||||
|
||||
Example of DebugRuleEvaluationStatuses:
|
||||
response['DebugRuleEvaluationStatuses'] =
|
||||
[{
|
||||
"RuleConfigurationName": "VanishingGradient",
|
||||
"RuleEvaluationStatus": "IssuesFound",
|
||||
"StatusDetails": "There was an issue."
|
||||
}]
|
||||
If last_print is False:
|
||||
INFO:root: - LossNotDecreasing: InProgress
|
||||
INFO:root: - Overtraining: NoIssuesFound
|
||||
ERROR:root:- CustomGradientRule: Error
|
||||
If last_print is True:
|
||||
INFO:root: - LossNotDecreasing: IssuesFound
|
||||
INFO:root: - RuleEvaluationConditionMet: Evaluation of the rule LossNotDecreasing at step 10 resulted in the condition being met
|
||||
|
||||
Args:
|
||||
response: A describe training job response.
|
||||
last_print: If true, prints each of the debug rule issues if found.
|
||||
"""
|
||||
for debug_rule in response["DebugRuleEvaluationStatuses"]:
|
||||
line_ending = "\n" if last_print else ""
|
||||
if "StatusDetails" in debug_rule:
|
||||
status_details = (
|
||||
f"- {debug_rule['StatusDetails'].rstrip()}{line_ending}"
|
||||
)
|
||||
line_ending = ""
|
||||
else:
|
||||
status_details = ""
|
||||
rule_status = f"- {debug_rule['RuleConfigurationName']}: {debug_rule['RuleEvaluationStatus']}{line_ending}"
|
||||
if debug_rule["RuleEvaluationStatus"] == "Error":
|
||||
log_fn = logging.error
|
||||
status_padding = 1
|
||||
else:
|
||||
log_fn = logging.info
|
||||
status_padding = 2
|
||||
|
||||
log_fn(f"{status_padding * ' '}{rule_status}")
|
||||
if last_print and status_details:
|
||||
log_fn(f"{(status_padding + 2) * ' '}{status_details}")
|
||||
self._print_log_header(50)
|
||||
|
||||
def _after_job_complete(
|
||||
self,
|
||||
job: object,
|
||||
request: Dict,
|
||||
inputs: SageMakerRLEstimatorInputs,
|
||||
outputs: SageMakerRLEstimatorOutputs,
|
||||
):
|
||||
outputs.job_name = self._rlestimator_job_name
|
||||
outputs.model_artifact_url = self._get_model_artifacts_from_job(
|
||||
self._rlestimator_job_name
|
||||
)
|
||||
outputs.training_image = self._get_image_from_job(self._rlestimator_job_name)
|
||||
|
||||
def _on_job_terminated(self):
|
||||
self._sm_client.stop_training_job(TrainingJobName=self._rlestimator_job_name)
|
||||
|
||||
def _print_logs_for_job(self):
|
||||
self._print_cloudwatch_logs(
|
||||
"/aws/sagemaker/TrainingJobs", self._rlestimator_job_name
|
||||
)
|
||||
|
||||
def _create_job_request(
|
||||
self, inputs: SageMakerRLEstimatorInputs, outputs: SageMakerRLEstimatorOutputs,
|
||||
) -> RLEstimator:
|
||||
# Documentation: https://sagemaker.readthedocs.io/en/stable/frameworks/rl/sagemaker.rl.html
|
||||
# We need to configure region and it is not something we can do via the RLEstimator class.
|
||||
|
||||
# Only use max wait time default value if electing to use spot instances
|
||||
if not inputs.spot_instance:
|
||||
max_wait_time = None
|
||||
else:
|
||||
max_wait_time = inputs.max_wait_time
|
||||
|
||||
estimator = RLEstimator(
|
||||
entry_point=inputs.entry_point,
|
||||
source_dir=inputs.source_dir,
|
||||
image_uri=inputs.image,
|
||||
toolkit=self._get_toolkit(inputs.toolkit),
|
||||
toolkit_version=inputs.toolkit_version,
|
||||
framework=self._get_framework(inputs.framework),
|
||||
role=inputs.role,
|
||||
debugger_hook_config=self._nullable(inputs.debug_hook_config),
|
||||
rules=self._nullable(inputs.debug_rule_config),
|
||||
instance_type=inputs.instance_type,
|
||||
instance_count=inputs.instance_count,
|
||||
output_path=inputs.model_artifact_path,
|
||||
metric_definitions=inputs.metric_definitions,
|
||||
input_mode=inputs.training_input_mode,
|
||||
max_run=inputs.max_run,
|
||||
hyperparameters=self._validate_hyperparameters(inputs.hyperparameters),
|
||||
subnets=self._nullable(inputs.vpc_subnets),
|
||||
security_group_ids=self._nullable(inputs.vpc_security_group_ids),
|
||||
use_spot_instances=inputs.spot_instance,
|
||||
enable_network_isolation=inputs.network_isolation,
|
||||
encrypt_inter_container_traffic=inputs.traffic_encryption,
|
||||
max_wait=max_wait_time,
|
||||
sagemaker_session=self._sagemaker_session,
|
||||
)
|
||||
|
||||
return estimator
|
||||
|
||||
def _submit_job_request(self, estimator: RLEstimator) -> object:
|
||||
# By setting wait to false we don't block the current thread.
|
||||
estimator.fit(job_name=self._rlestimator_job_name, wait=False)
|
||||
job_name = estimator.latest_training_job.job_name
|
||||
self._rlestimator_job_name = job_name
|
||||
response = self._sm_client.describe_training_job(TrainingJobName=job_name)
|
||||
return response
|
||||
|
||||
def _after_submit_job_request(
|
||||
self,
|
||||
job: object,
|
||||
request: Dict,
|
||||
inputs: SageMakerRLEstimatorInputs,
|
||||
outputs: SageMakerRLEstimatorOutputs,
|
||||
):
|
||||
logging.info(f"Created Training Job with name: {self._rlestimator_job_name}")
|
||||
logging.info(
|
||||
"Training job in SageMaker: https://{}.console.aws.amazon.com/sagemaker/home?region={}#/jobs/{}".format(
|
||||
inputs.region, inputs.region, self._rlestimator_job_name,
|
||||
)
|
||||
)
|
||||
logging.info(
|
||||
"CloudWatch logs: https://{}.console.aws.amazon.com/cloudwatch/home?region={}#logStream:group=/aws/sagemaker/TrainingJobs;prefix={};streamFilter=typeLogStreamPrefix".format(
|
||||
inputs.region, inputs.region, self._rlestimator_job_name,
|
||||
)
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def _get_toolkit(toolkit_type: str) -> RLToolkit:
|
||||
if toolkit_type == "":
|
||||
return None
|
||||
return RLToolkit[toolkit_type.upper()]
|
||||
|
||||
@staticmethod
|
||||
def _get_framework(framework_type: str) -> RLFramework:
|
||||
if framework_type == "":
|
||||
return None
|
||||
return RLFramework[framework_type.upper()]
|
||||
|
||||
@staticmethod
|
||||
def _nullable(value: str):
|
||||
if value:
|
||||
return value
|
||||
else:
|
||||
return None
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import sys
|
||||
|
||||
spec = SageMakerRLEstimatorSpec(sys.argv[1:])
|
||||
|
||||
component = SageMakerRLEstimatorComponent()
|
||||
component.Do(spec)
|
||||
|
|
@ -0,0 +1,232 @@
|
|||
"""Specification for the SageMaker RLEstimator component."""
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from dataclasses import dataclass
|
||||
|
||||
from typing import List
|
||||
from common.sagemaker_component_spec import (
|
||||
SageMakerComponentSpec,
|
||||
SageMakerComponentBaseOutputs,
|
||||
)
|
||||
from common.spec_input_parsers import SpecInputParsers
|
||||
from common.common_inputs import (
|
||||
COMMON_INPUTS,
|
||||
SageMakerComponentCommonInputs,
|
||||
SpotInstanceInputs,
|
||||
SPOT_INSTANCE_INPUTS,
|
||||
SageMakerComponentInput as Input,
|
||||
SageMakerComponentOutput as Output,
|
||||
SageMakerComponentInputValidator as InputValidator,
|
||||
SageMakerComponentOutputValidator as OutputValidator,
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class SageMakerRLEstimatorInputs(SageMakerComponentCommonInputs, SpotInstanceInputs):
|
||||
"""Defines the set of inputs for the rlestimator component."""
|
||||
|
||||
job_name: Input
|
||||
role: Input
|
||||
image: Input
|
||||
entry_point: Input
|
||||
source_dir: Input
|
||||
toolkit: Input
|
||||
toolkit_version: Input
|
||||
framework: Input
|
||||
metric_definitions: Input
|
||||
training_input_mode: Input
|
||||
hyperparameters: Input
|
||||
instance_type: Input
|
||||
instance_count: Input
|
||||
volume_size: Input
|
||||
max_run: Input
|
||||
model_artifact_path: Input
|
||||
vpc_security_group_ids: Input
|
||||
vpc_subnets: Input
|
||||
network_isolation: Input
|
||||
traffic_encryption: Input
|
||||
debug_hook_config: Input
|
||||
debug_rule_config: Input
|
||||
|
||||
|
||||
@dataclass
|
||||
class SageMakerRLEstimatorOutputs(SageMakerComponentBaseOutputs):
|
||||
"""Defines the set of outputs for the rlestimator component."""
|
||||
|
||||
model_artifact_url: Output
|
||||
job_name: Output
|
||||
training_image: Output
|
||||
|
||||
|
||||
class SageMakerRLEstimatorSpec(
|
||||
SageMakerComponentSpec[SageMakerRLEstimatorInputs, SageMakerRLEstimatorOutputs]
|
||||
):
|
||||
INPUTS: SageMakerRLEstimatorInputs = SageMakerRLEstimatorInputs(
|
||||
job_name=InputValidator(
|
||||
input_type=str,
|
||||
required=False,
|
||||
description="Training job name.",
|
||||
default="",
|
||||
),
|
||||
role=InputValidator(
|
||||
input_type=str,
|
||||
required=True,
|
||||
description="The Amazon Resource Name (ARN) that Amazon SageMaker assumes to perform tasks on your behalf.",
|
||||
),
|
||||
image=InputValidator(
|
||||
input_type=str,
|
||||
required=False,
|
||||
description="An ECR url. If specified, the estimator will use this image for training and hosting",
|
||||
default=None,
|
||||
),
|
||||
entry_point=InputValidator(
|
||||
input_type=str,
|
||||
required=True,
|
||||
description="Path (absolute or relative) to the Python source file which should be executed as the entry point to training.",
|
||||
default="",
|
||||
),
|
||||
source_dir=InputValidator(
|
||||
input_type=str,
|
||||
required=False,
|
||||
description="Path (S3 URI) to a directory with any other training source code dependencies aside from the entry point file.",
|
||||
default="",
|
||||
),
|
||||
toolkit=InputValidator(
|
||||
input_type=str,
|
||||
choices=["coach", "ray", ""],
|
||||
required=False,
|
||||
description="RL toolkit you want to use for executing your model training code.",
|
||||
default="",
|
||||
),
|
||||
toolkit_version=InputValidator(
|
||||
input_type=str,
|
||||
required=False,
|
||||
description="RL toolkit version you want to be use for executing your model training code.",
|
||||
default=None,
|
||||
),
|
||||
framework=InputValidator(
|
||||
input_type=str,
|
||||
choices=["tensorflow", "mxnet", "pytorch", ""],
|
||||
required=False,
|
||||
description="Framework (MXNet, TensorFlow or PyTorch) you want to be used as a toolkit backed for reinforcement learning training.",
|
||||
default="",
|
||||
),
|
||||
metric_definitions=InputValidator(
|
||||
input_type=SpecInputParsers.yaml_or_json_list,
|
||||
required=False,
|
||||
description="The dictionary of name-regex pairs specify the metrics that the algorithm emits.",
|
||||
default=[],
|
||||
),
|
||||
training_input_mode=InputValidator(
|
||||
choices=["File", "Pipe"],
|
||||
input_type=str,
|
||||
description="The input mode that the algorithm supports. File or Pipe.",
|
||||
default="File",
|
||||
),
|
||||
hyperparameters=InputValidator(
|
||||
input_type=SpecInputParsers.yaml_or_json_dict,
|
||||
required=False,
|
||||
description="Hyperparameters that will be used for training.",
|
||||
default={},
|
||||
),
|
||||
instance_type=InputValidator(
|
||||
input_type=str,
|
||||
required=False,
|
||||
description="The ML compute instance type.",
|
||||
default="ml.m4.xlarge",
|
||||
),
|
||||
instance_count=InputValidator(
|
||||
input_type=int,
|
||||
required=False,
|
||||
description="The number of ML compute instances to use in the training job.",
|
||||
default=1,
|
||||
),
|
||||
volume_size=InputValidator(
|
||||
input_type=int,
|
||||
required=True,
|
||||
description="The size of the ML storage volume that you want to provision.",
|
||||
default=30,
|
||||
),
|
||||
max_run=InputValidator(
|
||||
input_type=int,
|
||||
required=False,
|
||||
description="Timeout in seconds for training (default: 24 * 60 * 60).",
|
||||
default=24 * 60 * 60,
|
||||
),
|
||||
model_artifact_path=InputValidator(
|
||||
input_type=str,
|
||||
required=True,
|
||||
description="Identifies the S3 path where you want Amazon SageMaker to store the model artifacts.",
|
||||
),
|
||||
vpc_security_group_ids=InputValidator(
|
||||
input_type=SpecInputParsers.yaml_or_json_list,
|
||||
required=False,
|
||||
description="The VPC security group IDs, in the form sg-xxxxxxxx.",
|
||||
default=[],
|
||||
),
|
||||
vpc_subnets=InputValidator(
|
||||
input_type=SpecInputParsers.yaml_or_json_list,
|
||||
required=False,
|
||||
description="The ID of the subnets in the VPC to which you want to connect your hpo job.",
|
||||
default=[],
|
||||
),
|
||||
network_isolation=InputValidator(
|
||||
input_type=SpecInputParsers.str_to_bool,
|
||||
description="Isolates the training container.",
|
||||
default=False,
|
||||
),
|
||||
traffic_encryption=InputValidator(
|
||||
input_type=SpecInputParsers.str_to_bool,
|
||||
description="Encrypts all communications between ML compute instances in distributed training.",
|
||||
default=False,
|
||||
),
|
||||
debug_hook_config=InputValidator(
|
||||
input_type=SpecInputParsers.yaml_or_json_dict,
|
||||
required=False,
|
||||
description="Configuration information for the debug hook parameters, collection configuration, and storage paths.",
|
||||
default={},
|
||||
),
|
||||
debug_rule_config=InputValidator(
|
||||
input_type=SpecInputParsers.yaml_or_json_list,
|
||||
required=False,
|
||||
description="Configuration information for debugging rules.",
|
||||
default=[],
|
||||
),
|
||||
**vars(COMMON_INPUTS),
|
||||
**vars(SPOT_INSTANCE_INPUTS)
|
||||
)
|
||||
|
||||
OUTPUTS = SageMakerRLEstimatorOutputs(
|
||||
model_artifact_url=OutputValidator(description="The model artifacts URL."),
|
||||
job_name=OutputValidator(description="The training job name."),
|
||||
training_image=OutputValidator(
|
||||
description="The registry path of the Docker image that contains the training algorithm."
|
||||
),
|
||||
)
|
||||
|
||||
def __init__(self, arguments: List[str]):
|
||||
super().__init__(
|
||||
arguments, SageMakerRLEstimatorInputs, SageMakerRLEstimatorOutputs
|
||||
)
|
||||
|
||||
@property
|
||||
def inputs(self) -> SageMakerRLEstimatorInputs:
|
||||
return self._inputs
|
||||
|
||||
@property
|
||||
def outputs(self) -> SageMakerRLEstimatorOutputs:
|
||||
return self._outputs
|
||||
|
||||
@property
|
||||
def output_paths(self) -> SageMakerRLEstimatorOutputs:
|
||||
return self._output_paths
|
||||
|
|
@ -0,0 +1,77 @@
|
|||
# RoboMaker Simulation Job Kubeflow Pipelines component
|
||||
|
||||
## Summary
|
||||
Component to run a RoboMaker Simulation Job from a Kubeflow Pipelines workflow.
|
||||
https://docs.aws.amazon.com/robomaker/latest/dg/API_CreateSimulationJob.html
|
||||
|
||||
## Intended Use
|
||||
For running your simulation workloads using AWS RoboMaker.
|
||||
|
||||
## Runtime Arguments
|
||||
Argument | Description | Optional | Data type | Accepted values | Default |
|
||||
:--- | :---------- | :----------| :----------| :---------- | :----------|
|
||||
region | The region where the cluster launches | No | String | | |
|
||||
endpoint_url | The endpoint URL for the private link VPC endpoint | Yes | String | | |
|
||||
assume_role | The ARN of an IAM role to assume when connecting to SageMaker | Yes | String | | |
|
||||
app_name | The name of the simulation application. Must be unique within the same AWS account and AWS region | Yes | String | | SimulationApplication-[datetime]-[random id]|
|
||||
role | The Amazon Resource Name (ARN) that Amazon RoboMaker assumes to perform tasks on your behalf | No | String | | |
|
||||
output_bucket | The bucket to place outputs from the simulation job | No | String | | |
|
||||
output_path | The S3 key where outputs from the simulation job are placed | No | String | | |
|
||||
max_run | Timeout in seconds for simulation job (default: 8 * 60 * 60) | No | String | | |
|
||||
failure_behavior | The failure behavior the simulation job (Continue|Fail) | Yes | String | | |
|
||||
sim_app_arn | The application ARN for the simulation application | Yes | String | | |
|
||||
sim_app_version | The application version for the simulation application | Yes | String | | |
|
||||
sim_app_launch_config | The launch configuration for the simulation application | Yes | String | | |
|
||||
sim_app_world_config | A list of world configurations | Yes | List of Dicts | | [] |
|
||||
robot_app_arn | The application ARN for the robot application | Yes | String | | |
|
||||
robot_app_version | The application version for the robot application | Yes | String | | |
|
||||
robot_app_launch_config | The launch configuration for the robot application | Yes | Dict | | {} |
|
||||
data_sources | Specify data sources to mount read-only files from S3 into your simulation | Yes | List of Dicts | | [] |
|
||||
vpc_security_group_ids | A comma-delimited list of security group IDs, in the form sg-xxxxxxxx | Yes | String | | |
|
||||
vpc_subnets | A comma-delimited list of subnet IDs in the VPC to which you want to connect your simulation job | Yes | String | | |
|
||||
use_public_ip | A boolean indicating whether to assign a public IP address | Yes | Bool | | False |
|
||||
sim_unit_limit | The simulation unit limit | Yes | String | | |
|
||||
record_ros_topics | A boolean indicating whether to record all ROS topics (Used for logging) | Yes | Bool | | False |
|
||||
tags | Key-value pairs to categorize AWS resources | Yes | Dict | | {} |
|
||||
|
||||
Notes:
|
||||
* This component can be ran in a pipeline with the Create Simulation App and Delete Simulation App components or as a standalone.
|
||||
* One of sim_app_arn or robot_app_arn and any related inputs must be provided.
|
||||
* The format for the [`sim_app_launch_config`](https://docs.aws.amazon.com/robomaker/latest/dg/API_LaunchConfig.html) field is:
|
||||
```
|
||||
{
|
||||
"packageName": "string",
|
||||
"launchFile": "string",
|
||||
"environmentVariables": {
|
||||
"string": "string",
|
||||
},
|
||||
"streamUI": "bool",
|
||||
}
|
||||
```
|
||||
* The format for the [`sim_app_world_config`](https://docs.aws.amazon.com/robomaker/latest/dg/API_WorldConfig.html) field is:
|
||||
```
|
||||
{
|
||||
"world": "string"
|
||||
}
|
||||
```
|
||||
* The format for the [`robot_app_launch_config`](https://docs.aws.amazon.com/robomaker/latest/dg/API_LaunchConfig.html) field is:
|
||||
```
|
||||
{
|
||||
"packageName": "string",
|
||||
"launchFile": "string",
|
||||
"environmentVariables": {
|
||||
"string": "string",
|
||||
},
|
||||
"streamUI": "bool",
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## Output
|
||||
The output of the simulation job is sent to the location configured via output_artifacts
|
||||
|
||||
# Example code
|
||||
Example of creating a Sim app, then a Sim job and finally deleting the Sim app : [robomaker_simulation_job_app](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/robomaker_simulation/robomaker_simulation_job_app.py)
|
||||
|
||||
# Resources
|
||||
* [Create RoboMaker Simulation Job via Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/robomaker.html#RoboMaker.Client.create_simulation_job)
|
||||
|
|
@ -0,0 +1,109 @@
|
|||
name: RoboMaker - Create Simulation Job
|
||||
description: Creates a simulation job.
|
||||
inputs:
|
||||
- {name: region, type: String, description: The region for the SageMaker resource.}
|
||||
- {name: endpoint_url, type: String, description: The URL to use when communicating
|
||||
with the SageMaker service., default: ''}
|
||||
- {name: assume_role, type: String, description: The ARN of an IAM role to assume
|
||||
when connecting to SageMaker., default: ''}
|
||||
- {name: tags, type: JsonObject, description: 'An array of key-value pairs, to categorize
|
||||
AWS resources.', default: '{}'}
|
||||
- {name: role, type: String, description: The Amazon Resource Name (ARN) that Amazon
|
||||
RoboMaker assumes to perform tasks on your behalf.}
|
||||
- {name: output_bucket, type: String, description: The bucket to place outputs from
|
||||
the simulation job., default: ''}
|
||||
- {name: output_path, type: String, description: The S3 key where outputs from the
|
||||
simulation job are placed., default: ''}
|
||||
- {name: max_run, type: Integer, description: 'Timeout in seconds for simulation job
|
||||
(default: 8 * 60 * 60).', default: '28800'}
|
||||
- {name: failure_behavior, type: String, description: The failure behavior the simulation
|
||||
job (Continue|Fail)., default: Fail}
|
||||
- {name: sim_app_arn, type: String, description: The application ARN for the simulation
|
||||
application., default: ''}
|
||||
- {name: sim_app_version, type: String, description: The application version for the
|
||||
simulation application., default: ''}
|
||||
- {name: sim_app_launch_config, type: JsonObject, description: The launch configuration
|
||||
for the simulation application., default: '{}'}
|
||||
- {name: sim_app_world_config, type: JsonArray, description: A list of world configurations.,
|
||||
default: '[]'}
|
||||
- {name: robot_app_arn, type: String, description: The application ARN for the robot
|
||||
application., default: ''}
|
||||
- {name: robot_app_version, type: String, description: The application version for
|
||||
the robot application., default: ''}
|
||||
- {name: robot_app_launch_config, type: JsonObject, description: The launch configuration
|
||||
for the robot application., default: '{}'}
|
||||
- {name: data_sources, type: JsonArray, description: Specify data sources to mount
|
||||
read-only files from S3 into your simulation., default: '[]'}
|
||||
- {name: vpc_security_group_ids, type: JsonArray, description: 'The VPC security group
|
||||
IDs, in the form sg-xxxxxxxx.', default: '[]'}
|
||||
- {name: vpc_subnets, type: JsonArray, description: The ID of the subnets in the VPC
|
||||
to which you want to connect your simulation job., default: '[]'}
|
||||
- name: use_public_ip
|
||||
type: Bool
|
||||
description: A boolean indicating whether to assign a public IP address.
|
||||
default: "False"
|
||||
- {name: sim_unit_limit, type: Integer, description: The simulation unit limit., default: '15'}
|
||||
- name: record_ros_topics
|
||||
type: Bool
|
||||
description: A boolean indicating whether to record all ROS topics. Used for logging.
|
||||
default: "False"
|
||||
outputs:
|
||||
- {name: arn, description: The Amazon Resource Name (ARN) of the simulation job.}
|
||||
- {name: output_artifacts, description: The simulation job artifacts URL.}
|
||||
- {name: job_id, description: The simulation job id.}
|
||||
implementation:
|
||||
container:
|
||||
image: amazon/aws-sagemaker-kfp-components:1.1.0
|
||||
command: [python3]
|
||||
args:
|
||||
- simulation_job/src/robomaker_simulation_job_component.py
|
||||
- --region
|
||||
- {inputValue: region}
|
||||
- --endpoint_url
|
||||
- {inputValue: endpoint_url}
|
||||
- --assume_role
|
||||
- {inputValue: assume_role}
|
||||
- --tags
|
||||
- {inputValue: tags}
|
||||
- --role
|
||||
- {inputValue: role}
|
||||
- --output_bucket
|
||||
- {inputValue: output_bucket}
|
||||
- --output_path
|
||||
- {inputValue: output_path}
|
||||
- --max_run
|
||||
- {inputValue: max_run}
|
||||
- --failure_behavior
|
||||
- {inputValue: failure_behavior}
|
||||
- --sim_app_arn
|
||||
- {inputValue: sim_app_arn}
|
||||
- --sim_app_version
|
||||
- {inputValue: sim_app_version}
|
||||
- --sim_app_launch_config
|
||||
- {inputValue: sim_app_launch_config}
|
||||
- --sim_app_world_config
|
||||
- {inputValue: sim_app_world_config}
|
||||
- --robot_app_arn
|
||||
- {inputValue: robot_app_arn}
|
||||
- --robot_app_version
|
||||
- {inputValue: robot_app_version}
|
||||
- --robot_app_launch_config
|
||||
- {inputValue: robot_app_launch_config}
|
||||
- --data_sources
|
||||
- {inputValue: data_sources}
|
||||
- --vpc_security_group_ids
|
||||
- {inputValue: vpc_security_group_ids}
|
||||
- --vpc_subnets
|
||||
- {inputValue: vpc_subnets}
|
||||
- --use_public_ip
|
||||
- {inputValue: use_public_ip}
|
||||
- --sim_unit_limit
|
||||
- {inputValue: sim_unit_limit}
|
||||
- --record_ros_topics
|
||||
- {inputValue: record_ros_topics}
|
||||
- --arn_output_path
|
||||
- {outputPath: arn}
|
||||
- --output_artifacts_output_path
|
||||
- {outputPath: output_artifacts}
|
||||
- --job_id_output_path
|
||||
- {outputPath: job_id}
|
||||
|
|
@ -0,0 +1,252 @@
|
|||
"""RoboMaker component for creating a simulation job."""
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import logging
|
||||
from typing import Dict
|
||||
from simulation_job.src.robomaker_simulation_job_spec import (
|
||||
RoboMakerSimulationJobSpec,
|
||||
RoboMakerSimulationJobInputs,
|
||||
RoboMakerSimulationJobOutputs,
|
||||
)
|
||||
from common.sagemaker_component import (
|
||||
SageMakerComponent,
|
||||
ComponentMetadata,
|
||||
SageMakerJobStatus,
|
||||
)
|
||||
from common.boto3_manager import Boto3Manager
|
||||
from common.common_inputs import SageMakerComponentCommonInputs
|
||||
|
||||
|
||||
@ComponentMetadata(
|
||||
name="RoboMaker - Create Simulation Job",
|
||||
description="Creates a simulation job.",
|
||||
spec=RoboMakerSimulationJobSpec,
|
||||
)
|
||||
class RoboMakerSimulationJobComponent(SageMakerComponent):
|
||||
"""RoboMaker component for creating a simulation job."""
|
||||
|
||||
def Do(self, spec: RoboMakerSimulationJobSpec):
|
||||
super().Do(spec.inputs, spec.outputs, spec.output_paths)
|
||||
|
||||
def _get_job_status(self) -> SageMakerJobStatus:
|
||||
response = self._rm_client.describe_simulation_job(job=self._arn)
|
||||
status = response["status"]
|
||||
|
||||
if status in ["Completed"]:
|
||||
return SageMakerJobStatus(
|
||||
is_completed=True, has_error=False, raw_status=status
|
||||
)
|
||||
|
||||
if status in ["Terminating", "Terminated", "Canceled"]:
|
||||
if "failureCode" in response:
|
||||
simulation_message = (
|
||||
f"Simulation failed with code:{response['failureCode']}"
|
||||
)
|
||||
return SageMakerJobStatus(
|
||||
is_completed=True,
|
||||
has_error=True,
|
||||
error_message=simulation_message,
|
||||
raw_status=status,
|
||||
)
|
||||
else:
|
||||
simulation_message = "Exited without error code.\n"
|
||||
if "failureReason" in response:
|
||||
simulation_message += (
|
||||
f"Simulation exited with reason:{response['failureReason']}\n"
|
||||
)
|
||||
return SageMakerJobStatus(
|
||||
is_completed=True,
|
||||
has_error=False,
|
||||
error_message=simulation_message,
|
||||
raw_status=status,
|
||||
)
|
||||
|
||||
if status in ["Failed", "RunningFailed"]:
|
||||
failure_message = f"Simulation job is in status:{status}\n"
|
||||
if "failureReason" in response:
|
||||
failure_message += (
|
||||
f"Simulation failed with reason:{response['failureReason']}"
|
||||
)
|
||||
if "failureCode" in response:
|
||||
failure_message += (
|
||||
f"Simulation failed with errorCode:{response['failureCode']}"
|
||||
)
|
||||
return SageMakerJobStatus(
|
||||
is_completed=True,
|
||||
has_error=True,
|
||||
error_message=failure_message,
|
||||
raw_status=status,
|
||||
)
|
||||
|
||||
return SageMakerJobStatus(is_completed=False, raw_status=status)
|
||||
|
||||
def _configure_aws_clients(self, inputs: SageMakerComponentCommonInputs):
|
||||
"""Configures the internal AWS clients for the component.
|
||||
|
||||
Args:
|
||||
inputs: A populated list of user inputs.
|
||||
"""
|
||||
self._rm_client = Boto3Manager.get_robomaker_client(
|
||||
self._get_component_version(),
|
||||
inputs.region,
|
||||
endpoint_url=inputs.endpoint_url,
|
||||
assume_role_arn=inputs.assume_role,
|
||||
)
|
||||
self._cw_client = Boto3Manager.get_cloudwatch_client(
|
||||
inputs.region, assume_role_arn=inputs.assume_role
|
||||
)
|
||||
|
||||
def _after_job_complete(
|
||||
self,
|
||||
job: Dict,
|
||||
request: Dict,
|
||||
inputs: RoboMakerSimulationJobInputs,
|
||||
outputs: RoboMakerSimulationJobOutputs,
|
||||
):
|
||||
outputs.output_artifacts = self._get_job_outputs()
|
||||
logging.info(
|
||||
"Simulation Job in RoboMaker: https://{}.console.aws.amazon.com/robomaker/home?region={}#/simulationJobs/{}".format(
|
||||
inputs.region, inputs.region, self._job_id
|
||||
)
|
||||
)
|
||||
|
||||
def _on_job_terminated(self):
|
||||
self._rm_client.cancel_simulation_job(application=self._arn)
|
||||
|
||||
def _create_job_request(
|
||||
self,
|
||||
inputs: RoboMakerSimulationJobInputs,
|
||||
outputs: RoboMakerSimulationJobOutputs,
|
||||
) -> Dict:
|
||||
"""
|
||||
Documentation:https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/robomaker.html#RoboMaker.Client.create_simulation_job
|
||||
"""
|
||||
|
||||
# Need one of sim_app_arn or robot_app_arn to be provided
|
||||
if not inputs.sim_app_arn and not inputs.robot_app_arn:
|
||||
logging.error("Must specify a Simulation App ARN or a Robot App ARN.")
|
||||
raise Exception("Could not create simulation job request")
|
||||
|
||||
request = self._get_request_template("robomaker.simulation.job")
|
||||
|
||||
# Set the required inputs
|
||||
request["outputLocation"]["s3Bucket"] = inputs.output_bucket
|
||||
request["outputLocation"]["s3Prefix"] = inputs.output_path
|
||||
request["maxJobDurationInSeconds"] = inputs.max_run
|
||||
request["iamRole"] = inputs.role
|
||||
|
||||
# Set networking inputs
|
||||
if inputs.vpc_subnets:
|
||||
request["vpcConfig"]["subnets"] = inputs.vpc_subnets
|
||||
if inputs.vpc_security_group_ids:
|
||||
request["vpcConfig"]["securityGroups"] = inputs.vpc_security_group_ids
|
||||
if inputs.use_public_ip:
|
||||
request["vpcConfig"]["assignPublicIp"] = inputs.use_public_ip
|
||||
else:
|
||||
request.pop("vpcConfig")
|
||||
|
||||
# Set simulation application inputs
|
||||
if inputs.sim_app_arn:
|
||||
if not inputs.sim_app_launch_config:
|
||||
logging.error("Must specify a Launch Config for your Simulation App")
|
||||
raise Exception("Could not create simulation job request")
|
||||
sim_app = {
|
||||
"application": inputs.sim_app_arn,
|
||||
"launchConfig": inputs.sim_app_launch_config,
|
||||
}
|
||||
if inputs.sim_app_version:
|
||||
sim_app["version"]: inputs.sim_app_version
|
||||
if inputs.sim_app_world_config:
|
||||
sim_app["worldConfigs"]: inputs.sim_app_world_config
|
||||
request["simulationApplications"].append(sim_app)
|
||||
else:
|
||||
request.pop("simulationApplications")
|
||||
|
||||
# Set robot application inputs
|
||||
if inputs.robot_app_arn:
|
||||
if not inputs.robot_app_launch_config:
|
||||
logging.error("Must specify a Launch Config for your Robot App")
|
||||
raise Exception("Could not create simulation job request")
|
||||
robot_app = {
|
||||
"application": inputs.robot_app_arn,
|
||||
"launchConfig": inputs.robot_app_launch_config,
|
||||
}
|
||||
if inputs.robot_app_version:
|
||||
robot_app["version"]: inputs.robot_app_version
|
||||
request["robotApplications"].append(robot_app)
|
||||
else:
|
||||
request.pop("robotApplications")
|
||||
|
||||
# Set optional inputs
|
||||
if inputs.record_ros_topics:
|
||||
request["loggingConfig"]["recordAllRosTopics"] = inputs.record_ros_topics
|
||||
else:
|
||||
request.pop("loggingConfig")
|
||||
|
||||
if inputs.failure_behavior:
|
||||
request["failureBehavior"] = inputs.failure_behavior
|
||||
else:
|
||||
request.pop("failureBehavior")
|
||||
|
||||
if inputs.data_sources:
|
||||
request["dataSources"] = inputs.data_sources
|
||||
else:
|
||||
request.pop("dataSources")
|
||||
|
||||
if inputs.sim_unit_limit:
|
||||
request["compute"]["simulationUnitLimit"] = inputs.sim_unit_limit
|
||||
|
||||
self._enable_tag_support(request, inputs)
|
||||
|
||||
return request
|
||||
|
||||
def _submit_job_request(self, request: Dict) -> Dict:
|
||||
return self._rm_client.create_simulation_job(**request)
|
||||
|
||||
def _after_submit_job_request(
|
||||
self,
|
||||
job: Dict,
|
||||
request: Dict,
|
||||
inputs: RoboMakerSimulationJobInputs,
|
||||
outputs: RoboMakerSimulationJobOutputs,
|
||||
):
|
||||
outputs.arn = self._arn = job["arn"]
|
||||
outputs.job_id = self._job_id = job["arn"].split("/")[-1]
|
||||
logging.info(f"Started Robomaker Simulation Job with ID: {self._job_id}")
|
||||
logging.info(
|
||||
"Simulation Job in RoboMaker: https://{}.console.aws.amazon.com/robomaker/home?region={}#/simulationJobs/{}".format(
|
||||
inputs.region, inputs.region, self._job_id
|
||||
)
|
||||
)
|
||||
|
||||
def _print_logs_for_job(self):
|
||||
self._print_cloudwatch_logs("/aws/robomaker/SimulationJobs", self._job_id)
|
||||
|
||||
def _get_job_outputs(self):
|
||||
"""Map the S3 outputs of a simulation job to a dictionary object.
|
||||
|
||||
Returns:
|
||||
dict: A dictionary of output S3 URIs.
|
||||
"""
|
||||
response = self._rm_client.describe_simulation_job(job=self._arn)
|
||||
artifact_uri = f"s3://{response['outputLocation']['s3Bucket']}/{response['outputLocation']['s3Prefix']}"
|
||||
return artifact_uri
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import sys
|
||||
|
||||
spec = RoboMakerSimulationJobSpec(sys.argv[1:])
|
||||
|
||||
component = RoboMakerSimulationJobComponent()
|
||||
component.Do(spec)
|
||||
|
|
@ -0,0 +1,200 @@
|
|||
"""Specification for the RoboMaker create simulation job component."""
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from dataclasses import dataclass
|
||||
|
||||
from typing import List
|
||||
from common.sagemaker_component_spec import SageMakerComponentSpec
|
||||
from common.spec_input_parsers import SpecInputParsers
|
||||
from common.common_inputs import (
|
||||
COMMON_INPUTS,
|
||||
SageMakerComponentCommonInputs,
|
||||
SageMakerComponentInput as Input,
|
||||
SageMakerComponentOutput as Output,
|
||||
SageMakerComponentBaseOutputs,
|
||||
SageMakerComponentInputValidator as InputValidator,
|
||||
SageMakerComponentOutputValidator as OutputValidator,
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class RoboMakerSimulationJobInputs(SageMakerComponentCommonInputs):
|
||||
"""Defines the set of inputs for the create simulation job component."""
|
||||
|
||||
role: Input
|
||||
output_bucket: Input
|
||||
output_path: Input
|
||||
max_run: Input
|
||||
failure_behavior: Input
|
||||
sim_app_arn: Input
|
||||
sim_app_version: Input
|
||||
sim_app_launch_config: Input
|
||||
sim_app_world_config: Input
|
||||
robot_app_arn: Input
|
||||
robot_app_version: Input
|
||||
robot_app_launch_config: Input
|
||||
data_sources: Input
|
||||
vpc_security_group_ids: Input
|
||||
vpc_subnets: Input
|
||||
use_public_ip: Input
|
||||
sim_unit_limit: Input
|
||||
record_ros_topics: Input
|
||||
|
||||
|
||||
@dataclass
|
||||
class RoboMakerSimulationJobOutputs(SageMakerComponentBaseOutputs):
|
||||
"""Defines the set of outputs for the create simulation job component."""
|
||||
|
||||
arn: Output
|
||||
output_artifacts: Output
|
||||
job_id: Output
|
||||
|
||||
|
||||
class RoboMakerSimulationJobSpec(
|
||||
SageMakerComponentSpec[RoboMakerSimulationJobInputs, RoboMakerSimulationJobOutputs]
|
||||
):
|
||||
INPUTS: RoboMakerSimulationJobInputs = RoboMakerSimulationJobInputs(
|
||||
role=InputValidator(
|
||||
input_type=str,
|
||||
required=True,
|
||||
description="The Amazon Resource Name (ARN) that Amazon RoboMaker assumes to perform tasks on your behalf.",
|
||||
),
|
||||
output_bucket=InputValidator(
|
||||
input_type=str,
|
||||
required=True,
|
||||
description="The bucket to place outputs from the simulation job.",
|
||||
default="",
|
||||
),
|
||||
output_path=InputValidator(
|
||||
input_type=str,
|
||||
required=True,
|
||||
description="The S3 key where outputs from the simulation job are placed.",
|
||||
default="",
|
||||
),
|
||||
max_run=InputValidator(
|
||||
input_type=int,
|
||||
required=True,
|
||||
description="Timeout in seconds for simulation job (default: 8 * 60 * 60).",
|
||||
default=8 * 60 * 60,
|
||||
),
|
||||
failure_behavior=InputValidator(
|
||||
input_type=str,
|
||||
required=False,
|
||||
description="The failure behavior the simulation job (Continue|Fail).",
|
||||
default="Fail",
|
||||
),
|
||||
sim_app_arn=InputValidator(
|
||||
input_type=str,
|
||||
required=False,
|
||||
description="The application ARN for the simulation application.",
|
||||
default="",
|
||||
),
|
||||
sim_app_version=InputValidator(
|
||||
input_type=str,
|
||||
required=False,
|
||||
description="The application version for the simulation application.",
|
||||
default="",
|
||||
),
|
||||
sim_app_launch_config=InputValidator(
|
||||
input_type=SpecInputParsers.yaml_or_json_dict,
|
||||
required=False,
|
||||
description="The launch configuration for the simulation application.",
|
||||
default={},
|
||||
),
|
||||
sim_app_world_config=InputValidator(
|
||||
input_type=SpecInputParsers.yaml_or_json_list,
|
||||
required=False,
|
||||
description="A list of world configurations.",
|
||||
default=[],
|
||||
),
|
||||
robot_app_arn=InputValidator(
|
||||
input_type=str,
|
||||
required=False,
|
||||
description="The application ARN for the robot application.",
|
||||
default="",
|
||||
),
|
||||
robot_app_version=InputValidator(
|
||||
input_type=str,
|
||||
required=False,
|
||||
description="The application version for the robot application.",
|
||||
default="",
|
||||
),
|
||||
robot_app_launch_config=InputValidator(
|
||||
input_type=SpecInputParsers.yaml_or_json_dict,
|
||||
required=False,
|
||||
description="The launch configuration for the robot application.",
|
||||
default={},
|
||||
),
|
||||
data_sources=InputValidator(
|
||||
input_type=SpecInputParsers.yaml_or_json_list,
|
||||
required=False,
|
||||
description="Specify data sources to mount read-only files from S3 into your simulation.",
|
||||
default=[],
|
||||
),
|
||||
vpc_security_group_ids=InputValidator(
|
||||
input_type=SpecInputParsers.yaml_or_json_list,
|
||||
required=False,
|
||||
description="The VPC security group IDs, in the form sg-xxxxxxxx.",
|
||||
default=[],
|
||||
),
|
||||
vpc_subnets=InputValidator(
|
||||
input_type=SpecInputParsers.yaml_or_json_list,
|
||||
required=False,
|
||||
description="The ID of the subnets in the VPC to which you want to connect your simulation job.",
|
||||
default=[],
|
||||
),
|
||||
use_public_ip=InputValidator(
|
||||
input_type=bool,
|
||||
description="A boolean indicating whether to assign a public IP address.",
|
||||
default=False,
|
||||
),
|
||||
sim_unit_limit=InputValidator(
|
||||
input_type=int,
|
||||
required=False,
|
||||
description="The simulation unit limit.",
|
||||
default=15,
|
||||
),
|
||||
record_ros_topics=InputValidator(
|
||||
input_type=bool,
|
||||
description="A boolean indicating whether to record all ROS topics. Used for logging.",
|
||||
default=False,
|
||||
),
|
||||
**vars(COMMON_INPUTS),
|
||||
)
|
||||
|
||||
OUTPUTS = RoboMakerSimulationJobOutputs(
|
||||
arn=OutputValidator(
|
||||
description="The Amazon Resource Name (ARN) of the simulation job."
|
||||
),
|
||||
output_artifacts=OutputValidator(
|
||||
description="The simulation job artifacts URL."
|
||||
),
|
||||
job_id=OutputValidator(description="The simulation job id."),
|
||||
)
|
||||
|
||||
def __init__(self, arguments: List[str]):
|
||||
super().__init__(
|
||||
arguments, RoboMakerSimulationJobInputs, RoboMakerSimulationJobOutputs,
|
||||
)
|
||||
|
||||
@property
|
||||
def inputs(self) -> RoboMakerSimulationJobInputs:
|
||||
return self._inputs
|
||||
|
||||
@property
|
||||
def outputs(self) -> RoboMakerSimulationJobOutputs:
|
||||
return self._outputs
|
||||
|
||||
@property
|
||||
def output_paths(self) -> RoboMakerSimulationJobOutputs:
|
||||
return self._output_paths
|
||||
|
|
@ -0,0 +1,72 @@
|
|||
# RoboMaker Simulation Job Batch Kubeflow Pipelines component
|
||||
|
||||
## Summary
|
||||
Component to run a RoboMaker Simulation Job Batch from a Kubeflow Pipelines workflow.
|
||||
https://docs.aws.amazon.com/robomaker/latest/dg/API_StartSimulationJobBatch.html
|
||||
|
||||
## Intended Use
|
||||
For running your simulation workloads using AWS RoboMaker.
|
||||
|
||||
max_concurrency: Input
|
||||
simulation_job_requests: Input
|
||||
sim_app_arn: Input
|
||||
|
||||
## Runtime Arguments
|
||||
Argument | Description | Optional | Data type | Accepted values | Default |
|
||||
:--- | :---------- | :----------| :----------| :---------- | :----------|
|
||||
region | The region where the cluster launches | No | String | | |
|
||||
endpoint_url | The endpoint URL for the private link VPC endpoint | Yes | String | | |
|
||||
assume_role | The ARN of an IAM role to assume when connecting to SageMaker | Yes | String | | |
|
||||
app_name | The name of the simulation application. Must be unique within the same AWS account and AWS region | Yes | String | | SimulationApplication-[datetime]-[random id]|
|
||||
role | The Amazon Resource Name (ARN) that Amazon RoboMaker assumes to perform tasks on your behalf | No | String | | |
|
||||
timeout_in_secs | The amount of time, in seconds, to wait for the batch to complete | Yes | String | | |
|
||||
max_concurrency | The number of active simulation jobs create as part of the batch that can be in an active state at the same time | Yes | Int | | |
|
||||
simulation_job_requests | A list of simulation job requests to create in the batch | No | List of Dicts | | [] |
|
||||
sim_app_arn | The application ARN for the simulation application | Yes | String | | |
|
||||
tags | Key-value pairs to categorize AWS resources | Yes | Dict | | {} |
|
||||
|
||||
Notes:
|
||||
* This component can be ran in a pipeline with the Create Simulation App and Delete Simulation App components or as a standalone.
|
||||
* One of sim_app_arn can be provided as an input, or can be embedded as the 'application' value for any of the simulation_job_requests.
|
||||
* The format for the [`simulation_job_requests`](https://docs.aws.amazon.com/robomaker/latest/dg/API_SimulationJobRequest.html) field is:
|
||||
```
|
||||
[
|
||||
{
|
||||
"outputLocation": {
|
||||
"s3Bucket": "string",
|
||||
"s3Prefix": "string",
|
||||
},
|
||||
"loggingConfig": {"recordAllRosTopics": "bool"},
|
||||
"maxJobDurationInSeconds": "int",
|
||||
"iamRole": "string",
|
||||
"failureBehavior": "string",
|
||||
"simulationApplications": [
|
||||
{
|
||||
"application": "string",
|
||||
"launchConfig": {
|
||||
"packageName": "string",
|
||||
"launchFile": "string",
|
||||
"environmentVariables": {
|
||||
"string": "string",
|
||||
},
|
||||
"streamUI": "bool",
|
||||
},
|
||||
}
|
||||
],
|
||||
"vpcConfig": {
|
||||
"subnets": "list",
|
||||
"securityGroups": "list",
|
||||
"assignPublicIp": "bool",
|
||||
},
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
## Output
|
||||
The ARN and ID of the batch job.
|
||||
|
||||
# Example code
|
||||
Example of creating a Sim app, then a Sim job batch and finally deleting the Sim app : [robomaker_simulation_job_batch_app](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/robomaker_simulation/robomaker_simulation_job_batch_app.py)
|
||||
|
||||
# Resources
|
||||
* [Create RoboMaker Simulation Job Batch via Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/robomaker.html#RoboMaker.Client.start_simulation_job_batch)
|
||||
|
|
@ -0,0 +1,52 @@
|
|||
name: RoboMaker - Create Simulation Job Batch
|
||||
description: Creates a simulation job batch.
|
||||
inputs:
|
||||
- {name: region, type: String, description: The region for the SageMaker resource.}
|
||||
- {name: endpoint_url, type: String, description: The URL to use when communicating
|
||||
with the SageMaker service., default: ''}
|
||||
- {name: assume_role, type: String, description: The ARN of an IAM role to assume
|
||||
when connecting to SageMaker., default: ''}
|
||||
- {name: tags, type: JsonObject, description: 'An array of key-value pairs, to categorize
|
||||
AWS resources.', default: '{}'}
|
||||
- {name: role, type: String, description: The Amazon Resource Name (ARN) that Amazon
|
||||
RoboMaker assumes to perform tasks on your behalf.}
|
||||
- {name: timeout_in_secs, type: Integer, description: 'The amount of time, in seconds,
|
||||
to wait for the batch to complete.', default: '0'}
|
||||
- {name: max_concurrency, type: Integer, description: The number of active simulation
|
||||
jobs create as part of the batch that can be in an active state at the same time.,
|
||||
default: '0'}
|
||||
- {name: simulation_job_requests, type: JsonArray, description: A list of simulation
|
||||
job requests to create in the batch., default: '[]'}
|
||||
- {name: sim_app_arn, type: String, description: The application ARN for the simulation
|
||||
application., default: ''}
|
||||
outputs:
|
||||
- {name: arn, description: The Amazon Resource Name (ARN) of the simulation job.}
|
||||
- {name: batch_job_id, description: The simulation job batch id.}
|
||||
implementation:
|
||||
container:
|
||||
image: amazon/aws-sagemaker-kfp-components:1.1.0
|
||||
command: [python3]
|
||||
args:
|
||||
- simulation_job_batch/src/robomaker_simulation_job_batch_component.py
|
||||
- --region
|
||||
- {inputValue: region}
|
||||
- --endpoint_url
|
||||
- {inputValue: endpoint_url}
|
||||
- --assume_role
|
||||
- {inputValue: assume_role}
|
||||
- --tags
|
||||
- {inputValue: tags}
|
||||
- --role
|
||||
- {inputValue: role}
|
||||
- --timeout_in_secs
|
||||
- {inputValue: timeout_in_secs}
|
||||
- --max_concurrency
|
||||
- {inputValue: max_concurrency}
|
||||
- --simulation_job_requests
|
||||
- {inputValue: simulation_job_requests}
|
||||
- --sim_app_arn
|
||||
- {inputValue: sim_app_arn}
|
||||
- --arn_output_path
|
||||
- {outputPath: arn}
|
||||
- --batch_job_id_output_path
|
||||
- {outputPath: batch_job_id}
|
||||
|
|
@ -0,0 +1,194 @@
|
|||
"""RoboMaker component for creating a simulation job batch."""
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import logging
|
||||
from typing import Dict
|
||||
from simulation_job_batch.src.robomaker_simulation_job_batch_spec import (
|
||||
RoboMakerSimulationJobBatchSpec,
|
||||
RoboMakerSimulationJobBatchInputs,
|
||||
RoboMakerSimulationJobBatchOutputs,
|
||||
)
|
||||
from common.sagemaker_component import (
|
||||
SageMakerComponent,
|
||||
ComponentMetadata,
|
||||
SageMakerJobStatus,
|
||||
)
|
||||
from common.boto3_manager import Boto3Manager
|
||||
from common.common_inputs import SageMakerComponentCommonInputs
|
||||
|
||||
|
||||
@ComponentMetadata(
|
||||
name="RoboMaker - Create Simulation Job Batch",
|
||||
description="Creates a simulation job batch.",
|
||||
spec=RoboMakerSimulationJobBatchSpec,
|
||||
)
|
||||
class RoboMakerSimulationJobBatchComponent(SageMakerComponent):
|
||||
"""RoboMaker component for creating a simulation job."""
|
||||
|
||||
def Do(self, spec: RoboMakerSimulationJobBatchSpec):
|
||||
super().Do(spec.inputs, spec.outputs, spec.output_paths)
|
||||
|
||||
def _get_job_status(self) -> SageMakerJobStatus:
|
||||
batch_response = self._rm_client.describe_simulation_job_batch(batch=self._arn)
|
||||
batch_status = batch_response["status"]
|
||||
|
||||
if batch_status in ["Completed"]:
|
||||
return SageMakerJobStatus(
|
||||
is_completed=True, has_error=False, raw_status=batch_status
|
||||
)
|
||||
|
||||
if batch_status in ["TimedOut", "Canceled"]:
|
||||
simulation_message = "Simulation jobs are completed\n"
|
||||
has_error = False
|
||||
for completed_request in batch_response["createdRequests"]:
|
||||
self._sim_request_ids.add(completed_request["arn"].split("/")[-1])
|
||||
simulation_response = self._rm_client.describe_simulation_job(
|
||||
job=completed_request["arn"]
|
||||
)
|
||||
if "failureCode" in simulation_response:
|
||||
simulation_message += f"Simulation job: {simulation_response['arn']} failed with errorCode:{simulation_response['failureCode']}\n"
|
||||
has_error = True
|
||||
return SageMakerJobStatus(
|
||||
is_completed=True,
|
||||
has_error=has_error,
|
||||
error_message=simulation_message,
|
||||
raw_status=batch_status,
|
||||
)
|
||||
|
||||
if batch_status in ["Failed"]:
|
||||
failure_message = f"Simulation batch job is in status:{batch_status}\n"
|
||||
if "failureReason" in batch_response:
|
||||
failure_message += (
|
||||
f"Simulation failed with reason:{batch_response['failureReason']}"
|
||||
)
|
||||
if "failureCode" in batch_response:
|
||||
failure_message += (
|
||||
f"Simulation failed with errorCode:{batch_response['failureCode']}"
|
||||
)
|
||||
return SageMakerJobStatus(
|
||||
is_completed=True,
|
||||
has_error=True,
|
||||
error_message=failure_message,
|
||||
raw_status=batch_status,
|
||||
)
|
||||
|
||||
return SageMakerJobStatus(is_completed=False, raw_status=batch_status)
|
||||
|
||||
def _configure_aws_clients(self, inputs: SageMakerComponentCommonInputs):
|
||||
"""Configures the internal AWS clients for the component.
|
||||
|
||||
Args:
|
||||
inputs: A populated list of user inputs.
|
||||
"""
|
||||
self._rm_client = Boto3Manager.get_robomaker_client(
|
||||
self._get_component_version(),
|
||||
inputs.region,
|
||||
endpoint_url=inputs.endpoint_url,
|
||||
assume_role_arn=inputs.assume_role,
|
||||
)
|
||||
self._cw_client = Boto3Manager.get_cloudwatch_client(
|
||||
inputs.region, assume_role_arn=inputs.assume_role
|
||||
)
|
||||
|
||||
def _after_job_complete(
|
||||
self,
|
||||
job: Dict,
|
||||
request: Dict,
|
||||
inputs: RoboMakerSimulationJobBatchInputs,
|
||||
outputs: RoboMakerSimulationJobBatchOutputs,
|
||||
):
|
||||
for sim_request_id in self._sim_request_ids:
|
||||
logging.info(
|
||||
"Simulation Job in RoboMaker: https://{}.console.aws.amazon.com/robomaker/home?region={}#/simulationJobBatches/{}".format(
|
||||
inputs.region, inputs.region, sim_request_id
|
||||
)
|
||||
)
|
||||
|
||||
def _on_job_terminated(self):
|
||||
self._rm_client.cancel_simulation_job_batch(batch=self._arn)
|
||||
|
||||
def _create_job_request(
|
||||
self,
|
||||
inputs: RoboMakerSimulationJobBatchInputs,
|
||||
outputs: RoboMakerSimulationJobBatchOutputs,
|
||||
) -> Dict:
|
||||
"""
|
||||
Documentation:https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/robomaker.html#RoboMaker.Client.start_simulation_job_batch
|
||||
"""
|
||||
request = self._get_request_template("robomaker.simulation.job.batch")
|
||||
|
||||
# Set batch policy inputs
|
||||
if inputs.timeout_in_secs:
|
||||
request["batchPolicy"]["timeoutInSeconds"] = inputs.timeout_in_secs
|
||||
if inputs.max_concurrency:
|
||||
request["batchPolicy"]["maxConcurrency"] = inputs.max_concurrency
|
||||
if not inputs.timeout_in_secs and not inputs.max_concurrency:
|
||||
request.pop("batchPolicy")
|
||||
|
||||
# Set the simulation job inputs
|
||||
request["createSimulationJobRequests"] = inputs.simulation_job_requests
|
||||
|
||||
# Override with ARN of sim application from input. Can be used to pass ARN from create sim app component.
|
||||
if inputs.sim_app_arn:
|
||||
for sim_job_request in request["createSimulationJobRequests"]:
|
||||
for sim_jobs in sim_job_request["simulationApplications"]:
|
||||
sim_jobs["application"] = inputs.sim_app_arn
|
||||
|
||||
return request
|
||||
|
||||
def _submit_job_request(self, request: Dict) -> Dict:
|
||||
return self._rm_client.start_simulation_job_batch(**request)
|
||||
|
||||
def _after_submit_job_request(
|
||||
self,
|
||||
job: Dict,
|
||||
request: Dict,
|
||||
inputs: RoboMakerSimulationJobBatchInputs,
|
||||
outputs: RoboMakerSimulationJobBatchOutputs,
|
||||
):
|
||||
outputs.arn = self._arn = job["arn"]
|
||||
outputs.batch_job_id = self._batch_job_id = job["arn"].split("/")[-1]
|
||||
logging.info(
|
||||
f"Started Robomaker Simulation Job Batch with ID: {self._batch_job_id}"
|
||||
)
|
||||
logging.info(
|
||||
"Simulation Job Batch in RoboMaker: https://{}.console.aws.amazon.com/robomaker/home?region={}#/simulationJobBatches/{}".format(
|
||||
inputs.region, inputs.region, self._batch_job_id
|
||||
)
|
||||
)
|
||||
self._sim_request_ids = set()
|
||||
for created_request in job["createdRequests"]:
|
||||
self._sim_request_ids.add(created_request["arn"].split("/")[-1])
|
||||
logging.info(
|
||||
f"Started Robomaker Simulation Job with ID: {created_request['arn'].split('/')[-1]}"
|
||||
)
|
||||
|
||||
# Inform if we have any pending or failed requests
|
||||
if job["pendingRequests"]:
|
||||
logging.info("Some Simulation Requests are in state Pending")
|
||||
|
||||
if job["failedRequests"]:
|
||||
logging.info("Some Simulation Requests are in state Failed")
|
||||
|
||||
def _print_logs_for_job(self):
|
||||
for sim_request_id in self._sim_request_ids:
|
||||
self._print_cloudwatch_logs("/aws/robomaker/SimulationJobs", sim_request_id)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import sys
|
||||
|
||||
spec = RoboMakerSimulationJobBatchSpec(sys.argv[1:])
|
||||
|
||||
component = RoboMakerSimulationJobBatchComponent()
|
||||
component.Do(spec)
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
"""Specification for the RoboMaker create simulation job batch component."""
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from dataclasses import dataclass
|
||||
|
||||
from typing import List
|
||||
from common.sagemaker_component_spec import SageMakerComponentSpec
|
||||
from common.spec_input_parsers import SpecInputParsers
|
||||
from common.common_inputs import (
|
||||
COMMON_INPUTS,
|
||||
SageMakerComponentCommonInputs,
|
||||
SageMakerComponentInput as Input,
|
||||
SageMakerComponentOutput as Output,
|
||||
SageMakerComponentBaseOutputs,
|
||||
SageMakerComponentInputValidator as InputValidator,
|
||||
SageMakerComponentOutputValidator as OutputValidator,
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class RoboMakerSimulationJobBatchInputs(SageMakerComponentCommonInputs):
|
||||
"""Defines the set of inputs for the create simulation job batch component."""
|
||||
|
||||
role: Input
|
||||
timeout_in_secs: Input
|
||||
max_concurrency: Input
|
||||
simulation_job_requests: Input
|
||||
sim_app_arn: Input
|
||||
|
||||
|
||||
@dataclass
|
||||
class RoboMakerSimulationJobBatchOutputs(SageMakerComponentBaseOutputs):
|
||||
"""Defines the set of outputs for the create simulation job batch component."""
|
||||
|
||||
arn: Output
|
||||
batch_job_id: Output
|
||||
|
||||
|
||||
class RoboMakerSimulationJobBatchSpec(
|
||||
SageMakerComponentSpec[
|
||||
RoboMakerSimulationJobBatchInputs, RoboMakerSimulationJobBatchOutputs
|
||||
]
|
||||
):
|
||||
INPUTS: RoboMakerSimulationJobBatchInputs = RoboMakerSimulationJobBatchInputs(
|
||||
role=InputValidator(
|
||||
input_type=str,
|
||||
required=True,
|
||||
description="The Amazon Resource Name (ARN) that Amazon RoboMaker assumes to perform tasks on your behalf.",
|
||||
),
|
||||
timeout_in_secs=InputValidator(
|
||||
input_type=int,
|
||||
required=False,
|
||||
description="The amount of time, in seconds, to wait for the batch to complete.",
|
||||
default=0,
|
||||
),
|
||||
max_concurrency=InputValidator(
|
||||
input_type=int,
|
||||
required=False,
|
||||
description="The number of active simulation jobs create as part of the batch that can be in an active state at the same time.",
|
||||
default=0,
|
||||
),
|
||||
simulation_job_requests=InputValidator(
|
||||
input_type=SpecInputParsers.yaml_or_json_list,
|
||||
required=True,
|
||||
description="A list of simulation job requests to create in the batch.",
|
||||
default=[],
|
||||
),
|
||||
sim_app_arn=InputValidator(
|
||||
input_type=str,
|
||||
required=False,
|
||||
description="The application ARN for the simulation application.",
|
||||
default="",
|
||||
),
|
||||
**vars(COMMON_INPUTS),
|
||||
)
|
||||
|
||||
OUTPUTS = RoboMakerSimulationJobBatchOutputs(
|
||||
arn=OutputValidator(
|
||||
description="The Amazon Resource Name (ARN) of the simulation job."
|
||||
),
|
||||
batch_job_id=OutputValidator(description="The simulation job batch id."),
|
||||
)
|
||||
|
||||
def __init__(self, arguments: List[str]):
|
||||
super().__init__(
|
||||
arguments,
|
||||
RoboMakerSimulationJobBatchInputs,
|
||||
RoboMakerSimulationJobBatchOutputs,
|
||||
)
|
||||
|
||||
@property
|
||||
def inputs(self) -> RoboMakerSimulationJobBatchInputs:
|
||||
return self._inputs
|
||||
|
||||
@property
|
||||
def outputs(self) -> RoboMakerSimulationJobBatchOutputs:
|
||||
return self._outputs
|
||||
|
||||
@property
|
||||
def output_paths(self) -> RoboMakerSimulationJobBatchOutputs:
|
||||
return self._output_paths
|
||||
|
|
@ -6,6 +6,7 @@
|
|||
REGION=us-east-1
|
||||
|
||||
SAGEMAKER_EXECUTION_ROLE_ARN=arn:aws:iam::123456789012:role/service-role/AmazonSageMaker-ExecutionRole-Example
|
||||
ROBOMAKER_EXECUTION_ROLE_ARN=arn:aws:iam::123456789012:role/service-role/AmazonRoboMaker-ExecutionRole-Example
|
||||
S3_DATA_BUCKET=my-data-bucket
|
||||
|
||||
# If you hope to use an existing EKS cluster, rather than creating a new one.
|
||||
|
|
|
|||
|
|
@ -34,6 +34,7 @@ ENV PATH "/opt/conda/envs/kfp_test_env/bin":$PATH
|
|||
# Environment variables to be used by tests
|
||||
ENV REGION="us-west-2"
|
||||
ENV SAGEMAKER_EXECUTION_ROLE_ARN="arn:aws:iam::1234567890:role/sagemaker-role"
|
||||
ENV ROBOMAKER_EXECUTION_ROLE_ARN="arn:aws:iam::1234567890:role/robomaker-role"
|
||||
ENV S3_DATA_BUCKET="kfp-test-data"
|
||||
ENV MINIO_LOCAL_PORT=9000
|
||||
ENV KFP_NAMESPACE="kubeflow"
|
||||
|
|
@ -41,4 +42,6 @@ ENV KFP_NAMESPACE="kubeflow"
|
|||
RUN mkdir pipelines
|
||||
COPY ./ ./pipelines/
|
||||
|
||||
ENTRYPOINT [ "/bin/bash", "./pipelines/components/aws/sagemaker/tests/integration_tests/scripts/run_integration_tests" ]
|
||||
WORKDIR /pipelines/components/aws/sagemaker/tests/integration_tests/scripts/
|
||||
|
||||
ENTRYPOINT [ "/bin/bash", "./run_integration_tests" ]
|
||||
|
|
@ -1,16 +1,32 @@
|
|||
## Requirements
|
||||
1. [Docker](https://www.docker.com/)
|
||||
1. [IAM Role](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) with a SageMakerFullAccess and AmazonS3FullAccess
|
||||
1. IAM User credentials with SageMakerFullAccess, AWSCloudFormationFullAccess, IAMFullAccess, AmazonEC2FullAccess, AmazonS3FullAccess permissions
|
||||
1. [IAM Role](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) with a SageMakerFullAccess, RoboMakerFullAccess and AmazonS3FullAccess
|
||||
1. IAM User credentials with SageMakerFullAccess, RoboMakerFullAccess, AWSCloudFormationFullAccess, IAMFullAccess, AmazonEC2FullAccess, AmazonS3FullAccess permissions
|
||||
2. The SageMaker WorkTeam and GroundTruth Component tests expect that at least one private workteam already exists in the region where you are running these tests.
|
||||
|
||||
|
||||
## Creating S3 buckets with datasets
|
||||
|
||||
1. In the following Python script, change the bucket name and run the [`s3_sample_data_creator.py`](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/mnist-kmeans-sagemaker#the-sample-dataset) to create an S3 bucket with the sample mnist dataset in the region where you want to run the tests.
|
||||
2. To prepare the dataset for the SageMaker GroundTruth Component test, follow the steps in the `[GroundTruth Sample README](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/ground_truth_pipeline_demo#prep-the-dataset-label-categories-and-ui-template)`.
|
||||
2. To prepare the dataset for the SageMaker GroundTruth Component test, follow the steps in the [GroundTruth Sample README](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/ground_truth_pipeline_demo#prep-the-dataset-label-categories-and-ui-template).
|
||||
3. To prepare the processing script for the SageMaker Processing Component tests, upload the `scripts/kmeans_preprocessing.py` script to your bucket. This can be done by replacing `<my-bucket>` with your bucket name and running `aws s3 cp scripts/kmeans_preprocessing.py s3://<my-bucket>/mnist_kmeans_example/processing_code/kmeans_preprocessing.py`
|
||||
|
||||
4. Prepare RoboMaker Simulation App sources and Robot App sources and place them in the data bucket under the `/robomaker` key. The easiest way to create the files you need is to copy them from the public buckets that are used to store the [RoboMaker Hello World](https://console.aws.amazon.com/robomaker/home?region=us-east-1#sampleSimulationJobs) demos:
|
||||
```bash
|
||||
aws s3 cp s3://aws-robomaker-samples-us-east-1-1fd12c306611/hello-world/melodic/gazebo9/1.4.0.62/1.2.0/simulation_ws.tar .
|
||||
aws s3 cp ./simulation_ws.tar s3://<your_bucket_name>/robomaker/simulation_ws.tar
|
||||
aws s3 cp s3://aws-robomaker-samples-us-east-1-1fd12c306611/hello-world/melodic/gazebo9/1.4.0.62/1.2.0/robot_ws.tar .
|
||||
aws s3 cp ./robot_ws.tar s3://<your_bucket_name>/robomaker/robot_ws.tar
|
||||
```
|
||||
The files in the `/robomaker` directory on S3 should follow this pattern:
|
||||
```
|
||||
/robomaker/simulation_ws.tar
|
||||
/robomaker/robot_ws.tar
|
||||
```
|
||||
5. Prepare RLEstimator sources and place them in the data bucket under the `/rlestimator` key. The easiest way to create the files you need is to follow the notebooks outlined in the [RLEstimator Samples README](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/rlestimator_pipeline/README.md).
|
||||
The files in the `/rlestimator` directory on S3 should follow this pattern:
|
||||
```
|
||||
/rlestimator/sourcedir.tar.gz
|
||||
```
|
||||
|
||||
## Step to run integration tests
|
||||
1. Copy the `.env.example` file to `.env` and in the following steps modify the fields of this new file:
|
||||
|
|
@ -21,5 +37,5 @@
|
|||
1. Build the image by doing the following:
|
||||
1. Navigate to the root of this github directory.
|
||||
1. Run `docker build . -f components/aws/sagemaker/tests/integration_tests/Dockerfile -t amazon/integration_test`
|
||||
1. Run the image, injecting your environment variable files:
|
||||
1. Run `docker run --env-file components/aws/sagemaker/tests/integration_tests/.env amazon/integration_test`
|
||||
1. Run the image, injecting your environment variable files and mounting the repo files into the container:
|
||||
1. Run `docker run -v <path_to_this_repo_on_your_machine>:/pipelines --env-file components/aws/sagemaker/tests/integration_tests/.env amazon/integration_test`
|
||||
|
|
@ -0,0 +1,127 @@
|
|||
import random
|
||||
import string
|
||||
|
||||
import pytest
|
||||
import os
|
||||
import utils
|
||||
from utils import kfp_client_utils
|
||||
from utils import minio_utils
|
||||
from utils import sagemaker_utils
|
||||
from utils import argo_utils
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"test_file_dir",
|
||||
[
|
||||
pytest.param(
|
||||
"resources/config/rlestimator-training", marks=pytest.mark.canary_test
|
||||
),
|
||||
],
|
||||
)
|
||||
def test_trainingjob(
|
||||
kfp_client, experiment_id, region, sagemaker_client, test_file_dir
|
||||
):
|
||||
|
||||
download_dir = utils.mkdir(os.path.join(test_file_dir + "/generated"))
|
||||
test_params = utils.load_params(
|
||||
utils.replace_placeholders(
|
||||
os.path.join(test_file_dir, "config.yaml"),
|
||||
os.path.join(download_dir, "config.yaml"),
|
||||
)
|
||||
)
|
||||
|
||||
test_params["Arguments"]["job_name"] = input_job_name = (
|
||||
utils.generate_random_string(5) + "-" + test_params["Arguments"]["job_name"]
|
||||
)
|
||||
print(f"running test with job_name: {input_job_name}")
|
||||
|
||||
_, _, workflow_json = kfp_client_utils.compile_run_monitor_pipeline(
|
||||
kfp_client,
|
||||
experiment_id,
|
||||
test_params["PipelineDefinition"],
|
||||
test_params["Arguments"],
|
||||
download_dir,
|
||||
test_params["TestName"],
|
||||
test_params["Timeout"],
|
||||
)
|
||||
|
||||
outputs = {
|
||||
"sagemaker-rlestimator-training-job": [
|
||||
"job_name",
|
||||
"model_artifact_url",
|
||||
"training_image",
|
||||
]
|
||||
}
|
||||
output_files = minio_utils.artifact_download_iterator(
|
||||
workflow_json, outputs, download_dir
|
||||
)
|
||||
|
||||
# Verify Training job was successful on SageMaker
|
||||
training_job_name = utils.read_from_file_in_tar(
|
||||
output_files["sagemaker-rlestimator-training-job"]["job_name"]
|
||||
)
|
||||
print(f"training job name: {training_job_name}")
|
||||
train_response = sagemaker_utils.describe_training_job(
|
||||
sagemaker_client, training_job_name
|
||||
)
|
||||
assert train_response["TrainingJobStatus"] == "Stopped"
|
||||
|
||||
# Verify model artifacts output was generated from this run
|
||||
model_artifact_url = utils.read_from_file_in_tar(
|
||||
output_files["sagemaker-rlestimator-training-job"]["model_artifact_url"]
|
||||
)
|
||||
print(f"model_artifact_url: {model_artifact_url}")
|
||||
assert model_artifact_url == train_response["ModelArtifacts"]["S3ModelArtifacts"]
|
||||
assert training_job_name in model_artifact_url
|
||||
|
||||
# Verify training image output is an ECR image
|
||||
training_image = utils.read_from_file_in_tar(
|
||||
output_files["sagemaker-rlestimator-training-job"]["training_image"]
|
||||
)
|
||||
print(f"Training image used: {training_image}")
|
||||
if "ExpectedTrainingImage" in test_params.keys():
|
||||
assert test_params["ExpectedTrainingImage"] == training_image
|
||||
else:
|
||||
assert f"dkr.ecr.{region}.amazonaws.com" in training_image
|
||||
|
||||
assert not argo_utils.error_in_cw_logs(
|
||||
workflow_json["metadata"]["name"]
|
||||
), "Found the CloudWatch error message in the log output. Check SageMaker to see if the job has failed."
|
||||
|
||||
utils.remove_dir(download_dir)
|
||||
|
||||
|
||||
def test_terminate_trainingjob(kfp_client, experiment_id, sagemaker_client):
|
||||
test_file_dir = "resources/config/rlestimator-training"
|
||||
download_dir = utils.mkdir(
|
||||
os.path.join(test_file_dir + "/generated_test_terminate")
|
||||
)
|
||||
test_params = utils.load_params(
|
||||
utils.replace_placeholders(
|
||||
os.path.join(test_file_dir, "config.yaml"),
|
||||
os.path.join(download_dir, "config.yaml"),
|
||||
)
|
||||
)
|
||||
|
||||
input_job_name = test_params["Arguments"]["job_name"] = (
|
||||
"".join(random.choice(string.ascii_lowercase) for i in range(10))
|
||||
+ "-terminate-job"
|
||||
)
|
||||
|
||||
run_id, _, workflow_json = kfp_client_utils.compile_run_monitor_pipeline(
|
||||
kfp_client,
|
||||
experiment_id,
|
||||
test_params["PipelineDefinition"],
|
||||
test_params["Arguments"],
|
||||
download_dir,
|
||||
test_params["TestName"],
|
||||
60,
|
||||
"running",
|
||||
)
|
||||
print(f"Terminating run: {run_id} where Training job_name: {input_job_name}")
|
||||
kfp_client_utils.terminate_run(kfp_client, run_id)
|
||||
|
||||
response = sagemaker_utils.describe_training_job(sagemaker_client, input_job_name)
|
||||
assert response["TrainingJobStatus"] in ["Stopping", "Stopped"]
|
||||
|
||||
utils.remove_dir(download_dir)
|
||||
|
|
@ -0,0 +1,219 @@
|
|||
import pytest
|
||||
import os
|
||||
import utils
|
||||
|
||||
from utils import kfp_client_utils
|
||||
from utils import minio_utils
|
||||
from utils import robomaker_utils
|
||||
from utils import get_s3_data_bucket
|
||||
|
||||
|
||||
def create_simulation_app(kfp_client, experiment_id, creat_app_dir, app_name):
|
||||
download_dir = utils.mkdir(os.path.join(creat_app_dir + "/generated"))
|
||||
test_params = utils.load_params(
|
||||
utils.replace_placeholders(
|
||||
os.path.join(creat_app_dir, "config.yaml"),
|
||||
os.path.join(download_dir, "config.yaml"),
|
||||
)
|
||||
)
|
||||
|
||||
# Generate random prefix for sim app name
|
||||
sim_app_name = test_params["Arguments"]["app_name"] = (
|
||||
utils.generate_random_string(5) + "-" + app_name
|
||||
)
|
||||
|
||||
_, _, workflow_json = kfp_client_utils.compile_run_monitor_pipeline(
|
||||
kfp_client,
|
||||
experiment_id,
|
||||
test_params["PipelineDefinition"],
|
||||
test_params["Arguments"],
|
||||
download_dir,
|
||||
test_params["TestName"],
|
||||
test_params["Timeout"],
|
||||
)
|
||||
|
||||
return workflow_json, sim_app_name
|
||||
|
||||
|
||||
def create_robot_app(client):
|
||||
robomaker_sources = [
|
||||
{
|
||||
"s3Bucket": get_s3_data_bucket(),
|
||||
"s3Key": "robomaker/robot_ws.tar",
|
||||
"architecture": "X86_64",
|
||||
}
|
||||
]
|
||||
robomaker_suite = {"name": "ROS", "version": "Melodic"}
|
||||
app_name = utils.generate_random_string(5) + "-test-robot-app"
|
||||
|
||||
response = robomaker_utils.create_robot_application(
|
||||
client, app_name, robomaker_sources, robomaker_suite
|
||||
)
|
||||
return response["arn"]
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"test_file_dir", ["resources/config/robomaker-create-simulation-app"],
|
||||
)
|
||||
def test_create_simulation_app(
|
||||
kfp_client, experiment_id, robomaker_client, test_file_dir
|
||||
):
|
||||
|
||||
download_dir = utils.mkdir(os.path.join(test_file_dir + "/generated"))
|
||||
test_params = utils.load_params(
|
||||
utils.replace_placeholders(
|
||||
os.path.join(test_file_dir, "config.yaml"),
|
||||
os.path.join(download_dir, "config.yaml"),
|
||||
)
|
||||
)
|
||||
|
||||
# Create simulation app with random name
|
||||
workflow_json, sim_app_name = create_simulation_app(
|
||||
kfp_client, experiment_id, test_file_dir, test_params["Arguments"]["app_name"]
|
||||
)
|
||||
|
||||
try:
|
||||
print(f"running test with simulation application name: {sim_app_name}")
|
||||
|
||||
outputs = {"robomaker-create-simulation-application": ["arn"]}
|
||||
|
||||
output_files = minio_utils.artifact_download_iterator(
|
||||
workflow_json, outputs, download_dir
|
||||
)
|
||||
|
||||
sim_app_arn = utils.read_from_file_in_tar(
|
||||
output_files["robomaker-create-simulation-application"]["arn"]
|
||||
)
|
||||
print(f"Simulation Application arn: {sim_app_arn}")
|
||||
|
||||
# Verify simulation application exists
|
||||
assert (
|
||||
robomaker_utils.describe_simulation_application(
|
||||
robomaker_client, sim_app_arn
|
||||
)["name"]
|
||||
== sim_app_name
|
||||
)
|
||||
|
||||
finally:
|
||||
robomaker_utils.delete_simulation_application(robomaker_client, sim_app_arn)
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"test_file_dir", ["resources/config/robomaker-delete-simulation-app"],
|
||||
)
|
||||
def test_delete_simulation_app(
|
||||
kfp_client, experiment_id, robomaker_client, test_file_dir
|
||||
):
|
||||
|
||||
download_dir = utils.mkdir(os.path.join(test_file_dir + "/generated"))
|
||||
test_params = utils.load_params(
|
||||
utils.replace_placeholders(
|
||||
os.path.join(test_file_dir, "config.yaml"),
|
||||
os.path.join(download_dir, "config.yaml"),
|
||||
)
|
||||
)
|
||||
|
||||
# Create simulation app with random name
|
||||
workflow_json, sim_app_name = create_simulation_app(
|
||||
kfp_client,
|
||||
experiment_id,
|
||||
"resources/config/robomaker-create-simulation-app",
|
||||
"fake-app-name",
|
||||
)
|
||||
|
||||
print(f"running test with simulation application name: {sim_app_name}")
|
||||
|
||||
create_outputs = {"robomaker-create-simulation-application": ["arn"]}
|
||||
|
||||
create_output_files = minio_utils.artifact_download_iterator(
|
||||
workflow_json, create_outputs, download_dir
|
||||
)
|
||||
|
||||
sim_app_arn = utils.read_from_file_in_tar(
|
||||
create_output_files["robomaker-create-simulation-application"]["arn"]
|
||||
)
|
||||
print(f"Simulation Application arn: {sim_app_arn}")
|
||||
|
||||
# Here we perform the delete
|
||||
test_params["Arguments"]["arn"] = sim_app_arn
|
||||
_, _, workflow_json = kfp_client_utils.compile_run_monitor_pipeline(
|
||||
kfp_client,
|
||||
experiment_id,
|
||||
test_params["PipelineDefinition"],
|
||||
test_params["Arguments"],
|
||||
download_dir,
|
||||
test_params["TestName"],
|
||||
test_params["Timeout"],
|
||||
)
|
||||
|
||||
# Verify simulation application does not exist
|
||||
simulation_applications = robomaker_utils.list_simulation_applications(
|
||||
robomaker_client, sim_app_name
|
||||
)
|
||||
assert len(simulation_applications["simulationApplicationSummaries"]) == 0
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"test_file_dir", ["resources/config/robomaker-simulation-job"],
|
||||
)
|
||||
def test_run_simulation_job(kfp_client, experiment_id, robomaker_client, test_file_dir):
|
||||
|
||||
download_dir = utils.mkdir(os.path.join(test_file_dir + "/generated"))
|
||||
test_params = utils.load_params(
|
||||
utils.replace_placeholders(
|
||||
os.path.join(test_file_dir, "config.yaml"),
|
||||
os.path.join(download_dir, "config.yaml"),
|
||||
)
|
||||
)
|
||||
|
||||
# Create simulation app with random name
|
||||
sim_app_workflow_json, sim_app_name = create_simulation_app(
|
||||
kfp_client,
|
||||
experiment_id,
|
||||
"resources/config/robomaker-create-simulation-app",
|
||||
"random-app-name",
|
||||
)
|
||||
|
||||
print(f"running test with simulation application name: {sim_app_name}")
|
||||
|
||||
sim_app_outputs = {"robomaker-create-simulation-application": ["arn"]}
|
||||
|
||||
sim_app_output_files = minio_utils.artifact_download_iterator(
|
||||
sim_app_workflow_json, sim_app_outputs, download_dir
|
||||
)
|
||||
|
||||
sim_app_arn = utils.read_from_file_in_tar(
|
||||
sim_app_output_files["robomaker-create-simulation-application"]["arn"]
|
||||
)
|
||||
print(f"Simulation Application arn: {sim_app_arn}")
|
||||
|
||||
# Create Robot App by invoking api directly
|
||||
robot_app_arn = create_robot_app(robomaker_client)
|
||||
|
||||
# Here we run the simulation job
|
||||
test_params["Arguments"]["sim_app_arn"] = sim_app_arn
|
||||
test_params["Arguments"]["robot_app_arn"] = robot_app_arn
|
||||
|
||||
_, _, sim_job_workflow_json = kfp_client_utils.compile_run_monitor_pipeline(
|
||||
kfp_client,
|
||||
experiment_id,
|
||||
test_params["PipelineDefinition"],
|
||||
test_params["Arguments"],
|
||||
download_dir,
|
||||
test_params["TestName"],
|
||||
test_params["Timeout"],
|
||||
)
|
||||
|
||||
sim_job_outputs = {"robomaker-create-simulation-job": ["arn"]}
|
||||
sim_job_output_files = minio_utils.artifact_download_iterator(
|
||||
sim_job_workflow_json, sim_job_outputs, download_dir
|
||||
)
|
||||
sim_job_arn = utils.read_from_file_in_tar(
|
||||
sim_job_output_files["robomaker-create-simulation-job"]["arn"]
|
||||
)
|
||||
print(f"Simulation Job arn: {sim_job_arn}")
|
||||
|
||||
# Verify simulation job ran successfully
|
||||
assert robomaker_utils.describe_simulation_job(robomaker_client, sim_job_arn)[
|
||||
"status"
|
||||
] not in ["Failed", "RunningFailed"]
|
||||
|
|
@ -16,7 +16,10 @@ def pytest_addoption(parser):
|
|||
help="AWS region where test will run",
|
||||
)
|
||||
parser.addoption(
|
||||
"--role-arn", required=True, help="SageMaker execution IAM role ARN",
|
||||
"--sagemaker-role-arn", required=True, help="SageMaker execution IAM role ARN",
|
||||
)
|
||||
parser.addoption(
|
||||
"--robomaker-role-arn", required=True, help="RoboMaker execution IAM role ARN",
|
||||
)
|
||||
parser.addoption(
|
||||
"--assume-role-arn",
|
||||
|
|
@ -73,9 +76,15 @@ def assume_role_arn(request):
|
|||
|
||||
|
||||
@pytest.fixture(scope="session", autouse=True)
|
||||
def role_arn(request):
|
||||
os.environ["ROLE_ARN"] = request.config.getoption("--role-arn")
|
||||
return request.config.getoption("--role-arn")
|
||||
def sagemaker_role_arn(request):
|
||||
os.environ["SAGEMAKER_ROLE_ARN"] = request.config.getoption("--sagemaker-role-arn")
|
||||
return request.config.getoption("--sagemaker-role-arn")
|
||||
|
||||
|
||||
@pytest.fixture(scope="session", autouse=True)
|
||||
def robomaker_role_arn(request):
|
||||
os.environ["ROBOMAKER_ROLE_ARN"] = request.config.getoption("--robomaker-role-arn")
|
||||
return request.config.getoption("--robomaker-role-arn")
|
||||
|
||||
|
||||
@pytest.fixture(scope="session", autouse=True)
|
||||
|
|
@ -124,6 +133,11 @@ def sagemaker_client(boto3_session):
|
|||
return boto3_session.client(service_name="sagemaker")
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def robomaker_client(boto3_session):
|
||||
return boto3_session.client(service_name="robomaker")
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def s3_client(boto3_session):
|
||||
return boto3_session.client(service_name="s3")
|
||||
|
|
|
|||
|
|
@ -44,4 +44,4 @@ Arguments:
|
|||
instance_count: 1
|
||||
volume_size: 50
|
||||
max_run_time: 1800
|
||||
role: ((ROLE_ARN))
|
||||
role: ((SAGEMAKER_ROLE_ARN))
|
||||
|
|
@ -30,4 +30,4 @@ Arguments:
|
|||
spot_instance: "False"
|
||||
max_wait_time: 3600
|
||||
checkpoint_config: "{}"
|
||||
role: ((ROLE_ARN))
|
||||
role: ((SAGEMAKER_ROLE_ARN))
|
||||
|
|
|
|||
|
|
@ -33,4 +33,4 @@ Arguments:
|
|||
spot_instance: "False"
|
||||
max_wait_time: 3600
|
||||
checkpoint_config: "{}"
|
||||
role: ((ROLE_ARN))
|
||||
role: ((SAGEMAKER_ROLE_ARN))
|
||||
|
|
|
|||
|
|
@ -4,7 +4,7 @@ Timeout: 300
|
|||
StatusToCheck: 'running'
|
||||
Arguments:
|
||||
region: ((REGION))
|
||||
role: ((ROLE_ARN))
|
||||
role: ((SAGEMAKER_ROLE_ARN))
|
||||
ground_truth_train_job_name: 'image-labeling'
|
||||
ground_truth_label_attribute_name: 'category'
|
||||
ground_truth_train_manifest_location: 's3://((DATA_BUCKET))/mini-image-classification/ground-truth-demo/train.manifest'
|
||||
|
|
|
|||
|
|
@ -43,4 +43,4 @@ Arguments:
|
|||
instance_count: 1
|
||||
volume_size: 50
|
||||
max_run_time: 1800
|
||||
role: ((ROLE_ARN))
|
||||
role: ((SAGEMAKER_ROLE_ARN))
|
||||
|
|
@ -11,7 +11,7 @@ Arguments:
|
|||
instance_type: ml.m4.xlarge
|
||||
instance_count: 1
|
||||
network_isolation: "True"
|
||||
role: ((ROLE_ARN))
|
||||
role: ((SAGEMAKER_ROLE_ARN))
|
||||
data_input: s3://((DATA_BUCKET))/mnist_kmeans_example/input
|
||||
data_type: S3Prefix
|
||||
content_type: text/csv
|
||||
|
|
|
|||
|
|
@ -17,5 +17,5 @@ Arguments:
|
|||
instance_type_1: ml.m4.xlarge
|
||||
initial_instance_count_1: 1
|
||||
network_isolation: "True"
|
||||
role: ((ROLE_ARN))
|
||||
role: ((SAGEMAKER_ROLE_ARN))
|
||||
|
||||
|
|
@ -53,4 +53,4 @@ Arguments:
|
|||
output_location: s3://((DATA_BUCKET))/mnist_kmeans_example/output
|
||||
network_isolation: "True"
|
||||
max_wait_time: 3600
|
||||
role: ((ROLE_ARN))
|
||||
role: ((SAGEMAKER_ROLE_ARN))
|
||||
|
|
|
|||
|
|
@ -7,5 +7,5 @@ Arguments:
|
|||
image: ((KMEANS_REGISTRY)).dkr.ecr.((REGION)).amazonaws.com/kmeans:1
|
||||
model_artifact_url: s3://((DATA_BUCKET))/mnist_kmeans_example/model/kmeans-mnist-model/model.tar.gz
|
||||
network_isolation: "True"
|
||||
role: ((ROLE_ARN))
|
||||
role: ((SAGEMAKER_ROLE_ARN))
|
||||
|
||||
|
|
@ -19,6 +19,6 @@ Arguments:
|
|||
instance_type_2: ml.m5.xlarge
|
||||
initial_instance_count_1: 1
|
||||
network_isolation: "True"
|
||||
role: ((ROLE_ARN))
|
||||
role: ((SAGEMAKER_ROLE_ARN))
|
||||
update_endpoint: "True"
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,19 @@
|
|||
PipelineDefinition: resources/definition/rlestimator_training_job_pipeline.py
|
||||
TestName: rlestimator-pipeline-training
|
||||
Timeout: 900
|
||||
ExpectedTrainingImage: 462105765813.dkr.ecr.((REGION)).amazonaws.com/sagemaker-rl-ray-container:ray-0.8.5-tf-cpu-py36
|
||||
Arguments:
|
||||
region: ((REGION))
|
||||
role: ((SAGEMAKER_ROLE_ARN))
|
||||
entry_point: "train_news_vendor.py"
|
||||
metric_definitions: "[]"
|
||||
hyperparameters: "{}"
|
||||
source_dir: s3://((DATA_BUCKET))/rlestimator/sourcedir.tar.gz
|
||||
max_run: 300
|
||||
job_name: rlestimator-test
|
||||
model_artifact_path: s3://((DATA_BUCKET))/rlestimator/output/
|
||||
instance_count: 1
|
||||
instance_type: ml.c5.2xlarge
|
||||
framework: tensorflow
|
||||
toolkit: ray
|
||||
toolkit_version: "0.8.5"
|
||||
|
|
@ -0,0 +1,16 @@
|
|||
PipelineDefinition: resources/definition/robomaker_create_simulation_app_pipeline.py
|
||||
TestName: robomaker-create-simulation-app-test
|
||||
Timeout: 300
|
||||
Arguments:
|
||||
region: ((REGION))
|
||||
app_name: robomaker-create-simulation-app-test
|
||||
sources:
|
||||
- s3Bucket: ((DATA_BUCKET))
|
||||
s3Key: "robomaker/simulation_ws.tar"
|
||||
architecture: "X86_64"
|
||||
simulation_software_name: "Gazebo"
|
||||
simulation_software_version: "9"
|
||||
robot_software_name: "ROS"
|
||||
robot_software_version: "Melodic"
|
||||
rendering_engine_name: "OGRE"
|
||||
rendering_engine_version: "1.x"
|
||||
|
|
@ -0,0 +1,6 @@
|
|||
PipelineDefinition: resources/definition/robomaker_delete_simulation_app_pipeline.py
|
||||
TestName: robomaker-delete-simulation-app-test
|
||||
Timeout: 300
|
||||
Arguments:
|
||||
region: ((REGION))
|
||||
arn: ""
|
||||
|
|
@ -0,0 +1,22 @@
|
|||
PipelineDefinition: resources/definition/robomaker_simulation_job_pipeline.py
|
||||
TestName: robomaker-run-simulation-job-test
|
||||
Timeout: 600
|
||||
Arguments:
|
||||
region: ((REGION))
|
||||
output_bucket: ((DATA_BUCKET))
|
||||
output_path: "robomaker-output-key"
|
||||
max_run: 300
|
||||
failure_behavior: "Fail"
|
||||
sim_app_arn:
|
||||
sim_app_launch_config:
|
||||
packageName: "hello_world_simulation"
|
||||
launchFile: "empty_world.launch"
|
||||
environmentVariables:
|
||||
TURTLEBOT3_MODEL: "waffle_pi"
|
||||
robot_app_arn:
|
||||
robot_app_launch_config:
|
||||
packageName: "hello_world_robot"
|
||||
launchFile: "rotate.launch"
|
||||
environmentVariables:
|
||||
TURTLEBOT3_MODEL: "waffle_pi"
|
||||
role: ((ROBOMAKER_ROLE_ARN))
|
||||
|
|
@ -29,4 +29,4 @@ Arguments:
|
|||
spot_instance: "False"
|
||||
max_wait_time: 3600
|
||||
checkpoint_config: "{}"
|
||||
role: ((ROLE_ARN))
|
||||
role: ((SAGEMAKER_ROLE_ARN))
|
||||
|
|
|
|||
|
|
@ -22,4 +22,4 @@ Arguments:
|
|||
checkpoint_config:
|
||||
S3Uri: s3://((DATA_BUCKET))/mnist_kmeans_example/train-checkpoints
|
||||
model_artifact_path: s3://((DATA_BUCKET))/mnist_kmeans_example/output
|
||||
role: ((ROLE_ARN))
|
||||
role: ((SAGEMAKER_ROLE_ARN))
|
||||
|
|
|
|||
|
|
@ -4,4 +4,4 @@ Timeout: 3600
|
|||
ExpectedTrainingImage: 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:0.90-2-cpu-py3
|
||||
Arguments:
|
||||
bucket_name: ((DATA_BUCKET))
|
||||
role_arn: ((ROLE_ARN))
|
||||
role_arn: ((SAGEMAKER_ROLE_ARN))
|
||||
|
|
|
|||
|
|
@ -0,0 +1,51 @@
|
|||
import kfp
|
||||
from kfp import components
|
||||
from kfp import dsl
|
||||
|
||||
rlestimator_training_job_op = components.load_component_from_file(
|
||||
"../../rlestimator/component.yaml"
|
||||
)
|
||||
|
||||
|
||||
@dsl.pipeline(
|
||||
name="RLEstimator Toolkit & Framework Pipeline test",
|
||||
description="RLEstimator training job test where the AWS Docker image is auto-selected based on the Toolkit and Framework we define",
|
||||
)
|
||||
def rlestimator_training_toolkit_pipeline_test(
|
||||
region="",
|
||||
entry_point="",
|
||||
source_dir="",
|
||||
toolkit="",
|
||||
toolkit_version="",
|
||||
framework="",
|
||||
role="",
|
||||
instance_type="",
|
||||
instance_count="",
|
||||
model_artifact_path="",
|
||||
job_name="",
|
||||
metric_definitions="",
|
||||
max_run="",
|
||||
hyperparameters="",
|
||||
):
|
||||
rlestimator_training_job_op(
|
||||
region=region,
|
||||
entry_point=entry_point,
|
||||
source_dir=source_dir,
|
||||
toolkit=toolkit,
|
||||
toolkit_version=toolkit_version,
|
||||
framework=framework,
|
||||
role=role,
|
||||
instance_type=instance_type,
|
||||
instance_count=instance_count,
|
||||
model_artifact_path=model_artifact_path,
|
||||
job_name=job_name,
|
||||
metric_definitions=metric_definitions,
|
||||
max_run=max_run,
|
||||
hyperparameters=hyperparameters,
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
kfp.compiler.Compiler().compile(
|
||||
rlestimator_training_toolkit_pipeline_test, __file__ + ".zip"
|
||||
)
|
||||
|
|
@ -0,0 +1,42 @@
|
|||
import kfp
|
||||
from kfp import components
|
||||
from kfp import dsl
|
||||
|
||||
robomaker_create_sim_app_op = components.load_component_from_file(
|
||||
"../../create_simulation_app/component.yaml"
|
||||
)
|
||||
|
||||
|
||||
@dsl.pipeline(
|
||||
name="RoboMaker Create Simulation App",
|
||||
description="RoboMaker Create Simulation App test pipeline",
|
||||
)
|
||||
def robomaker_create_simulation_app_test(
|
||||
region="",
|
||||
app_name="",
|
||||
sources="",
|
||||
simulation_software_name="",
|
||||
simulation_software_version="",
|
||||
robot_software_name="",
|
||||
robot_software_version="",
|
||||
rendering_engine_name="",
|
||||
rendering_engine_version="",
|
||||
):
|
||||
|
||||
robomaker_create_sim_app_op(
|
||||
region=region,
|
||||
app_name=app_name,
|
||||
sources=sources,
|
||||
simulation_software_name=simulation_software_name,
|
||||
simulation_software_version=simulation_software_version,
|
||||
robot_software_name=robot_software_name,
|
||||
robot_software_version=robot_software_version,
|
||||
rendering_engine_name=rendering_engine_name,
|
||||
rendering_engine_version=rendering_engine_version,
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
kfp.compiler.Compiler().compile(
|
||||
robomaker_create_simulation_app_test, __file__ + ".yaml"
|
||||
)
|
||||
|
|
@ -0,0 +1,26 @@
|
|||
import kfp
|
||||
from kfp import components
|
||||
from kfp import dsl
|
||||
|
||||
robomaker_delete_sim_app_op = components.load_component_from_file(
|
||||
"../../delete_simulation_app/component.yaml"
|
||||
)
|
||||
|
||||
|
||||
@dsl.pipeline(
|
||||
name="RoboMaker Delete Simulation App",
|
||||
description="RoboMaker Delete Simulation App test pipeline",
|
||||
)
|
||||
def robomaker_delete_simulation_app_test(
|
||||
region="", arn="",
|
||||
):
|
||||
|
||||
robomaker_delete_sim_app_op(
|
||||
region=region, arn=arn,
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
kfp.compiler.Compiler().compile(
|
||||
robomaker_delete_simulation_app_test, __file__ + ".yaml"
|
||||
)
|
||||
|
|
@ -0,0 +1,42 @@
|
|||
import kfp
|
||||
from kfp import components
|
||||
from kfp import dsl
|
||||
|
||||
robomaker_sim_job_op = components.load_component_from_file(
|
||||
"../../simulation_job/component.yaml"
|
||||
)
|
||||
|
||||
|
||||
@dsl.pipeline(
|
||||
name="Run RoboMaker Simulation Job",
|
||||
description="RoboMaker Simulation Job test pipeline",
|
||||
)
|
||||
def robomaker_simulation_job_test(
|
||||
region="",
|
||||
role="",
|
||||
output_bucket="",
|
||||
output_path="",
|
||||
max_run="",
|
||||
failure_behavior="",
|
||||
sim_app_arn="",
|
||||
sim_app_launch_config="",
|
||||
robot_app_arn="",
|
||||
robot_app_launch_config="",
|
||||
):
|
||||
|
||||
robomaker_sim_job_op(
|
||||
region=region,
|
||||
role=role,
|
||||
output_bucket=output_bucket,
|
||||
output_path=output_path,
|
||||
max_run=max_run,
|
||||
failure_behavior=failure_behavior,
|
||||
sim_app_arn=sim_app_arn,
|
||||
sim_app_launch_config=sim_app_launch_config,
|
||||
robot_app_arn=robot_app_arn,
|
||||
robot_app_launch_config=robot_app_launch_config,
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
kfp.compiler.Compiler().compile(robomaker_simulation_job_test, __file__ + ".yaml")
|
||||
|
|
@ -46,6 +46,19 @@ function create_namespaced_iam_role {
|
|||
echo "IAM Role does not exist, creating a new Role for the cluster"
|
||||
aws iam create-role --role-name ${ROLE_NAME} --assume-role-policy-document file://${trust_file_path} --output=text --query "Role.Arn"
|
||||
aws iam attach-role-policy --role-name ${ROLE_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
|
||||
aws iam attach-role-policy --role-name ${ROLE_NAME} --policy-arn arn:aws:iam::aws:policy/AWSRoboMaker_FullAccess
|
||||
|
||||
printf '{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
"Effect": "Allow",
|
||||
"Action": "iam:PassRole",
|
||||
"Resource": "*"
|
||||
}
|
||||
]
|
||||
}' > ${assume_role_file}
|
||||
aws iam put-role-policy --role-name ${ROLE_NAME} --policy-name AllowPassRole --policy-document file://${assume_role_file}
|
||||
|
||||
printf '{
|
||||
"Version": "2012-10-17",
|
||||
|
|
|
|||
|
|
@ -34,6 +34,7 @@ PYTEST_MARKER=${PYTEST_MARKER:-""}
|
|||
S3_DATA_BUCKET=${S3_DATA_BUCKET:-""}
|
||||
SAGEMAKER_EXECUTION_ROLE_ARN=${SAGEMAKER_EXECUTION_ROLE_ARN:-""}
|
||||
ASSUMED_ROLE_NAME=${ASSUMED_ROLE_NAME:-""}
|
||||
ROBOMAKER_EXECUTION_ROLE_ARN=${ROBOMAKER_EXECUTION_ROLE_ARN:-""}
|
||||
|
||||
SKIP_FSX_TESTS=${SKIP_FSX_TESTS:-"false"}
|
||||
|
||||
|
|
@ -120,7 +121,7 @@ function delete_eks() {
|
|||
time_unit=m
|
||||
timeout=15
|
||||
retry_interval=5
|
||||
|
||||
|
||||
loop_counter=$timeout
|
||||
while [ "$loop_counter" -gt "0" ]; do
|
||||
eksctl delete cluster --name "$EKS_CLUSTER_NAME" --region "$REGION" --wait
|
||||
|
|
@ -175,7 +176,9 @@ function install_oidc_role() {
|
|||
function delete_oidc_role() {
|
||||
# Delete the role associated with the cluster thats being deleted
|
||||
aws iam detach-role-policy --role-name "${OIDC_ROLE_NAME}" --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
|
||||
aws iam detach-role-policy --role-name "${OIDC_ROLE_NAME}" --policy-arn arn:aws:iam::aws:policy/AWSRoboMaker_FullAccess
|
||||
aws iam delete-role-policy --role-name "${OIDC_ROLE_NAME}" --policy-name AllowAssumeRole
|
||||
aws iam delete-role-policy --role-name "${OIDC_ROLE_NAME}" --policy-name AllowPassRole
|
||||
aws iam delete-role --role-name "${OIDC_ROLE_NAME}"
|
||||
}
|
||||
|
||||
|
|
@ -202,6 +205,7 @@ function generate_assumed_role() {
|
|||
}' > "${assumed_trust_file}"
|
||||
aws iam create-role --role-name "${ASSUMED_ROLE_NAME}" --assume-role-policy-document file://${assumed_trust_file} --output=text --query "Role.Arn"
|
||||
aws iam attach-role-policy --role-name ${ASSUMED_ROLE_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
|
||||
aws iam attach-role-policy --role-name ${ASSUMED_ROLE_NAME} --policy-arn arn:aws:iam::aws:policy/AWSRoboMaker_FullAccess
|
||||
fi
|
||||
|
||||
# Generate the ARN using the role name
|
||||
|
|
@ -213,6 +217,7 @@ function delete_assumed_role() {
|
|||
if [[ ! -z "${ASSUMED_ROLE_NAME}" && "${CREATED_ASSUMED_ROLE:-false}" == "true" ]]; then
|
||||
# Delete the role associated with the cluster thats being deleted
|
||||
aws iam detach-role-policy --role-name "${ASSUMED_ROLE_NAME}" --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
|
||||
aws iam detach-role-policy --role-name "${ASSUMED_ROLE_NAME}" --policy-arn arn:aws:iam::aws:policy/AWSRoboMaker_FullAccess
|
||||
aws iam delete-role --role-name "${ASSUMED_ROLE_NAME}"
|
||||
fi
|
||||
}
|
||||
|
|
@ -257,9 +262,10 @@ install_kfp
|
|||
[ "${SKIP_KFP_OIDC_SETUP}" == "false" ] && install_oidc_role
|
||||
generate_assumed_role
|
||||
|
||||
pytest_args=( --region "${REGION}" --role-arn "${SAGEMAKER_EXECUTION_ROLE_ARN}" \
|
||||
pytest_args=( --region "${REGION}" --sagemaker-role-arn "${SAGEMAKER_EXECUTION_ROLE_ARN}" \
|
||||
--s3-data-bucket "${S3_DATA_BUCKET}" --kfp-namespace "${KFP_NAMESPACE}" \
|
||||
--minio-service-port "${MINIO_LOCAL_PORT}" --assume-role-arn "${ASSUMED_ROLE_ARN}")
|
||||
--minio-service-port "${MINIO_LOCAL_PORT}" --assume-role-arn "${ASSUMED_ROLE_ARN}" \
|
||||
--robomaker-role-arn "${ROBOMAKER_EXECUTION_ROLE_ARN}")
|
||||
|
||||
if [[ "${SKIP_FSX_TESTS}" == "true" ]]; then
|
||||
pytest_args+=( -m "not fsx_test" )
|
||||
|
|
|
|||
|
|
@ -1,3 +1,4 @@
|
|||
import json
|
||||
import os
|
||||
import subprocess
|
||||
import pytest
|
||||
|
|
@ -14,8 +15,12 @@ def get_region():
|
|||
return os.environ.get("AWS_REGION")
|
||||
|
||||
|
||||
def get_role_arn():
|
||||
return os.environ.get("ROLE_ARN")
|
||||
def get_sagemaker_role_arn():
|
||||
return os.environ.get("SAGEMAKER_ROLE_ARN")
|
||||
|
||||
|
||||
def get_robomaker_role_arn():
|
||||
return os.environ.get("ROBOMAKER_ROLE_ARN")
|
||||
|
||||
|
||||
def get_s3_data_bucket():
|
||||
|
|
@ -82,7 +87,7 @@ def replace_placeholders(input_filename, output_filename):
|
|||
region = get_region()
|
||||
variables_to_replace = {
|
||||
"((REGION))": region,
|
||||
"((ROLE_ARN))": get_role_arn(),
|
||||
"((SAGEMAKER_ROLE_ARN))": get_sagemaker_role_arn(),
|
||||
"((DATA_BUCKET))": get_s3_data_bucket(),
|
||||
"((KMEANS_REGISTRY))": get_algorithm_image_registry("kmeans", region, "1"),
|
||||
"((XGBOOST_REGISTRY))": get_algorithm_image_registry(
|
||||
|
|
@ -93,6 +98,7 @@ def replace_placeholders(input_filename, output_filename):
|
|||
"((FSX_SUBNET))": get_fsx_subnet(),
|
||||
"((FSX_SECURITY_GROUP))": get_fsx_security_group(),
|
||||
"((ASSUME_ROLE_ARN))": get_assume_role_arn(),
|
||||
"((ROBOMAKER_ROLE_ARN))": get_robomaker_role_arn(),
|
||||
}
|
||||
|
||||
filedata = ""
|
||||
|
|
|
|||
|
|
@ -0,0 +1,34 @@
|
|||
def describe_simulation_application(client, sim_app_arn):
|
||||
return client.describe_simulation_application(application=sim_app_arn)
|
||||
|
||||
|
||||
def describe_simulation_job(client, sim_job_arn):
|
||||
return client.describe_simulation_job(job=sim_job_arn)
|
||||
|
||||
|
||||
def describe_simulation_job_batch(client, batch_job_id):
|
||||
return client.describe_simulation_job_batch(batch=batch_job_id)
|
||||
|
||||
|
||||
def delete_simulation_application(client, sim_app_arn):
|
||||
return client.delete_simulation_application(application=sim_app_arn)
|
||||
|
||||
|
||||
def cancel_simulation_job(client, sim_job_arn):
|
||||
return client.cancel_simulation_job(job=sim_job_arn)
|
||||
|
||||
|
||||
def cancel_simulation_job_batch(client, batch_job_id):
|
||||
return client.cancel_simulation_job_batch(batch=batch_job_id)
|
||||
|
||||
|
||||
def list_simulation_applications(client, sim_app_name):
|
||||
return client.list_simulation_applications(
|
||||
filters=[{"name": "name", "values": [sim_app_name]}]
|
||||
)
|
||||
|
||||
|
||||
def create_robot_application(client, app_name, sources, robot_software_suite):
|
||||
return client.create_robot_application(
|
||||
name=app_name, sources=sources, robotSoftwareSuite=robot_software_suite
|
||||
)
|
||||
|
|
@ -1,3 +1,9 @@
|
|||
import logging
|
||||
import re
|
||||
from datetime import datetime
|
||||
from time import sleep
|
||||
|
||||
|
||||
def describe_training_job(client, training_job_name):
|
||||
return client.describe_training_job(TrainingJobName=training_job_name)
|
||||
|
||||
|
|
|
|||
|
|
@ -15,8 +15,9 @@
|
|||
```
|
||||
3. Run all unit tests
|
||||
```
|
||||
docker run -it amazon/unit-test-aws-sagemaker-kfp-components
|
||||
docker run -it -v <path_to_this_repo_on_your_machine>:/app/ amazon/unit-test-aws-sagemaker-kfp-components:latest
|
||||
```
|
||||
This runs the tests against a mounted volume from your host machine. This means you can edit the files and rerun the tests immediately without having to rebuild the docker container.
|
||||
|
||||
--------------
|
||||
|
||||
|
|
@ -37,5 +38,4 @@
|
|||
cd tests/unit_tests/
|
||||
|
||||
./run_unit_tests.sh
|
||||
```
|
||||
|
||||
```
|
||||
|
|
@ -0,0 +1,459 @@
|
|||
from common.sagemaker_component import SageMakerJobStatus
|
||||
from rlestimator.src.sagemaker_rlestimator_spec import SageMakerRLEstimatorSpec
|
||||
from rlestimator.src.sagemaker_rlestimator_component import (
|
||||
SageMakerRLEstimatorComponent,
|
||||
DebugRulesStatus,
|
||||
)
|
||||
from tests.unit_tests.tests.rlestimator.test_rlestimator_spec import (
|
||||
RLEstimatorSpecTestCase,
|
||||
)
|
||||
import unittest
|
||||
|
||||
from unittest.mock import patch, MagicMock
|
||||
|
||||
HAS_ATTR_MESSAGE = "{} should have an attribute {}"
|
||||
HAS_NOT_ATTR_MESSAGE = "{} should not have an attribute {}"
|
||||
ATTR_NOT_NONE_MESSAGE = "{} attribute {} should be None"
|
||||
|
||||
|
||||
class BaseTestCase(unittest.TestCase):
|
||||
def assertHasAttr(self, obj, attrname, message=None):
|
||||
if not hasattr(obj, attrname):
|
||||
if message is not None:
|
||||
self.fail(message)
|
||||
else:
|
||||
self.fail(HAS_ATTR_MESSAGE.format(obj, attrname))
|
||||
|
||||
def assertHasNotAttr(self, obj, attrname, message=None):
|
||||
if hasattr(obj, attrname):
|
||||
if message is not None:
|
||||
self.fail(message)
|
||||
else:
|
||||
self.fail(HAS_NOT_ATTR_MESSAGE.format(obj, attrname))
|
||||
|
||||
def assertAttrNone(self, obj, attrname, message=None):
|
||||
if not hasattr(obj, attrname):
|
||||
if message is not None:
|
||||
self.fail(message)
|
||||
else:
|
||||
self.fail(HAS_NOT_ATTR_MESSAGE.format(obj, attrname))
|
||||
if getattr(obj, attrname) is not None:
|
||||
if message is not None:
|
||||
self.fail(message)
|
||||
else:
|
||||
self.fail(ATTR_NOT_NONE_MESSAGE.format(obj, attrname))
|
||||
|
||||
|
||||
class RLEstimatorComponentTestCase(BaseTestCase):
|
||||
|
||||
CUSTOM_IMAGE_ARGS = RLEstimatorSpecTestCase.CUSTOM_IMAGE_ARGS
|
||||
TOOLKIT_IMAGE_ARGS = RLEstimatorSpecTestCase.TOOLKIT_IMAGE_ARGS
|
||||
|
||||
@classmethod
|
||||
def setUp(cls):
|
||||
cls.component = SageMakerRLEstimatorComponent()
|
||||
# Instantiate without calling Do()
|
||||
cls.component._rlestimator_job_name = "test-job"
|
||||
cls.component._sagemaker_session = MagicMock()
|
||||
|
||||
@patch("rlestimator.src.sagemaker_rlestimator_component.super", MagicMock())
|
||||
def test_do_sets_name(self):
|
||||
named_spec = SageMakerRLEstimatorSpec(
|
||||
self.CUSTOM_IMAGE_ARGS + ["--job_name", "job-name"]
|
||||
)
|
||||
unnamed_spec = SageMakerRLEstimatorSpec(self.CUSTOM_IMAGE_ARGS)
|
||||
|
||||
self.component.Do(named_spec)
|
||||
self.assertEqual("job-name", self.component._rlestimator_job_name)
|
||||
|
||||
with patch(
|
||||
"rlestimator.src.sagemaker_rlestimator_component.SageMakerComponent._generate_unique_timestamped_id",
|
||||
MagicMock(return_value="unique"),
|
||||
):
|
||||
self.component.Do(unnamed_spec)
|
||||
self.assertEqual("unique", self.component._rlestimator_job_name)
|
||||
|
||||
def test_create_rlestimator_custom_job(self):
|
||||
spec = SageMakerRLEstimatorSpec(self.CUSTOM_IMAGE_ARGS)
|
||||
rlestimator = self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
self.assertHasAttr(rlestimator, "image_uri")
|
||||
self.assertHasAttr(rlestimator, "role")
|
||||
self.assertHasAttr(rlestimator, "source_dir")
|
||||
self.assertHasAttr(rlestimator, "entry_point")
|
||||
self.assertHasNotAttr(rlestimator, "toolkit")
|
||||
self.assertHasNotAttr(rlestimator, "toolkit_version")
|
||||
self.assertHasNotAttr(rlestimator, "framework")
|
||||
|
||||
def test_create_rlestimator_toolkit_job(self):
|
||||
spec = SageMakerRLEstimatorSpec(self.TOOLKIT_IMAGE_ARGS)
|
||||
rlestimator = self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
self.assertHasAttr(rlestimator, "role")
|
||||
self.assertHasAttr(rlestimator, "source_dir")
|
||||
self.assertHasAttr(rlestimator, "entry_point")
|
||||
self.assertHasAttr(rlestimator, "toolkit")
|
||||
self.assertHasAttr(rlestimator, "toolkit_version")
|
||||
self.assertHasAttr(rlestimator, "framework")
|
||||
self.assertAttrNone(rlestimator, "image_uri")
|
||||
|
||||
def test_get_job_status(self):
|
||||
self.component._sm_client = mock_client = MagicMock()
|
||||
self.component._get_debug_rule_status = MagicMock(
|
||||
return_value=SageMakerJobStatus(
|
||||
is_completed=True, has_error=False, raw_status="Completed"
|
||||
)
|
||||
)
|
||||
|
||||
self.component._sm_client.describe_training_job.return_value = {
|
||||
"TrainingJobStatus": "Starting"
|
||||
}
|
||||
self.assertEqual(
|
||||
self.component._get_job_status(),
|
||||
SageMakerJobStatus(is_completed=False, raw_status="Starting"),
|
||||
)
|
||||
|
||||
self.component._sm_client.describe_training_job.return_value = {
|
||||
"TrainingJobStatus": "Downloading"
|
||||
}
|
||||
self.assertEqual(
|
||||
self.component._get_job_status(),
|
||||
SageMakerJobStatus(is_completed=False, raw_status="Downloading"),
|
||||
)
|
||||
|
||||
self.component._sm_client.describe_training_job.return_value = {
|
||||
"TrainingJobStatus": "Completed"
|
||||
}
|
||||
self.assertEqual(
|
||||
self.component._get_job_status(),
|
||||
SageMakerJobStatus(is_completed=True, raw_status="Completed"),
|
||||
)
|
||||
|
||||
self.component._sm_client.describe_training_job.return_value = {
|
||||
"TrainingJobStatus": "Failed",
|
||||
"FailureReason": "lolidk",
|
||||
}
|
||||
self.assertEqual(
|
||||
self.component._get_job_status(),
|
||||
SageMakerJobStatus(
|
||||
is_completed=True,
|
||||
raw_status="Failed",
|
||||
has_error=True,
|
||||
error_message="lolidk",
|
||||
),
|
||||
)
|
||||
|
||||
def test_after_job_completed(self):
|
||||
self.component._get_model_artifacts_from_job = MagicMock(return_value="model")
|
||||
self.component._get_image_from_job = MagicMock(return_value="image")
|
||||
|
||||
spec = SageMakerRLEstimatorSpec(self.CUSTOM_IMAGE_ARGS)
|
||||
|
||||
self.component._after_job_complete({}, {}, spec.inputs, spec.outputs)
|
||||
|
||||
self.assertEqual(spec.outputs.job_name, "test-job")
|
||||
self.assertEqual(spec.outputs.model_artifact_url, "model")
|
||||
self.assertEqual(spec.outputs.training_image, "image")
|
||||
|
||||
def test_metric_definitions(self):
|
||||
spec = SageMakerRLEstimatorSpec(
|
||||
self.CUSTOM_IMAGE_ARGS
|
||||
+ [
|
||||
"--metric_definitions",
|
||||
'[ {"Name": "metric1", "Regex": "regexval1"},{"Name": "metric2", "Regex": "regexval2"},]',
|
||||
]
|
||||
)
|
||||
|
||||
rlestimator = self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
self.assertEqual(
|
||||
getattr(rlestimator, "metric_definitions"),
|
||||
[
|
||||
{"Name": "metric1", "Regex": "regexval1"},
|
||||
{"Name": "metric2", "Regex": "regexval2"},
|
||||
],
|
||||
)
|
||||
|
||||
def test_no_defined_image(self):
|
||||
# Pass the image to pass the parser
|
||||
no_image_args = self.CUSTOM_IMAGE_ARGS.copy()
|
||||
image_index = no_image_args.index("--image")
|
||||
# Cut out --image and it's associated value
|
||||
no_image_args = no_image_args[:image_index] + no_image_args[image_index + 2 :]
|
||||
|
||||
spec = SageMakerRLEstimatorSpec(no_image_args)
|
||||
|
||||
with self.assertRaises(Exception):
|
||||
self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
|
||||
def test_valid_hyperparameters(self):
|
||||
hyperparameters_str = '{"hp1": "val1", "hp2": "val2", "hp3": "val3"}'
|
||||
|
||||
spec = SageMakerRLEstimatorSpec(
|
||||
self.CUSTOM_IMAGE_ARGS + ["--hyperparameters", hyperparameters_str]
|
||||
)
|
||||
rlestimator = self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
|
||||
self.assertIn("hp1", getattr(rlestimator, "_hyperparameters"))
|
||||
self.assertIn("hp2", getattr(rlestimator, "_hyperparameters"))
|
||||
self.assertIn("hp3", getattr(rlestimator, "_hyperparameters"))
|
||||
self.assertEqual(getattr(rlestimator, "_hyperparameters")["hp1"], "val1")
|
||||
self.assertEqual(getattr(rlestimator, "_hyperparameters")["hp2"], "val2")
|
||||
self.assertEqual(getattr(rlestimator, "_hyperparameters")["hp3"], "val3")
|
||||
|
||||
def test_empty_hyperparameters(self):
|
||||
hyperparameters_str = "{}"
|
||||
|
||||
spec = SageMakerRLEstimatorSpec(
|
||||
self.CUSTOM_IMAGE_ARGS + ["--hyperparameters", hyperparameters_str]
|
||||
)
|
||||
rlestimator = self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
|
||||
self.assertEqual(getattr(rlestimator, "_hyperparameters"), {})
|
||||
|
||||
def test_object_hyperparameters(self):
|
||||
hyperparameters_str = '{"hp1": {"innerkey": "innerval"}}'
|
||||
|
||||
spec = SageMakerRLEstimatorSpec(
|
||||
self.CUSTOM_IMAGE_ARGS + ["--hyperparameters", hyperparameters_str]
|
||||
)
|
||||
with self.assertRaises(Exception):
|
||||
self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
|
||||
def test_vpc_configuration(self):
|
||||
spec = SageMakerRLEstimatorSpec(
|
||||
self.CUSTOM_IMAGE_ARGS
|
||||
+ [
|
||||
"--vpc_security_group_ids",
|
||||
'["sg1", "sg2"]',
|
||||
"--vpc_subnets",
|
||||
'["subnet1", "subnet2"]',
|
||||
]
|
||||
)
|
||||
rlestimator = self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
|
||||
self.assertHasAttr(rlestimator, "subnets")
|
||||
self.assertHasAttr(rlestimator, "security_group_ids")
|
||||
self.assertIn("sg1", getattr(rlestimator, "security_group_ids"))
|
||||
self.assertIn("sg2", getattr(rlestimator, "security_group_ids"))
|
||||
self.assertIn("subnet1", getattr(rlestimator, "subnets"))
|
||||
self.assertIn("subnet2", getattr(rlestimator, "subnets"))
|
||||
|
||||
def test_training_mode(self):
|
||||
spec = SageMakerRLEstimatorSpec(
|
||||
self.CUSTOM_IMAGE_ARGS + ["--training_input_mode", "Pipe"]
|
||||
)
|
||||
rlestimator = self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
|
||||
self.assertEqual(getattr(rlestimator, "input_mode"), "Pipe")
|
||||
|
||||
def test_wait_for_debug_rules(self):
|
||||
self.component._sm_client = mock_client = MagicMock()
|
||||
mock_client.describe_training_job.side_effect = [
|
||||
{
|
||||
"DebugRuleEvaluationStatuses": [
|
||||
{
|
||||
"RuleConfigurationName": "rule1",
|
||||
"RuleEvaluationStatus": "InProgress",
|
||||
},
|
||||
{
|
||||
"RuleConfigurationName": "rule2",
|
||||
"RuleEvaluationStatus": "InProgress",
|
||||
},
|
||||
]
|
||||
},
|
||||
{
|
||||
"DebugRuleEvaluationStatuses": [
|
||||
{
|
||||
"RuleConfigurationName": "rule1",
|
||||
"RuleEvaluationStatus": "NoIssuesFound",
|
||||
},
|
||||
{
|
||||
"RuleConfigurationName": "rule2",
|
||||
"RuleEvaluationStatus": "InProgress",
|
||||
},
|
||||
]
|
||||
},
|
||||
{
|
||||
"DebugRuleEvaluationStatuses": [
|
||||
{
|
||||
"RuleConfigurationName": "rule1",
|
||||
"RuleEvaluationStatus": "NoIssuesFound",
|
||||
},
|
||||
{
|
||||
"RuleConfigurationName": "rule2",
|
||||
"RuleEvaluationStatus": "IssuesFound",
|
||||
},
|
||||
]
|
||||
},
|
||||
]
|
||||
self.assertEqual(
|
||||
self.component._get_debug_rule_status(),
|
||||
SageMakerJobStatus(
|
||||
is_completed=False,
|
||||
has_error=False,
|
||||
raw_status=DebugRulesStatus.INPROGRESS,
|
||||
),
|
||||
)
|
||||
self.assertEqual(
|
||||
self.component._get_debug_rule_status(),
|
||||
SageMakerJobStatus(
|
||||
is_completed=False,
|
||||
has_error=False,
|
||||
raw_status=DebugRulesStatus.INPROGRESS,
|
||||
),
|
||||
)
|
||||
self.assertEqual(
|
||||
self.component._get_debug_rule_status(),
|
||||
SageMakerJobStatus(
|
||||
is_completed=True,
|
||||
has_error=False,
|
||||
raw_status=DebugRulesStatus.COMPLETED,
|
||||
),
|
||||
)
|
||||
|
||||
def test_wait_for_errored_rule(self):
|
||||
self.component._sm_client = mock_client = MagicMock()
|
||||
mock_client.describe_training_job.side_effect = [
|
||||
{
|
||||
"DebugRuleEvaluationStatuses": [
|
||||
{
|
||||
"RuleConfigurationName": "rule1",
|
||||
"RuleEvaluationStatus": "InProgress",
|
||||
},
|
||||
{
|
||||
"RuleConfigurationName": "rule2",
|
||||
"RuleEvaluationStatus": "InProgress",
|
||||
},
|
||||
]
|
||||
},
|
||||
{
|
||||
"DebugRuleEvaluationStatuses": [
|
||||
{"RuleConfigurationName": "rule1", "RuleEvaluationStatus": "Error"},
|
||||
{
|
||||
"RuleConfigurationName": "rule2",
|
||||
"RuleEvaluationStatus": "InProgress",
|
||||
},
|
||||
]
|
||||
},
|
||||
{
|
||||
"DebugRuleEvaluationStatuses": [
|
||||
{"RuleConfigurationName": "rule1", "RuleEvaluationStatus": "Error"},
|
||||
{
|
||||
"RuleConfigurationName": "rule2",
|
||||
"RuleEvaluationStatus": "NoIssuesFound",
|
||||
},
|
||||
]
|
||||
},
|
||||
]
|
||||
self.assertEqual(
|
||||
self.component._get_debug_rule_status(),
|
||||
SageMakerJobStatus(
|
||||
is_completed=False,
|
||||
has_error=False,
|
||||
raw_status=DebugRulesStatus.INPROGRESS,
|
||||
),
|
||||
)
|
||||
self.assertEqual(
|
||||
self.component._get_debug_rule_status(),
|
||||
SageMakerJobStatus(
|
||||
is_completed=False,
|
||||
has_error=False,
|
||||
raw_status=DebugRulesStatus.INPROGRESS,
|
||||
),
|
||||
)
|
||||
self.assertEqual(
|
||||
self.component._get_debug_rule_status(),
|
||||
SageMakerJobStatus(
|
||||
is_completed=True, has_error=True, raw_status=DebugRulesStatus.ERRORED
|
||||
),
|
||||
)
|
||||
|
||||
def test_hook_min_args(self):
|
||||
spec = SageMakerRLEstimatorSpec(
|
||||
self.CUSTOM_IMAGE_ARGS
|
||||
+ ["--debug_hook_config", '{"S3OutputPath": "s3://fake-uri/"}']
|
||||
)
|
||||
rlestimator = self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
self.assertEqual(
|
||||
getattr(rlestimator, "debugger_hook_config")["S3OutputPath"],
|
||||
"s3://fake-uri/",
|
||||
)
|
||||
|
||||
def test_hook_max_args(self):
|
||||
spec = SageMakerRLEstimatorSpec(
|
||||
self.CUSTOM_IMAGE_ARGS
|
||||
+ [
|
||||
"--debug_hook_config",
|
||||
'{"S3OutputPath": "s3://fake-uri/", "LocalPath": "/local/path/", "HookParameters": {"key": "value"}, "CollectionConfigurations": [{"CollectionName": "collection1", "CollectionParameters": {"key1": "value1"}}, {"CollectionName": "collection2", "CollectionParameters": {"key2": "value2", "key3": "value3"}}]}',
|
||||
]
|
||||
)
|
||||
rlestimator = self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
self.assertEqual(
|
||||
getattr(rlestimator, "debugger_hook_config")["S3OutputPath"],
|
||||
"s3://fake-uri/",
|
||||
)
|
||||
self.assertEqual(
|
||||
getattr(rlestimator, "debugger_hook_config")["LocalPath"], "/local/path/"
|
||||
)
|
||||
self.assertEqual(
|
||||
getattr(rlestimator, "debugger_hook_config")["HookParameters"],
|
||||
{"key": "value"},
|
||||
)
|
||||
self.assertEqual(
|
||||
getattr(rlestimator, "debugger_hook_config")["CollectionConfigurations"],
|
||||
[
|
||||
{
|
||||
"CollectionName": "collection1",
|
||||
"CollectionParameters": {"key1": "value1"},
|
||||
},
|
||||
{
|
||||
"CollectionName": "collection2",
|
||||
"CollectionParameters": {"key2": "value2", "key3": "value3"},
|
||||
},
|
||||
],
|
||||
)
|
||||
|
||||
def test_rule_max_args(self):
|
||||
spec = SageMakerRLEstimatorSpec(
|
||||
self.CUSTOM_IMAGE_ARGS
|
||||
+ [
|
||||
"--debug_rule_config",
|
||||
'[{"InstanceType": "ml.m4.xlarge", "LocalPath": "/local/path/", "RuleConfigurationName": "rule_name", "RuleEvaluatorImage": "test-image", "RuleParameters": {"key1": "value1"}, "S3OutputPath": "s3://fake-uri/", "VolumeSizeInGB": 1}]',
|
||||
]
|
||||
)
|
||||
rlestimator = self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
attrs = vars(rlestimator)
|
||||
print(", ".join("%s: %s" % item for item in attrs.items()))
|
||||
print(getattr(rlestimator, "debugger_rule_configs"))
|
||||
self.assertEqual(
|
||||
getattr(rlestimator, "rules")[0]["InstanceType"], "ml.m4.xlarge"
|
||||
)
|
||||
self.assertEqual(getattr(rlestimator, "rules")[0]["LocalPath"], "/local/path/")
|
||||
self.assertEqual(
|
||||
getattr(rlestimator, "rules")[0]["RuleConfigurationName"], "rule_name"
|
||||
)
|
||||
self.assertEqual(
|
||||
getattr(rlestimator, "rules")[0]["RuleEvaluatorImage"], "test-image"
|
||||
)
|
||||
self.assertEqual(
|
||||
getattr(rlestimator, "rules")[0]["RuleParameters"], {"key1": "value1"}
|
||||
)
|
||||
self.assertEqual(
|
||||
getattr(rlestimator, "rules")[0]["S3OutputPath"], "s3://fake-uri/"
|
||||
)
|
||||
self.assertEqual(getattr(rlestimator, "rules")[0]["VolumeSizeInGB"], 1)
|
||||
|
||||
def test_rule_min_good_args(self):
|
||||
spec = SageMakerRLEstimatorSpec(
|
||||
self.CUSTOM_IMAGE_ARGS
|
||||
+ [
|
||||
"--debug_rule_config",
|
||||
'[{"RuleConfigurationName": "rule_name", "RuleEvaluatorImage": "test-image"}]',
|
||||
]
|
||||
)
|
||||
rlestimator = self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
|
||||
self.assertEqual(
|
||||
getattr(rlestimator, "rules")[0]["RuleConfigurationName"], "rule_name"
|
||||
)
|
||||
self.assertEqual(
|
||||
getattr(rlestimator, "rules")[0]["RuleEvaluatorImage"], "test-image"
|
||||
)
|
||||
|
|
@ -0,0 +1,62 @@
|
|||
from rlestimator.src.sagemaker_rlestimator_spec import SageMakerRLEstimatorSpec
|
||||
import unittest
|
||||
|
||||
|
||||
class RLEstimatorSpecTestCase(unittest.TestCase):
|
||||
CUSTOM_IMAGE_ARGS = [
|
||||
"--region",
|
||||
"us-east-1",
|
||||
"--entry_point",
|
||||
"train-unity.py",
|
||||
"--source_dir",
|
||||
"s3://input_bucket_name/input_key",
|
||||
"--role",
|
||||
"arn:aws:iam::123456789012:user/Development/product_1234/*",
|
||||
"--image",
|
||||
"test-image",
|
||||
"--instance_type",
|
||||
"ml.m4.xlarge",
|
||||
"--instance_count",
|
||||
"1",
|
||||
"--volume_size",
|
||||
"50",
|
||||
"--max_run",
|
||||
"900",
|
||||
"--model_artifact_path",
|
||||
"test-path",
|
||||
]
|
||||
|
||||
TOOLKIT_IMAGE_ARGS = [
|
||||
"--region",
|
||||
"us-east-1",
|
||||
"--entry_point",
|
||||
"train-unity.py",
|
||||
"--source_dir",
|
||||
"s3://input_bucket_name/input_key",
|
||||
"--role",
|
||||
"arn:aws:iam::123456789012:user/Development/product_1234/*",
|
||||
"--toolkit",
|
||||
"ray",
|
||||
"--toolkit_version",
|
||||
"0.8.5",
|
||||
"--framework",
|
||||
"tensorflow",
|
||||
"--instance_type",
|
||||
"ml.m4.xlarge",
|
||||
"--instance_count",
|
||||
"1",
|
||||
"--volume_size",
|
||||
"50",
|
||||
"--max_run",
|
||||
"900",
|
||||
"--model_artifact_path",
|
||||
"test-path",
|
||||
]
|
||||
|
||||
def test_custom_image_args(self):
|
||||
# Will raise if the inputs are incorrect
|
||||
spec = SageMakerRLEstimatorSpec(self.CUSTOM_IMAGE_ARGS)
|
||||
|
||||
def test_toolkit_image_args(self):
|
||||
# Will raise if the inputs are incorrect
|
||||
spec = SageMakerRLEstimatorSpec(self.TOOLKIT_IMAGE_ARGS)
|
||||
|
|
@ -0,0 +1,131 @@
|
|||
from common.sagemaker_component import SageMakerJobStatus
|
||||
from create_simulation_app.src.robomaker_create_simulation_app_spec import (
|
||||
RoboMakerCreateSimulationAppSpec,
|
||||
)
|
||||
from create_simulation_app.src.robomaker_create_simulation_app_component import (
|
||||
RoboMakerCreateSimulationAppComponent,
|
||||
)
|
||||
from tests.unit_tests.tests.robomaker.test_robomaker_create_sim_app_spec import (
|
||||
RoboMakerCreateSimAppSpecTestCase,
|
||||
)
|
||||
import unittest
|
||||
|
||||
from unittest.mock import patch, MagicMock
|
||||
|
||||
|
||||
class RoboMakerCreateSimAppTestCase(unittest.TestCase):
|
||||
REQUIRED_ARGS = RoboMakerCreateSimAppSpecTestCase.REQUIRED_ARGS
|
||||
|
||||
@classmethod
|
||||
def setUp(cls):
|
||||
cls.component = RoboMakerCreateSimulationAppComponent()
|
||||
# Instantiate without calling Do()
|
||||
cls.component._app_name = "test-app"
|
||||
|
||||
@patch(
|
||||
"create_simulation_app.src.robomaker_create_simulation_app_component.super",
|
||||
MagicMock(),
|
||||
)
|
||||
def test_do_sets_name(self):
|
||||
named_spec = RoboMakerCreateSimulationAppSpec(
|
||||
self.REQUIRED_ARGS + ["--app_name", "my-app-name"]
|
||||
)
|
||||
|
||||
self.component.Do(named_spec)
|
||||
self.assertEqual("my-app-name", self.component._app_name)
|
||||
|
||||
def test_create_simulation_application_request(self):
|
||||
spec = RoboMakerCreateSimulationAppSpec(self.REQUIRED_ARGS)
|
||||
request = self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
|
||||
self.assertEqual(
|
||||
request,
|
||||
{
|
||||
"name": "test-app",
|
||||
"renderingEngine": {
|
||||
"name": "rendering_engine_name",
|
||||
"version": "rendering_engine_version",
|
||||
},
|
||||
"robotSoftwareSuite": {
|
||||
"name": "robot_software_name",
|
||||
"version": "robot_software_version",
|
||||
},
|
||||
"simulationSoftwareSuite": {
|
||||
"name": "simulation_software_name",
|
||||
"version": "simulation_software_version",
|
||||
},
|
||||
"sources": [
|
||||
{
|
||||
"architecture": "X86_64",
|
||||
"s3Bucket": "sources_bucket",
|
||||
"s3Key": "sources_key",
|
||||
}
|
||||
],
|
||||
"tags": {},
|
||||
},
|
||||
)
|
||||
|
||||
def test_missing_required_input(self):
|
||||
missing_input = self.REQUIRED_ARGS.copy()
|
||||
missing_input.remove("--app_name")
|
||||
missing_input.remove("app-name")
|
||||
|
||||
with self.assertRaises(SystemExit):
|
||||
spec = RoboMakerCreateSimulationAppSpec(missing_input)
|
||||
self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
|
||||
def test_get_job_status(self):
|
||||
self.component._rm_client = MagicMock()
|
||||
self.component._arn = "cool-arn"
|
||||
|
||||
self.component._rm_client.describe_simulation_application.return_value = {
|
||||
"arn": None
|
||||
}
|
||||
self.assertEqual(
|
||||
self.component._get_job_status(),
|
||||
SageMakerJobStatus(
|
||||
is_completed=True,
|
||||
has_error=True,
|
||||
error_message="No ARN present",
|
||||
raw_status=None,
|
||||
),
|
||||
)
|
||||
|
||||
self.component._rm_client.describe_simulation_application.return_value = {
|
||||
"arn": "arn:aws:robomaker:us-west-2:111111111111:simulation-application/MyRobotApplication/1551203301792"
|
||||
}
|
||||
self.assertEqual(
|
||||
self.component._get_job_status(),
|
||||
SageMakerJobStatus(
|
||||
is_completed=True,
|
||||
raw_status="arn:aws:robomaker:us-west-2:111111111111:simulation-application/MyRobotApplication/1551203301792",
|
||||
),
|
||||
)
|
||||
|
||||
@patch(
|
||||
"create_simulation_app.src.robomaker_create_simulation_app_component.logging"
|
||||
)
|
||||
def test_after_submit_job_request(self, mock_logging):
|
||||
spec = RoboMakerCreateSimulationAppSpec(self.REQUIRED_ARGS)
|
||||
self.component._after_submit_job_request(
|
||||
{"arn": "cool-arn"}, {}, spec.inputs, spec.outputs
|
||||
)
|
||||
mock_logging.info.assert_called_once()
|
||||
|
||||
def test_after_job_completed(self):
|
||||
spec = RoboMakerCreateSimulationAppSpec(self.REQUIRED_ARGS)
|
||||
|
||||
mock_job_response = {
|
||||
"arn": "arn:aws:robomaker:us-west-2:111111111111:simulation-application/MyRobotApplication/1551203301792",
|
||||
"version": "latest",
|
||||
"revisionId": "ee753e53-519c-4d37-895d-65e79bcd1914",
|
||||
"tags": {},
|
||||
}
|
||||
self.component._after_job_complete(
|
||||
mock_job_response, {}, spec.inputs, spec.outputs
|
||||
)
|
||||
|
||||
self.assertEqual(
|
||||
spec.outputs.arn,
|
||||
"arn:aws:robomaker:us-west-2:111111111111:simulation-application/MyRobotApplication/1551203301792",
|
||||
)
|
||||
|
|
@ -0,0 +1,32 @@
|
|||
from create_simulation_app.src.robomaker_create_simulation_app_spec import (
|
||||
RoboMakerCreateSimulationAppSpec,
|
||||
)
|
||||
import unittest
|
||||
|
||||
|
||||
class RoboMakerCreateSimAppSpecTestCase(unittest.TestCase):
|
||||
|
||||
REQUIRED_ARGS = [
|
||||
"--region",
|
||||
"us-west-2",
|
||||
"--app_name",
|
||||
"app-name",
|
||||
"--sources",
|
||||
'[{"s3Bucket": "sources_bucket", "s3Key": "sources_key", "architecture": "X86_64"}]',
|
||||
"--simulation_software_name",
|
||||
"simulation_software_name",
|
||||
"--simulation_software_version",
|
||||
"simulation_software_version",
|
||||
"--robot_software_name",
|
||||
"robot_software_name",
|
||||
"--robot_software_version",
|
||||
"robot_software_version",
|
||||
"--rendering_engine_name",
|
||||
"rendering_engine_name",
|
||||
"--rendering_engine_version",
|
||||
"rendering_engine_version",
|
||||
]
|
||||
|
||||
def test_minimum_required_args(self):
|
||||
# Will raise if the inputs are incorrect
|
||||
spec = RoboMakerCreateSimulationAppSpec(self.REQUIRED_ARGS)
|
||||
|
|
@ -0,0 +1,97 @@
|
|||
from common.sagemaker_component import SageMakerJobStatus
|
||||
from delete_simulation_app.src.robomaker_delete_simulation_app_spec import (
|
||||
RoboMakerDeleteSimulationAppSpec,
|
||||
)
|
||||
from delete_simulation_app.src.robomaker_delete_simulation_app_component import (
|
||||
RoboMakerDeleteSimulationAppComponent,
|
||||
)
|
||||
from tests.unit_tests.tests.robomaker.test_robomaker_delete_sim_app_spec import (
|
||||
RoboMakerDeleteSimAppSpecTestCase,
|
||||
)
|
||||
import unittest
|
||||
import json
|
||||
|
||||
from unittest.mock import patch, MagicMock
|
||||
|
||||
|
||||
class RoboMakerDeleteSimAppTestCase(unittest.TestCase):
|
||||
REQUIRED_ARGS = RoboMakerDeleteSimAppSpecTestCase.REQUIRED_ARGS
|
||||
|
||||
@classmethod
|
||||
def setUp(cls):
|
||||
cls.component = RoboMakerDeleteSimulationAppComponent()
|
||||
# Instantiate without calling Do()
|
||||
cls.component._arn = "cool-arn"
|
||||
|
||||
@patch(
|
||||
"delete_simulation_app.src.robomaker_delete_simulation_app_component.super",
|
||||
MagicMock(),
|
||||
)
|
||||
def test_do_sets_version(self):
|
||||
named_spec = RoboMakerDeleteSimulationAppSpec(
|
||||
self.REQUIRED_ARGS + ["--version", "cool-version"]
|
||||
)
|
||||
|
||||
self.component.Do(named_spec)
|
||||
self.assertEqual("cool-version", self.component._version)
|
||||
|
||||
def test_delete_simulation_application_request(self):
|
||||
spec = RoboMakerDeleteSimulationAppSpec(self.REQUIRED_ARGS)
|
||||
request = self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
|
||||
self.assertEqual(request, {"application": "cool-arn",})
|
||||
|
||||
def test_missing_required_input(self):
|
||||
missing_input = self.REQUIRED_ARGS.copy()
|
||||
missing_input.remove("--arn")
|
||||
missing_input.remove("cool-arn")
|
||||
|
||||
with self.assertRaises(SystemExit):
|
||||
spec = RoboMakerDeleteSimulationAppSpec(missing_input)
|
||||
self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
|
||||
def test_get_job_status(self):
|
||||
self.component._rm_client = MagicMock()
|
||||
self.component._arn = "cool-arn"
|
||||
|
||||
self.component._rm_client.describe_simulation_application.return_value = {
|
||||
"arn": None
|
||||
}
|
||||
self.assertEqual(
|
||||
self.component._get_job_status(),
|
||||
SageMakerJobStatus(is_completed=True, raw_status="Item deleted"),
|
||||
)
|
||||
|
||||
self.component._rm_client.describe_simulation_application.return_value = {
|
||||
"arn": "arn:aws:robomaker:us-west-2:111111111111:simulation-application/MyRobotApplication/1551203301792"
|
||||
}
|
||||
self.assertEqual(
|
||||
self.component._get_job_status(),
|
||||
SageMakerJobStatus(
|
||||
is_completed=False,
|
||||
raw_status="arn:aws:robomaker:us-west-2:111111111111:simulation-application/MyRobotApplication/1551203301792",
|
||||
),
|
||||
)
|
||||
|
||||
@patch(
|
||||
"delete_simulation_app.src.robomaker_delete_simulation_app_component.logging"
|
||||
)
|
||||
def test_after_submit_job_request(self, mock_logging):
|
||||
spec = RoboMakerDeleteSimulationAppSpec(self.REQUIRED_ARGS)
|
||||
self.component._after_submit_job_request(
|
||||
{"arn": "cool-arn"}, {}, spec.inputs, spec.outputs
|
||||
)
|
||||
mock_logging.info.assert_called_once()
|
||||
|
||||
def test_after_job_completed(self):
|
||||
spec = RoboMakerDeleteSimulationAppSpec(self.REQUIRED_ARGS)
|
||||
|
||||
mock_job_response = {}
|
||||
self.component._version = "cool-version"
|
||||
self.component._after_job_complete(
|
||||
mock_job_response, {}, spec.inputs, spec.outputs
|
||||
)
|
||||
|
||||
# We expect to get returned the initial value we set for arn from REQUIRED_ARGS
|
||||
# The response from the api for the delete call will always be empty or None
|
||||
self.assertEqual(spec.outputs.arn, "cool-arn")
|
||||
|
|
@ -0,0 +1,18 @@
|
|||
from delete_simulation_app.src.robomaker_delete_simulation_app_spec import (
|
||||
RoboMakerDeleteSimulationAppSpec,
|
||||
)
|
||||
import unittest
|
||||
|
||||
|
||||
class RoboMakerDeleteSimAppSpecTestCase(unittest.TestCase):
|
||||
|
||||
REQUIRED_ARGS = [
|
||||
"--region",
|
||||
"us-east-1",
|
||||
"--arn",
|
||||
"cool-arn",
|
||||
]
|
||||
|
||||
def test_minimum_required_args(self):
|
||||
# Will raise if the inputs are incorrect
|
||||
spec = RoboMakerDeleteSimulationAppSpec(self.REQUIRED_ARGS)
|
||||
|
|
@ -0,0 +1,159 @@
|
|||
from yaml.parser import ParserError
|
||||
|
||||
from common.sagemaker_component import SageMakerJobStatus
|
||||
from simulation_job_batch.src.robomaker_simulation_job_batch_spec import (
|
||||
RoboMakerSimulationJobBatchSpec,
|
||||
)
|
||||
from simulation_job_batch.src.robomaker_simulation_job_batch_component import (
|
||||
RoboMakerSimulationJobBatchComponent,
|
||||
)
|
||||
from tests.unit_tests.tests.robomaker.test_robomaker_simulation_job_batch_spec import (
|
||||
RoboMakerSimulationJobBatchSpecTestCase,
|
||||
)
|
||||
import unittest
|
||||
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
|
||||
class RoboMakerSimulationJobTestCase(unittest.TestCase):
|
||||
REQUIRED_ARGS = RoboMakerSimulationJobBatchSpecTestCase.REQUIRED_ARGS
|
||||
|
||||
@classmethod
|
||||
def setUp(cls):
|
||||
cls.component = RoboMakerSimulationJobBatchComponent()
|
||||
cls.component._arn = "fake-arn"
|
||||
cls.component._batch_job_id = "fake-id"
|
||||
cls.component._sim_request_ids = set()
|
||||
|
||||
def test_create_simulation_batch_job(self):
|
||||
spec = RoboMakerSimulationJobBatchSpec(self.REQUIRED_ARGS)
|
||||
request = self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
|
||||
self.assertEqual(
|
||||
request,
|
||||
{
|
||||
"batchPolicy": {"maxConcurrency": 3, "timeoutInSeconds": 5800},
|
||||
"createSimulationJobRequests": [
|
||||
{
|
||||
"dataSources": {
|
||||
"name": "data-source-name",
|
||||
"s3Bucket": "data-source-bucket",
|
||||
"s3Keys": [{"s3Key": "data-source-key"}],
|
||||
},
|
||||
"failureBehavior": "Fail",
|
||||
"iamRole": "TestRole",
|
||||
"loggingConfig": {"recordAllRosTopics": "True"},
|
||||
"maxJobDurationInSeconds": "123",
|
||||
"outputLocation": {
|
||||
"s3Bucket": "fake-bucket",
|
||||
"s3Prefix": "fake-key",
|
||||
},
|
||||
"simulationApplications": [
|
||||
{
|
||||
"application": "test-arn",
|
||||
"applicationVersion": "1",
|
||||
"launchConfig": {
|
||||
"environmentVariables": {"Env": "var"},
|
||||
"launchFile": "launch-file.py",
|
||||
"packageName": "package-name",
|
||||
"portForwardingConfig": {
|
||||
"portMappings": [
|
||||
{
|
||||
"applicationPort": "123",
|
||||
"enableOnPublicIp": "True",
|
||||
"jobPort": "123",
|
||||
}
|
||||
]
|
||||
},
|
||||
"streamUI": "True",
|
||||
},
|
||||
}
|
||||
],
|
||||
}
|
||||
],
|
||||
"tags": {},
|
||||
},
|
||||
)
|
||||
|
||||
def test_get_job_status(self):
|
||||
self.component._rm_client = MagicMock()
|
||||
|
||||
self.component._rm_client.describe_simulation_job_batch.return_value = {
|
||||
"status": "Starting"
|
||||
}
|
||||
self.assertEqual(
|
||||
self.component._get_job_status(),
|
||||
SageMakerJobStatus(is_completed=False, raw_status="Starting"),
|
||||
)
|
||||
|
||||
self.component._rm_client.describe_simulation_job_batch.return_value = {
|
||||
"status": "Downloading"
|
||||
}
|
||||
self.assertEqual(
|
||||
self.component._get_job_status(),
|
||||
SageMakerJobStatus(is_completed=False, raw_status="Downloading"),
|
||||
)
|
||||
|
||||
self.component._rm_client.describe_simulation_job_batch.return_value = {
|
||||
"status": "Completed",
|
||||
"createdRequests": [{"status": "Completed",}],
|
||||
}
|
||||
self.assertEqual(
|
||||
self.component._get_job_status(),
|
||||
SageMakerJobStatus(is_completed=True, raw_status="Completed"),
|
||||
)
|
||||
|
||||
self.component._rm_client.describe_simulation_job_batch.return_value = {
|
||||
"status": "Canceled",
|
||||
"createdRequests": [{"status": "Failed", "arn": "fake-arn"}],
|
||||
}
|
||||
self.component._rm_client.describe_simulation_job.return_value = {
|
||||
"status": "Failed",
|
||||
"arn": "fake-arn",
|
||||
"failureCode": "InternalServiceError",
|
||||
"failureReason": "Big Reason",
|
||||
}
|
||||
self.assertEqual(
|
||||
self.component._get_job_status(),
|
||||
SageMakerJobStatus(
|
||||
is_completed=True,
|
||||
raw_status="Canceled",
|
||||
has_error=True,
|
||||
error_message="Simulation jobs are completed\nSimulation job: fake-arn failed with errorCode:InternalServiceError\n",
|
||||
),
|
||||
)
|
||||
|
||||
self.component._rm_client.describe_simulation_job_batch.return_value = {
|
||||
"status": "Failed",
|
||||
"failureCode": "InternalServiceError",
|
||||
"failureReason": "Big Reason",
|
||||
}
|
||||
self.assertEqual(
|
||||
self.component._get_job_status(),
|
||||
SageMakerJobStatus(
|
||||
is_completed=True,
|
||||
raw_status="Failed",
|
||||
has_error=True,
|
||||
error_message="Simulation batch job is in status:Failed\nSimulation failed with reason:Big ReasonSimulation failed with errorCode:InternalServiceError",
|
||||
),
|
||||
)
|
||||
|
||||
def test_no_simulation_job_requests(self):
|
||||
no_job_requests = self.REQUIRED_ARGS.copy()
|
||||
no_job_requests = no_job_requests[
|
||||
: no_job_requests.index("--simulation_job_requests")
|
||||
]
|
||||
|
||||
with self.assertRaises(SystemExit):
|
||||
spec = RoboMakerSimulationJobBatchSpec(no_job_requests)
|
||||
self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
|
||||
def test_empty_simulation_job_requests(self):
|
||||
empty_job_requests = self.REQUIRED_ARGS.copy()
|
||||
empty_job_requests[-1:] = "[]"
|
||||
|
||||
print(empty_job_requests)
|
||||
|
||||
with self.assertRaises(ParserError):
|
||||
spec = RoboMakerSimulationJobBatchSpec(empty_job_requests)
|
||||
self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
|
|
@ -0,0 +1,64 @@
|
|||
from simulation_job_batch.src.robomaker_simulation_job_batch_spec import (
|
||||
RoboMakerSimulationJobBatchSpec,
|
||||
)
|
||||
import unittest
|
||||
import json
|
||||
|
||||
|
||||
class RoboMakerSimulationJobBatchSpecTestCase(unittest.TestCase):
|
||||
|
||||
REQUIRED_ARGS = [
|
||||
"--region",
|
||||
"us-west-2",
|
||||
"--role",
|
||||
"role-arn",
|
||||
"--timeout_in_secs",
|
||||
"5800",
|
||||
"--max_concurrency",
|
||||
"3",
|
||||
"--simulation_job_requests",
|
||||
json.dumps(
|
||||
[
|
||||
{
|
||||
"outputLocation": {
|
||||
"s3Bucket": "fake-bucket",
|
||||
"s3Prefix": "fake-key",
|
||||
},
|
||||
"loggingConfig": {"recordAllRosTopics": "True"},
|
||||
"maxJobDurationInSeconds": "123",
|
||||
"iamRole": "TestRole",
|
||||
"failureBehavior": "Fail",
|
||||
"simulationApplications": [
|
||||
{
|
||||
"application": "test-arn",
|
||||
"applicationVersion": "1",
|
||||
"launchConfig": {
|
||||
"packageName": "package-name",
|
||||
"launchFile": "launch-file.py",
|
||||
"environmentVariables": {"Env": "var",},
|
||||
"portForwardingConfig": {
|
||||
"portMappings": [
|
||||
{
|
||||
"jobPort": "123",
|
||||
"applicationPort": "123",
|
||||
"enableOnPublicIp": "True",
|
||||
}
|
||||
]
|
||||
},
|
||||
"streamUI": "True",
|
||||
},
|
||||
}
|
||||
],
|
||||
"dataSources": {
|
||||
"name": "data-source-name",
|
||||
"s3Bucket": "data-source-bucket",
|
||||
"s3Keys": [{"s3Key": "data-source-key",}],
|
||||
},
|
||||
}
|
||||
]
|
||||
),
|
||||
]
|
||||
|
||||
def test_minimum_required_args(self):
|
||||
# Will raise if the inputs are incorrect
|
||||
spec = RoboMakerSimulationJobBatchSpec(self.REQUIRED_ARGS)
|
||||
|
|
@ -0,0 +1,148 @@
|
|||
from common.sagemaker_component import SageMakerJobStatus
|
||||
from simulation_job.src.robomaker_simulation_job_spec import RoboMakerSimulationJobSpec
|
||||
from simulation_job.src.robomaker_simulation_job_component import (
|
||||
RoboMakerSimulationJobComponent,
|
||||
)
|
||||
from tests.unit_tests.tests.robomaker.test_robomaker_simulation_job_spec import (
|
||||
RoboMakerSimulationJobSpecTestCase,
|
||||
)
|
||||
import unittest
|
||||
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
|
||||
class RoboMakerSimulationJobTestCase(unittest.TestCase):
|
||||
REQUIRED_ARGS = RoboMakerSimulationJobSpecTestCase.REQUIRED_ARGS
|
||||
|
||||
@classmethod
|
||||
def setUp(cls):
|
||||
cls.component = RoboMakerSimulationJobComponent()
|
||||
cls.component._arn = "fake-arn"
|
||||
cls.component._job_id = "fake-id"
|
||||
|
||||
def test_create_simulation_job(self):
|
||||
spec = RoboMakerSimulationJobSpec(self.REQUIRED_ARGS)
|
||||
request = self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
|
||||
self.assertEqual(
|
||||
request,
|
||||
{
|
||||
"outputLocation": {
|
||||
"s3Bucket": "output-bucket-name",
|
||||
"s3Prefix": "output-bucket-key",
|
||||
},
|
||||
"maxJobDurationInSeconds": 900,
|
||||
"iamRole": "role-arn",
|
||||
"failureBehavior": "Fail",
|
||||
"simulationApplications": [
|
||||
{
|
||||
"application": "simulation_app_arn",
|
||||
"launchConfig": {
|
||||
"environmentVariables": {"Env": "var"},
|
||||
"launchFile": "launch-file.py",
|
||||
"packageName": "package-name",
|
||||
"portForwardingConfig": {
|
||||
"portMappings": [
|
||||
{
|
||||
"applicationPort": "123",
|
||||
"enableOnPublicIp": "True",
|
||||
"jobPort": "123",
|
||||
}
|
||||
]
|
||||
},
|
||||
"streamUI": "True",
|
||||
},
|
||||
}
|
||||
],
|
||||
"dataSources": [
|
||||
{
|
||||
"name": "data-source-name",
|
||||
"s3Bucket": "data-source-bucket",
|
||||
"s3Keys": [{"s3Key": "data-source-key"}],
|
||||
}
|
||||
],
|
||||
"compute": {"simulationUnitLimit": 15},
|
||||
"tags": {},
|
||||
},
|
||||
)
|
||||
|
||||
def test_get_job_status(self):
|
||||
self.component._rm_client = MagicMock()
|
||||
|
||||
self.component._rm_client.describe_simulation_job.return_value = {
|
||||
"status": "Starting"
|
||||
}
|
||||
self.assertEqual(
|
||||
self.component._get_job_status(),
|
||||
SageMakerJobStatus(is_completed=False, raw_status="Starting"),
|
||||
)
|
||||
|
||||
self.component._rm_client.describe_simulation_job.return_value = {
|
||||
"status": "Downloading"
|
||||
}
|
||||
self.assertEqual(
|
||||
self.component._get_job_status(),
|
||||
SageMakerJobStatus(is_completed=False, raw_status="Downloading"),
|
||||
)
|
||||
|
||||
self.component._rm_client.describe_simulation_job.return_value = {
|
||||
"status": "Completed"
|
||||
}
|
||||
self.assertEqual(
|
||||
self.component._get_job_status(),
|
||||
SageMakerJobStatus(is_completed=True, raw_status="Completed"),
|
||||
)
|
||||
|
||||
self.component._rm_client.describe_simulation_job.return_value = {
|
||||
"status": "Failed",
|
||||
"failureCode": "InternalServiceError",
|
||||
"failureReason": "Big Reason",
|
||||
}
|
||||
self.assertEqual(
|
||||
self.component._get_job_status(),
|
||||
SageMakerJobStatus(
|
||||
is_completed=True,
|
||||
raw_status="Failed",
|
||||
has_error=True,
|
||||
error_message="Simulation job is in status:Failed\nSimulation failed with reason:Big ReasonSimulation failed with errorCode:InternalServiceError",
|
||||
),
|
||||
)
|
||||
|
||||
def test_after_job_completed(self):
|
||||
spec = RoboMakerSimulationJobSpec(self.REQUIRED_ARGS)
|
||||
|
||||
mock_out = "s3://cool-bucket/fake-key"
|
||||
self.component._get_job_outputs = MagicMock(return_value=mock_out)
|
||||
|
||||
self.component._after_job_complete({}, {}, spec.inputs, spec.outputs)
|
||||
|
||||
self.assertEqual(spec.outputs.output_artifacts, mock_out)
|
||||
|
||||
def test_get_job_outputs(self):
|
||||
self.component._rm_client = mock_client = MagicMock()
|
||||
mock_client.describe_simulation_job.return_value = {
|
||||
"outputLocation": {"s3Bucket": "cool-bucket", "s3Prefix": "fake-key",}
|
||||
}
|
||||
|
||||
self.assertEqual(
|
||||
self.component._get_job_outputs(), "s3://cool-bucket/fake-key",
|
||||
)
|
||||
|
||||
def test_no_simulation_app_defined(self):
|
||||
no_sim_app = self.REQUIRED_ARGS.copy()
|
||||
no_sim_app.remove("--sim_app_arn")
|
||||
no_sim_app.remove("simulation_app_arn")
|
||||
|
||||
with self.assertRaises(Exception):
|
||||
spec = RoboMakerSimulationJobSpec(no_sim_app)
|
||||
self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
|
||||
def test_no_launch_config_defined(self):
|
||||
no_launch_config = self.REQUIRED_ARGS.copy()
|
||||
no_launch_config = no_launch_config[
|
||||
: no_launch_config.index("--sim_app_launch_config")
|
||||
]
|
||||
|
||||
with self.assertRaises(Exception):
|
||||
spec = RoboMakerSimulationJobSpec(no_launch_config)
|
||||
self.component._create_job_request(spec.inputs, spec.outputs)
|
||||
|
|
@ -0,0 +1,55 @@
|
|||
from simulation_job.src.robomaker_simulation_job_spec import RoboMakerSimulationJobSpec
|
||||
import unittest
|
||||
import json
|
||||
|
||||
|
||||
class RoboMakerSimulationJobSpecTestCase(unittest.TestCase):
|
||||
|
||||
REQUIRED_ARGS = [
|
||||
"--region",
|
||||
"us-west-2",
|
||||
"--role",
|
||||
"role-arn",
|
||||
"--output_bucket",
|
||||
"output-bucket-name",
|
||||
"--output_path",
|
||||
"output-bucket-key",
|
||||
"--max_run",
|
||||
"900",
|
||||
"--data_sources",
|
||||
json.dumps(
|
||||
[
|
||||
{
|
||||
"name": "data-source-name",
|
||||
"s3Bucket": "data-source-bucket",
|
||||
"s3Keys": [{"s3Key": "data-source-key",}],
|
||||
}
|
||||
]
|
||||
),
|
||||
"--sim_app_arn",
|
||||
"simulation_app_arn",
|
||||
"--sim_app_version",
|
||||
"1",
|
||||
"--sim_app_launch_config",
|
||||
json.dumps(
|
||||
{
|
||||
"packageName": "package-name",
|
||||
"launchFile": "launch-file.py",
|
||||
"environmentVariables": {"Env": "var",},
|
||||
"portForwardingConfig": {
|
||||
"portMappings": [
|
||||
{
|
||||
"jobPort": "123",
|
||||
"applicationPort": "123",
|
||||
"enableOnPublicIp": "True",
|
||||
}
|
||||
]
|
||||
},
|
||||
"streamUI": "True",
|
||||
}
|
||||
),
|
||||
]
|
||||
|
||||
def test_minimum_required_args(self):
|
||||
# Will raise if the inputs are incorrect
|
||||
spec = RoboMakerSimulationJobSpec(self.REQUIRED_ARGS)
|
||||
|
|
@ -36,8 +36,8 @@ inputs:
|
|||
channels. Must have at least one.}
|
||||
- {name: instance_type, type: String, description: The ML compute instance type.,
|
||||
default: ml.m4.xlarge}
|
||||
- {name: instance_count, type: Integer, description: The registry path of the Docker
|
||||
image that contains the training algorithm., default: '1'}
|
||||
- {name: instance_count, type: Integer, description: The number of ML compute instances
|
||||
to use in the training job., default: '1'}
|
||||
- {name: volume_size, type: Integer, description: The size of the ML storage volume
|
||||
that you want to provision., default: '30'}
|
||||
- {name: resource_encryption_key, type: String, description: The AWS KMS key that
|
||||
|
|
@ -72,7 +72,7 @@ outputs:
|
|||
the training algorithm.}
|
||||
implementation:
|
||||
container:
|
||||
image: amazon/aws-sagemaker-kfp-components:1.0.0
|
||||
image: amazon/aws-sagemaker-kfp-components:1.1.0
|
||||
command: [python3]
|
||||
args:
|
||||
- train/src/sagemaker_training_component.py
|
||||
|
|
|
|||
|
|
@ -26,28 +26,10 @@ from common.sagemaker_component import (
|
|||
SageMakerComponent,
|
||||
ComponentMetadata,
|
||||
SageMakerJobStatus,
|
||||
DebugRulesStatus,
|
||||
)
|
||||
|
||||
|
||||
class DebugRulesStatus(Enum):
|
||||
COMPLETED = auto()
|
||||
ERRORED = auto()
|
||||
INPROGRESS = auto()
|
||||
|
||||
@classmethod
|
||||
def from_describe(cls, response):
|
||||
has_error = False
|
||||
for debug_rule in response["DebugRuleEvaluationStatuses"]:
|
||||
if debug_rule["RuleEvaluationStatus"] == "Error":
|
||||
has_error = True
|
||||
if debug_rule["RuleEvaluationStatus"] == "InProgress":
|
||||
return DebugRulesStatus.INPROGRESS
|
||||
if has_error:
|
||||
return DebugRulesStatus.ERRORED
|
||||
else:
|
||||
return DebugRulesStatus.COMPLETED
|
||||
|
||||
|
||||
@ComponentMetadata(
|
||||
name="SageMaker - Training Job",
|
||||
description="Train Machine Learning and Deep Learning Models using SageMaker",
|
||||
|
|
|
|||
|
|
@ -118,7 +118,7 @@ class SageMakerTrainingSpec(
|
|||
instance_count=InputValidator(
|
||||
required=True,
|
||||
input_type=int,
|
||||
description="The registry path of the Docker image that contains the training algorithm.",
|
||||
description="The number of ML compute instances to use in the training job.",
|
||||
default=1,
|
||||
),
|
||||
volume_size=InputValidator(
|
||||
|
|
|
|||
|
|
@ -22,7 +22,7 @@ outputs:
|
|||
- {name: workteam_arn, description: The ARN of the workteam.}
|
||||
implementation:
|
||||
container:
|
||||
image: amazon/aws-sagemaker-kfp-components:1.0.0
|
||||
image: amazon/aws-sagemaker-kfp-components:1.1.0
|
||||
command: [python3]
|
||||
args:
|
||||
- workteam/src/sagemaker_workteam_component.py
|
||||
|
|
|
|||
|
|
@ -0,0 +1,72 @@
|
|||
The two examples in this directory each run a different type of RLEstimator Reinforcement Learning job as a SageMaker training job.
|
||||
|
||||
## Examples
|
||||
|
||||
Each example is based on a notebook from the [AWS SageMaker Examples](https://github.com/aws/amazon-sagemaker-examples) repo.
|
||||
(It should be noted that all of these examples are available by default on all SageMaker Notebook Instance's)
|
||||
|
||||
The `rlestimator_pipeline_custom_image` pipeline example is based on the
|
||||
[`rl_unity_ray`](https://github.com/aws/amazon-sagemaker-examples/blob/master/reinforcement_learning/rl_unity_ray/rl_unity_ray.ipynb) notebook.
|
||||
|
||||
The `rlestimator_pipeline_toolkit_image` pipeline example is based on the
|
||||
[`rl_news_vendor_ray_custom`](https://github.com/aws/amazon-sagemaker-examples/blob/master/reinforcement_learning/rl_resource_allocation_ray_customEnv/rl_news_vendor_ray_custom.ipynb) notebook.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
To run these examples you will need to create a number of resources that will then be used as inputs for the pipeline component.
|
||||
|
||||
rlestimator_pipeline_custom_image required inputs:
|
||||
```
|
||||
output_bucket_name = <bucket used for outputs from the training job>
|
||||
input_bucket_name = <bucket used for inputs, in this case custom code via a tar.gz>
|
||||
input_key = <the path and file name of the source code tar.gz>
|
||||
job_name_prefix = <not required, but can be useful to identify these training jobs>
|
||||
image_uri = <docker image uri, can be docker.io if you have internet access, but might be easier to use ECR>
|
||||
assume_role = <sagemaker execution role, this is created for you automatically when you launch a notebook instance>
|
||||
```
|
||||
|
||||
rl_news_vendor_ray_custom required inputs:
|
||||
```
|
||||
output_bucket_name = <bucket used for outputs from the training job>
|
||||
input_bucket_name = <bucket used for inputs, in this case custom code via a tar.gz>
|
||||
input_key = <the path and file name of the source code tar.gz>
|
||||
job_name_prefix = <not required, but can be useful to identify these training jobs>
|
||||
role = <sagemaker execution role, this is created for you automatically when you launch a notebook instance>
|
||||
```
|
||||
|
||||
You could go to the bother of creating all of these resources individually, but it might be easier to run each of the notebooks
|
||||
mentioned above, and then use the resources that are created by the notebooks. For the input bucket and output bucket they
|
||||
will be created under a name like 'sagemaker-us-east-1-520713654638' depending on your region and account number. Within
|
||||
these buckets a key will be created for each of your training job runs. After you have executed all cells in each of the notebooks
|
||||
a key for each training job that has completed will be made and any custom code required for the training job will be placed
|
||||
there as a .tar.gz file. The tar.gz file full S3 URI can be used as the source_dir input for these pipeline components.
|
||||
|
||||
|
||||
## Compiling the pipeline template
|
||||
|
||||
Follow the guide to [building a pipeline](https://www.kubeflow.org/docs/guides/pipelines/build-pipeline/) to install the Kubeflow Pipelines SDK, then run the following command to compile the sample Python into a workflow specification. The specification takes the form of a YAML file compressed into a `.tar.gz` file.
|
||||
|
||||
```bash
|
||||
dsl-compile --py rlestimator_pipeline_custom_image.py --output rlestimator_pipeline_custom_image.tar.gz
|
||||
dsl-compile --py rlestimator_pipeline_toolkit_image.py --output rlestimator_pipeline_toolkit_image.tar.gz
|
||||
```
|
||||
|
||||
## Deploying the pipeline
|
||||
|
||||
Open the Kubeflow pipelines UI. Create a new pipeline, and then upload the compiled specification (`.tar.gz` file) as a new pipeline template.
|
||||
|
||||
Once the pipeline done, you can go to the S3 path specified in `output` to check your prediction results. There're three columes, `PassengerId`, `prediction`, `Survived` (Ground True value)
|
||||
|
||||
```
|
||||
...
|
||||
4,1,1
|
||||
5,0,0
|
||||
6,0,0
|
||||
7,0,0
|
||||
...
|
||||
```
|
||||
|
||||
## Components source
|
||||
|
||||
RLEstimator Training Job:
|
||||
[source code](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/rlestimator/src)
|
||||
|
|
@ -0,0 +1,76 @@
|
|||
#!/usr/bin/env python3
|
||||
|
||||
# Uncomment the apply(use_aws_secret()) below if you are not using OIDC
|
||||
# more info : https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/README.md
|
||||
|
||||
import kfp
|
||||
import os
|
||||
from kfp import components
|
||||
from kfp import dsl
|
||||
from kfp.aws import use_aws_secret
|
||||
from sagemaker.rl import RLEstimator, RLToolkit
|
||||
|
||||
|
||||
cur_file_dir = os.path.dirname(__file__)
|
||||
components_dir = os.path.join(cur_file_dir, "../../../../components/aws/sagemaker/")
|
||||
|
||||
sagemaker_rlestimator_op = components.load_component_from_file(
|
||||
components_dir + "/rlestimator/component.yaml"
|
||||
)
|
||||
|
||||
output_bucket_name = "kf-pipelines-rlestimator-output"
|
||||
input_bucket_name = "kf-pipelines-rlestimator-input"
|
||||
input_key = "sourcedir.tar.gz"
|
||||
job_name_prefix = "rlestimator-pipeline-custom-image"
|
||||
image_uri = "your_sagemaker_image_name"
|
||||
role = "your_sagemaker_role_name"
|
||||
security_groups = ["sg-0490601e83f220e82"]
|
||||
subnets = [
|
||||
"subnet-0efc73526db16a4a4",
|
||||
"subnet-0b8af626f39e7d462",
|
||||
]
|
||||
|
||||
# You need to specify your own metric_definitions if using a custom image_uri
|
||||
metric_definitions = RLEstimator.default_metric_definitions(RLToolkit.RAY)
|
||||
|
||||
|
||||
@dsl.pipeline(
|
||||
name="RLEstimator Custom Docker Image",
|
||||
description="RLEstimator training job where we provide a reference to a Docker image containing our training code",
|
||||
)
|
||||
def rlestimator_training_custom_pipeline(
|
||||
region="us-east-1",
|
||||
entry_point="train-unity.py",
|
||||
source_dir="s3://{}/{}".format(input_bucket_name, input_key),
|
||||
image_uri=image_uri,
|
||||
assume_role=role,
|
||||
instance_type="ml.c5.2xlarge",
|
||||
instance_count=1,
|
||||
output_path="s3://{}/".format(output_bucket_name),
|
||||
base_job_name=job_name_prefix,
|
||||
metric_definitions=metric_definitions,
|
||||
hyperparameters={},
|
||||
vpc_security_group_ids=security_groups,
|
||||
vpc_subnets=subnets,
|
||||
):
|
||||
rlestimator_training_custom = sagemaker_rlestimator_op(
|
||||
region=region,
|
||||
entry_point=entry_point,
|
||||
source_dir=source_dir,
|
||||
image=image_uri,
|
||||
role=assume_role,
|
||||
model_artifact_path=output_path,
|
||||
job_name=base_job_name,
|
||||
metric_definitions=metric_definitions,
|
||||
instance_type=instance_type,
|
||||
instance_count=instance_count,
|
||||
hyperparameters=hyperparameters,
|
||||
vpc_security_group_ids=vpc_security_group_ids,
|
||||
vpc_subnets=vpc_subnets,
|
||||
) # .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
kfp.compiler.Compiler().compile(
|
||||
rlestimator_training_custom_pipeline, __file__ + ".zip"
|
||||
)
|
||||
|
|
@ -0,0 +1,93 @@
|
|||
#!/usr/bin/env python3
|
||||
|
||||
# Uncomment the apply(use_aws_secret()) below if you are not using OIDC
|
||||
# more info : https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/README.md
|
||||
|
||||
import kfp
|
||||
import os
|
||||
from kfp import components
|
||||
from kfp import dsl
|
||||
from kfp.aws import use_aws_secret
|
||||
|
||||
|
||||
cur_file_dir = os.path.dirname(__file__)
|
||||
components_dir = os.path.join(cur_file_dir, "../../../../components/aws/sagemaker/")
|
||||
|
||||
sagemaker_rlestimator_op = components.load_component_from_file(
|
||||
components_dir + "/rlestimator/component.yaml"
|
||||
)
|
||||
|
||||
metric_definitions = [
|
||||
{
|
||||
"Name": "episode_reward_mean",
|
||||
"Regex": "episode_reward_mean: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)",
|
||||
},
|
||||
{
|
||||
"Name": "episode_reward_max",
|
||||
"Regex": "episode_reward_max: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)",
|
||||
},
|
||||
{
|
||||
"Name": "episode_len_mean",
|
||||
"Regex": "episode_len_mean: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)",
|
||||
},
|
||||
{"Name": "entropy", "Regex": "entropy: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)"},
|
||||
{
|
||||
"Name": "episode_reward_min",
|
||||
"Regex": "episode_reward_min: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)",
|
||||
},
|
||||
{"Name": "vf_loss", "Regex": "vf_loss: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)"},
|
||||
{
|
||||
"Name": "policy_loss",
|
||||
"Regex": "policy_loss: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)",
|
||||
},
|
||||
]
|
||||
|
||||
output_bucket_name = "your_sagemaker_bucket_name"
|
||||
input_bucket_name = "your_sagemaker_bucket_name"
|
||||
input_key = "rl-newsvendor-2020-11-11-10-43-30-556/source/sourcedir.tar.gz"
|
||||
job_name_prefix = "rlestimator-pipeline-toolkit-image"
|
||||
role = "your_sagemaker_role_name"
|
||||
|
||||
|
||||
@dsl.pipeline(
|
||||
name="RLEstimator Toolkit & Framework Pipeline",
|
||||
description="RLEstimator training job where the AWS Docker image is auto-selected based on the Toolkit and Framework we define",
|
||||
)
|
||||
def rlestimator_training_toolkit_pipeline(
|
||||
region="us-east-1",
|
||||
entry_point="train_news_vendor.py",
|
||||
source_dir="s3://{}/{}".format(input_bucket_name, input_key),
|
||||
toolkit="ray",
|
||||
toolkit_version="0.8.5",
|
||||
framework="tensorflow",
|
||||
assume_role=role,
|
||||
instance_type="ml.c5.2xlarge",
|
||||
instance_count=1,
|
||||
output_path="s3://{}/".format(output_bucket_name),
|
||||
base_job_name=job_name_prefix,
|
||||
metric_definitions=metric_definitions,
|
||||
max_run=300,
|
||||
hyperparameters={},
|
||||
):
|
||||
rlestimator_training_toolkit = sagemaker_rlestimator_op(
|
||||
region=region,
|
||||
entry_point=entry_point,
|
||||
source_dir=source_dir,
|
||||
toolkit=toolkit,
|
||||
toolkit_version=toolkit_version,
|
||||
framework=framework,
|
||||
role=assume_role,
|
||||
instance_type=instance_type,
|
||||
instance_count=instance_count,
|
||||
model_artifact_path=output_path,
|
||||
job_name=base_job_name,
|
||||
metric_definitions=metric_definitions,
|
||||
max_run=max_run,
|
||||
hyperparameters=hyperparameters,
|
||||
) # .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
kfp.compiler.Compiler().compile(
|
||||
rlestimator_training_toolkit_pipeline, __file__ + ".zip"
|
||||
)
|
||||
|
|
@ -0,0 +1,87 @@
|
|||
Examples for creating a simulation application, running a simulation job, running a simulation job batch, and deleting a simulation application.
|
||||
|
||||
## Examples
|
||||
|
||||
The examples are based on a notebook from the [AWS SageMaker Examples](https://github.com/aws/amazon-sagemaker-examples) repo.
|
||||
|
||||
The simulation jobs that are launched by these examples are based on the
|
||||
[`rl_objecttracker_robomaker_coach_gazebo`](https://github.com/aws/amazon-sagemaker-examples/tree/3de42334720a7197ea1f15395b66c44cf5ef7fd4/reinforcement_learning/rl_objecttracker_robomaker_coach_gazebo) notebook.
|
||||
This is an older notebook example, but you can still download it from github and upload directly to Jupyter Lab in SageMaker.
|
||||
|
||||
|
||||
## Prerequisites
|
||||
|
||||
To run these examples you will need to create a number of resources that will then be used as inputs for the pipeline component.
|
||||
Some of the inputs are used to create the RoboMaker Simulation Application and some are used as inputs for the RoboMaker
|
||||
Simulation Job.
|
||||
|
||||
required inputs for simulation job example:
|
||||
```
|
||||
role = <robomaker execution role, this is created for you automatically when you launch a notebook instance>
|
||||
region = <region in which to deploy the robomaker resources>
|
||||
app_name = <name to be given to the simulation application>
|
||||
sources = <source code files for the simulation application>
|
||||
simulation_software_suite = <select the simulation application software suite to use>
|
||||
robot_software_suite = <select the simulation application robot software suite to use>
|
||||
rendering_engine = <select the simulation application rendering engine suite to use>
|
||||
output_bucket = <bucket used for outputs from the training job>
|
||||
output_path = <key within the output bucket to use for output artifacts>
|
||||
max_run = <the maximum time to run the simulation job for>
|
||||
max_run = <the maximum time to run the simulation job for>
|
||||
failure_behavior = <"Fail" or "Continue">
|
||||
sim_app_arn = <used as input to simulation job component, comes as an output from simulation application component>
|
||||
sim_app_launch_config = <dictionary containing launch configurations>
|
||||
vpc_subnets = <subnets to launch the simulation job into>
|
||||
vpc_security_group_ids = <security groups to use if launching in a VPC>
|
||||
use_public_ip = <whether or not to use a public ip to access the simulation job>
|
||||
```
|
||||
|
||||
required inputs for simulation job batch example:
|
||||
```
|
||||
role = <robomaker execution role, this is created for you automatically when you launch a notebook instance>
|
||||
region = <region in which to deploy the robomaker resources>
|
||||
app_name = <name to be given to the simulation application>
|
||||
sources = <source code files for the simulation application>
|
||||
simulation_software_suite = <select the simulation application software suite to use>
|
||||
robot_software_suite = <select the simulation application robot software suite to use>
|
||||
rendering_engine = <select the simulation application rendering engine suite to use>
|
||||
timeout_in_secs = <maximum timeout to wait for simulation jobs in batch to launch>,
|
||||
max_concurrency = <maximum concurrency for simulation jobs in batch>,
|
||||
simulation_job_requests = <the definitions for the simulation jobs, things like launch configs and vpc configs are placed in here>,
|
||||
sim_app_arn=robomaker_create_sim_app.outputs["arn"]
|
||||
sim_app_arn = <used as input to simulation job component, comes as an output from simulation application component>
|
||||
```
|
||||
|
||||
You could go to the bother of creating all of these resources individually, but it might be easier to run the notebook
|
||||
mentioned above, and then use the resources that are created by that notebook. The notebook should create the output_bucket,
|
||||
output_key, vpc configs, launch config, etc and you can use those as the inputs for this example.
|
||||
|
||||
## Compiling the pipeline template
|
||||
|
||||
Follow the guide to [building a pipeline](https://www.kubeflow.org/docs/guides/pipelines/build-pipeline/) to install the Kubeflow Pipelines SDK, then run the following command to compile the sample Python into a workflow specification. The specification takes the form of a YAML file compressed into a `.tar.gz` file.
|
||||
|
||||
```bash
|
||||
dsl-compile --py rlestimator_pipeline_custom_image.py --output rlestimator_pipeline_custom_image.tar.gz
|
||||
dsl-compile --py rlestimator_pipeline_toolkit_image.py --output rlestimator_pipeline_toolkit_image.tar.gz
|
||||
dsl-compile --py sagemaker_robomaker_rl_job.py --output sagemaker_robomaker_rl_job.tar.gz
|
||||
```
|
||||
|
||||
## Deploying the pipeline
|
||||
|
||||
Open the Kubeflow pipelines UI. Create a new pipeline, and then upload the compiled specification (`.tar.gz` file) as a new pipeline template.
|
||||
|
||||
Once the pipeline done, you can go to the S3 path specified in `output` to check your prediction results. There're three columes, `PassengerId`, `prediction`, `Survived` (Ground True value)
|
||||
|
||||
```
|
||||
...
|
||||
4,1,1
|
||||
5,0,0
|
||||
6,0,0
|
||||
7,0,0
|
||||
...
|
||||
```
|
||||
|
||||
## Components source
|
||||
|
||||
RoboMaker Create Simulation Application:
|
||||
[source code](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/create_simulation_application/src)
|
||||
|
|
@ -0,0 +1,128 @@
|
|||
#!/usr/bin/env python3
|
||||
|
||||
# Uncomment the apply(use_aws_secret()) below if you are not using OIDC
|
||||
# more info : https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/README.md
|
||||
|
||||
import kfp
|
||||
import os
|
||||
from kfp import components
|
||||
from kfp import dsl
|
||||
import random
|
||||
import string
|
||||
from kfp.aws import use_aws_secret
|
||||
|
||||
|
||||
cur_file_dir = os.path.dirname(__file__)
|
||||
components_dir = os.path.join(cur_file_dir, "../../../../components/aws/sagemaker/")
|
||||
|
||||
robomaker_create_sim_app_op = components.load_component_from_file(
|
||||
components_dir + "/create_simulation_app/component.yaml"
|
||||
)
|
||||
|
||||
robomaker_sim_job_op = components.load_component_from_file(
|
||||
components_dir + "/simulation_job/component.yaml"
|
||||
)
|
||||
|
||||
robomaker_delete_sim_app_op = components.load_component_from_file(
|
||||
components_dir + "/delete_simulation_app/component.yaml"
|
||||
)
|
||||
|
||||
launch_config = {
|
||||
"packageName": "object_tracker_simulation",
|
||||
"launchFile": "evaluation.launch",
|
||||
"environmentVariables": {
|
||||
"MODEL_S3_BUCKET": "your_sagemaker_bucket_name",
|
||||
"MODEL_S3_PREFIX": "rl-object-tracker-sagemaker-201116-051751",
|
||||
"ROS_AWS_REGION": "us-east-1",
|
||||
"MARKOV_PRESET_FILE": "object_tracker.py",
|
||||
"NUMBER_OF_ROLLOUT_WORKERS": "1",
|
||||
},
|
||||
"streamUI": True,
|
||||
}
|
||||
|
||||
simulation_app_name = "robomaker-pipeline-simulation-application"
|
||||
sources_bucket = "your_sagemaker_bucket_name"
|
||||
sources_key = "object-tracker/simulation_ws.tar.gz"
|
||||
sources_architecture = "X86_64"
|
||||
simulation_software_name = "Gazebo"
|
||||
simulation_software_version = "7"
|
||||
robot_software_name = "ROS"
|
||||
robot_software_version = "Kinetic"
|
||||
rendering_engine_name = "OGRE"
|
||||
rendering_engine_version = "1.x"
|
||||
role = "your_sagemaker_role_name"
|
||||
output_bucket = "kf-pipelines-robomaker-output"
|
||||
output_key = "test-output-key"
|
||||
security_groups = ["sg-0490601e83f220e82"]
|
||||
subnets = [
|
||||
"subnet-0efc73526db16a4a4",
|
||||
"subnet-0b8af626f39e7d462",
|
||||
]
|
||||
|
||||
|
||||
@dsl.pipeline(
|
||||
name="RoboMaker Simulation Job Pipeline",
|
||||
description="RoboMaker simulation job and simulation application created via pipeline components",
|
||||
)
|
||||
def robomaker_simulation_job_app_pipeline(
|
||||
region="us-east-1",
|
||||
role=role,
|
||||
name=simulation_app_name
|
||||
+ "".join(random.choice(string.ascii_lowercase) for i in range(10)),
|
||||
sources=[
|
||||
{
|
||||
"s3Bucket": sources_bucket,
|
||||
"s3Key": sources_key,
|
||||
"architecture": sources_architecture,
|
||||
}
|
||||
],
|
||||
simulation_software_name=simulation_software_name,
|
||||
simulation_software_version=simulation_software_version,
|
||||
robot_software_name=robot_software_name,
|
||||
robot_software_version=robot_software_version,
|
||||
rendering_engine_name=rendering_engine_name,
|
||||
rendering_engine_version=rendering_engine_version,
|
||||
output_bucket=output_bucket,
|
||||
output_path=output_key,
|
||||
sim_app_launch_config=launch_config,
|
||||
vpc_security_group_ids=security_groups,
|
||||
vpc_subnets=subnets,
|
||||
):
|
||||
robomaker_create_sim_app = robomaker_create_sim_app_op(
|
||||
region=region,
|
||||
app_name=name,
|
||||
sources=sources,
|
||||
simulation_software_name=simulation_software_name,
|
||||
simulation_software_version=simulation_software_version,
|
||||
robot_software_name=robot_software_name,
|
||||
robot_software_version=robot_software_version,
|
||||
rendering_engine_name=rendering_engine_name,
|
||||
rendering_engine_version=rendering_engine_version,
|
||||
)
|
||||
# .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
|
||||
|
||||
robomaker_simulation_job = robomaker_sim_job_op(
|
||||
region=region,
|
||||
role=role,
|
||||
output_bucket=output_bucket,
|
||||
output_path=output_path,
|
||||
max_run=300,
|
||||
failure_behavior="Fail",
|
||||
sim_app_arn=robomaker_create_sim_app.outputs["arn"],
|
||||
sim_app_launch_config=sim_app_launch_config,
|
||||
vpc_security_group_ids=vpc_security_group_ids,
|
||||
vpc_subnets=vpc_subnets,
|
||||
use_public_ip="True",
|
||||
).after(robomaker_create_sim_app)
|
||||
# .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
|
||||
|
||||
robomaker_delete_sim_app = robomaker_delete_sim_app_op(
|
||||
region=region, arn=robomaker_create_sim_app.outputs["arn"],
|
||||
).after(robomaker_simulation_job, robomaker_create_sim_app)
|
||||
# .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
kfp.compiler.Compiler().compile(
|
||||
robomaker_simulation_job_app_pipeline, __file__ + ".zip"
|
||||
)
|
||||
|
|
@ -0,0 +1,136 @@
|
|||
#!/usr/bin/env python3
|
||||
|
||||
# Uncomment the apply(use_aws_secret()) below if you are not using OIDC
|
||||
# more info : https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/README.md
|
||||
|
||||
import kfp
|
||||
import os
|
||||
from kfp import components
|
||||
from kfp import dsl
|
||||
import random
|
||||
import string
|
||||
from kfp.aws import use_aws_secret
|
||||
|
||||
|
||||
cur_file_dir = os.path.dirname(__file__)
|
||||
components_dir = os.path.join(cur_file_dir, "../../../../components/aws/sagemaker/")
|
||||
|
||||
robomaker_create_sim_app_op = components.load_component_from_file(
|
||||
components_dir + "/create_simulation_app/component.yaml"
|
||||
)
|
||||
|
||||
robomaker_sim_job_batch_op = components.load_component_from_file(
|
||||
components_dir + "/simulation_job_batch/component.yaml"
|
||||
)
|
||||
|
||||
robomaker_delete_sim_app_op = components.load_component_from_file(
|
||||
components_dir + "/delete_simulation_app/component.yaml"
|
||||
)
|
||||
|
||||
simulation_app_name = "robomaker-pipeline-simulation-batch-application"
|
||||
sources_bucket = "your_sagemaker_bucket_name"
|
||||
sources_key = "object-tracker/simulation_ws.tar.gz"
|
||||
sources_architecture = "X86_64"
|
||||
simulation_software_name = "Gazebo"
|
||||
simulation_software_version = "7"
|
||||
robot_software_name = "ROS"
|
||||
robot_software_version = "Kinetic"
|
||||
rendering_engine_name = "OGRE"
|
||||
rendering_engine_version = "1.x"
|
||||
role = "your_sagemaker_role_name"
|
||||
|
||||
job_requests = [
|
||||
{
|
||||
"outputLocation": {
|
||||
"s3Bucket": "kf-pipelines-robomaker-output",
|
||||
"s3Prefix": "test-output-key",
|
||||
},
|
||||
"loggingConfig": {"recordAllRosTopics": True},
|
||||
"maxJobDurationInSeconds": 900,
|
||||
"iamRole": "your_sagemaker_role_name",
|
||||
"failureBehavior": "Fail",
|
||||
"simulationApplications": [
|
||||
{
|
||||
"application": "test-arn",
|
||||
"launchConfig": {
|
||||
"packageName": "object_tracker_simulation",
|
||||
"launchFile": "evaluation.launch",
|
||||
"environmentVariables": {
|
||||
"MODEL_S3_BUCKET": "your_sagemaker_bucket_name",
|
||||
"MODEL_S3_PREFIX": "rl-object-tracker-sagemaker-201116-051751",
|
||||
"ROS_AWS_REGION": "us-east-1",
|
||||
"MARKOV_PRESET_FILE": "object_tracker.py",
|
||||
"NUMBER_OF_ROLLOUT_WORKERS": "1",
|
||||
},
|
||||
"streamUI": True,
|
||||
},
|
||||
}
|
||||
],
|
||||
"vpcConfig": {
|
||||
"subnets": ["subnet-0efc73526db16a4a4", "subnet-0b8af626f39e7d462",],
|
||||
"securityGroups": ["sg-0490601e83f220e82"],
|
||||
"assignPublicIp": True,
|
||||
},
|
||||
}
|
||||
]
|
||||
|
||||
|
||||
@dsl.pipeline(
|
||||
name="RoboMaker Job Batch Pipeline",
|
||||
description="RoboMaker simulation job batch is launched via a pipeline component",
|
||||
)
|
||||
def robomaker_simulation_job_batch_app_pipeline(
|
||||
region="us-east-1",
|
||||
role=role,
|
||||
name=simulation_app_name
|
||||
+ "".join(random.choice(string.ascii_lowercase) for i in range(10)),
|
||||
sources=[
|
||||
{
|
||||
"s3Bucket": sources_bucket,
|
||||
"s3Key": sources_key,
|
||||
"architecture": sources_architecture,
|
||||
}
|
||||
],
|
||||
simulation_software_name=simulation_software_name,
|
||||
simulation_software_version=simulation_software_version,
|
||||
robot_software_name=robot_software_name,
|
||||
robot_software_version=robot_software_version,
|
||||
rendering_engine_name=rendering_engine_name,
|
||||
rendering_engine_version=rendering_engine_version,
|
||||
timeout_in_secs="900",
|
||||
max_concurrency="3",
|
||||
simulation_job_requests=job_requests,
|
||||
):
|
||||
robomaker_create_sim_app = robomaker_create_sim_app_op(
|
||||
region=region,
|
||||
app_name=name,
|
||||
sources=sources,
|
||||
simulation_software_name=simulation_software_name,
|
||||
simulation_software_version=simulation_software_version,
|
||||
robot_software_name=robot_software_name,
|
||||
robot_software_version=robot_software_version,
|
||||
rendering_engine_name=rendering_engine_name,
|
||||
rendering_engine_version=rendering_engine_version,
|
||||
)
|
||||
# .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
|
||||
|
||||
robomaker_simulation_batch_job = robomaker_sim_job_batch_op(
|
||||
region=region,
|
||||
role=role,
|
||||
timeout_in_secs=timeout_in_secs,
|
||||
max_concurrency=max_concurrency,
|
||||
simulation_job_requests=simulation_job_requests,
|
||||
sim_app_arn=robomaker_create_sim_app.outputs["arn"],
|
||||
).after(robomaker_create_sim_app)
|
||||
# .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
|
||||
|
||||
robomaker_delete_sim_app = robomaker_delete_sim_app_op(
|
||||
region=region, arn=robomaker_create_sim_app.outputs["arn"],
|
||||
).after(robomaker_simulation_batch_job, robomaker_create_sim_app)
|
||||
# .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
kfp.compiler.Compiler().compile(
|
||||
robomaker_simulation_job_batch_app_pipeline, __file__ + ".zip"
|
||||
)
|
||||
|
|
@ -0,0 +1,196 @@
|
|||
#!/usr/bin/env python3
|
||||
|
||||
# Uncomment the apply(use_aws_secret()) below if you are not using OIDC
|
||||
# more info : https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/README.md
|
||||
|
||||
import kfp
|
||||
import os
|
||||
from kfp import components
|
||||
from kfp import dsl
|
||||
import random
|
||||
import string
|
||||
from kfp.aws import use_aws_secret
|
||||
|
||||
cur_file_dir = os.path.dirname(__file__)
|
||||
components_dir = os.path.join(cur_file_dir, "../../../../components/aws/sagemaker/")
|
||||
|
||||
robomaker_create_sim_app_op = components.load_component_from_file(
|
||||
components_dir + "/create_simulation_app/component.yaml"
|
||||
)
|
||||
|
||||
robomaker_sim_job_op = components.load_component_from_file(
|
||||
components_dir + "/simulation_job/component.yaml"
|
||||
)
|
||||
|
||||
robomaker_delete_sim_app_op = components.load_component_from_file(
|
||||
components_dir + "/delete_simulation_app/component.yaml"
|
||||
)
|
||||
|
||||
sagemaker_rlestimator_op = components.load_component_from_file(
|
||||
components_dir + "/rlestimator/component.yaml"
|
||||
)
|
||||
|
||||
metric_definitions = [
|
||||
{"Name": "reward-training", "Regex": "^Training>.*Total reward=(.*?),"},
|
||||
{"Name": "ppo-surrogate-loss", "Regex": "^Policy training>.*Surrogate loss=(.*?),"},
|
||||
{"Name": "ppo-entropy", "Regex": "^Policy training>.*Entropy=(.*?),"},
|
||||
{"Name": "reward-testing", "Regex": "^Testing>.*Total reward=(.*?),"},
|
||||
]
|
||||
|
||||
# Simulation Application Inputs
|
||||
region = "us-east-1"
|
||||
simulation_software_name = "Gazebo"
|
||||
simulation_software_version = "7"
|
||||
robot_software_name = "ROS"
|
||||
robot_software_version = "Kinetic"
|
||||
rendering_engine_name = "OGRE"
|
||||
rendering_engine_version = "1.x"
|
||||
simulation_app_name = "robomaker-pipeline-objecttracker-sim-app" + "".join(
|
||||
random.choice(string.ascii_lowercase) for i in range(10)
|
||||
)
|
||||
sources_bucket = "your_sagemaker_bucket_name"
|
||||
sources_key = "object-tracker/simulation_ws.tar.gz"
|
||||
sources_architecture = "X86_64"
|
||||
sources = [
|
||||
{
|
||||
"s3Bucket": sources_bucket,
|
||||
"s3Key": sources_key,
|
||||
"architecture": sources_architecture,
|
||||
}
|
||||
]
|
||||
|
||||
# RLEstimator Inputs
|
||||
entry_point = "training_worker.py"
|
||||
rl_sources_key = "rl-object-tracker-sagemaker-201123-042019/source/sourcedir.tar.gz"
|
||||
source_dir = "s3://{}/{}".format(sources_bucket, rl_sources_key)
|
||||
rl_output_path = "s3://{}/".format(sources_bucket)
|
||||
train_instance_type = "ml.c5.2xlarge"
|
||||
train_instance_count = 1
|
||||
toolkit = "coach"
|
||||
toolkit_version = "0.11"
|
||||
framework = "tensorflow"
|
||||
job_name = "rl-kf-pipeline-objecttracker" + "".join(
|
||||
random.choice(string.ascii_lowercase) for i in range(10)
|
||||
)
|
||||
max_run = 300
|
||||
s3_prefix = "rl-object-tracker-sagemaker-201123-042019"
|
||||
hyperparameters = {
|
||||
"s3_bucket": sources_bucket,
|
||||
"s3_prefix": s3_prefix,
|
||||
"aws_region": "us-east-1",
|
||||
"RLCOACH_PRESET": "object_tracker",
|
||||
}
|
||||
role = "your_sagemaker_role_name"
|
||||
security_groups = ["sg-0490601e83f220e82"]
|
||||
subnets = [
|
||||
"subnet-0efc73526db16a4a4",
|
||||
"subnet-0b8af626f39e7d462",
|
||||
]
|
||||
|
||||
# Simulation Job Inputs
|
||||
output_bucket = "kf-pipelines-robomaker-output"
|
||||
output_key = "test-output-key"
|
||||
|
||||
|
||||
@dsl.pipeline(
|
||||
name="SageMaker & RoboMaker pipeline",
|
||||
description="SageMaker & RoboMaker Reinforcement Learning job where the jobs work together to train an RL model",
|
||||
)
|
||||
def sagemaker_robomaker_rl_job(
|
||||
region=region,
|
||||
role=role,
|
||||
name=simulation_app_name,
|
||||
sources=sources,
|
||||
simulation_software_name=simulation_software_name,
|
||||
simulation_software_version=simulation_software_version,
|
||||
robot_software_name=robot_software_name,
|
||||
robot_software_version=robot_software_version,
|
||||
rendering_engine_name=rendering_engine_name,
|
||||
rendering_engine_version=rendering_engine_version,
|
||||
output_bucket=output_bucket,
|
||||
robomaker_output_path=output_key,
|
||||
vpc_security_group_ids=security_groups,
|
||||
vpc_subnets=subnets,
|
||||
entry_point=entry_point,
|
||||
source_dir=source_dir,
|
||||
toolkit=toolkit,
|
||||
toolkit_version=toolkit_version,
|
||||
framework=framework,
|
||||
assume_role=role,
|
||||
instance_type=train_instance_type,
|
||||
instance_count=train_instance_count,
|
||||
output_path=rl_output_path,
|
||||
job_name=job_name,
|
||||
metric_definitions=metric_definitions,
|
||||
max_run=max_run,
|
||||
hyperparameters=hyperparameters,
|
||||
sources_bucket=sources_bucket,
|
||||
s3_prefix=s3_prefix,
|
||||
):
|
||||
robomaker_create_sim_app = robomaker_create_sim_app_op(
|
||||
region=region,
|
||||
app_name=name,
|
||||
sources=sources,
|
||||
simulation_software_name=simulation_software_name,
|
||||
simulation_software_version=simulation_software_version,
|
||||
robot_software_name=robot_software_name,
|
||||
robot_software_version=robot_software_version,
|
||||
rendering_engine_name=rendering_engine_name,
|
||||
rendering_engine_version=rendering_engine_version,
|
||||
)
|
||||
# .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
|
||||
|
||||
rlestimator_training_toolkit_coach = sagemaker_rlestimator_op(
|
||||
region=region,
|
||||
entry_point=entry_point,
|
||||
source_dir=source_dir,
|
||||
toolkit=toolkit,
|
||||
toolkit_version=toolkit_version,
|
||||
framework=framework,
|
||||
role=assume_role,
|
||||
instance_type=instance_type,
|
||||
instance_count=instance_count,
|
||||
model_artifact_path=output_path,
|
||||
job_name=job_name,
|
||||
max_run=max_run,
|
||||
hyperparameters=hyperparameters,
|
||||
metric_definitions=metric_definitions,
|
||||
vpc_subnets=vpc_subnets,
|
||||
vpc_security_group_ids=vpc_security_group_ids,
|
||||
)
|
||||
# .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
|
||||
|
||||
robomaker_simulation_job = robomaker_sim_job_op(
|
||||
region=region,
|
||||
role=role,
|
||||
output_bucket=output_bucket,
|
||||
output_path=robomaker_output_path,
|
||||
max_run=3800,
|
||||
failure_behavior="Continue",
|
||||
sim_app_arn=robomaker_create_sim_app.outputs["arn"],
|
||||
sim_app_launch_config={
|
||||
"packageName": "object_tracker_simulation",
|
||||
"launchFile": "evaluation.launch",
|
||||
"environmentVariables": {
|
||||
"MODEL_S3_BUCKET": sources_bucket,
|
||||
"MODEL_S3_PREFIX": s3_prefix,
|
||||
"ROS_AWS_REGION": region,
|
||||
"NUMBER_OF_ROLLOUT_WORKERS": "1",
|
||||
"MARKOV_PRESET_FILE": "object_tracker.py",
|
||||
},
|
||||
"streamUI": True,
|
||||
},
|
||||
vpc_security_group_ids=vpc_security_group_ids,
|
||||
vpc_subnets=vpc_subnets,
|
||||
use_public_ip="True",
|
||||
)
|
||||
# .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
|
||||
|
||||
robomaker_delete_sim_app = robomaker_delete_sim_app_op(
|
||||
region=region, arn=robomaker_create_sim_app.outputs["arn"],
|
||||
).after(robomaker_simulation_job, robomaker_create_sim_app)
|
||||
# .apply(use_aws_secret('aws-secret', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
kfp.compiler.Compiler().compile(sagemaker_robomaker_rl_job, __file__ + ".zip")
|
||||
Loading…
Reference in New Issue