* fix(components): make inputs.model_artifact_url optional in sagemaker model component * chore: run black * Fixed Stop bug commit f2092382ee941c2f33935db3e886093a15f103f7 Author: ananth102 <abashyam@amazon.com> Date: Fri Oct 7 19:51:55 2022 +0000 replaced image commit 2f0e2daa54fe80a3dfc471d393be62d612217b84 Merge: |
||
---|---|---|
.. | ||
src | ||
README.md | ||
component.yaml |
README.md
SageMaker RLEstimator Kubeflow Pipelines component
Summary
Component to submit SageMaker RLEstimator (Reinforcement Learning) training jobs directly from a Kubeflow Pipelines workflow. https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-rl-workflow.html
Intended Use
For running your data processing workloads, such as feature engineering, data validation, model evaluation, and model interpretation using AWS SageMaker.
Runtime Arguments
Argument | Description | Optional | Data type | Accepted values | Default |
---|---|---|---|---|---|
region | The region where the cluster launches | No | String | ||
endpoint_url | The endpoint URL for the private link VPC endpoint | Yes | String | ||
assume_role | The ARN of an IAM role to assume when connecting to SageMaker | Yes | String | ||
job_name | The name of the training job. Must be unique within the same AWS account and AWS region | Yes | String | TrainingJob-[datetime]-[random id] | |
role | The Amazon Resource Name (ARN) that Amazon SageMaker assumes to perform tasks on your behalf | No | String | ||
image | The registry path of the Docker image that contains your custom image, or you can use a prebuilt AWS RL image | Yes | String | ||
entry_point | Path (absolute or relative) to the Python source file which should be executed as the entry point to training | No | String | ||
source_dir | Path (S3 URI) to a directory with any other training source code dependencies aside from the entry point file | Yes | String | ||
toolkit | RL toolkit you want to use for executing your model training code | Yes | String | ||
toolkit_version | RL toolkit version you want to be use for executing your model training code | Yes | String | ||
framework | Framework (MXNet, TensorFlow or PyTorch) you want to be used as a toolkit backed for reinforcement learning training | Yes | String | ||
metric_definitions | The dictionary of name-regex pairs specify the metrics that the algorithm emits | Yes | Dict | {} | |
training_input_mode | The input mode that the algorithm supports | No | String | File, Pipe | File |
hyperparameters | Hyperparameters for the selected algorithm | No | Dict | Depends on Algo | |
instance_type | The ML compute instance type | Yes | String | ml.m4.xlarge, ml.m4.2xlarge, ml.m4.4xlarge, ml.m4.10xlarge, ml.m4.16xlarge, ml.m5.large, ml.m5.xlarge, ml.m5.2xlarge, ml.m5.4xlarge, ml.m5.12xlarge, ml.m5.24xlarge, ml.c4.xlarge, ml.c4.2xlarge, ml.c4.4xlarge, ml.c4.8xlarge, ml.p2.xlarge, ml.p2.8xlarge, ml.p2.16xlarge, ml.p3.2xlarge, ml.p3.8xlarge, ml.p3.16xlarge, ml.c5.xlarge, ml.c5.2xlarge, ml.c5.4xlarge, ml.c5.9xlarge, ml.c5.18xlarge and many more | ml.m4.xlarge |
instance_count | The number of ML compute instances to use in each training job | Yes | Int | ≥ 1 | 1 |
volume_size | The size of the ML storage volume that you want to provision in GB | Yes | Int | ≥ 1 | 30 |
max_run | The maximum run time in seconds per training job | Yes | Int | ≤ 432000 (5 days) | 86400 (1 day) |
model_artifact_path | No | String | |||
output_encryption_key | The AWS KMS key that Amazon SageMaker uses to encrypt the model artifacts | Yes | String | ||
vpc_security_group_ids | A comma-delimited list of security group IDs, in the form sg-xxxxxxxx | Yes | String | ||
vpc_subnets | A comma-delimited list of subnet IDs in the VPC to which you want to connect your RLEstimator job | Yes | String | ||
spot_instance | Use managed spot training if true | No | Boolean | False, True | False |
max_wait_time | The maximum time in seconds you are willing to wait for a managed spot training job to complete | Yes | Int | ≤ 432000 (5 days) | 86400 (1 day) |
checkpoint_config | Dictionary of information about the output location for managed spot training checkpoint data | Yes | Dict | {} | |
debug_hook_config | Dictionary of configuration information for the debug hook parameters, collection configurations, and storage paths | Yes | Dict | {} | |
debug_rule_config | List of configuration information for debugging rules | Yes | List of Dicts | [] | |
tags | Key-value pairs to categorize AWS resources | Yes | Dict | {} |
Notes:
- There are two ways to use this compnent, you can build your own Docker image with baked in code or pass code in via the source_dir input. You then use the entry_point to provide a filename to use as the code entrypoint.
- The format for the
debug_hook_config
field is:
{
"CollectionConfigurations": [
{
'CollectionName': 'string',
'CollectionParameters': {
'string' : 'string'
}
}
],
'HookParameters': {
'string' : 'string'
},
'LocalPath': 'string',
'S3OutputPath': 'string'
}
- The format for the
debug_rule_config
field is:
[
{
'InstanceType': 'string',
'LocalPath': 'string',
'RuleConfigurationName': 'string',
'RuleEvaluatorImage': 'string',
'RuleParameters': {
'string' : 'string'
},
'S3OutputPath': 'string',
'VolumeSizeInGB': number
}
]
Output
Stores the Model in the s3 bucket you specified via model_artifact_path
Example code
Simple example pipeline that uses a custom image : rlestimator_pipeline_custom_image Sample pipeline for using an image selected for you by the RLEstimator class dependent on the framework and toolkit you provide: rlestimator_pipeline_toolkit_image