pipelines/components/aws/sagemaker/rlestimator/README.md

6.6 KiB

SageMaker RLEstimator Kubeflow Pipelines component

Summary

Component to submit SageMaker RLEstimator (Reinforcement Learning) training jobs directly from a Kubeflow Pipelines workflow. https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-rl-workflow.html

Intended Use

For running your data processing workloads, such as feature engineering, data validation, model evaluation, and model interpretation using AWS SageMaker.

Runtime Arguments

Argument Description Optional Data type Accepted values Default
region The region where the cluster launches No String
endpoint_url The endpoint URL for the private link VPC endpoint Yes String
assume_role The ARN of an IAM role to assume when connecting to SageMaker Yes String
job_name The name of the training job. Must be unique within the same AWS account and AWS region Yes String TrainingJob-[datetime]-[random id]
role The Amazon Resource Name (ARN) that Amazon SageMaker assumes to perform tasks on your behalf No String
image The registry path of the Docker image that contains your custom image, or you can use a prebuilt AWS RL image Yes String
entry_point Path (absolute or relative) to the Python source file which should be executed as the entry point to training No String
source_dir Path (S3 URI) to a directory with any other training source code dependencies aside from the entry point file Yes String
toolkit RL toolkit you want to use for executing your model training code Yes String
toolkit_version RL toolkit version you want to be use for executing your model training code Yes String
framework Framework (MXNet, TensorFlow or PyTorch) you want to be used as a toolkit backed for reinforcement learning training Yes String
metric_definitions The dictionary of name-regex pairs specify the metrics that the algorithm emits Yes Dict {}
training_input_mode The input mode that the algorithm supports No String File, Pipe File
hyperparameters Hyperparameters for the selected algorithm No Dict Depends on Algo
instance_type The ML compute instance type Yes String ml.m4.xlarge, ml.m4.2xlarge, ml.m4.4xlarge, ml.m4.10xlarge, ml.m4.16xlarge, ml.m5.large, ml.m5.xlarge, ml.m5.2xlarge, ml.m5.4xlarge, ml.m5.12xlarge, ml.m5.24xlarge, ml.c4.xlarge, ml.c4.2xlarge, ml.c4.4xlarge, ml.c4.8xlarge, ml.p2.xlarge, ml.p2.8xlarge, ml.p2.16xlarge, ml.p3.2xlarge, ml.p3.8xlarge, ml.p3.16xlarge, ml.c5.xlarge, ml.c5.2xlarge, ml.c5.4xlarge, ml.c5.9xlarge, ml.c5.18xlarge and many more ml.m4.xlarge
instance_count The number of ML compute instances to use in each training job Yes Int ≥ 1 1
volume_size The size of the ML storage volume that you want to provision in GB Yes Int ≥ 1 30
max_run The maximum run time in seconds per training job Yes Int ≤ 432000 (5 days) 86400 (1 day)
model_artifact_path No String
output_encryption_key The AWS KMS key that Amazon SageMaker uses to encrypt the model artifacts Yes String
vpc_security_group_ids A comma-delimited list of security group IDs, in the form sg-xxxxxxxx Yes String
vpc_subnets A comma-delimited list of subnet IDs in the VPC to which you want to connect your RLEstimator job Yes String
spot_instance Use managed spot training if true No Boolean False, True False
max_wait_time The maximum time in seconds you are willing to wait for a managed spot training job to complete Yes Int ≤ 432000 (5 days) 86400 (1 day)
checkpoint_config Dictionary of information about the output location for managed spot training checkpoint data Yes Dict {}
debug_hook_config Dictionary of configuration information for the debug hook parameters, collection configurations, and storage paths Yes Dict {}
debug_rule_config List of configuration information for debugging rules Yes List of Dicts []
tags Key-value pairs to categorize AWS resources Yes Dict {}

Notes:

  • There are two ways to use this compnent, you can build your own Docker image with baked in code or pass code in via the source_dir input. You then use the entry_point to provide a filename to use as the code entrypoint.
  • The format for the debug_hook_config field is:
{
    "CollectionConfigurations": [
    {
        'CollectionName': 'string',
        'CollectionParameters': {
           'string' : 'string'
        }
     }
    ],
    'HookParameters': {
        'string' : 'string'
    },
    'LocalPath': 'string',
    'S3OutputPath': 'string'
}
[
    {
        'InstanceType': 'string',
        'LocalPath': 'string',
        'RuleConfigurationName': 'string',
        'RuleEvaluatorImage': 'string',
        'RuleParameters': {
            'string' : 'string'
        },
        'S3OutputPath': 'string',
        'VolumeSizeInGB': number
    }
]

Output

Stores the Model in the s3 bucket you specified via model_artifact_path

Example code

Simple example pipeline that uses a custom image : rlestimator_pipeline_custom_image Sample pipeline for using an image selected for you by the RLEstimator class dependent on the framework and toolkit you provide: rlestimator_pipeline_toolkit_image

Resources