* fix(components): make inputs.model_artifact_url optional in sagemaker model component * chore: run black * Fixed Stop bug commit f2092382ee941c2f33935db3e886093a15f103f7 Author: ananth102 <abashyam@amazon.com> Date: Fri Oct 7 19:51:55 2022 +0000 replaced image commit 2f0e2daa54fe80a3dfc471d393be62d612217b84 Merge: |
||
---|---|---|
.. | ||
src | ||
README.md | ||
component.yaml |
README.md
SageMaker Training Kubeflow Pipelines component
Summary
Component to submit SageMaker Training jobs directly from a Kubeflow Pipelines workflow. https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-training.html
Details
Intended Use
For model training using AWS SageMaker.
Runtime Arguments
Argument | Description | Optional | Data type | Accepted values | Default |
---|---|---|---|---|---|
region | The region where the cluster launches | No | String | ||
endpoint_url | The endpoint URL for the private link VPC endpoint | Yes | String | ||
assume_role | The ARN of an IAM role to assume when connecting to SageMaker | Yes | String | ||
job_name | The name of the training job. Must be unique within the same AWS account and AWS region | Yes | String | TrainingJob-[datetime]-[random id] | |
role | The Amazon Resource Name (ARN) that Amazon SageMaker assumes to perform tasks on your behalf | No | String | ||
image | The registry path of the Docker image that contains the training algorithm | Yes | String | ||
algorithm_name | The name of the algorithm resource to use for the hyperparameter tuning job; only specify this parameter if training image is not specified | Yes | String | ||
metric_definitions | The dictionary of name-regex pairs specify the metrics that the algorithm emits | Yes | Dict | {} | |
training_input_mode | The input mode that the algorithm supports | No | String | File, Pipe | File |
hyperparameters | Hyperparameters for the selected algorithm | No | Dict | Depends on Algo | |
channels | A list of dicts specifying the input channels (at least one); refer to documentation for parameters | No | List of Dicts | ||
instance_type | The ML compute instance type | Yes | String | ml.m4.xlarge, ml.m4.2xlarge, ml.m4.4xlarge, ml.m4.10xlarge, ml.m4.16xlarge, ml.m5.large, ml.m5.xlarge, ml.m5.2xlarge, ml.m5.4xlarge, ml.m5.12xlarge, ml.m5.24xlarge, ml.c4.xlarge, ml.c4.2xlarge, ml.c4.4xlarge, ml.c4.8xlarge, ml.p2.xlarge, ml.p2.8xlarge, ml.p2.16xlarge, ml.p3.2xlarge, ml.p3.8xlarge, ml.p3.16xlarge, ml.c5.xlarge, ml.c5.2xlarge, ml.c5.4xlarge, ml.c5.9xlarge, ml.c5.18xlarge and many more | ml.m4.xlarge |
instance_count | The number of ML compute instances to use in each training job | Yes | Int | ≥ 1 | 1 |
volume_size | The size of the ML storage volume that you want to provision in GB | Yes | Int | ≥ 1 | 30 |
resource_encryption_key | The AWS KMS key that Amazon SageMaker uses to encrypt data on the storage volume attached to the ML compute instance(s) | Yes | String | ||
max_run_time | The maximum run time in seconds per training job | Yes | Int | ≤ 432000 (5 days) | 86400 (1 day) |
model_artifact_path | No | String | |||
output_encryption_key | The AWS KMS key that Amazon SageMaker uses to encrypt the model artifacts | Yes | String | ||
vpc_security_group_ids | A comma-delimited list of security group IDs, in the form sg-xxxxxxxx | Yes | String | ||
vpc_subnets | A comma-delimited list of subnet IDs in the VPC to which you want to connect your hpo job | Yes | String | ||
network_isolation | Isolates the training container if true | No | Boolean | False, True | True |
traffic_encryption | Encrypts all communications between ML compute instances in distributed training if true | No | Boolean | False, True | False |
spot_instance | Use managed spot training if true | No | Boolean | False, True | False |
max_wait_time | The maximum time in seconds you are willing to wait for a managed spot training job to complete | Yes | Int | ≤ 432000 (5 days) | 86400 (1 day) |
checkpoint_config | Dictionary of information about the output location for managed spot training checkpoint data | Yes | Dict | {} | |
debug_hook_config | Dictionary of configuration information for the debug hook parameters, collection configurations, and storage paths | Yes | Dict | {} | |
debug_rule_config | List of configuration information for debugging rules. | Yes | List of Dicts | [] | |
tags | Key-value pairs to categorize AWS resources | Yes | Dict | {} |
Notes:
- Please use the links in the Resources section for detailed information on each input parameter and SageMaker APIs used in this component.
- The value of
RuleEvaluatorImage
will depend on two things: the region and whether the rule is a built-in or a custom rule. Debugger Registry URLs in the Resources section will lead you to the documentation which outlines what the value ofRuleEvaluatorImage
will be. - The format for the
debug_hook_config
field is:
{
"CollectionConfigurations": [
{
'CollectionName': 'string',
'CollectionParameters': {
'string' : 'string'
}
}
],
'HookParameters': {
'string' : 'string'
},
'LocalPath': 'string',
'S3OutputPath': 'string'
}
- The format for the
debug_rule_config
field is:
[
{
'InstanceType': 'string',
'LocalPath': 'string',
'RuleConfigurationName': 'string',
'RuleEvaluatorImage': 'string',
'RuleParameters': {
'string' : 'string'
},
'S3OutputPath': 'string',
'VolumeSizeInGB': number
}
]
Output
Stores the Model in the s3 bucket you specified
Example code
Simple example pipeline with only Train component : simple_train_pipeline Sample Pipeline for Training Component with Debugger: sagemaker_debugger_demo