pipelines/components/aws/sagemaker/train
Gautam Kumar 6e2a55cf84
Changing the default volume size to 30 (#3792)
2020-05-20 12:36:20 -07:00
..
src Changing the default volume size to 30 (#3792) 2020-05-20 12:36:20 -07:00
README.md [AWS SageMaker] Unit tests for Training component (#3722) 2020-05-13 16:14:22 -07:00
component.yaml [AWS SageMaker] Specify component input types (#3683) 2020-05-11 22:06:21 -07:00

README.md

SageMaker Training Kubeflow Pipelines component

Summary

Component to submit SageMaker Training jobs directly from a Kubeflow Pipelines workflow. https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-training.html

Details

Intended Use

For model training using AWS SageMaker.

Runtime Arguments

Argument Description Optional Data type Accepted values Default
region The region where the cluster launches No String
endpoint_url The endpoint URL for the private link VPC endpoint. Yes String
job_name The name of the Ground Truth job. Must be unique within the same AWS account and AWS region Yes String LabelingJob-[datetime]-[random id]
role The Amazon Resource Name (ARN) that Amazon SageMaker assumes to perform tasks on your behalf No String
image The registry path of the Docker image that contains the training algorithm Yes String
algorithm_name The name of the algorithm resource to use for the hyperparameter tuning job; only specify this parameter if training image is not specified Yes String
metric_definitions The dictionary of name-regex pairs specify the metrics that the algorithm emits Yes Dict {}
put_mode The input mode that the algorithm supports No String File, Pipe File
hyperparameters Hyperparameters for the selected algorithm No Dict Depends on Algo
channels A list of dicts specifying the input channels (at least one); refer to documentation for parameters No No List of Dicts
instance_type The ML compute instance type Yes No String ml.m4.xlarge, ml.m4.2xlarge, ml.m4.4xlarge, ml.m4.10xlarge, ml.m4.16xlarge, ml.m5.large, ml.m5.xlarge, ml.m5.2xlarge, ml.m5.4xlarge, ml.m5.12xlarge, ml.m5.24xlarge, ml.c4.xlarge, ml.c4.2xlarge, ml.c4.4xlarge, ml.c4.8xlarge, ml.p2.xlarge, ml.p2.8xlarge, ml.p2.16xlarge, ml.p3.2xlarge, ml.p3.8xlarge, ml.p3.16xlarge, ml.c5.xlarge, ml.c5.2xlarge, ml.c5.4xlarge, ml.c5.9xlarge, ml.c5.18xlarge
instance_count The number of ML compute instances to use in each training job Yes Int ≥ 1 1
volume_size The size of the ML storage volume that you want to provision in GB Yes Int ≥ 1 30
resource_encryption_key The AWS KMS key that Amazon SageMaker uses to encrypt data on the storage volume attached to the ML compute instance(s) Yes String
max_run_time The maximum run time in seconds per training job Yes Int ≤ 432000 (5 days) 86400 (1 day)
model_artifact_path No String
output_encryption_key The AWS KMS key that Amazon SageMaker uses to encrypt the model artifacts Yes String
vpc_security_group_ids A comma-delimited list of security group IDs, in the form sg-xxxxxxxx Yes String
vpc_subnets A comma-delimited list of subnet IDs in the VPC to which you want to connect your hpo job Yes String
network_isolation Isolates the training container if true No Boolean False, True True
traffic_encryption Encrypts all communications between ML compute instances in distributed training if true No Boolean False, True False
spot_instance Use managed spot training if true No Boolean False, True False
max_wait_time The maximum time in seconds you are willing to wait for a managed spot training job to complete Yes Int ≤ 432000 (5 days) 86400 (1 day)
checkpoint_config Dictionary of information about the output location for managed spot training checkpoint data Yes Dict {}
tags Key-value pairs to categorize AWS resources Yes Dict {}

Output

Stores the Model in the s3 bucket you specified

Example code

Simple example pipeline with only Train component : simple_train_pipeline

Resources