pipelines/components/aws/sagemaker/hyperparameter_tuning/README.md

8.4 KiB

SageMaker hyperparameter optimization Kubeflow Pipeline component

Summary

Component to submit hyperparameter tuning jobs to SageMaker directly from a Kubeflow Pipelines workflow.

Details

Intended Use

For hyperparameter tuning jobs using AWS SageMaker.

Runtime Arguments

Argument Description Optional (in pipeline definition) Optional (in UI) Data type Accepted values Default
region The region where the cluster launches No No String
endpoint_url The endpoint URL for the private link VPC endpoint Yes String
assume_role The ARN of an IAM role to assume when connecting to SageMaker Yes String
job_name The name of the tuning job. Must be unique within the same AWS account and AWS region Yes Yes String HPOJob-[datetime]-[random id]
role The Amazon Resource Name (ARN) that Amazon SageMaker assumes to perform tasks on your behalf No No String
image The registry path of the Docker image that contains the training algorithm Yes Yes String
algorithm_name The name of the algorithm resource to use for the hyperparameter tuning job; only specify this parameter if training image is not specified Yes Yes String
training_input_mode The input mode that the algorithm supports Yes No String File, Pipe File
metric_definitions The dictionary of name-regex pairs specify the metrics that the algorithm emits Yes Yes Dict {}
strategy How hyperparameter tuning chooses the combinations of hyperparameter values to use for the training job it launches Yes No String Bayesian, Random Bayesian
metric_name The name of the metric to use for the objective metric No No String
metric_type Whether to minimize or maximize the objective metric No No String Maximize, Minimize
early_stopping_type Whether to minimize or maximize the objective metric Yes No String Off, Auto Off
static_parameters The values of hyperparameters that do not change for the tuning job Yes Yes Dict {}
integer_parameters The array of IntegerParameterRange objects that specify ranges of integer hyperparameters that you want to search Yes Yes List of Dicts []
continuous_parameters The array of ContinuousParameterRange objects that specify ranges of continuous hyperparameters that you want to search Yes Yes List of Dicts []
categorical_parameters The array of CategoricalParameterRange objects that specify ranges of categorical hyperparameters that you want to search Yes Yes List of Dicts []
channels A list of dicts specifying the input channels (at least one); refer to documentation for parameters No No List of Dicts
output_location The Amazon S3 path where you want Amazon SageMaker to store the results of the transform job No No String
output_encryption_key The AWS KMS key that Amazon SageMaker uses to encrypt the model artifacts Yes Yes String
instance_type The ML compute instance type Yes No String ml.m4.xlarge, ml.m4.2xlarge, ml.m4.4xlarge, ml.m4.10xlarge, ml.m4.16xlarge, ml.m5.large, ml.m5.xlarge, ml.m5.2xlarge, ml.m5.4xlarge, ml.m5.12xlarge, ml.m5.24xlarge, ml.c4.xlarge, ml.c4.2xlarge, ml.c4.4xlarge, ml.c4.8xlarge, ml.p2.xlarge, ml.p2.8xlarge, ml.p2.16xlarge, ml.p3.2xlarge, ml.p3.8xlarge, ml.p3.16xlarge, ml.c5.xlarge, ml.c5.2xlarge, ml.c5.4xlarge, ml.c5.9xlarge, ml.c5.18xlarge and many more ml.m4.xlarge
instance_count The number of ML compute instances to use in each training job Yes Yes Int ≥ 1 1
volume_size The size of the ML storage volume that you want to provision in GB Yes Yes Int ≥ 1 30
max_num_jobs The maximum number of training jobs that a hyperparameter tuning job can launch No No Int [1, 500]
max_parallel_jobs The maximum number of concurrent training jobs that a hyperparameter tuning job can launch No No Int [1, 10]
max_run_time The maximum run time in seconds per training job Yes Yes Int ≤ 432000 (5 days) 86400 (1 day)
resource_encryption_key The AWS KMS key that Amazon SageMaker uses to encrypt data on the storage volume attached to the ML compute instance(s) Yes Yes String
vpc_security_group_ids A comma-delimited list of security group IDs, in the form sg-xxxxxxxx Yes Yes String
vpc_subnets A comma-delimited list of subnet IDs in the VPC to which you want to connect your hpo job Yes Yes String
network_isolation Isolates the training container if true Yes No Boolean False, True True
traffic_encryption Encrypts all communications between ML compute instances in distributed training if true Yes No Boolean False, True False
spot_instance Use managed spot training if true Yes No Boolean False, True False
max_wait_time The maximum time in seconds you are willing to wait for a managed spot training job to complete Yes Yes Int ≤ 432000 (5 days) 86400 (1 day)
checkpoint_config Dictionary of information about the output location for managed spot training checkpoint data Yes Yes Dict {}
warm_start_type Specifies the type of warm start used Yes No String IdenticalDataAndAlgorithm, TransferLearning
parent_hpo_jobs List of previously completed or stopped hyperparameter tuning jobs to be used as a starting point Yes Yes String Yes
tags Key-value pairs to categorize AWS resources Yes Yes Dict {}

Notes:

  • Specify training image OR algorithm name. Use the image parameter for Bring Your Own Container (BYOC) algorithms, and algorithm name for Amazon built-in algorithms, custom algorithm resources in SageMaker, and algorithms subscribed to from the AWS Marketplace.
  • Specify VPC security group IDs AND VPC subnets to specify the VPC that you want the training jobs to connect to.
  • Specify warm start type AND 1 to 5 parent HPO jobs to launch the hyperparameter tuning job with previous jobs as a starting point.

Outputs

Name Description
hpo_job_name The name of the hyper parameter tuning job
model_artifact_url URL where model artifacts were stored
best_job_name Best hyperparameter tuning training job name
best_hyperparameters Tuned hyperparameters
training_image The registry path of the Docker image that contains the training algorithm

Requirements

Samples

On its own

K-Means algorithm tuning on MNIST dataset: pipeline

Follow the steps as in the README with some modification:

  1. Get and store data in S3 buckets
  2. Prepare an IAM roles with permissions to run SageMaker jobs
  3. Add 'aws-secret' to your kubeflow namespace
  4. Compile the pipeline:
dsl-compile --py kmeans-hpo-pipeline.py --output kmeans-hpo-pipeline.tar.gz
  1. In the Kubeflow UI, upload the compiled pipeline specification (the .tar.gz file) and create a new run. Update the role_arn and the data paths, and optionally any other run parameters.
  2. Once the pipeline completes, you can see the outputs under 'Output parameters' in the HPO component's Input/Output section.

Integrated into a pipeline

MNIST Classification using K-Means pipeline: Pipeline | Steps

Resources