pipelines/components/aws/sagemaker/hyperparameter_tuning
Thang Minh Vu 328edd8117
fix(components): make inputs.model_artifact_url optional in sagemaker model component (#8336)
* fix(components): make inputs.model_artifact_url optional in sagemaker model component

* chore: run black

* Fixed Stop bug

commit f2092382ee941c2f33935db3e886093a15f103f7
Author: ananth102 <abashyam@amazon.com>
Date:   Fri Oct 7 19:51:55 2022 +0000

    replaced image

commit 2f0e2daa54fe80a3dfc471d393be62d612217b84
Merge: bf2389a66 7ce165432
Author: ananth102 <abashyam@amazon.com>
Date:   Fri Oct 7 19:50:28 2022 +0000

    Merge remote-tracking branch 'stopfix/handle_stopped' into kfpv1fixes2

commit 7ce165432e
Author: Kartik Kalamadi <kalamadi@amazon.com>
Date:   Thu Mar 3 09:58:16 2022 -0800

    Run black

commit 32d6e1388a
Author: Kartik Kalamadi <kalamadi@amazon.com>
Date:   Tue Mar 1 15:25:32 2022 -0800

    Change image for testing

commit 7875d9aa27
Author: Kartik Kalamadi <kalamadi@amazon.com>
Date:   Mon Jan 31 09:29:50 2022 -0800

    Handle Stopped state for all components and fix bug in robomaker simulation function

* chore(docs): Update model README.md

Update README

* updated image and liscense

* chore: pop ModelDataUrl if not exist

* fix: make field as option in aws batch_transform component

chore: run black

chore: revert docker version pump up

chore(docs): update valid instance types

Remove key if not use

Pop KmsKeyId

* update changelog

* chore: pop DataProcessing if no value supplied

* test(components): Update test

* fix(batch_transform): only pop input and output

* fixed log bug

Co-authored-by: ananth102 <abashyam@amazon.com>
2022-10-14 22:12:49 +00:00
..
src fix(components): make inputs.model_artifact_url optional in sagemaker model component (#8336) 2022-10-14 22:12:49 +00:00
README.md feat(components): AWS SageMaker - Support for assuming a role (#4212) 2020-08-03 10:53:43 -07:00
component.yaml fix(components): make inputs.model_artifact_url optional in sagemaker model component (#8336) 2022-10-14 22:12:49 +00:00

README.md

SageMaker hyperparameter optimization Kubeflow Pipeline component

Summary

Component to submit hyperparameter tuning jobs to SageMaker directly from a Kubeflow Pipelines workflow.

Details

Intended Use

For hyperparameter tuning jobs using AWS SageMaker.

Runtime Arguments

Argument Description Optional (in pipeline definition) Optional (in UI) Data type Accepted values Default
region The region where the cluster launches No No String
endpoint_url The endpoint URL for the private link VPC endpoint Yes String
assume_role The ARN of an IAM role to assume when connecting to SageMaker Yes String
job_name The name of the tuning job. Must be unique within the same AWS account and AWS region Yes Yes String HPOJob-[datetime]-[random id]
role The Amazon Resource Name (ARN) that Amazon SageMaker assumes to perform tasks on your behalf No No String
image The registry path of the Docker image that contains the training algorithm Yes Yes String
algorithm_name The name of the algorithm resource to use for the hyperparameter tuning job; only specify this parameter if training image is not specified Yes Yes String
training_input_mode The input mode that the algorithm supports Yes No String File, Pipe File
metric_definitions The dictionary of name-regex pairs specify the metrics that the algorithm emits Yes Yes Dict {}
strategy How hyperparameter tuning chooses the combinations of hyperparameter values to use for the training job it launches Yes No String Bayesian, Random Bayesian
metric_name The name of the metric to use for the objective metric No No String
metric_type Whether to minimize or maximize the objective metric No No String Maximize, Minimize
early_stopping_type Whether to minimize or maximize the objective metric Yes No String Off, Auto Off
static_parameters The values of hyperparameters that do not change for the tuning job Yes Yes Dict {}
integer_parameters The array of IntegerParameterRange objects that specify ranges of integer hyperparameters that you want to search Yes Yes List of Dicts []
continuous_parameters The array of ContinuousParameterRange objects that specify ranges of continuous hyperparameters that you want to search Yes Yes List of Dicts []
categorical_parameters The array of CategoricalParameterRange objects that specify ranges of categorical hyperparameters that you want to search Yes Yes List of Dicts []
channels A list of dicts specifying the input channels (at least one); refer to documentation for parameters No No List of Dicts
output_location The Amazon S3 path where you want Amazon SageMaker to store the results of the transform job No No String
output_encryption_key The AWS KMS key that Amazon SageMaker uses to encrypt the model artifacts Yes Yes String
instance_type The ML compute instance type Yes No String ml.m4.xlarge, ml.m4.2xlarge, ml.m4.4xlarge, ml.m4.10xlarge, ml.m4.16xlarge, ml.m5.large, ml.m5.xlarge, ml.m5.2xlarge, ml.m5.4xlarge, ml.m5.12xlarge, ml.m5.24xlarge, ml.c4.xlarge, ml.c4.2xlarge, ml.c4.4xlarge, ml.c4.8xlarge, ml.p2.xlarge, ml.p2.8xlarge, ml.p2.16xlarge, ml.p3.2xlarge, ml.p3.8xlarge, ml.p3.16xlarge, ml.c5.xlarge, ml.c5.2xlarge, ml.c5.4xlarge, ml.c5.9xlarge, ml.c5.18xlarge and many more ml.m4.xlarge
instance_count The number of ML compute instances to use in each training job Yes Yes Int ≥ 1 1
volume_size The size of the ML storage volume that you want to provision in GB Yes Yes Int ≥ 1 30
max_num_jobs The maximum number of training jobs that a hyperparameter tuning job can launch No No Int [1, 500]
max_parallel_jobs The maximum number of concurrent training jobs that a hyperparameter tuning job can launch No No Int [1, 10]
max_run_time The maximum run time in seconds per training job Yes Yes Int ≤ 432000 (5 days) 86400 (1 day)
resource_encryption_key The AWS KMS key that Amazon SageMaker uses to encrypt data on the storage volume attached to the ML compute instance(s) Yes Yes String
vpc_security_group_ids A comma-delimited list of security group IDs, in the form sg-xxxxxxxx Yes Yes String
vpc_subnets A comma-delimited list of subnet IDs in the VPC to which you want to connect your hpo job Yes Yes String
network_isolation Isolates the training container if true Yes No Boolean False, True True
traffic_encryption Encrypts all communications between ML compute instances in distributed training if true Yes No Boolean False, True False
spot_instance Use managed spot training if true Yes No Boolean False, True False
max_wait_time The maximum time in seconds you are willing to wait for a managed spot training job to complete Yes Yes Int ≤ 432000 (5 days) 86400 (1 day)
checkpoint_config Dictionary of information about the output location for managed spot training checkpoint data Yes Yes Dict {}
warm_start_type Specifies the type of warm start used Yes No String IdenticalDataAndAlgorithm, TransferLearning
parent_hpo_jobs List of previously completed or stopped hyperparameter tuning jobs to be used as a starting point Yes Yes String Yes
tags Key-value pairs to categorize AWS resources Yes Yes Dict {}

Notes:

  • Specify training image OR algorithm name. Use the image parameter for Bring Your Own Container (BYOC) algorithms, and algorithm name for Amazon built-in algorithms, custom algorithm resources in SageMaker, and algorithms subscribed to from the AWS Marketplace.
  • Specify VPC security group IDs AND VPC subnets to specify the VPC that you want the training jobs to connect to.
  • Specify warm start type AND 1 to 5 parent HPO jobs to launch the hyperparameter tuning job with previous jobs as a starting point.

Outputs

Name Description
hpo_job_name The name of the hyper parameter tuning job
model_artifact_url URL where model artifacts were stored
best_job_name Best hyperparameter tuning training job name
best_hyperparameters Tuned hyperparameters
training_image The registry path of the Docker image that contains the training algorithm

Requirements

Samples

On its own

K-Means algorithm tuning on MNIST dataset: pipeline

Follow the steps as in the README with some modification:

  1. Get and store data in S3 buckets
  2. Prepare an IAM roles with permissions to run SageMaker jobs
  3. Add 'aws-secret' to your kubeflow namespace
  4. Compile the pipeline:
dsl-compile --py kmeans-hpo-pipeline.py --output kmeans-hpo-pipeline.tar.gz
  1. In the Kubeflow UI, upload the compiled pipeline specification (the .tar.gz file) and create a new run. Update the role_arn and the data paths, and optionally any other run parameters.
  2. Once the pipeline completes, you can see the outputs under 'Output parameters' in the HPO component's Input/Output section.

Integrated into a pipeline

MNIST Classification using K-Means pipeline: Pipeline | Steps

Resources