pipelines/components/aws/sagemaker/process
Thang Minh Vu 328edd8117
fix(components): make inputs.model_artifact_url optional in sagemaker model component (#8336)
* fix(components): make inputs.model_artifact_url optional in sagemaker model component

* chore: run black

* Fixed Stop bug

commit f2092382ee941c2f33935db3e886093a15f103f7
Author: ananth102 <abashyam@amazon.com>
Date:   Fri Oct 7 19:51:55 2022 +0000

    replaced image

commit 2f0e2daa54fe80a3dfc471d393be62d612217b84
Merge: bf2389a66 7ce165432
Author: ananth102 <abashyam@amazon.com>
Date:   Fri Oct 7 19:50:28 2022 +0000

    Merge remote-tracking branch 'stopfix/handle_stopped' into kfpv1fixes2

commit 7ce165432e
Author: Kartik Kalamadi <kalamadi@amazon.com>
Date:   Thu Mar 3 09:58:16 2022 -0800

    Run black

commit 32d6e1388a
Author: Kartik Kalamadi <kalamadi@amazon.com>
Date:   Tue Mar 1 15:25:32 2022 -0800

    Change image for testing

commit 7875d9aa27
Author: Kartik Kalamadi <kalamadi@amazon.com>
Date:   Mon Jan 31 09:29:50 2022 -0800

    Handle Stopped state for all components and fix bug in robomaker simulation function

* chore(docs): Update model README.md

Update README

* updated image and liscense

* chore: pop ModelDataUrl if not exist

* fix: make field as option in aws batch_transform component

chore: run black

chore: revert docker version pump up

chore(docs): update valid instance types

Remove key if not use

Pop KmsKeyId

* update changelog

* chore: pop DataProcessing if no value supplied

* test(components): Update test

* fix(batch_transform): only pop input and output

* fixed log bug

Co-authored-by: ananth102 <abashyam@amazon.com>
2022-10-14 22:12:49 +00:00
..
src fix(components): make inputs.model_artifact_url optional in sagemaker model component (#8336) 2022-10-14 22:12:49 +00:00
README.md feat(components): AWS SageMaker - Support for assuming a role (#4212) 2020-08-03 10:53:43 -07:00
component.yaml fix(components): make inputs.model_artifact_url optional in sagemaker model component (#8336) 2022-10-14 22:12:49 +00:00

README.md

SageMaker Processing Kubeflow Pipelines component

Summary

Component to submit SageMaker Processing jobs directly from a Kubeflow Pipelines workflow. https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html

Intended Use

For running your data processing workloads, such as feature engineering, data validation, model evaluation, and model interpretation using AWS SageMaker.

Runtime Arguments

Argument Description Optional Data type Accepted values Default
region The region where the cluster launches No String
endpoint_url The endpoint URL for the private link VPC endpoint Yes String
assume_role The ARN of an IAM role to assume when connecting to SageMaker Yes String
job_name The name of the Processing job. Must be unique within the same AWS account and AWS region Yes String ProcessingJob-[datetime]-[random id]
role The Amazon Resource Name (ARN) that Amazon SageMaker assumes to perform tasks on your behalf No String
image The registry path of the Docker image that contains the processing script Yes String
instance_type The ML compute instance type Yes String ml.m4.xlarge, ml.m4.2xlarge, ml.m4.4xlarge, ml.m4.10xlarge, ml.m4.16xlarge, ml.m5.large, ml.m5.xlarge, ml.m5.2xlarge, ml.m5.4xlarge, ml.m5.12xlarge, ml.m5.24xlarge, ml.c4.xlarge, ml.c4.2xlarge, ml.c4.4xlarge, ml.c4.8xlarge, ml.p2.xlarge, ml.p2.8xlarge, ml.p2.16xlarge, ml.p3.2xlarge, ml.p3.8xlarge, ml.p3.16xlarge, ml.c5.xlarge, ml.c5.2xlarge, ml.c5.4xlarge, ml.c5.9xlarge, ml.c5.18xlarge and many more ml.m4.xlarge
instance_count The number of ML compute instances to use in each processing job Yes Int ≥ 1 1
volume_size The size of the ML storage volume that you want to provision in GB Yes Int ≥ 1 30
resource_encryption_key The AWS KMS key that Amazon SageMaker uses to encrypt data on the storage volume attached to the ML compute instance(s) Yes String
output_encryption_key The AWS KMS key that Amazon SageMaker uses to encrypt the model artifacts Yes String
max_run_time The maximum run time in seconds per processing job Yes Int ≤ 432000 (5 days) 86400 (1 day)
environment The environment variables to set in the Docker container Yes Yes Dict Maximum length of 1024. Key Pattern: [a-zA-Z_][a-zA-Z0-9_]*. Value Pattern: [\S\s]*. Upto 16 key and values entries in the map
container_entrypoint The entrypoint for the processing job. This is in the form of a list of strings that make a command Yes Yes List of Strings
container_arguments A list of string arguments to be passed to a processing job Yes Yes List of Strings
input_config Parameters that specify Amazon S3 inputs for a processing job No List of Dicts []
output_config Parameters that specify Amazon S3 outputs for a processing job No List of Dict []
vpc_security_group_ids A comma-delimited list of security group IDs, in the form sg-xxxxxxxx Yes String
vpc_subnets A comma-delimited list of subnet IDs in the VPC to which you want to connect your hpo job Yes String
network_isolation Isolates the processing container if true No Boolean False, True True
traffic_encryption Encrypts all communications between ML compute instances in distributed processing if true No Boolean False, True False
tags Key-value pairs to categorize AWS resources Yes Dict {}

Notes:

  • You can find more information about how container entrypoint and arguments are used at the Build Your Own Processing Container documentation.
  • Each key and value in the environment parameter string to string map can have length of up to 1024. SageMaker supports up to 16 entries in the map.
  • The format for the input_config field is:
[
  {
    'InputName': 'string',
    'S3Input': {
      'S3Uri': 'string',
      'LocalPath': 'string',
      'S3DataType': 'ManifestFile'|'S3Prefix',
      'S3InputMode': 'Pipe'|'File',
      'S3DataDistributionType': 'FullyReplicated'|'ShardedByS3Key',
      'S3CompressionType': 'None'|'Gzip'
    }
  },
]
[
  {
    'OutputName': 'string',
    'S3Output': {
      'S3Uri': 'string',
      'LocalPath': 'string',
      'S3UploadMode': 'Continuous'|'EndOfJob'
    }
  },
]

Outputs

Name Description
job_name Processing job name
output_artifacts A dictionary mapping with output_config OutputName as the key and S3Uri as the value

Requirements

Resources