pipelines/components/aws/sagemaker/ground_truth
Leonard O' Sullivan 4aa11c3c7f
feat(components) Adds RoboMaker and SageMaker RLEstimator components (#4813)
* Adds RoboMaker and SageMaker RLEstimator components

* Genericise samples

* Genericise samples

* Adds better logging and updates shim component in samples

* Adds fixes for PR comments. Updates tests accordingly

* Adds docker image reference for integration tests. Allows for setting job_name for RLEstimator training jobs

* Separate RM and SM execution roles

* Remove README reference to VPC config items

* Adds more reliable integration test for RoboMaker Simulation Job

* Simplifies integration tests

* Reverted test container entrypoints

* Update black formatting

* Update components for redbackthomson repo

* Prefix RLEstimator job name

* Add RoboMakerFullAccess to generated roles

* Update version to official 1.1.0

* Formatting int test file

* Add PassRole IAM permission to OIDC

* Adds ROBOMAKER_EXECUTION_ROLE_ARN to build vars

Co-authored-by: Nicholas Thomson <nithomso@amazon.com>
2020-12-11 13:27:27 -08:00
..
src refactor(components): AWS SageMaker - Full component refactoring (#4336) 2020-10-27 14:17:57 -07:00
README.md feat(components): AWS SageMaker - Support for assuming a role (#4212) 2020-08-03 10:53:43 -07:00
component.yaml feat(components) Adds RoboMaker and SageMaker RLEstimator components (#4813) 2020-12-11 13:27:27 -08:00

README.md

SageMaker Ground Truth Kubeflow Pipelines component

Summary

Component to submit SageMaker Ground Truth labeling jobs directly from a Kubeflow Pipelines workflow.

Details

Intended Use

For Ground Truth jobs using AWS SageMaker.

Runtime Arguments

Argument Description Optional Data type Accepted values Default
region The region where the cluster launches No String
endpoint_url The endpoint URL for the private link VPC endpoint Yes String
assume_role The ARN of an IAM role to assume when connecting to SageMaker Yes String
role The Amazon Resource Name (ARN) that Amazon SageMaker assumes to perform tasks on your behalf No String
job_name The name of the Ground Truth job. Must be unique within the same AWS account and AWS region Yes String LabelingJob-[datetime]-[random id]
label_attribute_name The attribute name to use for the label in the output manifest file Yes String job_name
manifest_location The Amazon S3 location of the manifest file that describes the input data objects No String
output_location The Amazon S3 path where you want Amazon SageMaker to store the results of the transform job No String
output_encryption_key The AWS KMS key that Amazon SageMaker uses to encrypt the model artifacts Yes String
task_type Built in image classification, bounding box, text classification, or semantic segmentation, or custom; If custom, please provide pre- and post-labeling task lambda functions No String Image Classification, Bounding Box, Text Classification, Semantic Segmentation, Custom
worker_type The workteam for data labeling No String Public, Private, Vendor
workteam_arn The ARN of the work team assigned to complete the tasks; specify if worker type is private or vendor Yes String
no_adult_content If data is free of adult content; specify if worker type is public Yes Boolean False, True False
no_ppi If data is free of personally identifiable information; specify if worker type is public Yes Boolean False, True False
label_category_config The S3 URL of the JSON structured file that defines the categories used to label the data objects Yes String
max_human_labeled_objects The maximum number of objects that can be labeled by human workers Yes Int ≥ 1 all objects
max_percent_objects The maximum percentage of input data objects that should be labeled Yes Int [1, 100] 100
enable_auto_labeling Enables auto-labeling; only for bounding box, text classification, and image classification Yes Boolean False, True False
initial_model_arn The ARN of the final model used for a previous auto-labeling job Yes String
resource_encryption_key The AWS KMS key that Amazon SageMaker uses to encrypt data on the storage volume attached to the ML compute instance(s) Yes String
ui_template The Amazon S3 bucket location of the UI template No String
pre_human_task_function The ARN of a Lambda function that is run before a data object is sent to a human worker Yes String
post_human_task_function The ARN of a Lambda function implements the logic for annotation consolidation Yes String
task_keywords Keywords used to describe the task so that workers on Amazon Mechanical Turk can discover the task Yes String
title A title for the task for your human workers No String
description A description of the task for your human workers No String
num_workers_per_object The number of human workers that will label an object No Int [1, 9]
time_limit The maximum run time in seconds per training job No Int [30, 28800]
task_availibility The length of time that a task remains available for labeling by human workers Yes Int Public workforce: [1, 43200], other: [1, 864000]
max_concurrent_tasks The maximum number of data objects that can be labeled by human workers at the same time Yes Int [1, 1000]
workforce_task_price The price that you pay for each task performed by a public worker in USD; Specify to the tenth fractions of a cent; Format as "0.000" Yes Float 0.000
tags Key-value pairs to categorize AWS resources Yes Dict {}

Outputs

Name Description
output_manifest_location URL where labeling results were stored
active_learning_model_arn ARN of the resulting active learning model

Requirements

Samples

Used in a pipeline with workteam creation and training

Mini image classification demo: Demo

References