History

Nicholas Thomson d81c8095d0 refactor(components): AWS SageMaker - Full component refactoring (#4336 ) * Temporary rebase commit * Add yaml compiler * Add compiler CLI * Update Dockerfile to copy all files * Add validate input list vs dict * Add unit test for new train * Add minor bug fixes * Override tag when generating specs * Update pydocs with formatter * Add contributing doc * Add formatters to CONTRIBUTING * Add working generic logic applied to train * Update component input and output to inherit * Downgrade to Python 3.7 * Update add outputValue to arg list * Updated outputValue to outputPath * Add empty string default to not-required inputs * Update path to component relative to root * Update faulty False-y condition * Update outputs to write to file * Update doc formatting * Update docstrings to match structure * Add unit tests for component and compiler * Add unit tests for component * Add spec unit tests * Add training unit tests * Update unit test automation * Add sample formatting checks * Remove extra flake8 check in integ tests * Add unit test black check * Update black formatting for all files * Update include black formatting * Add batch component * Remove old transform components * Update region input description * Add all component specs * Add deploy component * Add ground truth component * Add HPO component * Add create model component * Add processing component * Add workteam component * Add spec unit tests * Add deploy unit tests * Add ground truth unit tests * Add tuning component unit tests * Add create model component unit test * Add process component unit tests * Add workteam component unit tests * Remove output_path from required_args * Remove old component implementations * Update black formatting * Add assume role feature * Compiled all components * Update doc formatting * Fix process terminate syntax error * Update compiler to use kfp structures * Update nits * Update unified requirements * Rebase on debugging commit * Add debugger unit tests * Update formatting * Update component YAML * Fix unit test Dockerfile relative directory * Update unit test context to root * Update Batch to auto-generate name * Update minor docs and formatting changes * Update deploy name to common autogenerated * Add f-strings to logs * Add update support * Add Amazon license header * Update autogen and autoformat * Rename SpecValidator to SpecInputParser * Split requirements by dev and prod * Support for checking generated specs * Update minor changes * Update deploy component output description * Update components to beta repository * Update fix unit test requirements * Update unit test build spec for new results path * Update deploy wait for endpoint complete * Update component configure AWS clients in new method * Update boto3 retry method * Update license version * Update component YAML versions * Add new version to Changelog * Update component spec types * Update deploy config ignore overwrite * Update component for debugging * Update images back to 1.0.0 * Remove coverage from components		2020-10-27 14:17:57 -07:00
..
src	refactor(components): AWS SageMaker - Full component refactoring (#4336 )	2020-10-27 14:17:57 -07:00
README.md	feat(components): AWS SageMaker - Support for assuming a role (#4212 )	2020-08-03 10:53:43 -07:00
component.yaml	refactor(components): AWS SageMaker - Full component refactoring (#4336 )	2020-10-27 14:17:57 -07:00

README.md

SageMaker Ground Truth Kubeflow Pipelines component

Summary

Component to submit SageMaker Ground Truth labeling jobs directly from a Kubeflow Pipelines workflow.

Details

Intended Use

For Ground Truth jobs using AWS SageMaker.

Runtime Arguments

Argument	Description	Optional	Data type	Accepted values	Default
region	The region where the cluster launches	No	String
endpoint_url	The endpoint URL for the private link VPC endpoint	Yes	String
assume_role	The ARN of an IAM role to assume when connecting to SageMaker	Yes	String
role	The Amazon Resource Name (ARN) that Amazon SageMaker assumes to perform tasks on your behalf	No	String
job_name	The name of the Ground Truth job. Must be unique within the same AWS account and AWS region	Yes	String		LabelingJob-[datetime]-[random id]
label_attribute_name	The attribute name to use for the label in the output manifest file	Yes	String		job_name
manifest_location	The Amazon S3 location of the manifest file that describes the input data objects	No	String
output_location	The Amazon S3 path where you want Amazon SageMaker to store the results of the transform job	No	String
output_encryption_key	The AWS KMS key that Amazon SageMaker uses to encrypt the model artifacts	Yes	String
task_type	Built in image classification, bounding box, text classification, or semantic segmentation, or custom; If custom, please provide pre- and post-labeling task lambda functions	No	String	Image Classification, Bounding Box, Text Classification, Semantic Segmentation, Custom
worker_type	The workteam for data labeling	No	String	Public, Private, Vendor
workteam_arn	The ARN of the work team assigned to complete the tasks; specify if worker type is private or vendor	Yes	String
no_adult_content	If data is free of adult content; specify if worker type is public	Yes	Boolean	False, True	False
no_ppi	If data is free of personally identifiable information; specify if worker type is public	Yes	Boolean	False, True	False
label_category_config	The S3 URL of the JSON structured file that defines the categories used to label the data objects	Yes	String
max_human_labeled_objects	The maximum number of objects that can be labeled by human workers	Yes	Int	≥ 1	all objects
max_percent_objects	The maximum percentage of input data objects that should be labeled	Yes	Int	[1, 100]	100
enable_auto_labeling	Enables auto-labeling; only for bounding box, text classification, and image classification	Yes	Boolean	False, True	False
initial_model_arn	The ARN of the final model used for a previous auto-labeling job	Yes	String
resource_encryption_key	The AWS KMS key that Amazon SageMaker uses to encrypt data on the storage volume attached to the ML compute instance(s)	Yes	String
ui_template	The Amazon S3 bucket location of the UI template	No	String
pre_human_task_function	The ARN of a Lambda function that is run before a data object is sent to a human worker	Yes	String
post_human_task_function	The ARN of a Lambda function implements the logic for annotation consolidation	Yes	String
task_keywords	Keywords used to describe the task so that workers on Amazon Mechanical Turk can discover the task	Yes	String
title	A title for the task for your human workers	No	String
description	A description of the task for your human workers	No	String
num_workers_per_object	The number of human workers that will label an object	No	Int	[1, 9]
time_limit	The maximum run time in seconds per training job	No	Int	[30, 28800]
task_availibility	The length of time that a task remains available for labeling by human workers	Yes	Int	Public workforce: [1, 43200], other: [1, 864000]
max_concurrent_tasks	The maximum number of data objects that can be labeled by human workers at the same time	Yes	Int	[1, 1000]
workforce_task_price	The price that you pay for each task performed by a public worker in USD; Specify to the tenth fractions of a cent; Format as "0.000"	Yes	Float	0.000
tags	Key-value pairs to categorize AWS resources	Yes	Dict		{}

Outputs

Name	Description
output_manifest_location	URL where labeling results were stored
active_learning_model_arn	ARN of the resulting active learning model

README.md

SageMaker Ground Truth Kubeflow Pipelines component

Summary

Details

Intended Use

Runtime Arguments

Outputs

Requirements

Samples

Used in a pipeline with workteam creation and training

References