* Temporary rebase commit * Add yaml compiler * Add compiler CLI * Update Dockerfile to copy all files * Add validate input list vs dict * Add unit test for new train * Add minor bug fixes * Override tag when generating specs * Update pydocs with formatter * Add contributing doc * Add formatters to CONTRIBUTING * Add working generic logic applied to train * Update component input and output to inherit * Downgrade to Python 3.7 * Update add outputValue to arg list * Updated outputValue to outputPath * Add empty string default to not-required inputs * Update path to component relative to root * Update faulty False-y condition * Update outputs to write to file * Update doc formatting * Update docstrings to match structure * Add unit tests for component and compiler * Add unit tests for component * Add spec unit tests * Add training unit tests * Update unit test automation * Add sample formatting checks * Remove extra flake8 check in integ tests * Add unit test black check * Update black formatting for all files * Update include black formatting * Add batch component * Remove old transform components * Update region input description * Add all component specs * Add deploy component * Add ground truth component * Add HPO component * Add create model component * Add processing component * Add workteam component * Add spec unit tests * Add deploy unit tests * Add ground truth unit tests * Add tuning component unit tests * Add create model component unit test * Add process component unit tests * Add workteam component unit tests * Remove output_path from required_args * Remove old component implementations * Update black formatting * Add assume role feature * Compiled all components * Update doc formatting * Fix process terminate syntax error * Update compiler to use kfp structures * Update nits * Update unified requirements * Rebase on debugging commit * Add debugger unit tests * Update formatting * Update component YAML * Fix unit test Dockerfile relative directory * Update unit test context to root * Update Batch to auto-generate name * Update minor docs and formatting changes * Update deploy name to common autogenerated * Add f-strings to logs * Add update support * Add Amazon license header * Update autogen and autoformat * Rename SpecValidator to SpecInputParser * Split requirements by dev and prod * Support for checking generated specs * Update minor changes * Update deploy component output description * Update components to beta repository * Update fix unit test requirements * Update unit test build spec for new results path * Update deploy wait for endpoint complete * Update component configure AWS clients in new method * Update boto3 retry method * Update license version * Update component YAML versions * Add new version to Changelog * Update component spec types * Update deploy config ignore overwrite * Update component for debugging * Update images back to 1.0.0 * Remove coverage from components |
||
|---|---|---|
| .. | ||
| src | ||
| README.md | ||
| component.yaml | ||
README.md
SageMaker Ground Truth Kubeflow Pipelines component
Summary
Component to submit SageMaker Ground Truth labeling jobs directly from a Kubeflow Pipelines workflow.
Details
Intended Use
For Ground Truth jobs using AWS SageMaker.
Runtime Arguments
| Argument | Description | Optional | Data type | Accepted values | Default |
|---|---|---|---|---|---|
| region | The region where the cluster launches | No | String | ||
| endpoint_url | The endpoint URL for the private link VPC endpoint | Yes | String | ||
| assume_role | The ARN of an IAM role to assume when connecting to SageMaker | Yes | String | ||
| role | The Amazon Resource Name (ARN) that Amazon SageMaker assumes to perform tasks on your behalf | No | String | ||
| job_name | The name of the Ground Truth job. Must be unique within the same AWS account and AWS region | Yes | String | LabelingJob-[datetime]-[random id] | |
| label_attribute_name | The attribute name to use for the label in the output manifest file | Yes | String | job_name | |
| manifest_location | The Amazon S3 location of the manifest file that describes the input data objects | No | String | ||
| output_location | The Amazon S3 path where you want Amazon SageMaker to store the results of the transform job | No | String | ||
| output_encryption_key | The AWS KMS key that Amazon SageMaker uses to encrypt the model artifacts | Yes | String | ||
| task_type | Built in image classification, bounding box, text classification, or semantic segmentation, or custom; If custom, please provide pre- and post-labeling task lambda functions | No | String | Image Classification, Bounding Box, Text Classification, Semantic Segmentation, Custom | |
| worker_type | The workteam for data labeling | No | String | Public, Private, Vendor | |
| workteam_arn | The ARN of the work team assigned to complete the tasks; specify if worker type is private or vendor | Yes | String | ||
| no_adult_content | If data is free of adult content; specify if worker type is public | Yes | Boolean | False, True | False |
| no_ppi | If data is free of personally identifiable information; specify if worker type is public | Yes | Boolean | False, True | False |
| label_category_config | The S3 URL of the JSON structured file that defines the categories used to label the data objects | Yes | String | ||
| max_human_labeled_objects | The maximum number of objects that can be labeled by human workers | Yes | Int | ≥ 1 | all objects |
| max_percent_objects | The maximum percentage of input data objects that should be labeled | Yes | Int | [1, 100] | 100 |
| enable_auto_labeling | Enables auto-labeling; only for bounding box, text classification, and image classification | Yes | Boolean | False, True | False |
| initial_model_arn | The ARN of the final model used for a previous auto-labeling job | Yes | String | ||
| resource_encryption_key | The AWS KMS key that Amazon SageMaker uses to encrypt data on the storage volume attached to the ML compute instance(s) | Yes | String | ||
| ui_template | The Amazon S3 bucket location of the UI template | No | String | ||
| pre_human_task_function | The ARN of a Lambda function that is run before a data object is sent to a human worker | Yes | String | ||
| post_human_task_function | The ARN of a Lambda function implements the logic for annotation consolidation | Yes | String | ||
| task_keywords | Keywords used to describe the task so that workers on Amazon Mechanical Turk can discover the task | Yes | String | ||
| title | A title for the task for your human workers | No | String | ||
| description | A description of the task for your human workers | No | String | ||
| num_workers_per_object | The number of human workers that will label an object | No | Int | [1, 9] | |
| time_limit | The maximum run time in seconds per training job | No | Int | [30, 28800] | |
| task_availibility | The length of time that a task remains available for labeling by human workers | Yes | Int | Public workforce: [1, 43200], other: [1, 864000] | |
| max_concurrent_tasks | The maximum number of data objects that can be labeled by human workers at the same time | Yes | Int | [1, 1000] | |
| workforce_task_price | The price that you pay for each task performed by a public worker in USD; Specify to the tenth fractions of a cent; Format as "0.000" | Yes | Float | 0.000 | |
| tags | Key-value pairs to categorize AWS resources | Yes | Dict | {} |
Outputs
| Name | Description |
|---|---|
| output_manifest_location | URL where labeling results were stored |
| active_learning_model_arn | ARN of the resulting active learning model |
Requirements
Samples
Used in a pipeline with workteam creation and training
Mini image classification demo: Demo