Commit Graph

70 Commits

Author SHA1 Message Date
ananth102 664deaf933
test(components): Reduce sagemaker component test flakiness (#10225)
Signed-off-by: ananth102 <abashyam@amazon.com>
2024-02-14 19:29:10 +00:00
rd-pong fdb25f6e6d
test(components): fix k8s_client 401 unauthorized error (#9749)
* Initiate a new k8s client when calling _get_resource

* Remove k8s_client for methods that use _get_resource

* Initiate a new k8s client when calling _delete_resource
2023-07-18 18:37:22 +00:00
rd-pong 5cfaec6ebb
chore(test): Increase time out for git fetching and test run (#9462)
Increase timeout for rlestimator-training
2023-06-05 18:38:23 +00:00
ananth102 7de50a5839
test(components): Check the Sagemaker component output rather than Controller (#9402)
* update tests to check output

bugfix

fix another bug

* adress pr comments

* bug fix

* test fix

* namefix
2023-05-19 00:27:46 +00:00
rd-pong 1fa0893800
test(component): Update integration test for Model Monitor component (#9384)
* Add test that check component outputs

* Remove sagemaker check

* Extract get_output_ack_resource_metadata to a function
Extract "Scheduled" to constant FINAL_STATUS

* Extract to function: verify_monitoring_schedule_component_outputs
2023-05-12 22:29:29 +00:00
rd-pong 07e67bb0ca
feat(components): SageMaker V2 model monitor component and testing (#9253)
* Add model monitor component and integration tests

* Generate model monitor using updated generator

* Add sleep for monitoring schedule

* Update requirements v2

* Change model monitor image url

* minor fix

* minor fix

* minor fix

* Add unit testing for MonitoringSchedule

* Delete assume-role.json

* Add doc and sample pipeline for Monitoring Schedule

* Regenerate using the latest code generator.
Make parameter description 1 sentence long.

* Revert "Add doc and sample pipeline for Monitoring Schedule"

This reverts commit 6b3b7cc6f5.

* Delete print statements

* Update component with new generator

* address comments

* Add retry for _delete_resource

* Add try catch protection for _get_resource and _delete_resource

* Add integration tests for monitoring job definition components

* Update is_endpoint_deleted for new _get_resource

* Add integration test for updating monitoring schedule

* Remove update from canary

* Add doc and sample pipeline for Monitoring Schedule

* Add doc for monitoring job definition
Update doc for monitoring schedule

* Remove sample for monitoring schedule

* Address comments

* Address comments

* Address comment for unit test

* Address doc comments

* Address test comments
2023-05-09 19:42:33 +00:00
ananth102 aae9cb74cb
fix(components): Changing path for model (#9331)
* changing path for model

* corrective changes

* modified changelog

* change image location

* updated changelog

* changed changelog

* updated message
2023-05-05 00:45:23 +00:00
ananth102 57b52d29ea
test(components): fix Sagemaker component shallow canary (#9314) 2023-05-03 23:56:44 +00:00
ananth102 4818e849f8
feat(components): Sagemaker V2 Hosting components and tests (#9243)
* Hosting Components and test

* update dependency

* Regenerating with spec trimming

* handle None case

* adress pr comments

* another way of handling update not supported

* test changes

* removing unused logic

* Staging pr

* Added READMEs

* Main doc changes

* minor edit
2023-05-03 17:56:15 +00:00
rd-pong bec2834cdc
test: fix "ValueError: Unsupported region". Only populate necessary values for shallow_canary (#9236) 2023-04-25 19:46:45 +00:00
rd-pong 9c7aa16176
test: Upgrade package versions and remove dependency on "sagemaker-sample-data-<region>" bucket (#9204)
* Upgrade package version and change default instance type

Upgrade sagemaker version

Upgrade boto3 version

Upgrade pyyaml version

Change training and endpoint instance type

* Remove dependency on "sagemaker-sample-data-<region>"
2023-04-25 02:31:44 +00:00
rd-pong 3e7354e483
test(components): Add shallow_canary marker (#9180)
* Add shallow_canary marker

* Delete trainingV1 from shallow canary
2023-04-20 17:30:40 +00:00
jsitu777 fe9fc4de79
test(Component): push test stats to cloudwatch (#9130)
* push stats to cloudwatch

* remove h.py

* grab project name from env
2023-04-11 18:29:55 +00:00
Suraj Kota 56f1c80c7e
chore(components): add test image cache (#9111) 2023-04-06 19:26:41 +00:00
jsitu777 b4a203586e
test(Component): update argo-workflow version (#9040)
* update argo

* update aws-iam-authenticator path
2023-03-28 17:39:40 +00:00
jsitu777 b1c8d5111f
test(components): Changed miniconda to pip (#9030)
* change from conda to pip in DockerFile

* missing a equal sign

* missing a slash in eksctl path

* use ECR instead of docker

* revert ecr changes

* use ubuntu:18.04 as base image

* add copy requirement file

* replace docker image with ecr image to avoid rate limiting issue

* login to public ecr to use image
2023-03-23 22:41:22 +00:00
ananth102 943fb6bdff
test(components): disable imdsv1 (#8630)
* test: disable imdsv1

* move autokubeconfig

* update timeout

* Fix bad line

* test: removed redundancies
2022-12-28 20:46:19 +00:00
ananth102 6a6cfdbafb
feat(components): New sagemaker training job parameters (#8538)
* unit tests

* feature: generated new sagemaker features

* update unit test

* remove unit tests

* Release: Staging component for release

* reformatted files
2022-12-12 21:32:28 +00:00
ananth102 d159475f1a
Test: Update pytest xdist dependency in Sagemaker KFP integration test (#8389)
* fix: stable version

* fix: update python

* fix: update xdist instead of pytest
2022-10-27 20:54:19 +00:00
Thang Minh Vu 328edd8117
fix(components): make inputs.model_artifact_url optional in sagemaker model component (#8336)
* fix(components): make inputs.model_artifact_url optional in sagemaker model component

* chore: run black

* Fixed Stop bug

commit f2092382ee941c2f33935db3e886093a15f103f7
Author: ananth102 <abashyam@amazon.com>
Date:   Fri Oct 7 19:51:55 2022 +0000

    replaced image

commit 2f0e2daa54fe80a3dfc471d393be62d612217b84
Merge: bf2389a66 7ce165432
Author: ananth102 <abashyam@amazon.com>
Date:   Fri Oct 7 19:50:28 2022 +0000

    Merge remote-tracking branch 'stopfix/handle_stopped' into kfpv1fixes2

commit 7ce165432e
Author: Kartik Kalamadi <kalamadi@amazon.com>
Date:   Thu Mar 3 09:58:16 2022 -0800

    Run black

commit 32d6e1388a
Author: Kartik Kalamadi <kalamadi@amazon.com>
Date:   Tue Mar 1 15:25:32 2022 -0800

    Change image for testing

commit 7875d9aa27
Author: Kartik Kalamadi <kalamadi@amazon.com>
Date:   Mon Jan 31 09:29:50 2022 -0800

    Handle Stopped state for all components and fix bug in robomaker simulation function

* chore(docs): Update model README.md

Update README

* updated image and liscense

* chore: pop ModelDataUrl if not exist

* fix: make field as option in aws batch_transform component

chore: run black

chore: revert docker version pump up

chore(docs): update valid instance types

Remove key if not use

Pop KmsKeyId

* update changelog

* chore: pop DataProcessing if no value supplied

* test(components): Update test

* fix(batch_transform): only pop input and output

* fixed log bug

Co-authored-by: ananth102 <abashyam@amazon.com>
2022-10-14 22:12:49 +00:00
ananth102 14ce09d506
test: Make ephemeral sagemaker component tests more stable (#8346)
* increase timeout

Increase timeout to make canaries less flakey

* Increase minio timeout

Make canaries less flakey

* Update run_integration_tests

* correct sleep

* remove unnecessary wait
2022-10-11 01:28:25 +00:00
ananth102 bf2389a66c
test: Upgrade kfp version in sagemaker component test (#8331)
* upgrade kfp

* update eksctl

* upgrade kfp

* downgrade cluster

* upgrade node count

* updated cert-manager
2022-10-06 00:05:50 +00:00
ananth102 d4aaa03035
refactor(components): Open sourcing v2 AWS TrainingJob component. (#8258)
* added training source code

* added commonv2

* added v2 dockerfiles

* took out redudant installations

* fixed image

* fixed dockerignore

* updated docker description

* updated file name

* updated license

* removed unused dependencies

* removed redundant code

* updated dockerignore and licenses

* added v1 licesne to dockerignore

* temp change

* applied black

* decrease paralell tests

* updated documentation

* updated changelog and image

* increase paralell test runs

* updated dockerignore
2022-09-16 22:07:45 +00:00
ryansteakley 0368fc6174
chore(components): Update scripts to use public ecr instead of docker (#8264)
* Update scripts to use public ecr instead of docker

* other codebuild specs

* run black on non-formatted files

* login to general ecr

* change default image for generate_components

* use public ecr amazon linux

* use :2 tag

* add arg for kfp v1 or v2 build version

* change whitespace and add docker login back for integration tests

* enable buildkit

* use v2 license file if in v2 build-mode

* make build_version mandatory
2022-09-15 02:16:40 +00:00
ananth102 916777e62f
test(components): Added integration test for Sagemaker TrainingJob component v2. (#8247)
* Integration tests for sagemaker training v2

* removed redundant check

* removed redundant print

* added safety check

* pr changes

* updated python and kubernetes

* reverting dependency versions

* Revert "updated python and kubernetes"

This reverts commit e92034d5f9.

* added linting
2022-09-14 17:42:04 -07:00
Meghna Baijal e3678e1aad
chore: update the eksctl version required for latest eks version (#7976) 2022-07-01 06:07:08 +00:00
Meghna Baijal 8a4b06754a
chore: update EKS version and increase the EKS cluster creation timeout (#7975) 2022-07-01 01:00:09 +00:00
Mauricio Scheffer bb9fbc80c9
chore: fix link to pytest-xdist docs (#7760) 2022-06-30 20:05:17 +00:00
Meghna Baijal e867d1c31d
chore: update eks version to 1.19 for aws sagemaker integration tests (#7256) 2022-02-05 00:20:15 +00:00
ryansteakley 373dfe3792
chore: update eks version to 1.18 for aws sagemaker integration tests (#6847) 2021-11-01 17:04:59 -07:00
ryansteakley 40d8242bb0
chore: update aws sagemaker components tests to kfp 1.7.0 (#6805)
* Update to support kfp 1.7.0

* use variable for s3 bucket name
2021-10-27 11:08:25 -07:00
Meghna Baijal fd52629bde
fix(Dockerfile): fixes the dockerfile by allowing `apt-get update` to ignore releaseinfo changes. (#6356) 2021-08-16 12:44:42 -07:00
Joe Liedtke ade34542e0
chore: Updates argoproj/argo URLs to argoproj/argo-workflows (#5969)
* Updt argoproj/argo URLs to argoproj/argo-workflows

* Update link to workflows.ts

* Update license.txt to reduce # of changed lines

* Revert changes to backend Dockerfile & license.txt

* Update license.txt, keep line endings
2021-07-06 21:52:20 -07:00
Suraj Kota b50a5cfc4e
chore(components): AWS SageMaker update eks cluster version for tests (#5735) 2021-05-25 19:53:40 -07:00
Suraj Kota 52b40ed1ac
chore(components): AWS SageMaker tests enable ssm on eks worker nodes (#5437) 2021-04-06 18:31:01 -07:00
Kartik Kalamadi 079eea369a
fix(components): Print logs for AWS SageMaker components (#4879)
* Print logs for Processing and Batch Transform

* Change image in yamls

* Add unit tests for cw calls

* update version in license file to 1.1.1

* generate yaml for the new version

* update changelog
2021-02-23 19:19:14 -08:00
Suraj Kota 2dd8de3f6f
chore(components): SageMaker fix flaky groundtruth test (#5044) 2021-01-27 20:52:01 -08:00
Leonard O' Sullivan 4aa11c3c7f
feat(components) Adds RoboMaker and SageMaker RLEstimator components (#4813)
* Adds RoboMaker and SageMaker RLEstimator components

* Genericise samples

* Genericise samples

* Adds better logging and updates shim component in samples

* Adds fixes for PR comments. Updates tests accordingly

* Adds docker image reference for integration tests. Allows for setting job_name for RLEstimator training jobs

* Separate RM and SM execution roles

* Remove README reference to VPC config items

* Adds more reliable integration test for RoboMaker Simulation Job

* Simplifies integration tests

* Reverted test container entrypoints

* Update black formatting

* Update components for redbackthomson repo

* Prefix RLEstimator job name

* Add RoboMakerFullAccess to generated roles

* Update version to official 1.1.0

* Formatting int test file

* Add PassRole IAM permission to OIDC

* Adds ROBOMAKER_EXECUTION_ROLE_ARN to build vars

Co-authored-by: Nicholas Thomson <nithomso@amazon.com>
2020-12-11 13:27:27 -08:00
Kartik Kalamadi 008985a576
fix(components): AWS SageMaker - Retry delete EKS Cluster after Integ test failure (#4662)
* fix(components): AWS SageMaker - Retry delete EKS Cluster after Integ test failure

* decrease delete eks tiemout to 15 min
2020-11-06 12:12:29 -08:00
Nicholas Thomson d81c8095d0
refactor(components): AWS SageMaker - Full component refactoring (#4336)
* Temporary rebase commit

* Add yaml compiler

* Add compiler CLI

* Update Dockerfile to copy all files

* Add validate input list vs dict

* Add unit test for new train

* Add minor bug fixes

* Override tag when generating specs

* Update pydocs with formatter

* Add contributing doc

* Add formatters to CONTRIBUTING

* Add working generic logic applied to train

* Update component input and output to inherit

* Downgrade to Python 3.7

* Update add outputValue to arg list

* Updated outputValue to outputPath

* Add empty string default to not-required inputs

* Update path to component relative to root

* Update faulty False-y condition

* Update outputs to write to file

* Update doc formatting

* Update docstrings to match structure

* Add unit tests for component and compiler

* Add unit tests for component

* Add spec unit tests

* Add training unit tests

* Update unit test automation

* Add sample formatting checks

* Remove extra flake8 check in integ tests

* Add unit test black check

* Update black formatting for all files

* Update include black formatting

* Add batch component

* Remove old transform components

* Update region input description

* Add all component specs

* Add deploy component

* Add ground truth component

* Add HPO component

* Add create model component

* Add processing component

* Add workteam component

* Add spec unit tests

* Add deploy unit tests

* Add ground truth unit tests

* Add tuning component unit tests

* Add create model component unit test

* Add process component unit tests

* Add workteam component unit tests

* Remove output_path from required_args

* Remove old component implementations

* Update black formatting

* Add assume role feature

* Compiled all components

* Update doc formatting

* Fix process terminate syntax error

* Update compiler to use kfp structures

* Update nits

* Update unified requirements

* Rebase on debugging commit

* Add debugger unit tests

* Update formatting

* Update component YAML

* Fix unit test Dockerfile relative directory

* Update unit test context to root

* Update Batch to auto-generate name

* Update minor docs and formatting changes

* Update deploy name to common autogenerated

* Add f-strings to logs

* Add update support

* Add Amazon license header

* Update autogen and autoformat

* Rename SpecValidator to SpecInputParser

* Split requirements by dev and prod

* Support for checking generated specs

* Update minor changes

* Update deploy component output description

* Update components to beta repository

* Update fix unit test requirements

* Update unit test build spec for new results path

* Update deploy wait for endpoint complete

* Update component configure AWS clients in new method

* Update boto3 retry method

* Update license version

* Update component YAML versions

* Add new version to Changelog

* Update component spec types

* Update deploy config ignore overwrite

* Update component for debugging

* Update images back to 1.0.0

* Remove coverage from components
2020-10-27 14:17:57 -07:00
Suraj Kota 466147a2d8
chore(components): SageMaker integ test changes (#4603)
- fix clean up in groundtruth
2020-10-09 16:06:47 -07:00
Suraj Kota e87d74f036
chore(components): SageMaker integ tests, fix for unbound variable (#4595) 2020-10-08 17:17:06 -07:00
Meghna Baijal 237795539f
chore(components): AWS SageMaker - Fix leaking Workteam(GroundTruth) resources (#4536) 2020-09-30 13:08:54 -07:00
jkuruba 7c349f3f82
feat(components): AWS SageMaker - Changes for updating an existing endpoint (#4424)
* Changes for updating existing endpoint

* Review comments addressed

* Review comments addressed

* Review comments addressed

* Changed awscli and boto3 version. Ran black to format integration tests

* Removing temporarily to debug integration failures

* Adding back integration tests

* Control the number of parallel integration tests to 10

* Third Party License updated

* Version changed to 0.9.0

* Fixed a typo in Changelog
2020-09-18 16:00:28 -07:00
Nicholas Thomson 2f7a5e5a2b
chore(components): AWS SageMaker - Fix leaking resources (#4457)
* Add try/catch cleanup for integ resources

* Update pytest-xdist to 2.1

* Fix groundtruth workteam invocation
2020-09-02 16:11:40 -07:00
Kartik Kalamadi 05398cf475
[AWS SageMaker] Fix small bugs (#4161)
* fix small bugs

* add SKIP_OIDC_SETUP config

* address comments

* address comments and add KFP_VERSION to .env

* typo
2020-09-01 23:17:06 -07:00
Dustin Luong 3ebd075212
feat(components): AWS SageMaker - Add optional parameter to allow training component to accept parameters related to Debugger (#4283)
* Implemented debugger for training component with sample pipeline, unit tests, and integration test

* Implemented changes from PR, refactored utils.py, made sample pipeline more succinct, removed hardcoding from integration tests

* Added default parameter for sample pipeline and fixed grammar for sample README, refactored _utils.py for fstrings and fixed offset for errors

* Removed aws secret lines

* Terminate debug rules when terminating training job, Terminate debug rules if terminate is pressed after training job has completed, added integration tests for stop_debug_rules, updated READMEs for train and sample, renamed sample pipeline, removed tensorboard, updated sagemaker version to sagemaker 2.1.0.

* Terminate debug rules when terminating training job, Terminate debug rules if terminate is pressed after training job has completed, added integration tests for stop_debug_rules, updated READMEs for train and sample, renamed sample pipeline, removed tensorboard, updated sagemaker version to sagemaker 2.1.0.

* Removed extra files, cleaned integration test

* Changed integration test to use sample debugger pipeline

* Processing jobs created from debug rules will not terminate, fixing other small issues

* Removed debug from pipeline definition, removed extra line, removed unused function

* Changelog and image tag updates
2020-08-19 15:41:22 -07:00
Nicholas Thomson 8014a44229
feat(components): AWS SageMaker - Support for assuming a role (#4212)
* Add client assume role functionality

* Add assume_role to component.yaml files

* Update image to personal

* Update input to force NoneType on empty

* Update integration test setup with assumed role

* Add assume role integration test

* Update boto session to use refreshing credentials

* Update assume role relax trust relationship

* Add check for defined assumed role name

* Add processing assume integ test

* Add assume role unit test for main methods

* Add assume_role to all READMEs

* Update session to use AssumeRoleProvider

* Remove region from child calls to session

* Fix extra region_name in test

* Update assume role processing integ test name

* Add processing integ test to list

* Update assumed role to remain if not generated

* Update license version

* Update image tag to new version

* Add new version to Changelog
2020-08-03 10:53:43 -07:00
Suraj Kota 900eeaec16
feat(components): AWS SageMaker - Add functionality to stop SageMaker jobs on run termination (#4167)
* Add functionality to stop SM jobs
	- Unit and Integration tests for the functionality

* unit test update and customer message update

* Changelog and image tag updates

* update version for deploy component and merge conflicts

* Update version in License file

* fix conflicting paths for download, add test for batch
2020-07-17 16:34:50 -07:00
Kartik Kalamadi 799db4714f
[AWS SageMaker] Integ test to check CloudWatch logs print feature (#4056)
* Integ test for cw logs

* Update license file version to 0.5.3

* update version in yaml

* add changelog
2020-07-09 15:14:33 -07:00