pipelines/samples/contrib/aws-samples/sagemaker_debugger_demo
Nicholas Thomson d81c8095d0
refactor(components): AWS SageMaker - Full component refactoring (#4336)
* Temporary rebase commit

* Add yaml compiler

* Add compiler CLI

* Update Dockerfile to copy all files

* Add validate input list vs dict

* Add unit test for new train

* Add minor bug fixes

* Override tag when generating specs

* Update pydocs with formatter

* Add contributing doc

* Add formatters to CONTRIBUTING

* Add working generic logic applied to train

* Update component input and output to inherit

* Downgrade to Python 3.7

* Update add outputValue to arg list

* Updated outputValue to outputPath

* Add empty string default to not-required inputs

* Update path to component relative to root

* Update faulty False-y condition

* Update outputs to write to file

* Update doc formatting

* Update docstrings to match structure

* Add unit tests for component and compiler

* Add unit tests for component

* Add spec unit tests

* Add training unit tests

* Update unit test automation

* Add sample formatting checks

* Remove extra flake8 check in integ tests

* Add unit test black check

* Update black formatting for all files

* Update include black formatting

* Add batch component

* Remove old transform components

* Update region input description

* Add all component specs

* Add deploy component

* Add ground truth component

* Add HPO component

* Add create model component

* Add processing component

* Add workteam component

* Add spec unit tests

* Add deploy unit tests

* Add ground truth unit tests

* Add tuning component unit tests

* Add create model component unit test

* Add process component unit tests

* Add workteam component unit tests

* Remove output_path from required_args

* Remove old component implementations

* Update black formatting

* Add assume role feature

* Compiled all components

* Update doc formatting

* Fix process terminate syntax error

* Update compiler to use kfp structures

* Update nits

* Update unified requirements

* Rebase on debugging commit

* Add debugger unit tests

* Update formatting

* Update component YAML

* Fix unit test Dockerfile relative directory

* Update unit test context to root

* Update Batch to auto-generate name

* Update minor docs and formatting changes

* Update deploy name to common autogenerated

* Add f-strings to logs

* Add update support

* Add Amazon license header

* Update autogen and autoformat

* Rename SpecValidator to SpecInputParser

* Split requirements by dev and prod

* Support for checking generated specs

* Update minor changes

* Update deploy component output description

* Update components to beta repository

* Update fix unit test requirements

* Update unit test build spec for new results path

* Update deploy wait for endpoint complete

* Update component configure AWS clients in new method

* Update boto3 retry method

* Update license version

* Update component YAML versions

* Add new version to Changelog

* Update component spec types

* Update deploy config ignore overwrite

* Update component for debugging

* Update images back to 1.0.0

* Remove coverage from components
2020-10-27 14:17:57 -07:00
..
README.md feat(components): AWS SageMaker - Add optional parameter to allow training component to accept parameters related to Debugger (#4283) 2020-08-19 15:41:22 -07:00
debugger-training-pipeline.py refactor(components): AWS SageMaker - Full component refactoring (#4336) 2020-10-27 14:17:57 -07:00

README.md

Sample Pipeline for Training Component with Debugger

The sagemaker-debugger-demo.py sample creates a pipeline consisting of only a training component. In that component we are using the XGBoost algorithm but with poor hyperparameter choices. By enabling debugger rules and hooks, we can quickly learn that the model produced has issues.

Prerequisites

This pipeline uses the exact same setup as simple_training_pipeline. For the purposes of this demonstration, all resources will be created in the us-east-1 region.

Steps

  1. Compile the pipeline: dsl-compile --py debugger-training-pipeline.py --output debugger-training-pipeline.tar.gz
  2. In the Kubeflow UI, upload this compiled pipeline specification (the .tar.gz file), fill in the necessary run parameters, and click create run.
  3. Once the pipeline has finished running, you can view the results of each debugger rule under 'Logs'.

Inputs format to debug_hook_config and debug_rule_config :

debug_hook_config = {
    "S3OutputPath": "s3://<your_bucket_name>/path/for/data/emission/",
    "LocalPath": "/local/path/for/data/emission/",
    "CollectionConfigurations": [
        {
          "CollectionName": "losses",
          "CollectionParameters": {
            "start_step": "25",
            "end_step": "150"
          }
        }, {
            "CollectionName": "gradient",
            "CollectionParameters": {
                "start_step": "5",
                "end_step": "100"
            }
        }
    ],
    "HookParameters": {
        "save_interval": "10"
    }
}

debug_rule_config = {
    "RuleConfigurationName": "rule_name"
    "RuleEvaluatorImage": "503895931360.dkr.ecr.us-east-1.amazonaws.com/sagemaker-debugger-rules:latest"
    "RuleParameters": {
        "rule_to_invoke": "VanishingGradient",
        "threshold": "0.01"
    }
}

Resources