pipelines/components/aws/sagemaker/TrainingJob
dependabot[bot] 5c868d40ad
chore(deps): bump sagemaker from 2.1.0 to 2.237.3 in /components/aws/sagemaker/TrainingJob/samples/mnist-kmeans-training (#11847)
Bumps [sagemaker](https://github.com/aws/sagemaker-python-sdk) from 2.1.0 to 2.237.3.
- [Release notes](https://github.com/aws/sagemaker-python-sdk/releases)
- [Changelog](https://github.com/aws/sagemaker-python-sdk/blob/master/CHANGELOG.md)
- [Commits](https://github.com/aws/sagemaker-python-sdk/compare/v2.1.0...v2.237.3)

---
updated-dependencies:
- dependency-name: sagemaker
  dependency-version: 2.237.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-22 13:26:31 +00:00
..
samples chore(deps): bump sagemaker from 2.1.0 to 2.237.3 in /components/aws/sagemaker/TrainingJob/samples/mnist-kmeans-training (#11847) 2025-04-22 13:26:31 +00:00
src feat(components): SageMaker V2 model monitor component release (#9368) 2023-05-09 22:26:42 +00:00
README.md refactor(components): Open sourcing v2 AWS TrainingJob component. (#8258) 2022-09-16 22:07:45 +00:00
component.yaml feat(components): SageMaker V2 model monitor component release (#9368) 2023-05-09 22:26:42 +00:00

README.md

SageMaker Training Kubeflow Pipelines component v2

Component to create SageMaker Training jobs in a Kubeflow Pipelines workflow.

Overview

The Amazon SageMaker components for Kubeflow Pipelines version 1(v1.1.x or below) uses Boto3 (AWS SDK for Python) as the backend to create and manage resources on SageMaker. SageMaker components version 2(v2.0.0-alpha2 or above) uses the ACK Service Controller for SageMaker to do the same. AWS introduced ACK to facilitate a Kubernetes-native way of managing AWS Cloud resources. ACK includes a set of AWS service-specific controllers, one of which is the SageMaker controller. The SageMaker controller makes it easier for machine learning developers and data scientists who use Kubernetes as their control plane to train, tune, and deploy machine learning models in Amazon SageMaker.

Creating SageMaker resouces using the controller allows you to create and monitor the resources as part of a Kubeflow Pipelines workflow(same as version 1 of components) and additionally provides you a flexible and consistent experience to manage the SageMaker resources from other environments such as using the Kubernetes command line tool(kubectl) or the other Kubeflow applications such as Notebooks.

Kubeflow Pipelines backend compatibility

SageMaker components are currently supported with Kubeflow pipelines backend v1. This means, you will have to use KFP sdk 1.8.x to create your pipelines.

Getting Started

Follow this guide to get started with using the SageMaker Training Job pipeline component version 2.

Prerequisites

  1. An existing Kubeflow deployment. This guide assumes you have already installed Kubeflow, if you do not have an existing Kubeflow deployment, choose one of the deployment options from the Kubeflow on AWS Deployment guide.

    Note: If you were using the Kubeflow pipelines standalone deployment. You can continue to use it.

  2. Install the ACK Service Controller for SageMaker version 0.4.2+. Follow the ML with ACK SageMaker Controller tutorial to install the SageMaker Controller.

    Note: You only have to install the controller, so you do NOT have to run Train an XGBoost Model section

  3. This guide assumes you have already installed the following tools on your local machine or an EC2 instance:
    • kubectl A command line tool for working with Kubernetes clusters.
    • eksctl - A command line tool for working with Amazon EKS clusters

Setup

  1. Configure RBAC permissions for the service account used by kubeflow pipeline pods in the user/profile namespace. The pipeline runs are executed in user namespaces using the default-editor Kubernetes service account.

    Note: In Kubeflow pipeline standalone deployment, the pipeline runs are executed in kubeflow namespace using the pipeline-runner service account

    • Set the environment variable value for PROFILE_NAMESPACE(e.g. kubeflow-user-example-com) according to your profile and SERVICE_ACCOUNT name according to your installation:
      # For full Kubeflow installation use your profile namespace
      # For Standalone installation use kubeflow
      export PROFILE_NAMESPACE=kubeflow-user-example-com
      
      # For full Kubeflow installation use default-editor
      # For Standalone installation use pipeline-runner
      export KUBEFLOW_PIPELINE_POD_SERVICE_ACCOUNT=default-editor
      
    • Create a RoleBinding that grants service account access to manage sagemaker custom resources
      cat > manage_sagemaker_cr.yaml <<EOF
      apiVersion: rbac.authorization.k8s.io/v1
      kind: RoleBinding
      metadata:
        name: manage-sagemaker-cr
        namespace: ${PROFILE_NAMESPACE}
      subjects:
      - kind: ServiceAccount
        name: ${KUBEFLOW_PIPELINE_POD_SERVICE_ACCOUNT}
        namespace: ${PROFILE_NAMESPACE}
      roleRef:
        kind: ClusterRole
        name: ack-sagemaker-controller
        apiGroup: rbac.authorization.k8s.io
      EOF
      
      kubectl apply -f manage_sagemaker_cr.yaml
      
    • Check rolebinding was created by running kubectl get rolebinding manage-sagemaker-cr -n ${PROFILE_NAMESPACE} -oyaml
  2. (Optional) If you are also using the SageMaker components version 1. Grant SageMaker access to the service account used by kubeflow pipeline pods.
    • Export your cluster name and cluster region
      export CLUSTER_NAME=
      export CLUSTER_REGION=
      
    • eksctl create iamserviceaccount --name ${KUBEFLOW_PIPELINE_POD_SERVICE_ACCOUNT} --namespace ${PROFILE_NAMESPACE} --cluster ${CLUSTER_NAME} --region ${CLUSTER_REGION} --attach-policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess --override-existing-serviceaccounts --approve
      

Samples

Head over to the samples directory and follow the README to create jobs on SageMaker.

Inputs Parameters

Find the high level component input parameters and their description in the component's input specification. The parameters with JsonObject or JsonArray type inputs have nested fields, you will have to refer to the TrainingJob CRD specification for the respective structure and pass the input in JSON format.

A quick way to see the converted JSON style input is to copy the sample TrainingJob spec and convert it to JSON using a YAML to JSON converter like this website.

For e.g. the resourceConfig in the TrainingJob CRD looks like:

resourceConfig: 
  instanceCount: integer
  instanceType: string
  volumeKMSKeyID: string
  volumeSizeInGB: integer

the resource_config input for the component would be:

resourceConfig = {
  "instanceCount": 1,
  "instanceType": "ml.m4.xlarge",
  "volumeSizeInGB": 5,
}

You might also want to look at the TrainingJob API reference for a detailed explaination of parameters.

References