Bumps [sagemaker](https://github.com/aws/sagemaker-python-sdk) from 2.1.0 to 2.237.3. - [Release notes](https://github.com/aws/sagemaker-python-sdk/releases) - [Changelog](https://github.com/aws/sagemaker-python-sdk/blob/master/CHANGELOG.md) - [Commits](https://github.com/aws/sagemaker-python-sdk/compare/v2.1.0...v2.237.3) --- updated-dependencies: - dependency-name: sagemaker dependency-version: 2.237.3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |
||
---|---|---|
.. | ||
samples | ||
src | ||
README.md | ||
component.yaml |
README.md
SageMaker Training Kubeflow Pipelines component v2
Component to create SageMaker Training jobs in a Kubeflow Pipelines workflow.
Overview
The Amazon SageMaker components for Kubeflow Pipelines version 1(v1.1.x or below) uses Boto3 (AWS SDK for Python) as the backend to create and manage resources on SageMaker. SageMaker components version 2(v2.0.0-alpha2 or above) uses the ACK Service Controller for SageMaker to do the same. AWS introduced ACK to facilitate a Kubernetes-native way of managing AWS Cloud resources. ACK includes a set of AWS service-specific controllers, one of which is the SageMaker controller. The SageMaker controller makes it easier for machine learning developers and data scientists who use Kubernetes as their control plane to train, tune, and deploy machine learning models in Amazon SageMaker.
Creating SageMaker resouces using the controller allows you to create and monitor the resources as part of a Kubeflow Pipelines workflow(same as version 1 of components) and additionally provides you a flexible and consistent experience to manage the SageMaker resources from other environments such as using the Kubernetes command line tool(kubectl) or the other Kubeflow applications such as Notebooks.
Kubeflow Pipelines backend compatibility
SageMaker components are currently supported with Kubeflow pipelines backend v1. This means, you will have to use KFP sdk 1.8.x to create your pipelines.
Getting Started
Follow this guide to get started with using the SageMaker Training Job pipeline component version 2.
Prerequisites
- An existing Kubeflow deployment. This guide assumes you have already installed Kubeflow, if you do not have an existing Kubeflow deployment, choose one of the deployment options from the Kubeflow on AWS Deployment guide.
Note: If you were using the Kubeflow pipelines standalone deployment. You can continue to use it.
- Install the ACK Service Controller for SageMaker version 0.4.2+. Follow the ML with ACK SageMaker Controller tutorial to install the SageMaker Controller.
Note: You only have to install the controller, so you do NOT have to run Train an XGBoost Model section
- This guide assumes you have already installed the following tools on your local machine or an EC2 instance:
Setup
- Configure RBAC permissions for the service account used by kubeflow pipeline pods in the user/profile namespace. The pipeline runs are executed in user namespaces using the
default-editor
Kubernetes service account.Note: In Kubeflow pipeline standalone deployment, the pipeline runs are executed in kubeflow namespace using the
pipeline-runner
service account- Set the environment variable value for
PROFILE_NAMESPACE
(e.g.kubeflow-user-example-com
) according to your profile andSERVICE_ACCOUNT
name according to your installation:# For full Kubeflow installation use your profile namespace # For Standalone installation use kubeflow export PROFILE_NAMESPACE=kubeflow-user-example-com # For full Kubeflow installation use default-editor # For Standalone installation use pipeline-runner export KUBEFLOW_PIPELINE_POD_SERVICE_ACCOUNT=default-editor
- Create a RoleBinding that grants service account access to manage sagemaker custom resources
cat > manage_sagemaker_cr.yaml <<EOF apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: manage-sagemaker-cr namespace: ${PROFILE_NAMESPACE} subjects: - kind: ServiceAccount name: ${KUBEFLOW_PIPELINE_POD_SERVICE_ACCOUNT} namespace: ${PROFILE_NAMESPACE} roleRef: kind: ClusterRole name: ack-sagemaker-controller apiGroup: rbac.authorization.k8s.io EOF kubectl apply -f manage_sagemaker_cr.yaml
- Check rolebinding was created by running
kubectl get rolebinding manage-sagemaker-cr -n ${PROFILE_NAMESPACE} -oyaml
- Set the environment variable value for
- (Optional) If you are also using the SageMaker components version 1. Grant SageMaker access to the service account used by kubeflow pipeline pods.
- Export your cluster name and cluster region
export CLUSTER_NAME= export CLUSTER_REGION=
-
eksctl create iamserviceaccount --name ${KUBEFLOW_PIPELINE_POD_SERVICE_ACCOUNT} --namespace ${PROFILE_NAMESPACE} --cluster ${CLUSTER_NAME} --region ${CLUSTER_REGION} --attach-policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess --override-existing-serviceaccounts --approve
- Export your cluster name and cluster region
Samples
Head over to the samples directory and follow the README to create jobs on SageMaker.
Inputs Parameters
Find the high level component input parameters and their description in the component's input specification. The parameters with JsonObject
or JsonArray
type inputs have nested fields, you will have to refer to the TrainingJob CRD specification for the respective structure and pass the input in JSON format.
A quick way to see the converted JSON style input is to copy the sample TrainingJob spec and convert it to JSON using a YAML to JSON converter like this website.
For e.g. the resourceConfig
in the TrainingJob
CRD looks like:
resourceConfig:
instanceCount: integer
instanceType: string
volumeKMSKeyID: string
volumeSizeInGB: integer
the resource_config
input for the component would be:
resourceConfig = {
"instanceCount": 1,
"instanceType": "ml.m4.xlarge",
"volumeSizeInGB": 5,
}
You might also want to look at the TrainingJob API reference for a detailed explaination of parameters.