Adjust Amazon SageMaker Studio custom environment settings: Default CI/CD pipeline path

Attaching a custom Docker image to an Amazon SageMaker Studio domain involves several steps. First, you need to build and push the image to Amazon Elastic Container Registry (Amazon ECR). You also need to make sure that the Amazon SageMaker domain release role has the necessary permissions to pull the image from Amazon ECR. After the image is pushed to Amazon ECR, you create a custom SageMaker image in the AWS Management Console. Finally, you update the SageMaker domain configuration to specify a custom image Amazon Resource Name (ARN). This multi-step process needs to be followed manually every time end users create new custom Docker images to be available in SageMaker Studio.
In this post, we explain how to automate this process. This method allows you to update the SageMaker configuration without writing additional infrastructure code, provide custom images, and attach them to SageMaker domains. By adopting this automation, you can use consistent and standardized statistics across your organization, resulting in increased team productivity and reduced security risks associated with using one-time snapshots.
The solution described in this post is aimed at machine learning (ML) developers and field teams who are often responsible for managing and configuring custom environments at scale across an organization. For individual data scientists looking for self-help experience, we recommend using the native Docker support in SageMaker Studio, as described in Accelerate ML workflows with Amazon SageMaker Studio Local Mode and Docker support. This feature allows data scientists to build, test, and deploy custom Docker containers directly within the SageMaker Studio integrated development environment (IDE), allowing you to iteratively test your analysis environments seamlessly within the SageMaker Studio interface.
Solution overview
The following diagram shows the structure of the solution.
We deploy the pipeline using AWS CodePipeline, which automates the creation of a custom Docker image and the attachment of the image to the SageMaker domain. The pipeline starts by checking the code base from the GitHub repo and creates custom Docker images based on the configuration declared in the setup files. After successfully creating and pushing Docker images to Amazon ECR, the pipeline validates the image by scanning and checking for security vulnerabilities in the image. If no critical or high security vulnerabilities are found, the pipeline proceeds to the manual approval phase before being deployed. After manual validation is completed, the pipeline uses the SageMaker domain and automatically attaches custom images to the domain.
What is required
Requirements to implement the solution described in this post include:
Submit the solution
Complete the following steps to apply the solution:
- Sign in to your AWS account using the AWS CLI in a shell terminal (for more details, see Authenticating with the AWS CLI brief information).
- Run the following command to make sure you are successfully signed in to your AWS account:
- Add the GitHub repo to your GitHub account.
- Close the forked repo on your local workstation using the following command:
- Log in to the console and create an AWS CodeStar connection to the GitHub repo in the previous step. For instructions, see Create a connection to GitHub (console).
- Copy the ARN for the connection you created.
- Go to terminal and run the following command to cd to the last directory:
- Run the following command to install all libraries from npm:
- Run the following commands to run a shell script in the terminal. This script will take your AWS account number and AWS Region as input parameters and extract the AWS CDK stack, which extracts components such as CodePipeline, AWS CodeBuild, ECR repository, and so on. Use an existing VPC to set the VPC_ID export variable below. If you don't have a VPC, create one with at least two subnets and use it.
- Run the following command to deploy AWS infrastructure using AWS CDK V2 and make sure to wait for the template to succeed:
- In the CodePipeline console, select Pipes in the navigation pane.
- Select the named pipe link
sagemaker-custom-image-pipeline
.
- You can follow the progress of the pipeline in the console and give permission in the manual authorization section to use the SageMaker infrastructure. The pipeline takes about 5-8 minutes to build the image and move to the manual approval stage
- Wait for the pipeline to complete the deployment phase.
The pipeline creates infrastructure resources in your AWS account with a SageMaker domain and a custom SageMaker image. It also attaches a custom image to the SageMaker domain.
- In the SageMaker console, select Domains below Administrator configuration in the navigation pane.
- Open a domain named team-ds, and navigate to The environment
You should be able to see one custom image attached.
How custom images are used and attached
CodePipeline has a platform called BuildCustomImages
which contains automated steps to create a custom SageMaker image using the SageMaker Custom Image CLI and push it to the ECR directory created in the AWS account. The AWS CDK stack in the deployment section contains the steps needed to create a SageMaker domain and attach a custom image to the domain. Parameters for creating a SageMaker domain, custom image, and so on are configured in JSON format and used in the SageMaker stack under the lib directory. Look at sagemakerConfig
section on environments/config.json
in the declaration parameters.
Add some custom images
You can now add your own custom Docker image to attach to the SageMaker domain created by the pipeline. To find custom images that are created, see the Dockerfile specification for Docker image specifications.
- cd to the images directory in the terminal terminal:
- Create a new directory (for example, custom) under the images directory:
- Add your Dockerfile to this directory. To test, you can use the following Dockerfile config:
- Update the images section in the json file under the locations directory to add the name of the images directory you created:
- Review the same image name at
customImages
under the configuration of the SageMaker domain created:
- Commit and push changes to the GitHub repository.
- You should see CodePipeline start on pressing. Track pipeline progress and provide manual approval for deployment.
After the deployment is completed successfully, you should be able to see that the custom image you added has been attached to the domain configuration (as shown in the following screenshot).
Clean up
To clean up your resources, open the AWS CloudFormation console and delete the stacks SagemakerImageStack
again PipelineStack
in that order. If you encounter errors like “The S3 bucket is empty” or “The ECR Repository has images,” you can manually delete the S3 bucket and the ECR repository that was created. Then you can try again to remove the CloudFormation stacks.
The conclusion
In this post, we showed how to create an automated integration and delivery (CI/CD) pipeline solution for building, scanning, and deploying custom Docker images to SageMaker Studio domains. You can use this solution to promote consistency of analytical environments for data science teams across your enterprise. This approach helps you achieve machine learning (ML), scalability, and standardization.
About the Authors
Muni AnnachiA Senior DevOps Consultant at AWS, he boasts over ten years of expertise in the design and implementation of software systems and cloud platforms. He specializes in guiding non-profit organizations to adopt DevOps CI/CD architectures, which adhere to AWS best practices and the AWS Well-Architected Framework. Apart from his professional endeavours, Muni is an avid sports fan and tries his luck in the kitchen.
Ajay Raghunathan is a Machine Learning Engineer at AWS. His current work focuses on designing and implementing large-scale ML solutions. He is a technology enthusiast and architect with a primary area of interest in AI/ML, data analytics, serverless, and DevOps. Outside of work, he enjoys spending time with his family, traveling, and playing soccer.
Arun Dyasani a Senior Cloud Application Architect at AWS. His current work focuses on designing and implementing innovative software solutions. His role focuses on building robust architectures for complex applications, using his deep knowledge and experience in developing large systems.
Shweta Singh He is a Senior Product Manager on the Amazon SageMaker Machine Learning platform team at AWS, leading the SageMaker Python SDK. He has worked in several product roles at Amazon for over 5 years. He holds a Bachelor of Science in Computer Engineering and a Masters of Science in Financial Engineering, both from New York University.
Jenna Eun He is the Principal Practice Manager for the Health and Advanced Compute team at AWS Professional Services. His team focuses on designing and delivering data, ML, and advanced computing solutions for the public sector, including federal, state, and local governments, academic medical centers, nonprofit healthcare organizations, and research institutions.
Meenakshi Ponn Shankaran is a Principal Domain Architect at AWS in Data & ML Professional Services Org. He has extensive experience designing and building large data pools, managing petabytes of data. Currently, he focuses on delivering technical leadership to AWS US Public Sector customers, guiding them in using new AWS services to meet their strategic objectives and unlock the full potential of their data.