To reduce the cold start times of the container using the SOCI index in DLAMI and DLC

Deep Learning AMI and AWS Deep Learning Containers are now enabled with support for SOCI abstraction and indexing. Seekable OCI (SOCI) is a technology that allows efficient management of a container image by downloading a selected file. It uses a layer-based indexing system to map file locations within container images, allowing containers to start with only the necessary files loaded (lazy loading). This approach reduces network bandwidth usage and improves container startup times, making it particularly useful for organizations managing large container images in cloud environments.
In this post, we look at how to use SOCI in publicly available AMIs and Deep Learning Containers, when to use the various SOCI methods provided by the tool, and how you can quickly and effectively use this tool in your operations today.
It's the background
As organizations deploy artificial intelligence (AI) and machine learning (ML) workloads at scale, container startup time has become a bottleneck in production environments. Whether it's running training jobs, rendering results, or automatically scaling GPU clusters, the time spent downloading multi-gigabyte container images directly affects cost, user experience, and performance. Conventional container deployment methods force teams to download all images before work can begin. This process can take several minutes to run for images that are commonly used in production. During development, a few minutes of waiting time is not noticeable. In production, those same minutes add up quickly.
Organizations deploying deep learning infrastructure at scale often encounter several key challenges:
- Long cold start times. A Standard Docker image download of 15–20 GB can take 4–6 minutes per instance, delaying training jobs and endpoints during event scaling.
- Computing resources are wasted. GPU instances remain idle during image rendering, burning up valuable compute hours while waiting for container initialization to complete.
- Raising the stakes. When demand spikes trigger auto-scaling, slow startup times prevent quick response, resulting in degraded performance or dropped requests.
- Bandwidth limits. High-volume deployments that pull large images at the same time can fill up network bandwidth, causing latency to drop across the infrastructure.
- Productivity of engineers. Data scientists and ML engineers spend significant time waiting for containers to launch during iterative development and testing cycles.
Methods of hauling containers
When pulling a container for your workloads, AWS Deep Learning AMIs (DLAMI) and Deep Learning Containers offer three options: standard Docker pulling, SOCI parallel pulling, and SOCI lazy loading with the SOCI index. Think of this as a sliding scale of trade. Docker pulls are sequential and slow. SOCI parallel pull provides faster startup times by pooling downloads at the cost of compute resources. SOCI lazy loading provides near-container loading but requires files to be downloaded when needed. You can use the following guide to choose the right method for your workload:
- The choice between lazy loading and parallel pull methods depends on the image, instance specification, and storage configuration. Lazy loading requires images to have SOCI index. With one exception, the system returns to normal draw.
- Under-spec instances should use lazy loading to save resources, while spec instances with multiple vCPUs and high network bandwidth benefit from the same draw mode. Storage performance varies: EBS volumes are bound by their assigned IOPS and volume type, which may create bottlenecks during unmounting, while the NVMe instance store delivers high I/O performance at the expense of data persistence across instance stop/start cycles.
The following example shows various methods based on the vLLM Deep Learning Container:
Deep Learning Techniques
Architectural solution
The following diagram shows the SOCI implementation architecture with DLAMI and Deep Learning Containers.

Comparison of container startup time with SOCI summary
The following benchmarks compare standard Docker pulls against SOCI abstraction for both lazy loading and parallel pull methods.
Lazy loading mode
Lazy loading mode starts containers quickly by fetching only the required data where it is needed, the remaining layers are loaded in the background as needed.
What is required
SOCI reference required
Important: Lazy loading mode requires the container image to have Index of SOCI kept in the register. Without the SOCI indicator, the snapshot will revert to normal drag behavior, and you won't see any performance improvements. AWS Containers for Deep Learning (DLCs) with the -soci tag suffix comes with SOCI pointers pre-created and pushed into the registry, allowing lazy loading out of the box. To get custom images, you have to create and push SOCI indexes
The environment
- Example Typeg5.2 large
- EBS: Size 500GiB, IOPS 3000, Throughput 125
- AMI: Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 24.04) 20260413 (
ami-06abbbf2049359343) - Docker image:
public.ecr.aws/deep-learning-containers/vllm:0.19.0-gpu-py312-ec2-soci - Image Size: 9.72GB (compressed), 32.7GB (disk usage)
- The network: Corp
Start a container with Docker (non-SOCI)
We use Docker to start the inference server directly. Since no image exists locally, Docker downloads and extracts the entire image before starting the container.
Total time: 6m59.099s.
Start container with SOCI snapshotter (lazy loading)
We use nerdctl with SOCI snapshotter to start the inference container. Although no image exists locally, the SOCI indexed image allows nerdctl to pull only the index and layers needed to start the container, enabling lazy loading of the remaining layers. Total time: 21.125s.
Lazy loading summary
Using the SOCI shortcut with lazy loading, the container is initialized 21.125 secondscompared to 6 minutes 59.099 seconds with standard Docker. This improvement is achieved because SOCI only pulls the layers needed to start the container, and the remaining layers are loaded on demand as needed.
Parallel drag mode
While lazy loading mode starts containers quickly by fetching required data on demand, parallel drag mode it downloads the entire image before booting but does so at a higher rate than standard Docker pulls. This mode is ideal if you need a full image available at startup or when running I/O-intensive workloads.
The environment
- Status Type: g5.4 large
- EBS: 500GiB gp3, 16000 IOPS, 1000 MB/s Out
- AMI: Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 24.04) 20260413 (
ami-06abbbf2049359343) - Docker image:
763104351884.dkr.ecr.us-east-1.amazonaws.com/sglang:0.5.10-gpu-py312-cu129-ubuntu24.04-sagemaker - Image Size: 19.32GB (compressed), 60.4GB (Disk usage)
- The network: Corp
Note: We use a private ECR image for this benchmark because the public ECR is in front of Amazon CloudFront, which limits network bandwidth and affects parallel mode performance. Private ECR is served directly from Amazon Simple Storage Service (Amazon S3), which provides high performance.
Enables parallel drag mode
The SOCI shortcut in Deep Learning AMI switches to lazy loading mode. To enable parallel pull mode, change the configuration file to /etc/soci-snapshotter-grpc/config.toml:
Apply configuration by restarting the service:
Tip: You can tune in max_concurrent_downloads_per_image again max_concurrent_unpacks_per_image based on your instance type and network bandwidth. For detailed tuning instructions, see Introducing Seekable OCI Pull Mode for Amazon EKS.
Confirmation of parallel mode is active
Monitor the SOCI summary logs during the download to ensure that parallel mode is enabled:
Check out the log entries that show the corresponding pull/unload:
Pull image with Docker (non-SOCI)
Standard Docker pulls downloads and releases layers for a limited amount of money.
Total time: 4m 44.163s
Draw an image in SOCI compatible mode
Using nerdctl with SOCI parallel pull mode uses increased parallelism for both fetch and fetch operations.
Total time: 2m 12.846s
Parallel pull summary
Using SOCI parallel draw mode reduced the time to draw an image from 4 minutes 44 seconds to 2 minutes 12 minutesrepresenting a 2.2x improvement in the towing operation.
The conclusion
SOCI abstraction provides improvements for both container initialization and image pull operations:
- Lazy loading mode – Reached a 20x improvement container startup time (from 6+ minutes to 21 seconds)
- Parallel drag mode – Reached a 2.2x improvement in the duration of the picture (from 4 minutes 44 seconds to 2 minutes 12 seconds)
Choose lazy load mode when you need the fastest possible container startup, or parallel pull mode when you need a full image available before your work starts.
Clean up
If you launched EC2 instances to test the SOCI snapshot, disconnect them to avoid incurring ongoing costs. Delete any container images you pushed to Amazon Elastic Container Registry (Amazon ECR) during testing, and delete any SOCI indexes you no longer need.
Getting started with SOCI
DLAMI and Deep Learning Containers are publicly available today with the SOCI acronym and SOCI index. For more information about publicly available DLAMI and Deep Learning Containers, you can check the SOCI Index DLAMI to select images that support SOCI, and check the Deep Learning Containers Repository for more information on supported SOCI-indexed images.
For detailed configuration guidance and best practices, see the SOCI documentation and SOC Deep Learning Container documentation.
About the writers



