To reduce the cold start times of the container using the SOCI index in DLAMI and DLC

0 1 7 minutes read

To reduce the cold start times of the container using the SOCI index in DLAMI and DLC

Deep Learning AMI and AWS Deep Learning Containers are now enabled with support for SOCI abstraction and indexing. Seekable OCI (SOCI) is a technology that allows efficient management of a container image by downloading a selected file. It uses a layer-based indexing system to map file locations within container images, allowing containers to start with only the necessary files loaded (lazy loading). This approach reduces network bandwidth usage and improves container startup times, making it particularly useful for organizations managing large container images in cloud environments.

In this post, we look at how to use SOCI in publicly available AMIs and Deep Learning Containers, when to use the various SOCI methods provided by the tool, and how you can quickly and effectively use this tool in your operations today.

It's the background

As organizations deploy artificial intelligence (AI) and machine learning (ML) workloads at scale, container startup time has become a bottleneck in production environments. Whether it's running training jobs, rendering results, or automatically scaling GPU clusters, the time spent downloading multi-gigabyte container images directly affects cost, user experience, and performance. Conventional container deployment methods force teams to download all images before work can begin. This process can take several minutes to run for images that are commonly used in production. During development, a few minutes of waiting time is not noticeable. In production, those same minutes add up quickly.

Organizations deploying deep learning infrastructure at scale often encounter several key challenges:

Long cold start times. A Standard Docker image download of 15–20 GB can take 4–6 minutes per instance, delaying training jobs and endpoints during event scaling.
Computing resources are wasted. GPU instances remain idle during image rendering, burning up valuable compute hours while waiting for container initialization to complete.
Raising the stakes. When demand spikes trigger auto-scaling, slow startup times prevent quick response, resulting in degraded performance or dropped requests.
Bandwidth limits. High-volume deployments that pull large images at the same time can fill up network bandwidth, causing latency to drop across the infrastructure.
Productivity of engineers. Data scientists and ML engineers spend significant time waiting for containers to launch during iterative development and testing cycles.

Methods of hauling containers

When pulling a container for your workloads, AWS Deep Learning AMIs (DLAMI) and Deep Learning Containers offer three options: standard Docker pulling, SOCI parallel pulling, and SOCI lazy loading with the SOCI index. Think of this as a sliding scale of trade. Docker pulls are sequential and slow. SOCI parallel pull provides faster startup times by pooling downloads at the cost of compute resources. SOCI lazy loading provides near-container loading but requires files to be downloaded when needed. You can use the following guide to choose the right method for your workload:

The choice between lazy loading and parallel pull methods depends on the image, instance specification, and storage configuration. Lazy loading requires images to have SOCI index. With one exception, the system returns to normal draw.
Under-spec instances should use lazy loading to save resources, while spec instances with multiple vCPUs and high network bandwidth benefit from the same draw mode. Storage performance varies: EBS volumes are bound by their assigned IOPS and volume type, which may create bottlenecks during unmounting, while the NVMe instance store delivers high I/O performance at the expense of data persistence across instance stop/start cycles.

The following example shows various methods based on the vLLM Deep Learning Container:

Deep Learning Techniques

Architectural solution

The following diagram shows the SOCI implementation architecture with DLAMI and Deep Learning Containers.

Solution architecture demonstrating integration of SOCI and DLAMI abstraction and deep learning containers on Amazon EC2

Comparison of container startup time with SOCI summary

The following benchmarks compare standard Docker pulls against SOCI abstraction for both lazy loading and parallel pull methods.

Lazy loading mode

Lazy loading mode starts containers quickly by fetching only the required data where it is needed, the remaining layers are loaded in the background as needed.

What is required

SOCI reference required

Important: Lazy loading mode requires the container image to have Index of SOCI kept in the register. Without the SOCI indicator, the snapshot will revert to normal drag behavior, and you won't see any performance improvements. AWS Containers for Deep Learning (DLCs) with the -soci tag suffix comes with SOCI pointers pre-created and pushed into the registry, allowing lazy loading out of the box. To get custom images, you have to create and push SOCI indexes

The environment

Example Typeg5.2 large
EBS: Size 500GiB, IOPS 3000, Throughput 125
AMI: Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 24.04) 20260413 (ami-06abbbf2049359343)
Docker image: public.ecr.aws/deep-learning-containers/vllm:0.19.0-gpu-py312-ec2-soci
Image Size: 9.72GB (compressed), 32.7GB (disk usage)
The network: Corp

Start a container with Docker (non-SOCI)

We use Docker to start the inference server directly. Since no image exists locally, Docker downloads and extracts the entire image before starting the container.

Total time: 6m59.099s.

#!/bin/bash
time docker run 
    --gpus all 
    -d 
    -v ~/.cache/huggingface:/root/.cache/huggingface 
    --env "HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN" 
    -p 8000:8000 
    --ipc=host 
    public.ecr.aws/deep-learning-containers/vllm:0.19.0-gpu-py312-ec2-soci 
    --model mistralai/Mistral-7B-v0.1
# output
Unable to find image 'public.ecr.aws/deep-learning-containers/vllm:0.19.0-gpu-py312-ec2-soci' locally
0.19.0-gpu-py312-ec2-soci: Pulling from deep-learning-containers/vllm
340d44d2921c: Pull complete
....2001a2421bf1: Pull complete
Digest: sha256:a6344c96a33ef98a32a27f89b41b8c0529d4fbbba248eb57f811725d415f68fc
Status: Downloaded newer image for public.ecr.aws/deep-learning-containers/vllm:0.19.0-gpu-py312-ec2-soci
e12d969eb71517d9a6a23b9b11cfa22ddda26a95f6a0f0d8df00cd5c4fdfe912

real    6m59.099s
user    0m0.391s
sys     0m0.452s

Start container with SOCI snapshotter (lazy loading)

We use nerdctl with SOCI snapshotter to start the inference container. Although no image exists locally, the SOCI indexed image allows nerdctl to pull only the index and layers needed to start the container, enabling lazy loading of the remaining layers. Total time: 21.125s.

#!/bin/bash
time sudo nerdctl run 
     --snapshotter soci 
    --gpus all 
    -d 
    -v ~/.cache/huggingface:/root/.cache/huggingface 
    --env "HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN" 
    -p 8000:8000 
    --ipc=host 
    public.ecr.aws/deep-learning-containers/vllm:0.19.0-gpu-py312-ec2-soci 
    --model mistralai/Mistral-7B-v0.1
# output
public.ecr.aws/deep-learning-containers/vllm:0.19.0-gpu-py312-ec2-soci:           resolved       |++++++++++++++++++++++++++++++++++++++|
index-sha256:a6344c96a33ef98a32a27f89b41b8c0529d4fbbba248eb57f811725d415f68fc:    done           |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:d91ad3b46204eace6de2fb27c46d9600337fa9c124b4c82fe0f335d391017daa: done           |++++++++++++++++++++++++++++++++++++++|
config-sha256:886ed36d57c44081a74a0ab052f57366d96ab2c0fe39bb3e2f8a46cc20db8ec2:   done           |++++++++++++++++++++++++++++++++++++++|
elapsed: 10.5s                                                                    total:  48.1 K (4.6 KiB/s)
189307b7899438415f3df4288b3fbb26bcc4cd43678e88ec3b062bc6330e3e3b

real    0m21.125s
user    0m0.004s
sys     0m0.011s

Lazy loading summary

Using the SOCI shortcut with lazy loading, the container is initialized 21.125 secondscompared to 6 minutes 59.099 seconds with standard Docker. This improvement is achieved because SOCI only pulls the layers needed to start the container, and the remaining layers are loaded on demand as needed.

Parallel drag mode

While lazy loading mode starts containers quickly by fetching required data on demand, parallel drag mode it downloads the entire image before booting but does so at a higher rate than standard Docker pulls. This mode is ideal if you need a full image available at startup or when running I/O-intensive workloads.

The environment

Status Type: g5.4 large
EBS: 500GiB gp3, 16000 IOPS, 1000 MB/s Out
AMI: Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 24.04) 20260413 (ami-06abbbf2049359343)
Docker image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/sglang:0.5.10-gpu-py312-cu129-ubuntu24.04-sagemaker
Image Size: 19.32GB (compressed), 60.4GB (Disk usage)
The network: Corp

Note: We use a private ECR image for this benchmark because the public ECR is in front of Amazon CloudFront, which limits network bandwidth and affects parallel mode performance. Private ECR is served directly from Amazon Simple Storage Service (Amazon S3), which provides high performance.

Enables parallel drag mode

The SOCI shortcut in Deep Learning AMI switches to lazy loading mode. To enable parallel pull mode, change the configuration file to /etc/soci-snapshotter-grpc/config.toml:

# Parallel Pull Mode - significantly improves image pull times for large AI/ML images
# These are conservative defaults recommended by AWS for ECR
[pull_modes.parallel_pull_unpack]
enable = true # false(default): lazy loading/true: parallel mode
max_concurrent_downloads = -1 # unlimited global cap across all images
max_concurrent_downloads_per_image = 20 # per-image download connections
concurrent_download_chunk_size = "16mb"
max_concurrent_unpacks = -1 # unlimited global cap across all images
max_concurrent_unpacks_per_image = 10 # per-image parallel unpack threads
discard_unpacked_layers = true

Apply configuration by restarting the service:

sudo systemctl restart soci-snapshotter.service

Tip: You can tune in max_concurrent_downloads_per_image again max_concurrent_unpacks_per_image based on your instance type and network bandwidth. For detailed tuning instructions, see Introducing Seekable OCI Pull Mode for Amazon EKS.

Confirmation of parallel mode is active

Monitor the SOCI summary logs during the download to ensure that parallel mode is enabled:

journalctl -u soci-snapshotter -f

Check out the log entries that show the corresponding pull/unload:

Apr 16 23:59:08 ip-172-31-86-91 soci-snapshotter-grpc[3108]:
  {"layerDigest":"sha256:e87500e698966458d9dfc34df84602985c9821f39666619792fe6282aa6df5d4",
   "level":"info",
   "msg":"preparing snapshot with parallel pull/unpack",
   "time":"2026-04-16T23:59:08.654819383Z"}

Pull image with Docker (non-SOCI)

Standard Docker pulls downloads and releases layers for a limited amount of money.

Total time: 4m 44.163s

time docker pull 
  763104351884.dkr.ecr.us-east-1.amazonaws.com/sglang:0.5.10-gpu-py312-cu129-ubuntu24.04-sagemaker

Digest: sha256:fd0cf60bbb34a5d30f22595215a633e5d4a7260fc0868aabe3f04b1174b7365d
Status: Downloaded newer image for
  763104351884.dkr.ecr.us-east-1.amazonaws.com/sglang:0.5.10-gpu-py312-cu129-ubuntu24.04-sagemaker
763104351884.dkr.ecr.us-east-1.amazonaws.com/sglang:0.5.10-gpu-py312-cu129-ubuntu24.04-sagemaker

real    4m44.163s
user    0m0.339s
sys     0m0.423s

Draw an image in SOCI compatible mode

Using nerdctl with SOCI parallel pull mode uses increased parallelism for both fetch and fetch operations.

Total time: 2m 12.846s

time sudo nerdctl pull --snapshotter soci 
  763104351884.dkr.ecr.us-east-1.amazonaws.com/sglang:0.5.10-gpu-py312-cu129-ubuntu24.04-sagemaker

763104351884.dkr.ecr.us-east-1.amazonaws.com/sglang:0.5.10-gpu-py312-cu129-ubuntu24.04-sagemaker:
  resolved       |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:fd0cf60bbb34a5d30f22595215a633e5d4a7260fc0868aabe3f04b1174b7365d:
  done           |++++++++++++++++++++++++++++++++++++++|
config-sha256:5e6a53b7478b0631dd3c4222ab6619dae3a3dd32a565921f10b0b03fdc316d46:
  done           |++++++++++++++++++++++++++++++++++++++|
elapsed: 132.8s    total:  89.3 K (688.0 B/s)

real    2m12.846s
user    0m0.018s
sys     0m0.075s

Parallel pull summary

Using SOCI parallel draw mode reduced the time to draw an image from 4 minutes 44 seconds to 2 minutes 12 minutesrepresenting a 2.2x improvement in the towing operation.

The conclusion

SOCI abstraction provides improvements for both container initialization and image pull operations:

Lazy loading mode – Reached a 20x improvement container startup time (from 6+ minutes to 21 seconds)
Parallel drag mode – Reached a 2.2x improvement in the duration of the picture (from 4 minutes 44 seconds to 2 minutes 12 seconds)

Choose lazy load mode when you need the fastest possible container startup, or parallel pull mode when you need a full image available before your work starts.

Clean up

If you launched EC2 instances to test the SOCI snapshot, disconnect them to avoid incurring ongoing costs. Delete any container images you pushed to Amazon Elastic Container Registry (Amazon ECR) during testing, and delete any SOCI indexes you no longer need.

Getting started with SOCI

DLAMI and Deep Learning Containers are publicly available today with the SOCI acronym and SOCI index. For more information about publicly available DLAMI and Deep Learning Containers, you can check the SOCI Index DLAMI to select images that support SOCI, and check the Deep Learning Containers Repository for more information on supported SOCI-indexed images.

For detailed configuration guidance and best practices, see the SOCI documentation and SOC Deep Learning Container documentation.