AWS and NVIDIA deepen strategic collaboration to accelerate AI from pilot to production

nimda March 16, 2026

0 10 5 minutes read

AI is moving fast, and for many of our customers, the real opportunity isn't in testing it—it's in using AI in production where it drives meaningful business results. This means building systems that work reliably, perform at scale, and meet your organization's security and compliance requirements.

Today at NVIDIA GTC 2026, AWS and NVIDIA announced expanded collaboration and new technology integrations to support the growing demand for AI computing and help you build and deploy production-ready AI solutions. This integration has accelerated computing, correlation technology, and fine-tuning of models and interpretation. They include:

Big announcements at NVIDIA GTC 2026

Scale AI infrastructure with expanded GPU options and improved connectivity

Accelerating computing capacity in the age of agent AI

Starting in 2026, AWS will add more than 1 million NVIDIA GPUs including Blackwell and Rubin GPU architectures across our global cloud regions. AWS offers the broadest set of NVIDIA GPU-based instances of any cloud provider to power a diverse set of AI/ML workloads. AWS and NVIDIA also collaborate on Spectrum networking and other infrastructure areas, adding to more than 15 years of joint innovation between our two companies.

AWS' advanced cloud and AI infrastructure provides businesses, startups, and researchers with the infrastructure needed to build and scale agent-based AI systems that can think, program, and operate autonomously across complex tasks.

New Amazon EC2 instances with NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs

Today, we announced that Amazon EC2 instances accelerated by NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs are coming soon. AWS is the first major cloud provider to announce support for RTX PRO 4500 Blackwell Server Edition GPUs. These scenarios are well-suited for a wide variety of tasks, including data analysis, conversational AI, content generation, recommendation systems, video streaming, video rendering, and other graphics workloads.

Amazon EC2 instances accelerated by NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs will be built on the AWS Nitro System, a combination of dedicated hardware and a lightweight hypervisor that brings almost all the computing resources and memory of the host hardware to your instances for better resource utilization and performance. Nitro System's specialized hardware, software, and firmware are designed to enforce restrictions so that no one, including anyone at AWS, can access your critical AI workload and data. In addition, Nitro System supports firmware updates, bug fixes, and maintenance while the system is running. These capabilities within the Nitro system enable the improved resource efficiency, security, and stability required by AI, analytics, and graphics-heavy manufacturing.

Accelerate the connection of LLM classification with NVIDIA NIXL in AWS EFA and Trainium

As the model sizes increase, the overhead connection between the GPU or Trainium can be a bottleneck. Today, we announced support for the NVIDIA Inference Xfer Library (NIXL) with AWS EFA to accelerate distributed Large Language Model (LLM) analysis on Amazon EC2, across all NVIDIA GPUs and AWS Trainiums. Accelerating distributed logic is critical to modern AI load balancing because it enables efficient clustering of communication and computation while reducing communication latency and maximizing GPU utilization. This integration enables high-speed, low-latency KV-cache data movement between GPU compute nodes that perform token generation and distributed memory resources that maintain the state of the KV cache. It also provides the flexibility to build visualization clusters using any combination of GPU and EC2 instances enabled by Trainium EFA. NIXL and EFA integrate natively with popular open source frameworks such as NVIDIA Dynamo, vLLM, and SGLang, delivering improved token latency and more efficient KV cache memory usage.

Accelerate data analysis with Amazon EMR and NVIDIA GPUs

Running Apache Spark 3x faster using Amazon EMR on Amazon EKS with G7e instances

Data engineers and data scientists often face data processing pipelines for hours that slow down AI/ML model iteration and business intelligence generation. We see huge performance benefits for this workload—AWS and NVIDIA deliver 3x faster performance for Apache Spark workloads with Amazon EMR on EKS in G7e instances. This functionality results from an AWS-NVIDIA engineering collaboration that enhances GPU-accelerated analytics by integrating Amazon EMR on EKS with NVIDIA's RTX PRO 6000 architecture. With Amazon EMR and G7e instances, data engineers and data scientists can accelerate insight into AI/ML feature engineering, complex ETL transformations, and real-time analytics at scale. Customers using large data processing pipelines can reduce the time required to perform calculations while maintaining full compatibility with existing Spark applications.

Extends NVIDIA Nemotron model support to Amazon Bedrock

Amazon Bedrock Fine-Tuning Nemotron Models with Reinforcement Fine-Tuning (Coming Soon)

Developers will soon be able to fine-tune NVIDIA Nemotron models directly on Amazon Bedrock using Reinforcement Fine-Tuning (RFT). This is important for teams that need to align model behaviors with specific domains, whether that's legal, healthcare, finance, or any other specialized field. Fine tuning allows you to shape how the model reasons and responds, not just what it knows. And because this runs natively on Amazon Bedrock, there's zero infrastructure overhead. You define the function, provide the feedback signal, and Bedrock handles the rest. Learn about Fine-Tuning on Amazon Bedrock.

Nemotron 3 Super on Amazon Bedrock (Coming Soon)

NVIDIA Nemotron 3 Super—a hybrid MoE model built for multi-agent and extended computing—is coming soon to Amazon Bedrock. Designed to enable AI agents to maintain accuracy throughout a complex, multi-step process, it powers use cases across financial cybersecurity, marketing, and software development—delivering fast, cost-effective insights through a fully managed API.

Improving energy efficiency and sustainability

As an AI workload scaler, performance per watt isn't just a sustainability metric—it's a competitive advantage. In this NVIDIA GTC session, Amazon CSO Kara Hurst will join sustainability leaders from Equinix and PepsiCo to discuss how AI is transforming business power and infrastructure at scale—from data centers as active grid participants to AI as a business efficiency engine, and how AWS can help you achieve energy efficiency with an AWS infrastructure that has 4.1x more data center efficiency.

Built to run, together

What makes these announcements exciting isn't a single skill—it's what they represent together. The fifteen-year partnership between AWS and NVIDIA has produced a number of advanced end-to-end AI infrastructures, from the GPU to the network to the managed services layer. You don't have to put it together. It's perfect for running.

If you're at GTC this week, come find us at the AWS booth. Check out live demos, find our theater sessions, and pick up custom swag with the AWS Swag Factory.

Visit AWS at NVIDIA GTC 2026 to see everything AWS is doing at the conference.