NVIDIA Nemotron 3 Ultra is now available on Amazon SageMaker JumpStart

Today, we are happy to announce the availability of day-zero NVIDIA Nemotron 3 Ultra in Amazon SageMaker JumpStart.
With this launch, you can now use the Nemotron 3 Ultra model using a one-click experience. Nemotron 3 Ultra is an open model built for edge computing and orchestration for long-running agents, delivering 5x faster computing and up to 30% lower costs for agent workloads. The Nemotron 3 Ultra is designed for the NVFP4 format, which makes the model much faster and less expensive to handle.
Overview of NVIDIA Nemotron 3 Ultra
NVIDIA Nemotron 3 Ultra is a large open language model with 550 billion parameters and 55 billion active parameters. It is built on a hybrid Transformer-Mamba Mixture-of-Experts (MoE), designed to deliver borderline intelligence at a fraction of the computational cost of dense models of similar quality.
| Clarification | Details |
|---|---|
| Buildings | Hybrid Transformer-Mamba MoE |
| Parameters | 550B total / 55B active |
| The length of the thread | Up to 1M tokens |
| Input / Output | Text, text out |
| Accuracy | NVFP4 |
| Pointing speed | 5x faster long-lasting agent workflow |
| Costs | Up to 30% less on complex agent jobs |
Why agent AI needs purpose-built models
Agents don't respond just once. They plan, call the tools, send the work to sub-agents, check the results, and proceed to turn hundreds. Every step adds tokens and counts, so the key metrics are task completion with useful accuracy, completion time, and cost per task.
The Nemotron 3 Ultra addresses this directly. Its MoE architecture uses only 55B of 550B parameters per forward pass, which keeps throughput high even with a context length of one million tokens. This means agents can support scheduling, tooling, and maintenance loops that take hundreds of hours while helping to maintain compliance and manage costs.
Business use cases
The Nemotron 3 Ultra excels in workloads that require multi-step thinking:
- Agent orchestrators – coordinate multiple sub-agents, control the country throughout the long chain of calling tools
- Coding agents – generate, test, debug, and iterate on code across large clusters
- Advanced Search – integrate information from multiple sources, maintain coherent thinking over a broader context
- Complex business workflows – change multi-step business processes with decision integration and error detection
Getting started with SageMaker JumpStart
You can deploy Nemotron 3 Ultra with Amazon SageMaker JumpStart with one click, eliminating the need to manage infrastructure or configure deployment parameters.
What is required
Before you begin, make sure you have:
- AWS account
- Properly designed permissions for SageMaker JumpStart
- Sufficient service allocation for GPU instances (for example, ml.p5en.48xlarge, ml.p5.48xlarge, or ml.g7e.48xlarge)
Important: Deploying this model creates a SageMaker endpoint that incurs costs while running. GPU instances like ml.p5en.48xlarge can cost a few dollars an hour. See Amazon SageMaker AI pricing for details. Remember to remove your finish when you're done to avoid ongoing costs.
Implement using SageMaker Studio
- Open Amazon SageMaker Studio
- In the left navigation pane, select SageMaker JumpStart
- Search for Nemotron 3 Ultra
- Select the card model
- Select Apply
- Choose your instance type (supported instance types are ml.p5en.48xlarge, ml.p5.48xlarge, or ml.g7e.48xlarge)
- Update application settings (default is sufficient for most use cases)
- Select Apply to create the repository
- Wait for the endpoint status to show InService before proceeding

Implement using the SageMaker Python SDK
Run the inference
Clean up
To avoid incurring unnecessary costs, remove the SageMaker repository when you're done:predictor.delete_endpoint()
The conclusion
NVIDIA Nemotron 3 Ultra brings boundary layer inference to Amazon SageMaker JumpStart with 5x faster inference and up to 30% lower cost of agent workload. The hybrid Transformer-Mamba MoE architecture and one million token context window make it purpose-built for the continuous, multi-step reasons production agents demand.
Whether you're building agent orchestrators, coding agents, deep research systems, or complex business automation, Nemotron 3 Ultra is ready to use today from SageMaker JumpStart.
Get started now by searching for Nemotron 3 Ultra on Amazon SageMaker JumpStart.
About the writers
Dan Ferguson Solutions Architect at AWS, based in New York, USA. As a machine learning services specialist, Dan works to support clients on their journey to integrate ML workflows effectively, efficiently, and sustainably.
Malav Shastri is a Software Development Engineer at AWS, where he works on the Amazon SageMaker JumpStart and Amazon Bedrock teams. His role focuses on enabling clients to take advantage of modern open source and proprietary models. Malav holds a Master's degree in Computer Science.
Vivek Gangasani is the Global Leader of Solutions Architecture, SageMaker Inference. He leads the Solution Architecture, Technical Go-to-Market (GTM) and outsourced product strategy for SageMaker Inference. He also helps enterprises and startups deploy and optimize GenAI models and build AI workflows with SageMaker and GPUs. Currently, he is focused on developing strategies and content to improve inference performance and use cases such as Agentic workflow, RAG etc. In his free time, Vivek likes to walk, watch movies, and try different foods.



