NVIDIA Nemotron 3 Ultra is now available on Amazon SageMaker JumpStart

0 1 3 minutes read

NVIDIA Nemotron 3 Ultra is now available on Amazon SageMaker JumpStart

Today, we are happy to announce the availability of day-zero NVIDIA Nemotron 3 Ultra in Amazon SageMaker JumpStart.

With this launch, you can now use the Nemotron 3 Ultra model using a one-click experience. Nemotron 3 Ultra is an open model built for edge computing and orchestration for long-running agents, delivering 5x faster computing and up to 30% lower costs for agent workloads. The Nemotron 3 Ultra is designed for the NVFP4 format, which makes the model much faster and less expensive to handle.

Overview of NVIDIA Nemotron 3 Ultra

NVIDIA Nemotron 3 Ultra is a large open language model with 550 billion parameters and 55 billion active parameters. It is built on a hybrid Transformer-Mamba Mixture-of-Experts (MoE), designed to deliver borderline intelligence at a fraction of the computational cost of dense models of similar quality.

Clarification	Details
Buildings	Hybrid Transformer-Mamba MoE
Parameters	550B total / 55B active
The length of the thread	Up to 1M tokens
Input / Output	Text, text out
Accuracy	NVFP4
Pointing speed	5x faster long-lasting agent workflow
Costs	Up to 30% less on complex agent jobs

Why agent AI needs purpose-built models

Agents don't respond just once. They plan, call the tools, send the work to sub-agents, check the results, and proceed to turn hundreds. Every step adds tokens and counts, so the key metrics are task completion with useful accuracy, completion time, and cost per task.

The Nemotron 3 Ultra addresses this directly. Its MoE architecture uses only 55B of 550B parameters per forward pass, which keeps throughput high even with a context length of one million tokens. This means agents can support scheduling, tooling, and maintenance loops that take hundreds of hours while helping to maintain compliance and manage costs.

Business use cases

The Nemotron 3 Ultra excels in workloads that require multi-step thinking:

Agent orchestrators – coordinate multiple sub-agents, control the country throughout the long chain of calling tools
Coding agents – generate, test, debug, and iterate on code across large clusters
Advanced Search – integrate information from multiple sources, maintain coherent thinking over a broader context
Complex business workflows – change multi-step business processes with decision integration and error detection

Getting started with SageMaker JumpStart

You can deploy Nemotron 3 Ultra with Amazon SageMaker JumpStart with one click, eliminating the need to manage infrastructure or configure deployment parameters.

What is required

Before you begin, make sure you have:

AWS account
Properly designed permissions for SageMaker JumpStart
Sufficient service allocation for GPU instances (for example, ml.p5en.48xlarge, ml.p5.48xlarge, or ml.g7e.48xlarge)

Important: Deploying this model creates a SageMaker endpoint that incurs costs while running. GPU instances like ml.p5en.48xlarge can cost a few dollars an hour. See Amazon SageMaker AI pricing for details. Remember to remove your finish when you're done to avoid ongoing costs.

Implement using SageMaker Studio

Open Amazon SageMaker Studio
In the left navigation pane, select SageMaker JumpStart
Search for Nemotron 3 Ultra
Select the card model
Select Apply
Choose your instance type (supported instance types are ml.p5en.48xlarge, ml.p5.48xlarge, or ml.g7e.48xlarge)
Update application settings (default is sufficient for most use cases)
Select Apply to create the repository
Wait for the endpoint status to show InService before proceeding

Implement using the SageMaker Python SDK

import sagemaker
from sagemaker.jumpstart.model import JumpStartModel
model = JumpStartModel(
    model_id="huggingface-reasoning-nvidia-nemotron-3-ultra-550b-a55b-nvfp4",  # Verify in SageMaker JumpStart model card
    role=sagemaker.get_execution_role(),  # Your SageMaker execution role ARN
)
predictor = model.deploy(accept_eula=True)

Run the inference

payload = {
    "messages": [{
        "role": "user",
        "content": "Break this task into subtasks, identify which tools are needed, and run them in sequence."
    }],
    "max_tokens": 20480,
    "temperature": 0.6,
    "top_p": 0.95,
}
response = predictor.predict(payload)
print(response["choices"][0]["message"]["content"])

Clean up

To avoid incurring unnecessary costs, remove the SageMaker repository when you're done:predictor.delete_endpoint()

The conclusion

NVIDIA Nemotron 3 Ultra brings boundary layer inference to Amazon SageMaker JumpStart with 5x faster inference and up to 30% lower cost of agent workload. The hybrid Transformer-Mamba MoE architecture and one million token context window make it purpose-built for the continuous, multi-step reasons production agents demand.

Whether you're building agent orchestrators, coding agents, deep research systems, or complex business automation, Nemotron 3 Ultra is ready to use today from SageMaker JumpStart.

Get started now by searching for Nemotron 3 Ultra on Amazon SageMaker JumpStart.

About the writers

Dan Ferguson Solutions Architect at AWS, based in New York, USA. As a machine learning services specialist, Dan works to support clients on their journey to integrate ML workflows effectively, efficiently, and sustainably.

Malav Shastri is a Software Development Engineer at AWS, where he works on the Amazon SageMaker JumpStart and Amazon Bedrock teams. His role focuses on enabling clients to take advantage of modern open source and proprietary models. Malav holds a Master's degree in Computer Science.

Vivek Gangasani is the Global Leader of Solutions Architecture, SageMaker Inference. He leads the Solution Architecture, Technical Go-to-Market (GTM) and outsourced product strategy for SageMaker Inference. He also helps enterprises and startups deploy and optimize GenAI models and build AI workflows with SageMaker and GPUs. Currently, he is focused on developing strategies and content to improve inference performance and use cases such as Agentic workflow, RAG etc. In his free time, Vivek likes to walk, watch movies, and try different foods.