Run NVIDIA Nemotron 3 Super on Amazon Bedrock

nimda March 19, 2026

0 3 5 minutes read

Run NVIDIA Nemotron 3 Super on Amazon Bedrock

The Nemotron 3 Super is now available as a fully managed and serverless model on Amazon Bedrock, joining the Nemotron Nano models already available on the Amazon Bedrock site.

With NVIDIA Nemotron open models on Amazon Bedrock, you can accelerate innovation and deliver tangible business value without managing infrastructure complexity. You can power your productive AI applications with Nemotron by using Amazon Bedrock's fully managed index, using its many features and tools.

This post examines the technical features of the Nemotron 3 Super model and discusses possible application scenarios. It also provides technical guidance to get you started using this model for your productive AI applications within the Amazon Bedrock environment.

About Nemotron 3 Super

Nemotron 3 Super is a hybrid Model of Expertise (MoE) model with advanced computing and precision for multi-agent applications and specialized AI systems. The model is released with open weights, datasets, and recipes so that developers can customize, improve, and apply the model to their infrastructure to improve privacy and security.

Model overview:

Architecture:
- MoE with Hybrid Transformer-Mamba architecture.
- It supports token budgeting to provide improved accuracy by generating a smaller token token.
Accuracy:
- The highest operating efficiency in its size category and up to 5x than the previous Nemotron Super model.
- Leading accuracy of reasoning and agent functions among the leading open models with accuracy up to 2x higher than the previous version.
- It achieves high accuracy on all leading benchmarks, including AIME 2025, Terminal-Bench, SWE Bench certified and multilingual, RULER.
- Multi-environment RL training gave the model leading accuracy in all 10+ areas with NVIDIA NeMo.
Model size: 120 B with 12 B active parameters
Context length: up to 256K tokens
Model input: Text
Model output: Text
Languages: English, French, German, Italian, Japanese, Spanish and Chinese

Hidden MoE

Nemotron 3 Super uses a hidden MoE, where experts work on a shared hidden representation before the output is reflected back into the token space. This approach allows the model to call 4x more experts for the same inference cost, allowing for better specialization around subtle semantic structures, domain abstractions, or multi-hop reasoning patterns.

Multi Token Prediction (MTP)

MTP enables the model to predict several future tokens in a single pass, which greatly increases the output of long-term thinking sequences and systematic results. Through planning, trajectory generation, extended chain of reasoning, or code generation, MTP reduces latency and improves agent responsiveness.

To learn more about Nemotron 3 Super's architecture and how it is trained, see Introducing Nemotron 3 Super: Open Hybrid Mamba Transformer MoE for Agentic Reasoning.

NVIDIA Nemotron 3 Super use cases

Nemotron 3 Super helps power a variety of use cases in different industries. Some of the use cases include

Software development: Assist with tasks such as code summarization.
Finance: Speed up loan processing by extracting data, analyzing revenue patterns, and detecting fraudulent activities, which can help reduce cycle times and risk.
Cybersecurity: Can be used to diagnose problems, perform in-depth malware analysis, and proactively hunt for security threats.
Search: Can help understand user intent to activate relevant agents.
Sales: It can help improve inventory management and improve real-time in-store service, personalized product recommendations and support.
Multi-agent Workflows: Organizes specific agents for work—scheduling, tooling, authentication, and domain execution—to automate complex, end-to-end business processes.

Get started with NVIDIA Nemotron 3 Super at Amazon Bedrock. Complete the following steps to test NVIDIA Nemotron 3 Super on Amazon Bedrock

Navigate to Amazon Bedrock console and select A chat/text playground from the left menu (under Check it out part).
Select Select a model in the upper left corner of the playing field.
Select NVIDIA in the category list, and select NVIDIA Nemotron 3 Super.
Select Claim to load the model.

After completing the previous steps, you can test the model immediately. To really show Nemotron 3 Super's power, we will go beyond the simple syntax and pose it with a complex engineering challenge. High-level logic models are at the forefront of “system-level” thinking where they must balance architectures, synchronization, and distributed state management.

Let's use the following information to design a globally distributed service:

"Design a distributed rate-limiting service in Python that must support 100,000 requests per second across multiple geographic regions.

1. Provide a high-level architectural strategy (e.g., Token Bucket vs. Fixed Window) and justify your choice for a global scale. 2. Write a thread-safe implementation using Redis as the backing store. 3. Address the 'race condition' problem when multiple instances update the same counter. 4. Include a pytest suite that simulates network latency between the app and Redis."

This notice requires the model to act as a master engineer of a distributed system – thinking about trade-offs, generating safe code, anticipating failure modes, and validating everything through practical testing, all with one coherent answer.

Using the AWS CLI and SDKs

You can access the model programmatically using the model ID nvidia.nemotron-super-3-120b . The model supports both InvokeModel again Discuss APIs via AWS Command Line Interface (AWS CLI) and AWS SDK via nvidia.nemotron-super-3-120b such as the model ID. In addition, it supports the corresponding Amazon Bedrock OpenAI SDK API.

Run the following command to request the model directly from your terminal using the AWS Command Line Interface (AWS CLI) and InvokeModel API:

aws bedrock-runtime invoke-model  
 --model-id nvidia.nemotron-super-3-120b  
 --region us-west-2  
 --body '{"messages": [{"role": "user", "content": "Type_Your_Prompt_Here"}], "max_tokens": 512, "temperature": 0.5, "top_p": 0.9}'  
 --cli-binary-format raw-in-base64-out  
invoke-model-output.txt

If you want to invoke the model with the AWS SDK for Python (Boto3), use the following script to send information to the model, this time using the Converse API:

import boto3 
from botocore.exceptions import ClientError 

# Create a Bedrock Runtime client in the AWS Region you want to use. 
client = boto3.client("bedrock-runtime", region_name="us-west-2") 

# Set the model ID
model_id = "nvidia.nemotron-super-3-120b" 

# Start a conversation with the user message. 

user_message = "Type_Your_Prompt_Here" 
conversation = [ 
   { 
       "role": "user", 

       "content": [{"text": user_message}], 
   } 
]  

try: 
   # Send the message to the model using a basic inference configuration. 
   response = client.converse( 
        modelId=model_id, 

       messages=conversation, 
        inferenceConfig={"maxTokens": 512, "temperature": 0.5, "topP": 0.9}, 
   ) 
 
   # Extract and print the response text. 
    response_text = response["output"]["message"]["content"][0]["text"] 
   print(response_text)

except (ClientError, Exception) as e: 
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}") 
    exit(1)

To invoke the model with the Amazon Bedrock OpenAI-compatible ChatCompletions endpoint you can proceed as follows using the OpenAI SDK:

# Import OpenAI SDK
from openai import OpenAI

# Set environment variables
os.environ["OPENAI_API_KEY"] = ""
os.environ["OPENAI_BASE_URL"] = "https://bedrock-runtime..amazon.com/openai/v1"

# Set the model ID
model_id = "nvidia.nemotron-super-3-120b"

# Set prompts
system_prompt = “Type_Your_System_Prompt_Here”
user_message = "Type_Your_User_Prompt_Here"


# Use ChatCompletionsAPI
response = client.chat.completions.create(
    model= model _ID,                 
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user",   "content": user_message}
    ],
    temperature=0,
    max_completion_tokens=1000
)
 
# Extract and print the response text
print(response.choices[0].message.content)

The conclusion

In this post, we showed you how to get started with NVIDIA Nemotron 3 Super on Amazon Bedrock to build the next generation of AI applications. By combining the Hybrid Transformer-Mamba model's advanced architecture and Latent MoE with Amazon Bedrock's fully managed, serverless infrastructure, organizations can now deploy logical and efficient applications at scale without the heavy lifting of backend management. Ready to see what this model can do for your specific workflow?

Try it now: Head over to the Amazon Bedrock Console to try out the NVIDIA Nemotron 3 Super in a simulated gaming environment.
Build: Check out the AWS SDK to integrate Nemotron 3 Super into your existing AI productivity pipelines.