Nemotron 3 Super Launches on Together AI

Post LinkedIn

🤝Read original on Together AI Blog

#multi-agent #long-context #managed-inferencenvidia-nemotron-3-super

💡NVIDIA's 1M-context LLM now on Together AI for easy multi-agent deployment.

⚡ 30-Second TL;DR

What Changed

Nemotron 3 Super now accessible via Together AI Dedicated Inference

Why It Matters

This launch simplifies access to advanced NVIDIA LLMs for developers, reducing infra overhead. It boosts multi-agent AI apps with long-context handling, potentially speeding up production workflows.

What To Do Next

Who should care:Developers & AI Engineers

🧠 Deep Insight

Web-grounded analysis with 5 cited sources.

🔑 Enhanced Key Takeaways

•Nemotron 3 Super features approximately 100B total parameters with 10B active per token, positioning it between the smaller Nano (30B/3B) and larger Ultra (500B/50B) variants.
•It employs a hybrid Mamba-Transformer MoE architecture combined with Latent MoE for enhanced expert specialization and 4x more experts at the same inference cost.
•Trained using NVIDIA's 4-bit NVFP4 precision on Blackwell architecture, enabling reduced memory usage and faster training without accuracy loss.
•Outperforms models like GPT-OSS-20B and Qwen3-30B-A3B-Thinking-2507 on benchmarks, with the Nano variant showing 3.3x higher throughput than Qwen3-30B-A3B.

🛠️ Technical Deep Dive

•Hybrid Mamba-Transformer MoE architecture: Integrates Mamba layers for efficient long-range sequence modeling, Transformer layers for precise reasoning, and MoE routing for scalable compute.
•Latent MoE: Experts process shared latent representations before token projection, supporting 4x more experts for better specialization in multi-hop reasoning.
•Multi-Token Prediction (MTP) layers: Improve long-form text generation efficiency and model quality.
•NVFP4 4-bit floating-point format: Used for pretraining on 25T token dataset, optimizing cost-accuracy for training and inference on Blackwell GPUs.
•Positioned for multi-agent workloads like IT ticket automation, with up to 4x higher token throughput and reduced reasoning tokens compared to Nemotron 2.

🔮 Future ImplicationsAI analysis grounded in cited sources

Nemotron 3 Super will accelerate adoption in enterprise multi-agent systems by early 2026.

Early adopters including Accenture, CrowdStrike, Oracle, Palantir, Perplexity, ServiceNow, Siemens, and Zoom are integrating the Nemotron 3 family into manufacturing, cybersecurity, and automation workflows.

Latent MoE and NVFP4 will become standard for efficient large-scale MoE training.

These innovations enable larger models like Super and Ultra with lower memory and compute needs while maintaining accuracy, as demonstrated on Blackwell hardware.

⏳ Timeline

2025-12

NVIDIA releases Nemotron 3 Nano, the first model in the family with 30B parameters and 1M-token context.

2026-01

NVIDIA announces full Nemotron 3 family including Super and Ultra, with technical reports and open tools like NeMo Gym.

2026-03

Nemotron 3 Super launches on Together AI's Dedicated Inference platform for production deployment.

📎 Sources (5)

Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.

🤝Read original article on Together AI Blog

📰

Weekly AI Recap

Read this week's curated digest of top AI events →

👉Related Updates

Same topic

Explore #multi-agent

Same product