๐ŸคStalecollected in 16h

Nemotron 3 Super Launches on Together AI

Nemotron 3 Super Launches on Together AI
PostLinkedIn
๐ŸคRead original on Together AI Blog

๐Ÿ’กNVIDIA's 1M-context LLM now on Together AI for easy multi-agent deployment.

โšก 30-Second TL;DR

What Changed

Nemotron 3 Super now accessible via Together AI Dedicated Inference

Why It Matters

This launch simplifies access to advanced NVIDIA LLMs for developers, reducing infra overhead. It boosts multi-agent AI apps with long-context handling, potentially speeding up production workflows.

What To Do Next

Sign up on Together AI and deploy Nemotron 3 Super for multi-agent inference testing.

Who should care:Developers & AI Engineers

๐Ÿง  Deep Insight

Web-grounded analysis with 5 cited sources.

๐Ÿ”‘ Enhanced Key Takeaways

  • โ€ขNemotron 3 Super features approximately 100B total parameters with 10B active per token, positioning it between the smaller Nano (30B/3B) and larger Ultra (500B/50B) variants.
  • โ€ขIt employs a hybrid Mamba-Transformer MoE architecture combined with Latent MoE for enhanced expert specialization and 4x more experts at the same inference cost.
  • โ€ขTrained using NVIDIA's 4-bit NVFP4 precision on Blackwell architecture, enabling reduced memory usage and faster training without accuracy loss.
  • โ€ขOutperforms models like GPT-OSS-20B and Qwen3-30B-A3B-Thinking-2507 on benchmarks, with the Nano variant showing 3.3x higher throughput than Qwen3-30B-A3B.

๐Ÿ› ๏ธ Technical Deep Dive

  • โ€ขHybrid Mamba-Transformer MoE architecture: Integrates Mamba layers for efficient long-range sequence modeling, Transformer layers for precise reasoning, and MoE routing for scalable compute.
  • โ€ขLatent MoE: Experts process shared latent representations before token projection, supporting 4x more experts for better specialization in multi-hop reasoning.
  • โ€ขMulti-Token Prediction (MTP) layers: Improve long-form text generation efficiency and model quality.
  • โ€ขNVFP4 4-bit floating-point format: Used for pretraining on 25T token dataset, optimizing cost-accuracy for training and inference on Blackwell GPUs.
  • โ€ขPositioned for multi-agent workloads like IT ticket automation, with up to 4x higher token throughput and reduced reasoning tokens compared to Nemotron 2.

๐Ÿ”ฎ Future ImplicationsAI analysis grounded in cited sources

Nemotron 3 Super will accelerate adoption in enterprise multi-agent systems by early 2026.
Early adopters including Accenture, CrowdStrike, Oracle, Palantir, Perplexity, ServiceNow, Siemens, and Zoom are integrating the Nemotron 3 family into manufacturing, cybersecurity, and automation workflows.
Latent MoE and NVFP4 will become standard for efficient large-scale MoE training.
These innovations enable larger models like Super and Ultra with lower memory and compute needs while maintaining accuracy, as demonstrated on Blackwell hardware.

โณ Timeline

2025-12
NVIDIA releases Nemotron 3 Nano, the first model in the family with 30B parameters and 1M-token context.
2026-01
NVIDIA announces full Nemotron 3 family including Super and Ultra, with technical reports and open tools like NeMo Gym.
2026-03
Nemotron 3 Super launches on Together AI's Dedicated Inference platform for production deployment.
๐Ÿ“ฐ

Weekly AI Recap

Read this week's curated digest of top AI events โ†’

๐Ÿ‘‰Related Updates

AI-curated news aggregator. All content rights belong to original publishers.
Original source: Together AI Blog โ†—