Nemotron 3 Super Launches on Together AI

๐กNVIDIA's 1M-context LLM now on Together AI for easy multi-agent deployment.
โก 30-Second TL;DR
What Changed
Nemotron 3 Super now accessible via Together AI Dedicated Inference
Why It Matters
This launch simplifies access to advanced NVIDIA LLMs for developers, reducing infra overhead. It boosts multi-agent AI apps with long-context handling, potentially speeding up production workflows.
What To Do Next
Sign up on Together AI and deploy Nemotron 3 Super for multi-agent inference testing.
๐ง Deep Insight
Web-grounded analysis with 5 cited sources.
๐ Enhanced Key Takeaways
- โขNemotron 3 Super features approximately 100B total parameters with 10B active per token, positioning it between the smaller Nano (30B/3B) and larger Ultra (500B/50B) variants.
- โขIt employs a hybrid Mamba-Transformer MoE architecture combined with Latent MoE for enhanced expert specialization and 4x more experts at the same inference cost.
- โขTrained using NVIDIA's 4-bit NVFP4 precision on Blackwell architecture, enabling reduced memory usage and faster training without accuracy loss.
- โขOutperforms models like GPT-OSS-20B and Qwen3-30B-A3B-Thinking-2507 on benchmarks, with the Nano variant showing 3.3x higher throughput than Qwen3-30B-A3B.
๐ ๏ธ Technical Deep Dive
- โขHybrid Mamba-Transformer MoE architecture: Integrates Mamba layers for efficient long-range sequence modeling, Transformer layers for precise reasoning, and MoE routing for scalable compute.
- โขLatent MoE: Experts process shared latent representations before token projection, supporting 4x more experts for better specialization in multi-hop reasoning.
- โขMulti-Token Prediction (MTP) layers: Improve long-form text generation efficiency and model quality.
- โขNVFP4 4-bit floating-point format: Used for pretraining on 25T token dataset, optimizing cost-accuracy for training and inference on Blackwell GPUs.
- โขPositioned for multi-agent workloads like IT ticket automation, with up to 4x higher token throughput and reduced reasoning tokens compared to Nemotron 2.
๐ฎ Future ImplicationsAI analysis grounded in cited sources
โณ Timeline
๐ Sources (5)
Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.
- datacamp.com โ Nvidia Nemotron 3
- research.nvidia.com โ Nemotron 3
- developer.nvidia.com โ Inside Nvidia Nemotron 3 Techniques Tools and Data That Make It Efficient and Accurate
- hyperframeresearch.com โ Nvidia Releases Nemotron 3 a New Family of Open Models
- deepinfra.com โ Nemotron 3 Nano Nvidia Efficient Small LLM
Weekly AI Recap
Read this week's curated digest of top AI events โ
๐Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: Together AI Blog โ