NVIDIA Blackwell Ultra achieves 50x tokens per MW throughput vs Hopper in DeepSeek-R1 tests, reducing million-token costs to 1/35th. It previews Rubin platform with 10x further gains. Key enablers include 130 TB/s NVLink and NVFP4 precision.
Key Points
- 1.50x per-MW throughput vs Hopper using DeepSeek-R1
- 2.Million-token cost reduced to 1/35th of Hopper
- 3.130 TB/s NVLink interconnects 72 GPUs
- 4.1.5x better long-context efficiency vs GB200
- 5.Rubin platform teased with 10x Blackwell gains
Impact Analysis
Dramatic efficiency jumps lower AI inference costs, enabling scalable deployments for coding agents and MoE models. Enterprises can plan Hopper-to-Blackwell migrations for 35x savings.
Technical Details
72-GPU NVLink cluster at 130 TB/s; NVFP4 format boosts MoE inference. TensorRT-LLM optimizations yield 5x low-latency gains on GB200 in months.



