Nvidia Custom Chip for OpenAI Inference

#inference #chip #gtcnvidia-ai-processor

💡Nvidia's Groq-integrated inference chip for OpenAI; GTC launch soon.

⚡ 30-Second TL;DR

What Changed

New system for AI inference user requests

Why It Matters

Enhances AI efficiency, solidifying Nvidia-OpenAI partnership. May reshape inference hardware competition.

What To Do Next

Who should care:Developers & AI Engineers

Web-grounded analysis with 3 cited sources.

•Nvidia's agreement with Groq is valued at $20 billion and includes an acqui-hire of founder Jonathan Ross, formerly of Google's TPU team, and other key engineers.[1][3]
•The deal is a non-exclusive licensing of Groq's LPU inference technology, allowing Groq to continue operating independently under new CEO Simon Edwards.[2]
•Groq's LPUs feature a 144-way VLIW tensor-streaming processor with only on-chip SRAM (230 MB per chip), optimized for low-latency single-user inference at batch size one.[1][3]

•Groq LPU uses a 144-way VLIW design built at GlobalFoundries, contrasting with Google's 8-way VLIW systolic arrays in TPUs, enabling cheap scaling but limited to 230 MB SRAM per chip with no external DDR or HBM.[3]
•Static scheduling and tensor-streaming eliminate memory bottlenecks via ultra-fast on-chip SRAM, excelling in sequential low-latency tasks for real-time AI like chatbots.[1]
•Running Llama 70B requires 10 racks and over 100 kW due to SRAM constraints; second-gen chip planned on Samsung 4nm for 2025 revenue but not yet evidenced in market.[3]

Nvidia will launch a dedicated inference accelerator card by Q3 2026

Podcast analysis details Nvidia's roadmap to release this hardware integrating Groq LPU tech post-deal.[1]

Integration of Groq LPU into Nvidia Blackwell architecture will enable hybrid GPU-LPU systems

The strategic agreement aims to blend high-throughput GPUs with low-latency LPUs for unified AI compute platforms.[1]

Antitrust scrutiny will intensify on Nvidia's AI dominance

The $20B deal absorbing Groq talent and tech raises concerns as Nvidia consolidates inference leadership over rivals like AMD.[1]

2019

Groq releases first LPU chip with 144-way VLIW architecture.

2024

Groq announces second-gen chip plans on Samsung 4nm for 2025 revenue.

2025-02

Groq secures $1.5B commitment from Saudi Arabia for LPU expansion.

2025-12

Nvidia and Groq announce $20B non-exclusive licensing deal and acqui-hire of key team.

Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.

Weekly AI Recap

Read this week's curated digest of top AI events →

Same topic

Explore #inference

Same product