🐯虎嗅•Stalecollected in 21m
Keller's Anti-GPU AI Chip Revealed

💡Jim Keller's RISC-V AI chip beats GPU inner loops – dev-friendly alternative.
⚡ 30-Second TL;DR
What Changed
NoC grid Tensix cores: 5x RISC-V, 1.5MB SRAM, vector/matrix units
Why It Matters
Challenges GPU monopoly with efficient, cost-effective AI hardware scalable to racks. Appeals to devs seeking alternatives to NVIDIA for custom accelerators.
What To Do Next
Download tt-Metalium SDK and prototype a 32x32 tiled matrix multiply on Tensix.
Who should care:Developers & AI Engineers
🧠 Deep Insight
AI-generated analysis for this event.
🔑 Enhanced Key Takeaways
- •Tenstorrent's architecture utilizes a 'dataflow' execution model, which fundamentally differs from the instruction-driven SIMT model of GPUs by allowing data to flow directly between cores without constant memory round-trips.
- •The Wormhole chip incorporates a proprietary 'Packetized' interconnect protocol that allows for seamless scaling across heterogeneous clusters, specifically targeting the reduction of latency in large-scale distributed training.
- •Tenstorrent has shifted its business model to include IP licensing of its RISC-V and Tensix core technology to third-party silicon vendors, aiming to create a broader ecosystem beyond its own hardware.
📊 Competitor Analysis▸ Show
| Feature | Tenstorrent Wormhole | NVIDIA Blackwell (B200) | Groq LPU |
|---|---|---|---|
| Architecture | Dataflow / RISC-V | SIMT / Hopper-Blackwell | Tensor Streaming Processor |
| Interconnect | Ethernet-based (Scale-out) | NVLink (Scale-up) | Proprietary Low-Latency |
| Memory Model | Explicit SRAM (Managed) | HBM3e (Cache-based) | SRAM-centric (Deterministic) |
| Primary Focus | Efficiency/Flexibility | Raw Throughput/Ecosystem | Inference Latency |
🛠️ Technical Deep Dive
- Tensix Core: Comprises a 5-core RISC-V cluster (1 control, 4 worker) paired with a high-performance matrix math engine and a vector engine.
- Memory Hierarchy: Employs a distributed, software-managed SRAM architecture rather than traditional hardware-managed L1/L2 caches to ensure deterministic execution timing.
- Dataflow Engine: Designed to execute graphs of operations where data movement is scheduled at compile-time, minimizing the overhead of instruction fetching and decoding.
- tt-Metalium SDK: Provides a low-level abstraction layer that allows developers to manage data movement and compute scheduling directly on the Tensix cores, bypassing standard driver overhead.
🔮 Future ImplicationsAI analysis grounded in cited sources
Tenstorrent will prioritize IP licensing revenue over direct hardware sales by 2027.
The company's strategic pivot toward licensing RISC-V and Tensix IP suggests a long-term goal of becoming a foundational silicon provider rather than a pure-play hardware vendor.
Wormhole-based clusters will achieve lower TCO than GPU clusters for specific inference workloads.
The use of standard Ethernet for scaling and the elimination of expensive, proprietary interconnects like NVLink significantly reduces infrastructure capital expenditure.
⏳ Timeline
2016-05
Tenstorrent founded by Ljubisa Bajic, Ivan Hamer, and Milos Trajkovic.
2021-01
Jim Keller joins Tenstorrent as President and CTO.
2023-08
Tenstorrent secures $100 million in strategic funding led by Hyundai and Samsung Catalyst Fund.
2024-02
Tenstorrent announces the availability of its Wormhole development kits for enterprise partners.
📰
Weekly AI Recap
Read this week's curated digest of top AI events →
👉Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: 虎嗅 ↗
