🐯Stalecollected in 21m

Keller's Anti-GPU AI Chip Revealed

Keller's Anti-GPU AI Chip Revealed
PostLinkedIn
🐯Read original on 虎嗅
#ai-chip#noc#explicit-sramtenstorrent-wormhole

💡Jim Keller's RISC-V AI chip beats GPU inner loops – dev-friendly alternative.

⚡ 30-Second TL;DR

What Changed

NoC grid Tensix cores: 5x RISC-V, 1.5MB SRAM, vector/matrix units

Why It Matters

Challenges GPU monopoly with efficient, cost-effective AI hardware scalable to racks. Appeals to devs seeking alternatives to NVIDIA for custom accelerators.

What To Do Next

Download tt-Metalium SDK and prototype a 32x32 tiled matrix multiply on Tensix.

Who should care:Developers & AI Engineers

🧠 Deep Insight

AI-generated analysis for this event.

🔑 Enhanced Key Takeaways

  • Tenstorrent's architecture utilizes a 'dataflow' execution model, which fundamentally differs from the instruction-driven SIMT model of GPUs by allowing data to flow directly between cores without constant memory round-trips.
  • The Wormhole chip incorporates a proprietary 'Packetized' interconnect protocol that allows for seamless scaling across heterogeneous clusters, specifically targeting the reduction of latency in large-scale distributed training.
  • Tenstorrent has shifted its business model to include IP licensing of its RISC-V and Tensix core technology to third-party silicon vendors, aiming to create a broader ecosystem beyond its own hardware.
📊 Competitor Analysis▸ Show
FeatureTenstorrent WormholeNVIDIA Blackwell (B200)Groq LPU
ArchitectureDataflow / RISC-VSIMT / Hopper-BlackwellTensor Streaming Processor
InterconnectEthernet-based (Scale-out)NVLink (Scale-up)Proprietary Low-Latency
Memory ModelExplicit SRAM (Managed)HBM3e (Cache-based)SRAM-centric (Deterministic)
Primary FocusEfficiency/FlexibilityRaw Throughput/EcosystemInference Latency

🛠️ Technical Deep Dive

  • Tensix Core: Comprises a 5-core RISC-V cluster (1 control, 4 worker) paired with a high-performance matrix math engine and a vector engine.
  • Memory Hierarchy: Employs a distributed, software-managed SRAM architecture rather than traditional hardware-managed L1/L2 caches to ensure deterministic execution timing.
  • Dataflow Engine: Designed to execute graphs of operations where data movement is scheduled at compile-time, minimizing the overhead of instruction fetching and decoding.
  • tt-Metalium SDK: Provides a low-level abstraction layer that allows developers to manage data movement and compute scheduling directly on the Tensix cores, bypassing standard driver overhead.

🔮 Future ImplicationsAI analysis grounded in cited sources

Tenstorrent will prioritize IP licensing revenue over direct hardware sales by 2027.
The company's strategic pivot toward licensing RISC-V and Tensix IP suggests a long-term goal of becoming a foundational silicon provider rather than a pure-play hardware vendor.
Wormhole-based clusters will achieve lower TCO than GPU clusters for specific inference workloads.
The use of standard Ethernet for scaling and the elimination of expensive, proprietary interconnects like NVLink significantly reduces infrastructure capital expenditure.

Timeline

2016-05
Tenstorrent founded by Ljubisa Bajic, Ivan Hamer, and Milos Trajkovic.
2021-01
Jim Keller joins Tenstorrent as President and CTO.
2023-08
Tenstorrent secures $100 million in strategic funding led by Hyundai and Samsung Catalyst Fund.
2024-02
Tenstorrent announces the availability of its Wormhole development kits for enterprise partners.
📰

Weekly AI Recap

Read this week's curated digest of top AI events →

👉Related Updates

AI-curated news aggregator. All content rights belong to original publishers.
Original source: 虎嗅