⚛️Stalecollected in 74m

24-Person Team Launches 17K Tokens/Sec Chip

24-Person Team Launches 17K Tokens/Sec Chip
PostLinkedIn
⚛️Read original on 量子位

💡17k tokens/sec at 1/10 Nvidia cost: potential inference revolution for AI devs.

⚡ 30-Second TL;DR

What Changed

24-person team from ex-AMD executives

Why It Matters

This low-cost high-speed chip could disrupt Nvidia's AI hardware monopoly, enabling cheaper large-scale LLM deployments for startups and enterprises.

What To Do Next

Benchmark this chip against Nvidia H100 for your LLM inference workloads to assess cost savings.

Who should care:Developers & AI Engineers

🧠 Deep Insight

Web-grounded analysis with 6 cited sources.

🔑 Enhanced Key Takeaways

  • Taalas, led by a 24-person team of former AMD executives, unveiled the HC1 chip achieving 17,000 tokens per second per user on Llama 3.1 8B inference[1][2][3]
  • HC1 delivers ~73x higher throughput than Nvidia H200 and multiples above Cerebras (~2,000 tokens/sec) and Groq (~600 tokens/sec) on the same model[1][2][3]
  • Chip costs 1/10th the power of Nvidia equivalents and ~20x less to build, using air-cooled PCIe form factor[1][2][6]
  • Taalas raised $169 million in funding to develop model-specific AI chips challenging Nvidia dominance[1][5]
  • HC1 hardwires the entire model including weights onto the chip using mask ROM recall fabric, eliminating HBM and memory-compute bottlenecks[2][5]
📊 Competitor Analysis▸ Show
FeatureTaalas HC1Nvidia H200CerebrasGroq
Tokens/sec (Llama3.1-8B per user)17,000 [1][2][3]~230 (17k/73x) [1]~2,000 [2]~600 [2]
Power Consumption1/10th of Nvidia [1][5]Baseline [1]Not specified [2]Not specified [2]
Cost to Build20x less than SOTA [6]Baseline [6]Not specifiedNot specified
Form FactorPCIe card, ~250W air-cooled [2]GPU with HBM [5]Not specifiedNot specified

🛠️ Technical Deep Dive

  • Process/Fab: TSMC N6 (6nm)[2]
  • Die size: 815 mm²[2]
  • Power: ~250W per card; 10-card server ~2.5kW, air-cooled[2]
  • Architecture: Hardwires entire model (weights via mask ROM recall fabric), SRAM for KV cache and fine-tuned weights; single transistor per 4-bit module for matrix multiplications[2][5]
  • Memory: Eliminates HBM by merging storage and computation, no high-speed I/O or advanced packaging needed[2][5]
  • Form factor: PCIe card optimized for Llama 3.1 8B[2][5]

🔮 Future ImplicationsAI analysis grounded in cited sources

Taalas HC1 enables interactive frontier models with agentic behavior, reducing task times from hours to minutes at lower cost; unlocks new use cases like real-time reasoning with larger budgets for higher accuracy via multiple sampling or longer traces[2]. Model-specific chips challenge Nvidia by improving efficiency through specialization, potentially accelerating ubiquitous AI deployment[5][6].

Timeline

2026-02
Taalas unveils HC1 chip with 17K tokens/sec on Llama 3.1 8B and raises $169M funding
📰

Weekly AI Recap

Read this week's curated digest of top AI events →

👉Related Updates

AI-curated news aggregator. All content rights belong to original publishers.
Original source: 量子位