24-Person Team Launches 17K Tokens/Sec Chip

💡17k tokens/sec at 1/10 Nvidia cost: potential inference revolution for AI devs.
⚡ 30-Second TL;DR
What Changed
24-person team from ex-AMD executives
Why It Matters
This low-cost high-speed chip could disrupt Nvidia's AI hardware monopoly, enabling cheaper large-scale LLM deployments for startups and enterprises.
What To Do Next
Benchmark this chip against Nvidia H100 for your LLM inference workloads to assess cost savings.
🧠 Deep Insight
Web-grounded analysis with 6 cited sources.
🔑 Enhanced Key Takeaways
- •Taalas, led by a 24-person team of former AMD executives, unveiled the HC1 chip achieving 17,000 tokens per second per user on Llama 3.1 8B inference[1][2][3]
- •HC1 delivers ~73x higher throughput than Nvidia H200 and multiples above Cerebras (~2,000 tokens/sec) and Groq (~600 tokens/sec) on the same model[1][2][3]
- •Chip costs 1/10th the power of Nvidia equivalents and ~20x less to build, using air-cooled PCIe form factor[1][2][6]
- •Taalas raised $169 million in funding to develop model-specific AI chips challenging Nvidia dominance[1][5]
- •HC1 hardwires the entire model including weights onto the chip using mask ROM recall fabric, eliminating HBM and memory-compute bottlenecks[2][5]
📊 Competitor Analysis▸ Show
| Feature | Taalas HC1 | Nvidia H200 | Cerebras | Groq |
|---|---|---|---|---|
| Tokens/sec (Llama3.1-8B per user) | 17,000 [1][2][3] | ~230 (17k/73x) [1] | ~2,000 [2] | ~600 [2] |
| Power Consumption | 1/10th of Nvidia [1][5] | Baseline [1] | Not specified [2] | Not specified [2] |
| Cost to Build | 20x less than SOTA [6] | Baseline [6] | Not specified | Not specified |
| Form Factor | PCIe card, ~250W air-cooled [2] | GPU with HBM [5] | Not specified | Not specified |
🛠️ Technical Deep Dive
- Process/Fab: TSMC N6 (6nm)[2]
- Die size: 815 mm²[2]
- Power: ~250W per card; 10-card server ~2.5kW, air-cooled[2]
- Architecture: Hardwires entire model (weights via mask ROM recall fabric), SRAM for KV cache and fine-tuned weights; single transistor per 4-bit module for matrix multiplications[2][5]
- Memory: Eliminates HBM by merging storage and computation, no high-speed I/O or advanced packaging needed[2][5]
- Form factor: PCIe card optimized for Llama 3.1 8B[2][5]
🔮 Future ImplicationsAI analysis grounded in cited sources
Taalas HC1 enables interactive frontier models with agentic behavior, reducing task times from hours to minutes at lower cost; unlocks new use cases like real-time reasoning with larger budgets for higher accuracy via multiple sampling or longer traces[2]. Model-specific chips challenge Nvidia by improving efficiency through specialization, potentially accelerating ubiquitous AI deployment[5][6].
⏳ Timeline
📎 Sources (6)
Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.
- cxodigitalpulse.com — Taalas Raises 169 Million to Build Custom AI Chips Challenging Nvidia
- kaitchup.substack.com — Taalas Hc1 Absurdly Fast Per User
- eetimes.com — Taalas Specializes to Extremes for Extraordinary Token Speed
- youtube.com — Watch
- siliconangle.com — Taalas Raises 169m Funding Develop Model Specific AI Chips
- taalas.com — The Path to Ubiquitous AI
Weekly AI Recap
Read this week's curated digest of top AI events →
👉Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: 量子位 ↗
