Suanmiao 3D TokenPU chip officially enters tape-out phase

💡A new domestic high-performance AI chip enters tape-out, impacting the future of cloud computing infrastructure.
⚡ 30-Second TL;DR
What Changed
3D TokenPU chip has officially entered the tape-out stage.
Why It Matters
The tape-out of this chip suggests a potential shift in the domestic AI supply chain, offering more alternatives for high-compute cloud infrastructure.
What To Do Next
Monitor the performance benchmarks of 3D TokenPU once engineering samples become available for cloud integration testing.
🧠 Deep Insight
Web-grounded analysis with 6 cited sources.
🔑 Enhanced Key Takeaways
- •Suanmiao Technology has secured nearly RMB 1 billion in Pre-A and Pre-A1 funding rounds, earmarked for the research, development, and mass production of its 3D AI inference chips.
- •The company was established in November 2022 by founder Wang Fuquan, a former researcher at the Chinese Academy of Sciences and a key contributor to the Loongson (Godson) CPU project.
- •Pre-silicon simulation data for Suanmiao's A4 chip, which utilizes the 3D TokenPU architecture, indicates an inference throughput 1.26x to 2.19x higher than Nvidia H200 on Llama and Mixtral models.
- •Suanmiao Technology's core objective is to overcome the 'memory wall' bottleneck in AI computing through a combination of innovative architectural design and the cultivation of a domestic 3D IC supply chain.
- •The leadership team includes CTO Liu Ming, who brings over six years of experience in 3D ICs from his time at Loongson, and Chief Scientist Lou Jianguang, a former Microsoft Research principal who collaborated with OpenAI on Excel NLP features and joined in September 2025.
🛠️ Technical Deep Dive
- The chip employs a "3D TokenPU architecture" and is referred to as the "A4" chip.
- Its design specifically targets large model inference, aiming to mitigate the "memory wall" bottleneck.
- The underlying concept of a "Token Processor" involves a hardware-native intelligent accelerator with a minimal software layer, designed for a "Token in → Token out" process with a fixed compute fabric and reconfigurable model parameters/topology.
- Technical paths for Token Processors, such as "Token Streaming (Groq route)," suggest a single-core architecture with hundreds of megabytes of on-chip SRAM, allowing model weights to reside directly on the chip to eliminate DRAM bottlenecks.
- This approach also involves a compiler that statically orchestrates data paths to achieve deterministic execution, reducing scheduling jitter common in GPUs.
- The 3D aspect of the architecture focuses on vertical integration to shorten data travel distances and enhance bandwidth between computing units and memory.
- The chip is part of an initiative to produce "100% domestically produced 3D AI inference chips" and leverage a "domestic 3D IC supply chain."
🔮 Future ImplicationsAI analysis grounded in cited sources
⏳ Timeline
📎 Sources (6)
Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.
Weekly AI Recap
Read this week's curated digest of top AI events →
👉Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: 量子位 ↗
