ByteDance shifts to domestic chips for AI workloads

🔑 Enhanced Key Takeaways

•ByteDance is reportedly developing its own custom AI CPUs, codenamed 'SeedChip', primarily for inference tasks, evaluating both Arm and RISC-V designs. This initiative involves external partners for design and manufacturing, with a goal to produce between 100,000 and 350,000 units in 2026, and discussions with Samsung and TSMC for production.
•The company plans to significantly increase its AI infrastructure spending in China to over 200 billion yuan (approximately US$30 billion) for 2026, marking a 25% increase from its late 2025 projections, with a larger share allocated to domestic Chinese AI chips.
•ByteDance is in active discussions to procure at least 50,000 AI chips from Shanghai-based Iluvatar CoreX in 2026, primarily for inference tasks to support its AI chatbot, Doubao. This would establish Iluvatar CoreX as ByteDance's third major domestic GPU supplier, alongside existing relationships with Huawei and Cambricon, and the company is also considering Baidu's Kunlunxin chips.
•The pivot to domestic chips is largely driven by the exponential growth of ByteDance's AI applications, particularly its Doubao chatbot, which now processes 50 trillion tokens daily, a twelve-fold increase from late 2024. This massive scale of inference workloads necessitates a reliable and high-volume supply of chips that domestic suppliers are increasingly capable of providing.

📊 Competitor Analysis▸ Show

Feature/Chip	Huawei Ascend (e.g., 910B, 950PR)	Biren BR100	Moore Threads MTT S4000	Nvidia (Reference: A100/H20)
Primary Use Case	Training & Inference (910B), Inference (950PR)	Training & Inference	Training & Inference	Training & Inference
Process Node	(Generally Chinese domestic)	TSMC 7nm	12nm	(Varies, e.g., TSMC 7nm for A100)
FP16/BF16 Performance	910B: Close to A100; 950PR: 1.56 PFLOPS (FP4)	Up to 2000 TFLOPs (FP16)	100 TFLOPs (FP16/BF16)	A100: 624 TFLOPS (FP16 Tensor Core)
INT8 Performance	(Not specified for 910B/950PR)	2048 TOPS	200 TOPS	A100: 1248 TOPS (INT8 Tensor Core)
Memory	950PR: 11GB HBM	64GB HBM2e	48GB GDDR6	A100: 40GB/80GB HBM2
Memory Bandwidth	950PR: 1.4TB/s	2.3 TB/s	768 GB/s	A100: 1.55 TB/s (80GB)
Software Ecosystem	MindSpore AI framework, CANN	(Proprietary, but aims for broad compatibility)	MUSA architecture, CUDA/PyTorch compatible	CUDA
Relative Performance Claim	910B close to A100; 950PR 2.87x over Nvidia H20	Comparable to A100; 2.6x speedup over A100 in specific benchmarks	Behind Ampere/Ada Lovelace (Nvidia)	Market leader, H200 still lacks Chinese import approval

🛠️ Technical Deep Dive

ByteDance's Custom AI CPUs ('SeedChip'):
- Designed primarily for AI inference tasks, inspired by Groq's "language processing units."
- Evaluating both Arm and RISC-V instruction set architectures for design.
- Partnering with Chinese startup InnoStar Semiconductor for memory technology, potentially reducing reliance on HBM chips.
- Aims for initial production of at least 100,000 units in 2026, with potential ramp-up to 350,000 units.
Huawei Ascend Series (e.g., 910B, 950PR):
- Ascend 910B is considered comparable to Nvidia's A100 from 2020.
- Ascend 950PR is a new-generation AI accelerator chip optimized for prefill inference and recommendation workloads.
- Claims 2.87x compute performance over Nvidia H20 and supports FP4 low-precision inference.
- Features approximately 11GB of HBM and 1.4TB/s memory bandwidth, with a TDP of 600W.
- Huawei's MindSpore AI framework and CANN software stack serve as alternatives to Nvidia's CUDA.
- The Ascend 950 aims for single-chip parity with Nvidia's Hopper, and the 960 targets Blackwell-level performance.
- Huawei's CloudMatrix 384 system, utilizing Ascend 910C (combining two 910B-class processors), can achieve system-level performance that outperforms Nvidia's GB200 NVL72 in some metrics, despite higher power consumption.
Biren BR100 GPU:
- Built on TSMC's 7nm process technology, utilizing a chiplet design.
- Integrates 77 billion transistors.
- Delivers up to 2000 TFLOPs in FP16 tensor performance and 2048 TOPS in INT8, positioning it comparably to Nvidia's A100.
- Features 64GB of HBM2e memory with a bandwidth of 2.3 TB/s.
- Demonstrated 2.6x speedups over the A100 in specific domestic benchmarks for natural language processing (NLP) and computer vision.
Moore Threads MTT S4000 GPU:
- Features the third-generation MUSA (Moore Threads Unified System Architecture) architecture.
- Equipped with 128 Tensor Cores and 48GB of GDDR6 memory, providing 768 GB/s memory bandwidth.
- Offers 25 TFLOPs of FP32, 100 TFLOPs of FP16/BF16, and 200 TOPS of INT8 performance.
- Supports PCIe Gen5 x16 and MTLink 1.0 for multi-GPU interconnectivity, enabling clusters with thousands of cards.
- Its training platform is compatible with CUDA and PyTorch, supporting distributed training frameworks like Megatron-LM and DeepSpeed.
- Built on a 12nm process, based on the Chunxiao graphics processor, with a TDP of 450W.

🔮 Future ImplicationsAI analysis grounded in cited sources

China's domestic AI chip ecosystem will rapidly mature, especially for inference workloads.

ByteDance's substantial investment and procurement from multiple domestic suppliers will provide significant volume, engineering feedback, and software ecosystem development, accelerating the capabilities of Chinese chipmakers for inference tasks.

The US export controls will inadvertently strengthen China's long-term technological independence in AI.

By limiting access to advanced foreign chips, the controls compel Chinese tech giants to invest heavily in domestic alternatives, fostering a self-sufficient supply chain and reducing reliance on foreign technology.

ByteDance will achieve greater control over its AI performance and scalability.

Developing custom chips and diversifying domestic suppliers allows ByteDance to tailor hardware to its specific AI workloads and mitigate supply chain risks, ensuring consistent performance and expansion for its rapidly growing applications.

⏳ Timeline

2019

Biren Technology founded.

2020-10

Moore Threads founded.

2022-08

Biren Technology releases BR100 GPU.

2022-10

US implements sweeping export controls on advanced computing chips and manufacturing equipment to China.

2023-12

Moore Threads launches MTT S4000 AI accelerator.

2024

ByteDance begins designing its own 'SeedChip' AI accelerator with TSMC.

2025-09

Huawei announces Ascend 950PR chip as part of its three-year roadmap.

2026-01

Iluvatar CoreX listed on Hong Kong Stock Exchange.

2026-03

Huawei unveils Atlas 350 AI accelerator card powered by Ascend 950PR.

2026-05

ByteDance reportedly developing its own custom AI CPUs and plans to spend over $30 billion on AI infrastructure in China for 2026.

2026-06

ByteDance in talks to purchase at least 50,000 AI chips from Iluvatar CoreX, and considering Baidu's Kunlunxin chips.

ByteDance shifts to domestic chips for AI workloads

⚡ 30-Second TL;DR

🧠 Deep Insight

🔑 Enhanced Key Takeaways

🛠️ Technical Deep Dive

🔮 Future ImplicationsAI analysis grounded in cited sources

⏳ Timeline

📎 Sources (31)

👉Related Updates

Intel and Apple Partner for Domestic Chip Production

Apple to raise prices due to memory chip shortages

Apple to raise prices due to memory chip costs

Alibaba and ByteDance Accelerate Embodied AI Development