๐Ÿ‡ญ๐Ÿ‡ฐFreshcollected in 20m

ByteDance shifts to domestic chips for AI workloads

ByteDance shifts to domestic chips for AI workloads
PostLinkedIn
๐Ÿ‡ญ๐Ÿ‡ฐRead original on SCMP Technology

๐Ÿ’กByteDance's move to domestic chips highlights a major shift in AI infrastructure strategy amid global supply constraints

โšก 30-Second TL;DR

What Changed

ByteDance is pivoting away from reliance on Nvidia due to export controls.

Why It Matters

This shift signals a broader trend of Chinese tech giants localizing their AI supply chains. It creates significant opportunities for emerging domestic chip designers to gain market share in the high-demand AI sector.

What To Do Next

Monitor the performance benchmarks of emerging Chinese AI accelerators to assess their viability for your own cross-region deployment strategies.

Who should care:Founders & Product Leaders

๐Ÿง  Deep Insight

Web-grounded analysis with 31 cited sources.

๐Ÿ”‘ Enhanced Key Takeaways

  • โ€ขByteDance is reportedly developing its own custom AI CPUs, codenamed 'SeedChip', primarily for inference tasks, evaluating both Arm and RISC-V designs. This initiative involves external partners for design and manufacturing, with a goal to produce between 100,000 and 350,000 units in 2026, and discussions with Samsung and TSMC for production.
  • โ€ขThe company plans to significantly increase its AI infrastructure spending in China to over 200 billion yuan (approximately US$30 billion) for 2026, marking a 25% increase from its late 2025 projections, with a larger share allocated to domestic Chinese AI chips.
  • โ€ขByteDance is in active discussions to procure at least 50,000 AI chips from Shanghai-based Iluvatar CoreX in 2026, primarily for inference tasks to support its AI chatbot, Doubao. This would establish Iluvatar CoreX as ByteDance's third major domestic GPU supplier, alongside existing relationships with Huawei and Cambricon, and the company is also considering Baidu's Kunlunxin chips.
  • โ€ขThe pivot to domestic chips is largely driven by the exponential growth of ByteDance's AI applications, particularly its Doubao chatbot, which now processes 50 trillion tokens daily, a twelve-fold increase from late 2024. This massive scale of inference workloads necessitates a reliable and high-volume supply of chips that domestic suppliers are increasingly capable of providing.
๐Ÿ“Š Competitor Analysisโ–ธ Show
Feature/ChipHuawei Ascend (e.g., 910B, 950PR)Biren BR100Moore Threads MTT S4000Nvidia (Reference: A100/H20)
Primary Use CaseTraining & Inference (910B), Inference (950PR)Training & InferenceTraining & InferenceTraining & Inference
Process Node(Generally Chinese domestic)TSMC 7nm12nm(Varies, e.g., TSMC 7nm for A100)
FP16/BF16 Performance910B: Close to A100; 950PR: 1.56 PFLOPS (FP4)Up to 2000 TFLOPs (FP16)100 TFLOPs (FP16/BF16)A100: 624 TFLOPS (FP16 Tensor Core)
INT8 Performance(Not specified for 910B/950PR)2048 TOPS200 TOPSA100: 1248 TOPS (INT8 Tensor Core)
Memory950PR: 11GB HBM64GB HBM2e48GB GDDR6A100: 40GB/80GB HBM2
Memory Bandwidth950PR: 1.4TB/s2.3 TB/s768 GB/sA100: 1.55 TB/s (80GB)
Software EcosystemMindSpore AI framework, CANN(Proprietary, but aims for broad compatibility)MUSA architecture, CUDA/PyTorch compatibleCUDA
Relative Performance Claim910B close to A100; 950PR 2.87x over Nvidia H20Comparable to A100; 2.6x speedup over A100 in specific benchmarksBehind Ampere/Ada Lovelace (Nvidia)Market leader, H200 still lacks Chinese import approval

๐Ÿ› ๏ธ Technical Deep Dive

  • ByteDance's Custom AI CPUs ('SeedChip'):
    • Designed primarily for AI inference tasks, inspired by Groq's "language processing units."
    • Evaluating both Arm and RISC-V instruction set architectures for design.
    • Partnering with Chinese startup InnoStar Semiconductor for memory technology, potentially reducing reliance on HBM chips.
    • Aims for initial production of at least 100,000 units in 2026, with potential ramp-up to 350,000 units.
  • Huawei Ascend Series (e.g., 910B, 950PR):
    • Ascend 910B is considered comparable to Nvidia's A100 from 2020.
    • Ascend 950PR is a new-generation AI accelerator chip optimized for prefill inference and recommendation workloads.
    • Claims 2.87x compute performance over Nvidia H20 and supports FP4 low-precision inference.
    • Features approximately 11GB of HBM and 1.4TB/s memory bandwidth, with a TDP of 600W.
    • Huawei's MindSpore AI framework and CANN software stack serve as alternatives to Nvidia's CUDA.
    • The Ascend 950 aims for single-chip parity with Nvidia's Hopper, and the 960 targets Blackwell-level performance.
    • Huawei's CloudMatrix 384 system, utilizing Ascend 910C (combining two 910B-class processors), can achieve system-level performance that outperforms Nvidia's GB200 NVL72 in some metrics, despite higher power consumption.
  • Biren BR100 GPU:
    • Built on TSMC's 7nm process technology, utilizing a chiplet design.
    • Integrates 77 billion transistors.
    • Delivers up to 2000 TFLOPs in FP16 tensor performance and 2048 TOPS in INT8, positioning it comparably to Nvidia's A100.
    • Features 64GB of HBM2e memory with a bandwidth of 2.3 TB/s.
    • Demonstrated 2.6x speedups over the A100 in specific domestic benchmarks for natural language processing (NLP) and computer vision.
  • Moore Threads MTT S4000 GPU:
    • Features the third-generation MUSA (Moore Threads Unified System Architecture) architecture.
    • Equipped with 128 Tensor Cores and 48GB of GDDR6 memory, providing 768 GB/s memory bandwidth.
    • Offers 25 TFLOPs of FP32, 100 TFLOPs of FP16/BF16, and 200 TOPS of INT8 performance.
    • Supports PCIe Gen5 x16 and MTLink 1.0 for multi-GPU interconnectivity, enabling clusters with thousands of cards.
    • Its training platform is compatible with CUDA and PyTorch, supporting distributed training frameworks like Megatron-LM and DeepSpeed.
    • Built on a 12nm process, based on the Chunxiao graphics processor, with a TDP of 450W.

๐Ÿ”ฎ Future ImplicationsAI analysis grounded in cited sources

China's domestic AI chip ecosystem will rapidly mature, especially for inference workloads.
ByteDance's substantial investment and procurement from multiple domestic suppliers will provide significant volume, engineering feedback, and software ecosystem development, accelerating the capabilities of Chinese chipmakers for inference tasks.
The US export controls will inadvertently strengthen China's long-term technological independence in AI.
By limiting access to advanced foreign chips, the controls compel Chinese tech giants to invest heavily in domestic alternatives, fostering a self-sufficient supply chain and reducing reliance on foreign technology.
ByteDance will achieve greater control over its AI performance and scalability.
Developing custom chips and diversifying domestic suppliers allows ByteDance to tailor hardware to its specific AI workloads and mitigate supply chain risks, ensuring consistent performance and expansion for its rapidly growing applications.

โณ Timeline

2019
Biren Technology founded.
2020-10
Moore Threads founded.
2022-08
Biren Technology releases BR100 GPU.
2022-10
US implements sweeping export controls on advanced computing chips and manufacturing equipment to China.
2023-12
Moore Threads launches MTT S4000 AI accelerator.
2024
ByteDance begins designing its own 'SeedChip' AI accelerator with TSMC.
2025-09
Huawei announces Ascend 950PR chip as part of its three-year roadmap.
2026-01
Iluvatar CoreX listed on Hong Kong Stock Exchange.
2026-03
Huawei unveils Atlas 350 AI accelerator card powered by Ascend 950PR.
2026-05
ByteDance reportedly developing its own custom AI CPUs and plans to spend over $30 billion on AI infrastructure in China for 2026.
2026-06
ByteDance in talks to purchase at least 50,000 AI chips from Iluvatar CoreX, and considering Baidu's Kunlunxin chips.
๐Ÿ“ฐ

Weekly AI Recap

Read this week's curated digest of top AI events โ†’

๐Ÿ‘‰Related Updates

AI-curated news aggregator. All content rights belong to original publishers.
Original source: SCMP Technology โ†—