⚛️量子位•Freshcollected in 74m
AI Infra Shifts from GPU to Token

💡AI infra wars pivot to tokens—SenseTime's 3yr device shows the future
⚡ 30-Second TL;DR
What Changed
Competition logic reconstructs around tokens over GPUs
Why It Matters
This paradigm shift may reduce reliance on scarce GPUs, enabling more efficient AI scaling via token optimization. AI practitioners could pivot strategies toward token-efficient architectures for cost savings.
What To Do Next
Review SenseTime 大装置 papers for token-optimized training techniques.
Who should care:Founders & Product Leaders
🧠 Deep Insight
AI-generated analysis for this event.
🔑 Enhanced Key Takeaways
- •The shift toward 'token-centric' infrastructure emphasizes optimizing the entire pipeline—from data ingestion and preprocessing to inference throughput—rather than just raw GPU TFLOPS, aiming to reduce the cost-per-token for large-scale model training.
- •SenseTime's 'SenseCore' (AI大装置) leverages a proprietary heterogeneous computing architecture that integrates thousands of GPUs with high-speed interconnects, specifically designed to handle the massive data throughput required for training trillion-parameter models.
- •Industry trends indicate that major AI infrastructure providers are moving toward 'Token-as-a-Service' (TaaS) business models, where pricing and performance guarantees are tied to token generation efficiency rather than leased hardware capacity.
📊 Competitor Analysis▸ Show
| Feature | SenseTime (SenseCore) | NVIDIA (DGX Cloud) | Huawei (Ascend/Atlas) |
|---|---|---|---|
| Core Focus | Full-stack model training | Hardware/Software ecosystem | Domestic supply chain security |
| Architecture | Heterogeneous/Proprietary | CUDA-optimized | NPU-based (Ascend) |
| Market Position | Enterprise/Gov/Regional | Global Standard | China-domestic dominant |
🛠️ Technical Deep Dive
- SenseCore Architecture: Utilizes a distributed, multi-level storage system to minimize I/O bottlenecks during massive model training.
- Token Optimization: Implements custom kernel-level optimizations for Transformer-based architectures to accelerate attention mechanism calculations.
- Scalability: Supports elastic scheduling across heterogeneous GPU clusters, allowing for dynamic resource allocation based on token-processing demand.
- Interconnects: Employs high-bandwidth, low-latency networking fabrics to maintain high GPU utilization rates during distributed training sessions.
🔮 Future ImplicationsAI analysis grounded in cited sources
Hardware-agnostic software layers will become the primary competitive moat.
As the industry shifts focus to token efficiency, the ability to abstract away hardware differences will be more valuable than proprietary hardware ownership.
The cost of training a 1T parameter model will drop by 50% by 2027.
Optimizations in token-centric infrastructure are currently outpacing the raw performance gains of new GPU generations.
⏳ Timeline
2021-07
SenseTime officially launches SenseCore (AI大装置) to provide large-scale AI infrastructure.
2023-04
SenseTime unveils 'SenseNova' foundation model set, powered by the SenseCore infrastructure.
2024-07
SenseTime upgrades SenseCore to support multi-modal training at scale, focusing on token efficiency.
📰
Weekly AI Recap
Read this week's curated digest of top AI events →
👉Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: 量子位 ↗