Zhipu Apologizes for GLM-5 Rollout Woes

💡Zhipu GLM-5 pricing fixes + refunds: optimize your China LLM costs now
⚡ 30-Second TL;DR
What Changed
GLM-5 token costs 2x off-peak/3x peak vs GLM-4.7 due to larger scale targeting Claude Opus level
Why It Matters
Addresses user backlash on China's leading LLM pricing, stabilizing trust amid competition. Compensation may retain users, but highlights scaling pains for frontier models.
What To Do Next
Check Zhipu dashboard and apply for GLM-5 Pro/Lite refund if usage surged unexpectedly.
🧠 Deep Insight
Web-grounded analysis with 7 cited sources.
🔑 Enhanced Key Takeaways
- •Zhipu AI apologized on February 21, 2026, for GLM Coding Plan issues including lack of transparency, slow GLM-5 rollout due to traffic surge, and flawed upgrade mechanisms for old users[1][2].
- •GLM-5 rollout is phased: Max tier fully open, Pro tier with peak-hour limits due to high cluster load, Lite tier post-holiday grayscale; refunds offered to affected Lite/Pro users since Jan 1[1][2].
- •GLM-5 is 2x larger than GLM-4.7 with 744B total parameters (40B active) in MoE architecture using DeepSeek Sparse Attention, trained on 28.5T tokens, targeting Claude Opus-level coding and agentic performance[3][5][6].
- •Token costs increased 2-3x due to model scale; dashboard improvements reduced refresh from 1hr to 10min with rules now on purchase page; one-click rollback for Feb 12-16 mis-upgrades[1].
- •Optimized for domestic chips like Huawei Ascend, Moore Threads; compute constraints caused serving delays and pricing hikes amid 10x traffic increase[2][4][6].
📊 Competitor Analysis▸ Show
| Model | Parameters | Key Benchmarks | Pricing Notes |
|---|---|---|---|
| Zhipu GLM-5 | 744B total (40B active, MoE) | Leads open models in coding/agentic; surpasses Gemini 3 Pro, lags Claude Opus | 2-3x GLM-4.7 tokens; 30% coding plan hike [3][5] |
| DeepSeek (recent) | N/A | Sparse Attention pioneer; 10x context expansion | Efficiency-focused [4] |
| Anthropic Claude Opus | Proprietary | Top coding benchmark | N/A [3] |
| Kimi K2.5 | N/A | Below GLM-5 on GDPVal-AA | Cheap metering [2][5] |
🛠️ Technical Deep Dive
• GLM-5: 744 billion total parameters, 40 billion active parameters in Mixture-of-Experts (MoE) architecture; doubled from GLM-4.7's 355B[3][5][6]. • Trained on 28.5 trillion tokens; adopts DeepSeek Sparse Attention for computational efficiency[3][4]. • Supports deployment on non-NVIDIA chips: Huawei Ascend, Moore Threads, Cambricon, Kunlunxin, MetaX via kernel optimization and quantization[4][6]. • Serving challenges: MLA models with one KV head cause tensor parallelism KV cache waste; mitigations like SGLang's DP Attention (DPA) for zero KV redundancy and +92% throughput[2]. • Pivot to 'agentic engineering' from 'vibe coding' for scaled AI-automated coding[3].
🔮 Future ImplicationsAI analysis grounded in cited sources
Zhipu's GLM-5 launch and apology highlight compute bottlenecks in China's AI race, signaling shift to agentic/coding models amid GPU shortages; pricing hikes buck price wars, while domestic chip optimization reduces NVIDIA reliance, potentially accelerating open-weight SOTA competition with global leaders like Claude Opus[2][3][4][5].
⏳ Timeline
📎 Sources (7)
Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.
- kucoin.com — Zhipu AI Apologizes for Glm Coding Plan Issues and Announces Compensation
- latent.space — Ainews Zai Glm 5 New Sota Open Weights
- scmp.com — Chinas Zhipu AI Launches New Major Model Glm 5 Challenge Its Rivals
- trendforce.com — News Deepseek Expands Context Tenfold As Zhipu Rolls Out New Model in Chinas AI Race
- chinatalk.media — Chinese AI Rings in the Year of the
- jessleao.substack.com — Something Big Is Definitely Happening
- businesstimes.com.sg — Chinas Zhipu Unveils New AI Model Jolting Race Deepseek
Weekly AI Recap
Read this week's curated digest of top AI events →
👉Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: IT之家 ↗



