🏠Stalecollected in 3h

Zhipu Apologizes for GLM-5 Rollout Woes

Zhipu Apologizes for GLM-5 Rollout Woes
PostLinkedIn
🏠Read original on IT之家

💡Zhipu GLM-5 pricing fixes + refunds: optimize your China LLM costs now

⚡ 30-Second TL;DR

What Changed

GLM-5 token costs 2x off-peak/3x peak vs GLM-4.7 due to larger scale targeting Claude Opus level

Why It Matters

Addresses user backlash on China's leading LLM pricing, stabilizing trust amid competition. Compensation may retain users, but highlights scaling pains for frontier models.

What To Do Next

Check Zhipu dashboard and apply for GLM-5 Pro/Lite refund if usage surged unexpectedly.

Who should care:Developers & AI Engineers

🧠 Deep Insight

Web-grounded analysis with 7 cited sources.

🔑 Enhanced Key Takeaways

  • Zhipu AI apologized on February 21, 2026, for GLM Coding Plan issues including lack of transparency, slow GLM-5 rollout due to traffic surge, and flawed upgrade mechanisms for old users[1][2].
  • GLM-5 rollout is phased: Max tier fully open, Pro tier with peak-hour limits due to high cluster load, Lite tier post-holiday grayscale; refunds offered to affected Lite/Pro users since Jan 1[1][2].
  • GLM-5 is 2x larger than GLM-4.7 with 744B total parameters (40B active) in MoE architecture using DeepSeek Sparse Attention, trained on 28.5T tokens, targeting Claude Opus-level coding and agentic performance[3][5][6].
  • Token costs increased 2-3x due to model scale; dashboard improvements reduced refresh from 1hr to 10min with rules now on purchase page; one-click rollback for Feb 12-16 mis-upgrades[1].
  • Optimized for domestic chips like Huawei Ascend, Moore Threads; compute constraints caused serving delays and pricing hikes amid 10x traffic increase[2][4][6].
📊 Competitor Analysis▸ Show
ModelParametersKey BenchmarksPricing Notes
Zhipu GLM-5744B total (40B active, MoE)Leads open models in coding/agentic; surpasses Gemini 3 Pro, lags Claude Opus2-3x GLM-4.7 tokens; 30% coding plan hike [3][5]
DeepSeek (recent)N/ASparse Attention pioneer; 10x context expansionEfficiency-focused [4]
Anthropic Claude OpusProprietaryTop coding benchmarkN/A [3]
Kimi K2.5N/ABelow GLM-5 on GDPVal-AACheap metering [2][5]

🛠️ Technical Deep Dive

• GLM-5: 744 billion total parameters, 40 billion active parameters in Mixture-of-Experts (MoE) architecture; doubled from GLM-4.7's 355B[3][5][6]. • Trained on 28.5 trillion tokens; adopts DeepSeek Sparse Attention for computational efficiency[3][4]. • Supports deployment on non-NVIDIA chips: Huawei Ascend, Moore Threads, Cambricon, Kunlunxin, MetaX via kernel optimization and quantization[4][6]. • Serving challenges: MLA models with one KV head cause tensor parallelism KV cache waste; mitigations like SGLang's DP Attention (DPA) for zero KV redundancy and +92% throughput[2]. • Pivot to 'agentic engineering' from 'vibe coding' for scaled AI-automated coding[3].

🔮 Future ImplicationsAI analysis grounded in cited sources

Zhipu's GLM-5 launch and apology highlight compute bottlenecks in China's AI race, signaling shift to agentic/coding models amid GPU shortages; pricing hikes buck price wars, while domestic chip optimization reduces NVIDIA reliance, potentially accelerating open-weight SOTA competition with global leaders like Claude Opus[2][3][4][5].

Timeline

2025-12
Zhipu releases GLM-4.7, marketed as coding partner with subscription pivot to coding plans
2026-02-12
GLM-5 launched alongside 30% coding plan price hike; rollout begins with upgrade issues
2026-02-21
Zhipu issues public apology for GLM Coding Plan transparency, rollout delays, and upgrades; announces refunds and improvements
📰

Weekly AI Recap

Read this week's curated digest of top AI events →

👉Related Updates

AI-curated news aggregator. All content rights belong to original publishers.
Original source: IT之家