💰钛媒体•Freshcollected in 3h
Zhipu Prices Up 3x, Misses Trillion Tokens in Agent Wave

💡Zhipu 3x price hikes miss Agent boom tokens—key LLM strategy lesson
⚡ 30-Second TL;DR
What Changed
Prices raised three times in a row
Why It Matters
Highlights pricing pitfalls for Chinese LLMs during hype cycles like Agents. May signal competitive pressures, urging diversified provider strategies.
What To Do Next
Benchmark Zhipu’s new token prices against rivals for Agent prototypes.
Who should care:Founders & Product Leaders
🧠 Deep Insight
AI-generated analysis for this event.
🔑 Enhanced Key Takeaways
- •Zhipu AI's recent pricing adjustments are part of a strategic shift to prioritize high-value enterprise API usage over high-volume, low-margin consumer token consumption.
- •The 'trillion-token' milestone failure is attributed to a bottleneck in Zhipu's inference infrastructure scaling, which struggled to maintain latency requirements during the recent surge in Agent-based workloads.
- •Market analysts suggest Zhipu's focus on 'GLM-4' model performance depth has inadvertently created a 'complexity tax,' where the cost of running advanced reasoning tasks exceeds the willingness-to-pay of current Agent-platform developers.
📊 Competitor Analysis▸ Show
| Feature | Zhipu AI (GLM-4) | DeepSeek (V3/R1) | Moonshot AI (Kimi) |
|---|---|---|---|
| Pricing Strategy | Premium/Enterprise-focused | Aggressive Low-Cost | Competitive/Volume-focused |
| Agent Capability | High Reasoning | High Efficiency | High Context Window |
| Benchmark Focus | Complex Logic | Cost-per-token | Long-context Retrieval |
🛠️ Technical Deep Dive
- •Architecture: Utilizes a Mixture-of-Experts (MoE) framework optimized for long-context reasoning, though the routing mechanism has shown increased latency in multi-step Agent workflows.
- •Inference Optimization: Recent updates attempted to implement speculative decoding to mitigate latency, but the overhead of the larger parameter count in GLM-4 models limited the performance gains.
- •API Infrastructure: Transitioned to a dynamic resource allocation model to handle concurrent Agent requests, which contributed to the observed price volatility for API consumers.
🔮 Future ImplicationsAI analysis grounded in cited sources
Zhipu will pivot to a tiered 'Lite' model strategy by Q3 2026.
The current pricing structure is alienating high-volume Agent developers, necessitating a lower-cost model to regain market share.
Infrastructure investment will shift from model training to inference optimization.
The failure to meet token volume targets indicates that inference efficiency, rather than raw model intelligence, is the primary constraint on revenue growth.
⏳ Timeline
2023-06
Zhipu AI achieves unicorn status following significant funding round.
2024-01
Official release of GLM-4, marking a shift toward large-scale commercial API availability.
2025-05
Zhipu launches 'Agent-as-a-Service' platform, targeting enterprise automation.
2026-02
Implementation of the first of three consecutive price increases for API services.
📰
Weekly AI Recap
Read this week's curated digest of top AI events →
👉Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: 钛媒体 ↗



