Indian AI lab Sarvam released two MoE LLMs built from scratch: 30B-A1B for low-latency apps and 105B-A9B for demanding tasks. Models will be open-weight on Hugging Face with API access soon. The 105B model beats Gemini 2.5 Flash on Indic benchmarks and DeepSeek R1 on many others.
Key Points
- 1.30B-A1B: 16T pretrain tokens, 32K context for real-time apps
- 2.105B-A9B: 128K context, outperforms Gemini on Indic benchmarks
- 3.Built from scratch, open weights on Hugging Face soon
- 4.API and dashboard access to follow
Impact Analysis
Advances open-source LLMs for Indic languages, challenging Western models in regional markets. Enables low-cost deployment for Indian devs, potentially accelerating AI adoption in non-English regions.
Technical Details
MoE architecture: 30B-A1B (30B total, 1B active), 105B-A9B (105B total, 9B active). Pretrained on massive datasets, optimized for latency and long contexts.




