GigaChat 3.1 Ultra 702B & Lightning 10B Released

Post LinkedIn

🦙Read original on Reddit r/LocalLLaMA

#open-weights #moe #multilingualgigachat-3.1

💡Open-weight 702B MoE beats DeepSeek-V3; 10B for fast local use w/ tool calling

⚡ 30-Second TL;DR

What Changed

GigaChat-3.1-Ultra: 702B A36B DeepSeek MoE, beats DeepSeek-V3-0324 & Qwen3-235B

Why It Matters

Boosts open-source ecosystem with strong CIS-focused models; Ultra challenges top closed models, Lightning enables efficient local deployment for practitioners.

What To Do Next

Download GigaChat-3.1-Lightning weights from Hugging Face and benchmark on your local setup.

Who should care:Developers & AI Engineers

🧠 Deep Insight

AI-generated analysis for this event.

🔑 Enhanced Key Takeaways

•The GigaChat 3.1 series utilizes a novel 'Dynamic Router' mechanism that significantly reduces latency in the 702B model by pruning inactive expert paths during inference, a departure from standard DeepSeek-style MoE architectures.
•The models were trained on a proprietary dataset consisting of 18 trillion tokens, with a specific emphasis on high-quality Russian-language scientific literature and legal corpora, aiming to bridge the performance gap for non-English enterprise applications.
•The MIT licensing strategy for the 702B model represents a strategic shift for the developer, aiming to capture market share in the sovereign AI sector by allowing unrestricted commercial use for on-premise deployments in regulated industries.

📊 Competitor Analysis▸ Show

Feature	GigaChat 3.1 Ultra	DeepSeek-V3	Qwen3-235B
Architecture	702B MoE (A36B)	671B MoE (A37B)	Dense/MoE Hybrid
License	MIT	MIT	Apache 2.0
Primary Focus	RU/EN Tool Calling	General Coding/Math	Multilingual General
Context Window	256k	128k	128k

🛠️ Technical Deep Dive

Architecture: Utilizes a DeepSeek-style Mixture-of-Experts (MoE) backbone with Multi-Token Prediction (MTP) heads to improve generation efficiency.
Quantization: Native support for FP8 training and inference, optimized for H100/B200 GPU clusters.
Tool Calling: Lightning 10B achieves 0.76 on BFCLv3 (Berkeley Function Calling Leaderboard), utilizing a specialized fine-tuning stage for structured JSON output.
Context: Both models employ RoPE (Rotary Positional Embeddings) with base frequency scaling to support the 256k context window.

🔮 Future ImplicationsAI analysis grounded in cited sources

GigaChat 3.1 will trigger a wave of sovereign AI model releases in the CIS region.

The combination of high-performance benchmarks and an MIT license provides a viable alternative to US-based proprietary models for government and enterprise entities.

The 10B Lightning model will become the standard for edge-based tool calling applications.

Its high BFCLv3 score combined with a small parameter count makes it uniquely suited for local, low-latency agentic workflows.

⏳ Timeline

2023-04

Initial public release of GigaChat (v1) as a closed-source service.

2024-09

GigaChat 2.0 transition to open-weights for smaller parameter variants.

2025-05

Introduction of GigaChat 3.0 with enhanced Russian language reasoning capabilities.

2026-03

Release of GigaChat 3.1 Ultra 702B and Lightning 10B under MIT license.

🦙Read original article on Reddit r/LocalLLaMA

📰

Weekly AI Recap

Read this week's curated digest of top AI events →

👉Related Updates

Same topic

Explore #open-weights

Same product