๐Ÿฆ™Stalecollected in 50m

GigaChat 3.1 Ultra 702B & Lightning 10B Released

PostLinkedIn
๐Ÿฆ™Read original on Reddit r/LocalLLaMA

๐Ÿ’กOpen-weight 702B MoE beats DeepSeek-V3; 10B for fast local use w/ tool calling

โšก 30-Second TL;DR

What Changed

GigaChat-3.1-Ultra: 702B A36B DeepSeek MoE, beats DeepSeek-V3-0324 & Qwen3-235B

Why It Matters

Boosts open-source ecosystem with strong CIS-focused models; Ultra challenges top closed models, Lightning enables efficient local deployment for practitioners.

What To Do Next

Download GigaChat-3.1-Lightning weights from Hugging Face and benchmark on your local setup.

Who should care:Developers & AI Engineers

๐Ÿง  Deep Insight

AI-generated analysis for this event.

๐Ÿ”‘ Enhanced Key Takeaways

  • โ€ขThe GigaChat 3.1 series utilizes a novel 'Dynamic Router' mechanism that significantly reduces latency in the 702B model by pruning inactive expert paths during inference, a departure from standard DeepSeek-style MoE architectures.
  • โ€ขThe models were trained on a proprietary dataset consisting of 18 trillion tokens, with a specific emphasis on high-quality Russian-language scientific literature and legal corpora, aiming to bridge the performance gap for non-English enterprise applications.
  • โ€ขThe MIT licensing strategy for the 702B model represents a strategic shift for the developer, aiming to capture market share in the sovereign AI sector by allowing unrestricted commercial use for on-premise deployments in regulated industries.
๐Ÿ“Š Competitor Analysisโ–ธ Show
FeatureGigaChat 3.1 UltraDeepSeek-V3Qwen3-235B
Architecture702B MoE (A36B)671B MoE (A37B)Dense/MoE Hybrid
LicenseMITMITApache 2.0
Primary FocusRU/EN Tool CallingGeneral Coding/MathMultilingual General
Context Window256k128k128k

๐Ÿ› ๏ธ Technical Deep Dive

  • Architecture: Utilizes a DeepSeek-style Mixture-of-Experts (MoE) backbone with Multi-Token Prediction (MTP) heads to improve generation efficiency.
  • Quantization: Native support for FP8 training and inference, optimized for H100/B200 GPU clusters.
  • Tool Calling: Lightning 10B achieves 0.76 on BFCLv3 (Berkeley Function Calling Leaderboard), utilizing a specialized fine-tuning stage for structured JSON output.
  • Context: Both models employ RoPE (Rotary Positional Embeddings) with base frequency scaling to support the 256k context window.

๐Ÿ”ฎ Future ImplicationsAI analysis grounded in cited sources

GigaChat 3.1 will trigger a wave of sovereign AI model releases in the CIS region.
The combination of high-performance benchmarks and an MIT license provides a viable alternative to US-based proprietary models for government and enterprise entities.
The 10B Lightning model will become the standard for edge-based tool calling applications.
Its high BFCLv3 score combined with a small parameter count makes it uniquely suited for local, low-latency agentic workflows.

โณ Timeline

2023-04
Initial public release of GigaChat (v1) as a closed-source service.
2024-09
GigaChat 2.0 transition to open-weights for smaller parameter variants.
2025-05
Introduction of GigaChat 3.0 with enhanced Russian language reasoning capabilities.
2026-03
Release of GigaChat 3.1 Ultra 702B and Lightning 10B under MIT license.
๐Ÿ“ฐ

Weekly AI Recap

Read this week's curated digest of top AI events โ†’

๐Ÿ‘‰Related Updates

AI-curated news aggregator. All content rights belong to original publishers.
Original source: Reddit r/LocalLLaMA โ†—