Qwen3.5 MoE: Breakthrough or Incremental?

💡397B MoE with ultra-low active params: open-source game-changer?

⚡ 30-Second TL;DR

What Changed

397B total parameters with only 17B active in MoE setup

Why It Matters

If breakthrough, it could enable more efficient training and inference for massive open-source models, democratizing high-performance AI.

What To Do Next

Download and benchmark Qwen3.5-397B-A17B on your MoE routing tasks.

Who should care:Researchers & Academics

Web-grounded analysis with 8 cited sources.

•Qwen3.5 introduces a Shared Expert in its MoE architecture, a dedicated dense MLP that processes every token alongside top-8 routed experts out of 64 for enhanced stability[1].
•It employs a hybrid attention mechanism with Gated Delta Networks in 75% of layers for linear complexity, enabling native support for up to 262k token contexts and reduced KV-cache memory[2][4][5].
•The model supports native multimodality as a visual agent via DeepStack, 3D convolutions, and mRoPE, optimized for AMD Instinct and NVIDIA GPUs[1][6].

•MoE setup: 397B total parameters, 17B active per token; includes Shared Expert (universal dense MLP) + routed experts (top-8 of 64 via Top-K Router)[1][5].
•Attention: Hybrid Gated DeltaNet (linear attention) + full attention (75% linear layers), achieving linear scaling for long contexts up to 262k tokens[1][2][4][5].
•Multimodal: Native VLM with DeepStack, 3D convolutions, mRoPE positional embeddings; supports UI navigation and visual reasoning[1][6].
•Optimization: hipBLASLt for Shared Expert GEMM, AITER FusedMoE for routed experts (AMD); MIOpen/PyTorch for vision; runs on single AMD Instinct GPU[1].
•Variants: Smaller models like Qwen3.5-35B-A3B (3B active, outperforms prior 235B-A22B), Qwen3.5-122B-A10B, with dual-mode thinking/non-thinking[2][3][4].

Qwen3.5 hybrid MoE will reduce inference costs by 3-6x for long-context agents

Linear attention and sparse activation enable 1M token processing with minimal compute growth, as shown in benchmarks against dense models[2].

Open-source native VLMs like Qwen3.5 will dominate industrial visual agents by 2027

Built-in multimodality with GPU optimizations allows single-node deployment for complex environments, surpassing prior VLMs in UI navigation[1][6].

2026-02

Qwen3 team releases Qwen3-Coder-Next 80B (3B active) with early hybrid attention

2026-02

Qwen3-235B-A22B released as prior flagship MoE model

2026-02-16

Qwen3.5 first release: 397B-A17B MoE on GitHub and blog

2026-02

Qwen3.5 medium models (35B-A3B, 122B-A10B, 27B) announced post-397B

Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.

Weekly AI Recap

Read this week's curated digest of top AI events →

Same topic

Explore #open-source

Same product

AI-curated news aggregator. All content rights belong to original publishers.
Original source: Reddit r/MachineLearning ↗