Alibaba launched Qwen3.5-397B-A17B, an open-weight MoE model with 397B total parameters but only 17B active per token, outperforming its trillion-parameter Qwen3-Max on benchmarks. It offers 19x faster decoding at 256K context, 60% lower running costs, and native multimodal capabilities from scratch training on text, images, and video. The hosted version supports up to 1M tokens.
Key Points
- 1.397B params with 17B active per token via 512 MoE experts
- 2.19x faster than Qwen3-Max at 256K context, 60% cheaper to run
- 3.Native multimodal training on text/images/video
- 4.1/18th cost of Gemini 3 Pro, handles 8x concurrent workloads
- 5.Multi-token prediction and optimized attention for speed
Impact Analysis
This launch challenges enterprise AI procurement by offering flagship performance in a deployable, ownable open-weight model, reducing reliance on rented trillion-param giants. It accelerates adoption of high-context, multimodal AI at scale, slashing inference bills for production workloads.
Technical Details
Built on Qwen3-Next with 512 MoE experts (up from 128), multi-token prediction for faster training, and inherited long-context attention. Activates sparse params for dense 17B-like compute while accessing full expert depth. Supports 256K open-weight, 1M hosted context.


