FlagOS has adapted Alibaba's Qwen3.5-397B-A17B, the largest open-source multimodal MoE model with 397B total parameters and 17B active, for seamless deployment on Metax, Zhenwu, and NVIDIA chips. It provides vLLM-plugin-FL for unified multi-chip inference and verified BF16 dual-machine 16-card setups. Ready-to-use models are available on HuggingFace and ModelScope.
Key Points
- 1.Adapted Qwen3.5-397B for Metax, Zhenwu, and NVIDIA chips with full precision alignment
- 2.vLLM-plugin-FL enables zero-code-change multi-chip inference
- 3.BF16 versions support dual-machine 16-card deployment
- 4.Direct downloads from FlagRelease on HuggingFace and ModelScope
Impact Analysis
This release solves key pain points in multi-chip adaptation, enabling developers to run the top open MoE VLM across diverse hardware without code changes or performance loss. It promotes hardware diversity in AI deployments, especially for Chinese ecosystems.
Technical Details
Qwen3.5-397B-A17B is a native Vision-Language MoE model excelling in general capabilities, code gen, long context, agent reasoning, tool calling, and multimodal understanding. FlagOS ensures cross-chip migration via unified stack, with verified high-efficiency inference.

