FlagOS Launches Qwen3.5 397B Multi-Chip Versions

💡Largest open MoE VLM now runs out-of-box on NVIDIA & Chinese chips (397B params).
⚡ 30-Second TL;DR
What Changed
Adapted Qwen3.5-397B for Metax, Zhenwu, and NVIDIA chips with full precision alignment
Why It Matters
This release solves key pain points in multi-chip adaptation, enabling developers to run the top open MoE VLM across diverse hardware without code changes or performance loss. It promotes hardware diversity in AI deployments, especially for Chinese ecosystems.
What To Do Next
Download FlagRelease/Qwen3.5-397B-A17B-nvidia-FlagOS from HuggingFace and deploy via vLLM-plugin-FL README.
🧠 Deep Insight
Web-grounded analysis with 5 cited sources.
🔑 Enhanced Key Takeaways
- •FlagOS adapted Alibaba's Qwen3.5-397B-A17B, the largest open-source multimodal MoE model with 397B total parameters and 17B active parameters, for deployment on Metax, Zhenwu, and NVIDIA chips.[article]
- •vLLM-plugin-FL provides unified multi-chip inference with zero code changes, leveraging vLLM's high-throughput features like paged attention and continuous batching.[article][3]
- •Verified BF16 dual-machine 16-card setups enable seamless deployment, aligning full precision across diverse hardware.[article]
- •Ready-to-use models available on HuggingFace and ModelScope via FlagRelease.[article]
- •Qwen3.5-397B-A17B added to LMSYS Arena for Text, Vision, and Code benchmarks alongside Claude Sonnet 4.6.[5]
📊 Competitor Analysis▸ Show
| Feature | FlagOS Qwen3.5-397B-A17B | AirLLM | vLLM |
|---|---|---|---|
| Model Size | 397B total / 17B active MoE | Up to 405B | N/A (Inference Engine) |
| Hardware | Metax, Zhenwu, NVIDIA multi-chip | Low-memory GPUs (4-8GB) | Multi-GPU/node |
| Key Tech | vLLM-plugin-FL, BF16 16-card | Layer-by-layer loading | Paged attention, continuous batching |
| Benchmarks | LMSYS Arena (Text/Vision/Code) | Up to 3x speed w/ compression | High-throughput serving |
| Pricing | Open-source, free download | Open-source | Open-source |
🛠️ Technical Deep Dive
- Qwen3.5-397B-A17B is a multimodal Mixture-of-Experts (MoE) model with 397 billion total parameters but only 17 billion active per inference, optimizing compute efficiency.[article][5]
- Supports BF16 precision with full alignment for Metax (likely Chinese AI chip), Zhenwu, and NVIDIA GPUs, enabling dual-machine 16-card deployments.[article]
- vLLM-plugin-FL integrates with vLLM for zero-code-change inference, building on vLLM's paged attention, continuous batching, prefix caching, and multi-GPU support.[article][3]
🔮 Future ImplicationsAI analysis grounded in cited sources
FlagOS's multi-chip adaptation of Qwen3.5-397B-A17B democratizes access to massive open-source multimodal models across diverse hardware, reducing reliance on single-vendor ecosystems like NVIDIA and potentially accelerating adoption in cost-sensitive regions with chips like Metax and Zhenwu.
📎 Sources (5)
Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.
Weekly AI Recap
Read this week's curated digest of top AI events →
👉Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: 机器之心 ↗