FlagOS Launches Qwen3.5 397B Multi-Chip Versions
🧠#moe-model#multi-chip#bf16Freshcollected in 5m

FlagOS Launches Qwen3.5 397B Multi-Chip Versions

PostLinkedIn
🧠Read original on 机器之心

💡Largest open MoE VLM now runs out-of-box on NVIDIA & Chinese chips (397B params).

⚡ 30-Second TL;DR

What changed

Adapted Qwen3.5-397B for Metax, Zhenwu, and NVIDIA chips with full precision alignment

Why it matters

This release solves key pain points in multi-chip adaptation, enabling developers to run the top open MoE VLM across diverse hardware without code changes or performance loss. It promotes hardware diversity in AI deployments, especially for Chinese ecosystems.

What to do next

Download FlagRelease/Qwen3.5-397B-A17B-nvidia-FlagOS from HuggingFace and deploy via vLLM-plugin-FL README.

Who should care:Developers & AI Engineers

🧠 Deep Insight

Web-grounded analysis with 5 cited sources.

🔑 Key Takeaways

  • FlagOS adapted Alibaba's Qwen3.5-397B-A17B, the largest open-source multimodal MoE model with 397B total parameters and 17B active parameters, for deployment on Metax, Zhenwu, and NVIDIA chips.[article]
  • vLLM-plugin-FL provides unified multi-chip inference with zero code changes, leveraging vLLM's high-throughput features like paged attention and continuous batching.[article][3]
  • Verified BF16 dual-machine 16-card setups enable seamless deployment, aligning full precision across diverse hardware.[article]
📊 Competitor Analysis▸ Show
FeatureFlagOS Qwen3.5-397B-A17BAirLLMvLLM
Model Size397B total / 17B active MoEUp to 405BN/A (Inference Engine)
HardwareMetax, Zhenwu, NVIDIA multi-chipLow-memory GPUs (4-8GB)Multi-GPU/node
Key TechvLLM-plugin-FL, BF16 16-cardLayer-by-layer loadingPaged attention, continuous batching
BenchmarksLMSYS Arena (Text/Vision/Code)Up to 3x speed w/ compressionHigh-throughput serving
PricingOpen-source, free downloadOpen-sourceOpen-source

🛠️ Technical Deep Dive

  • Qwen3.5-397B-A17B is a multimodal Mixture-of-Experts (MoE) model with 397 billion total parameters but only 17 billion active per inference, optimizing compute efficiency.[article][5]
  • Supports BF16 precision with full alignment for Metax (likely Chinese AI chip), Zhenwu, and NVIDIA GPUs, enabling dual-machine 16-card deployments.[article]
  • vLLM-plugin-FL integrates with vLLM for zero-code-change inference, building on vLLM's paged attention, continuous batching, prefix caching, and multi-GPU support.[article][3]

🔮 Future ImplicationsAI analysis grounded in cited sources

FlagOS's multi-chip adaptation of Qwen3.5-397B-A17B democratizes access to massive open-source multimodal models across diverse hardware, reducing reliance on single-vendor ecosystems like NVIDIA and potentially accelerating adoption in cost-sensitive regions with chips like Metax and Zhenwu.

📎 Sources (5)

Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.

  1. westurner.github.io
  2. news.ycombinator.com
  3. ludwigabap.com
  4. t.me
  5. xagi.in

FlagOS has adapted Alibaba's Qwen3.5-397B-A17B, the largest open-source multimodal MoE model with 397B total parameters and 17B active, for seamless deployment on Metax, Zhenwu, and NVIDIA chips. It provides vLLM-plugin-FL for unified multi-chip inference and verified BF16 dual-machine 16-card setups. Ready-to-use models are available on HuggingFace and ModelScope.

Key Points

  • 1.Adapted Qwen3.5-397B for Metax, Zhenwu, and NVIDIA chips with full precision alignment
  • 2.vLLM-plugin-FL enables zero-code-change multi-chip inference
  • 3.BF16 versions support dual-machine 16-card deployment
  • 4.Direct downloads from FlagRelease on HuggingFace and ModelScope

Impact Analysis

This release solves key pain points in multi-chip adaptation, enabling developers to run the top open MoE VLM across diverse hardware without code changes or performance loss. It promotes hardware diversity in AI deployments, especially for Chinese ecosystems.

Technical Details

Qwen3.5-397B-A17B is a native Vision-Language MoE model excelling in general capabilities, code gen, long context, agent reasoning, tool calling, and multimodal understanding. FlagOS ensures cross-chip migration via unified stack, with verified high-efficiency inference.

📰

Weekly AI Recap

Read this week's curated digest of top AI events →

👉Read Next

AI-curated news aggregator. All content rights belong to original publishers.
Original source: 机器之心