🧠Stalecollected in 5m

FlagOS Launches Qwen3.5 397B Multi-Chip Versions

FlagOS Launches Qwen3.5 397B Multi-Chip Versions
PostLinkedIn
🧠Read original on 机器之心

💡Largest open MoE VLM now runs out-of-box on NVIDIA & Chinese chips (397B params).

⚡ 30-Second TL;DR

What Changed

Adapted Qwen3.5-397B for Metax, Zhenwu, and NVIDIA chips with full precision alignment

Why It Matters

This release solves key pain points in multi-chip adaptation, enabling developers to run the top open MoE VLM across diverse hardware without code changes or performance loss. It promotes hardware diversity in AI deployments, especially for Chinese ecosystems.

What To Do Next

Download FlagRelease/Qwen3.5-397B-A17B-nvidia-FlagOS from HuggingFace and deploy via vLLM-plugin-FL README.

Who should care:Developers & AI Engineers

🧠 Deep Insight

Web-grounded analysis with 5 cited sources.

🔑 Enhanced Key Takeaways

  • FlagOS adapted Alibaba's Qwen3.5-397B-A17B, the largest open-source multimodal MoE model with 397B total parameters and 17B active parameters, for deployment on Metax, Zhenwu, and NVIDIA chips.[article]
  • vLLM-plugin-FL provides unified multi-chip inference with zero code changes, leveraging vLLM's high-throughput features like paged attention and continuous batching.[article][3]
  • Verified BF16 dual-machine 16-card setups enable seamless deployment, aligning full precision across diverse hardware.[article]
  • Ready-to-use models available on HuggingFace and ModelScope via FlagRelease.[article]
  • Qwen3.5-397B-A17B added to LMSYS Arena for Text, Vision, and Code benchmarks alongside Claude Sonnet 4.6.[5]
📊 Competitor Analysis▸ Show
FeatureFlagOS Qwen3.5-397B-A17BAirLLMvLLM
Model Size397B total / 17B active MoEUp to 405BN/A (Inference Engine)
HardwareMetax, Zhenwu, NVIDIA multi-chipLow-memory GPUs (4-8GB)Multi-GPU/node
Key TechvLLM-plugin-FL, BF16 16-cardLayer-by-layer loadingPaged attention, continuous batching
BenchmarksLMSYS Arena (Text/Vision/Code)Up to 3x speed w/ compressionHigh-throughput serving
PricingOpen-source, free downloadOpen-sourceOpen-source

🛠️ Technical Deep Dive

  • Qwen3.5-397B-A17B is a multimodal Mixture-of-Experts (MoE) model with 397 billion total parameters but only 17 billion active per inference, optimizing compute efficiency.[article][5]
  • Supports BF16 precision with full alignment for Metax (likely Chinese AI chip), Zhenwu, and NVIDIA GPUs, enabling dual-machine 16-card deployments.[article]
  • vLLM-plugin-FL integrates with vLLM for zero-code-change inference, building on vLLM's paged attention, continuous batching, prefix caching, and multi-GPU support.[article][3]

🔮 Future ImplicationsAI analysis grounded in cited sources

FlagOS's multi-chip adaptation of Qwen3.5-397B-A17B democratizes access to massive open-source multimodal models across diverse hardware, reducing reliance on single-vendor ecosystems like NVIDIA and potentially accelerating adoption in cost-sensitive regions with chips like Metax and Zhenwu.

📎 Sources (5)

Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.

  1. westurner.github.io — Hnlog
  2. news.ycombinator.com — Item
  3. ludwigabap.com — Bookmarks
  4. t.me — Githubtrending
  5. xagi.in — AI News
📰

Weekly AI Recap

Read this week's curated digest of top AI events →

👉Related Updates

AI-curated news aggregator. All content rights belong to original publishers.
Original source: 机器之心