๐Ÿฆ™Stalecollected in 12h

Qwen3.5-397B Uncensored NVFP4 Released

Qwen3.5-397B Uncensored NVFP4 Released
PostLinkedIn
๐Ÿฆ™Read original on Reddit r/LocalLLaMA
#uncensored-model#quantization#nvfp4#qwenqwen3.5-397b-uncensored-nvfp4

๐Ÿ’กUncensored 397B Qwen in NVFP4โ€”run massive uncensored LLM locally now

โšก 30-Second TL;DR

What Changed

Uncensored version of Qwen3.5-397B model

Why It Matters

Uncensored 397B Qwen quant democratizes access to top-tier local AI without alignment limits. Boosts experimentation for researchers on consumer hardware.

What To Do Next

Download the Qwen3.5-397B Uncensored NVFP4 from the Reddit link and run inference benchmarks locally.

Who should care:Developers & AI Engineers

๐Ÿง  Deep Insight

Web-grounded analysis with 9 cited sources.

๐Ÿ”‘ Enhanced Key Takeaways

  • โ€ขQwen3.5-397B-A17B is a native multimodal model from Alibaba's Qwen team, supporting text, image, and video inputs via early fusion training, achieving top benchmarks like 87.8% MMLU-Pro and 88.6% MathVision[1][2][3].
  • โ€ขFeatures Hybrid MoE architecture with 397B total parameters but only 17B active per token, enabling 8.6-19x faster decoding than Qwen3-Max at long contexts[1][2][4].
  • โ€ขReleased openly by QwenLM on GitHub on 2026-02-16 as the first in Qwen3.5 series, ranking #3 on Artificial Analysis Intelligence Index with score of 45[3][9].
  • โ€ขAvailable via NVIDIA NIM and Together AI APIs with FP4 quantization option, priced at $0.60 input / $3.60 output per million tokens on Together[1][2].
๐Ÿ“Š Competitor Analysisโ–ธ Show
ModelTotal/Active ParamsContext LengthKey BenchmarksPricing (Together AI)
Qwen3.5-397B-A17B397B/17B262K (ext. 1M)87.8% MMLU-Pro, 88.6% MathVision$0.60 in / $3.60 out
GLM-5744B/40BN/AIntelligence Index 50N/A
Kimi K2.51T/32BN/AIntelligence Index 47N/A
DeepSeek V3.2671B/37BN/AN/AN/A [3][4]

๐Ÿ› ๏ธ Technical Deep Dive

  • โ€ขArchitecture: Transformer Causal LM with Vision Encoder; Hybrid MoE using Gated DeltaNet (64 linear attention heads for V, 16 for QK, head dim 128) + Gated Attention MoE; 60 layers, hidden dim 4,096, vocab 248,320[1].
  • โ€ขEfficiency: Sparse MoE with 10 routed + 1 shared expert out of 512; multi-token prediction (MTP); YaRN RoPE scaling to 1M tokens; 256K native context[1][2].
  • โ€ขInputs: Unified multimodal (text, RGB images, MP4/WebM video); ViT encoder fused early with LM; supports 1D/2D/3D parameters[1].
  • โ€ขPerformance: 8.6x faster at 32K, 19x at 256K vs Qwen3-Max; multilingual math 73.3% PolyMATH; instruction 76.5% IFBench[2][4].

๐Ÿ”ฎ Future ImplicationsAI analysis grounded in cited sources

NVFP4 uncensored quantization will lower hardware barriers for local deployment
397B MoE with only 17B active params in efficient NVFP4 format targets enthusiasts running massive models on consumer-grade NVIDIA GPUs previously infeasible[1][2].
Accelerates open-source multimodal agent development
Uncensored local access to native vision-language model with strong agent benchmarks enables rapid experimentation in RAG, GUI interaction, and video tasks without API limits[1][3][6].
Challenges proprietary VLMs in efficiency
8.6-19x decoding speed with comparable performance to larger dense models positions quantized Qwen3.5 as viable alternative for production workflows[2][4].

โณ Timeline

2026-02-16
Qwen3.5 series launched with open-weight release of 397B-A17B MoE model on GitHub[9]
2026-02
Model deployed on NVIDIA NIM, Together AI, and Alibaba Cloud APIs with FP4 quantization support[1][2][4]
2026-03-02
Uncensored NVFP4 quantization of Qwen3.5-397B shared on Reddit r/LocalLLaMA for local inference
๐Ÿ“ฐ

Weekly AI Recap

Read this week's curated digest of top AI events โ†’

๐Ÿ‘‰Related Updates

AI-curated news aggregator. All content rights belong to original publishers.
Original source: Reddit r/LocalLLaMA โ†—