Qwen3.5-397B Uncensored NVFP4 Released

๐กUncensored 397B Qwen in NVFP4โrun massive uncensored LLM locally now
โก 30-Second TL;DR
What Changed
Uncensored version of Qwen3.5-397B model
Why It Matters
Uncensored 397B Qwen quant democratizes access to top-tier local AI without alignment limits. Boosts experimentation for researchers on consumer hardware.
What To Do Next
Download the Qwen3.5-397B Uncensored NVFP4 from the Reddit link and run inference benchmarks locally.
๐ง Deep Insight
Web-grounded analysis with 9 cited sources.
๐ Enhanced Key Takeaways
- โขQwen3.5-397B-A17B is a native multimodal model from Alibaba's Qwen team, supporting text, image, and video inputs via early fusion training, achieving top benchmarks like 87.8% MMLU-Pro and 88.6% MathVision[1][2][3].
- โขFeatures Hybrid MoE architecture with 397B total parameters but only 17B active per token, enabling 8.6-19x faster decoding than Qwen3-Max at long contexts[1][2][4].
- โขReleased openly by QwenLM on GitHub on 2026-02-16 as the first in Qwen3.5 series, ranking #3 on Artificial Analysis Intelligence Index with score of 45[3][9].
- โขAvailable via NVIDIA NIM and Together AI APIs with FP4 quantization option, priced at $0.60 input / $3.60 output per million tokens on Together[1][2].
๐ Competitor Analysisโธ Show
| Model | Total/Active Params | Context Length | Key Benchmarks | Pricing (Together AI) |
|---|---|---|---|---|
| Qwen3.5-397B-A17B | 397B/17B | 262K (ext. 1M) | 87.8% MMLU-Pro, 88.6% MathVision | $0.60 in / $3.60 out |
| GLM-5 | 744B/40B | N/A | Intelligence Index 50 | N/A |
| Kimi K2.5 | 1T/32B | N/A | Intelligence Index 47 | N/A |
| DeepSeek V3.2 | 671B/37B | N/A | N/A | N/A [3][4] |
๐ ๏ธ Technical Deep Dive
- โขArchitecture: Transformer Causal LM with Vision Encoder; Hybrid MoE using Gated DeltaNet (64 linear attention heads for V, 16 for QK, head dim 128) + Gated Attention MoE; 60 layers, hidden dim 4,096, vocab 248,320[1].
- โขEfficiency: Sparse MoE with 10 routed + 1 shared expert out of 512; multi-token prediction (MTP); YaRN RoPE scaling to 1M tokens; 256K native context[1][2].
- โขInputs: Unified multimodal (text, RGB images, MP4/WebM video); ViT encoder fused early with LM; supports 1D/2D/3D parameters[1].
- โขPerformance: 8.6x faster at 32K, 19x at 256K vs Qwen3-Max; multilingual math 73.3% PolyMATH; instruction 76.5% IFBench[2][4].
๐ฎ Future ImplicationsAI analysis grounded in cited sources
โณ Timeline
๐ Sources (9)
Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.
- build.nvidia.com โ Modelcard
- together.ai โ Qwen3 5 397b A17b
- artificialanalysis.ai โ Qwen3 5 397b A17b Everything You Need to Know
- alibabacloud.com โ 602894
- latent.space โ Ainews Qwen35 397b A17b the Smallest
- naga.ac โ Uptime
- docs.api.nvidia.com โ Qwen Qwen3 5 397b A17b
- qwen.ai โ Blog
- GitHub โ Qwen3
Weekly AI Recap
Read this week's curated digest of top AI events โ
๐Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: Reddit r/LocalLLaMA โ

