AI Updates Aggregator

🦙Reddit r/LocalLLaMA•Mar 1, 2026Stalecollected in 12h

Qwen3.5-397B Uncensored NVFP4 Released

Post LinkedIn

🦙Read original on Reddit r/LocalLLaMA

#uncensored-model #quantization #nvfp4 #qwenqwen3.5-397b-uncensored-nvfp4

💡Uncensored 397B Qwen in NVFP4—run massive uncensored LLM locally now

⚡ 30-Second TL;DR

What Changed

Uncensored version of Qwen3.5-397B model

Why It Matters

Uncensored 397B Qwen quant democratizes access to top-tier local AI without alignment limits. Boosts experimentation for researchers on consumer hardware.

What To Do Next

Download the Qwen3.5-397B Uncensored NVFP4 from the Reddit link and run inference benchmarks locally.

Who should care:Developers & AI Engineers

🧠 Deep Insight

Web-grounded analysis with 9 cited sources.

🔑 Enhanced Key Takeaways

•Qwen3.5-397B-A17B is a native multimodal model from Alibaba's Qwen team, supporting text, image, and video inputs via early fusion training, achieving top benchmarks like 87.8% MMLU-Pro and 88.6% MathVision[1][2][3].
•Features Hybrid MoE architecture with 397B total parameters but only 17B active per token, enabling 8.6-19x faster decoding than Qwen3-Max at long contexts[1][2][4].
•Released openly by QwenLM on GitHub on 2026-02-16 as the first in Qwen3.5 series, ranking #3 on Artificial Analysis Intelligence Index with score of 45[3][9].
•Available via NVIDIA NIM and Together AI APIs with FP4 quantization option, priced at $0.60 input / $3.60 output per million tokens on Together[1][2].

📊 Competitor Analysis▸ Show

Model	Total/Active Params	Context Length	Key Benchmarks	Pricing (Together AI)
Qwen3.5-397B-A17B	397B/17B	262K (ext. 1M)	87.8% MMLU-Pro, 88.6% MathVision	$0.60 in / $3.60 out
GLM-5	744B/40B	N/A	Intelligence Index 50	N/A
Kimi K2.5	1T/32B	N/A	Intelligence Index 47	N/A
DeepSeek V3.2	671B/37B	N/A	N/A	N/A [3][4]

🛠️ Technical Deep Dive

•Architecture: Transformer Causal LM with Vision Encoder; Hybrid MoE using Gated DeltaNet (64 linear attention heads for V, 16 for QK, head dim 128) + Gated Attention MoE; 60 layers, hidden dim 4,096, vocab 248,320[1].
•Efficiency: Sparse MoE with 10 routed + 1 shared expert out of 512; multi-token prediction (MTP); YaRN RoPE scaling to 1M tokens; 256K native context[1][2].
•Inputs: Unified multimodal (text, RGB images, MP4/WebM video); ViT encoder fused early with LM; supports 1D/2D/3D parameters[1].
•Performance: 8.6x faster at 32K, 19x at 256K vs Qwen3-Max; multilingual math 73.3% PolyMATH; instruction 76.5% IFBench[2][4].

🔮 Future ImplicationsAI analysis grounded in cited sources

NVFP4 uncensored quantization will lower hardware barriers for local deployment

397B MoE with only 17B active params in efficient NVFP4 format targets enthusiasts running massive models on consumer-grade NVIDIA GPUs previously infeasible[1][2].

Accelerates open-source multimodal agent development

Uncensored local access to native vision-language model with strong agent benchmarks enables rapid experimentation in RAG, GUI interaction, and video tasks without API limits[1][3][6].

Challenges proprietary VLMs in efficiency

8.6-19x decoding speed with comparable performance to larger dense models positions quantized Qwen3.5 as viable alternative for production workflows[2][4].

⏳ Timeline

2026-02-16

Qwen3.5 series launched with open-weight release of 397B-A17B MoE model on GitHub[9]

2026-02

Model deployed on NVIDIA NIM, Together AI, and Alibaba Cloud APIs with FP4 quantization support[1][2][4]

2026-03-02

Uncensored NVFP4 quantization of Qwen3.5-397B shared on Reddit r/LocalLLaMA for local inference

📎 Sources (9)

Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.

🦙Read original article on Reddit r/LocalLLaMA

📰

Weekly AI Recap

Read this week's curated digest of top AI events →

👉Related Updates

Same topic

Explore #uncensored-model

Same product

AI-curated news aggregator. All content rights belong to original publishers.
Original source: Reddit r/LocalLLaMA ↗