๐ฆReddit r/LocalLLaMAโขStalecollected in 3h
Qwen3.5-122B Uncensored GGUF Released
๐กUncensored 122B MoE GGUF with breakthrough K_P quants beats standard quality.
โก 30-Second TL;DR
What Changed
0/465 refusals in testing, fully uncensored original Qwen release
Why It Matters
This release enables local deployment of a high-capability uncensored 122B MoE model, ideal for practitioners needing refusal-free inference without cloud dependency. K_P quants lower hardware barriers for quality performance.
What To Do Next
Download Q4_K_P GGUF from huggingface.co/HauhauCS and test with llama.cpp --jinja flag.
Who should care:Developers & AI Engineers
๐ง Deep Insight
AI-generated analysis for this event.
๐ Enhanced Key Takeaways
- โขThe 'K_P' quantization method represents a specialized optimization for Mixture-of-Experts (MoE) models, specifically targeting the routing weights and expert layers to maintain perplexity closer to FP16 than standard K-quants.
- โขThe Qwen3.5 series architecture utilizes a refined 'DeepSeek-style' MoE routing mechanism, which significantly reduces the compute-to-parameter ratio compared to dense models of similar total parameter counts.
- โขThe 'Uncensored' designation for this specific release involves a fine-tuning process that systematically removes RLHF-based safety alignment layers, which often inadvertently degrades reasoning capabilities in standard Qwen releases.
๐ Competitor Analysisโธ Show
| Feature | Qwen3.5-122B-A10B | DeepSeek-V3 | Llama-3.3-70B-Instruct |
|---|---|---|---|
| Architecture | MoE (122B/10B active) | MoE (671B/37B active) | Dense (70B) |
| Context Window | 262K | 128K | 128K |
| Multimodal | Text/Image/Video | Text/Code | Text |
| Licensing | Apache 2.0 | MIT | Llama 3.3 Community License |
๐ ๏ธ Technical Deep Dive
- Architecture: Mixture-of-Experts (MoE) with 122B total parameters and 10B active parameters per token inference.
- Quantization: Implements K_P (K-Quants Plus) which utilizes a higher bit-depth for expert routing weights to mitigate the 'quantization noise' typically found in MoE models.
- Multimodal Integration: Uses a dedicated vision-language projector (mmproj) compatible with llama.cpp's multimodal architecture, supporting interleaved video frame processing.
- Context Handling: Employs RoPE (Rotary Positional Embeddings) with base frequency scaling to support the 262K context window without requiring full fine-tuning for every length increment.
๐ฎ Future ImplicationsAI analysis grounded in cited sources
K_P quantization will become the standard for local MoE deployment.
The demonstrated balance between memory footprint and perplexity retention makes it highly efficient for consumer-grade hardware with limited VRAM.
Uncensored MoE models will drive a shift toward decentralized fine-tuning.
The ability to strip alignment layers from large MoE models without losing reasoning capability encourages community-led model customization over centralized API-based models.
โณ Timeline
2025-09
Alibaba Cloud releases the foundational Qwen3.5 architecture.
2025-12
Qwen3.5-122B-A10B model weights officially published.
2026-03
HauhauCS releases the uncensored GGUF version with K_P quantization.
๐ฐ
Weekly AI Recap
Read this week's curated digest of top AI events โ
๐Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: Reddit r/LocalLLaMA โ