Qwen3.5-35B-A3B RTX 5080 Benchmarks Update
๐กProves KV q8_0 free speed boost for Qwen MoE on RTX 5080โtest now for 74 tok/s.
โก 30-Second TL;DR
What Changed
KV q8_0 confirmed 'free lunch' with <0.4% PPL change
Why It Matters
Optimizes local inference for MoE models on consumer GPUs, enabling faster speeds without quality trade-offs for AI builders running large models.
What To Do Next
Run llama.cpp with -ctk q8_0 -ctv q8_0 on your Qwen3.5-35B-A3B to boost throughput 12-38%.
๐ง Deep Insight
Web-grounded analysis with 7 cited sources.
๐ Enhanced Key Takeaways
- โขQwen3.5-35B-A3B released on February 24, 2026, as part of Alibaba's Qwen3.5 series emphasizing 'more intelligence, less compute' with MoE architecture outperforming larger predecessors[1][2][6].
- โขModel supports 262,144 token context length and native multimodal inputs (text, image, video) with benchmarks like GPQA 84.5% and Tau-Bench 89.2%[1][2][3].
- โขFeatures Gated Delta Networks and sparse MoE (256 experts, 8 routed + 1 shared active) for efficient inference, comparable to Qwen3.5-27B dense model[1][2].
- โขAvailable via APIs with pricing at $0.25/1M input tokens and $2.00/1M output tokens; supports reasoning mode with step-by-step thinking[2][5].
๐ ๏ธ Technical Deep Dive
- โข35B total parameters, 3B activated; hybrid architecture with linear attention, sparse Mixture-of-Experts (256 total experts, 8 routed + 1 shared active), RoPE positional embeddings, SwiGLU activations, RMSNorm[1][2][5].
- โขUnified vision-language foundation via early fusion training on multimodal tokens for reasoning, coding, agents, and visual understanding[1][7].
- โขScalable RL trained across million-agent environments; supports 201 languages/dialects and tool use[1].
- โข64 layers in advanced transformer architecture[5].
๐ฎ Future ImplicationsAI analysis grounded in cited sources
โณ Timeline
๐ Sources (7)
Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.
Weekly AI Recap
Read this week's curated digest of top AI events โ
๐Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: Reddit r/LocalLLaMA โ