๐ฆReddit r/LocalLLaMAโขStalecollected in 5h
More MoE experts: meaningful gains?
๐กDoes scaling MoE experts beyond A3B pay off? Easy Llama.cpp test revives old debate.
โก 30-Second TL;DR
What Changed
Debate on Qwen3-30B-A3B vs A6B expert scaling
Why It Matters
Could revive interest in MoE tuning for local LLMs if tests confirm gains, optimizing inference without full model retraining.
What To Do Next
Run benchmarks on Qwen3-30B-A6B in Llama.cpp to test expert scaling on your tasks.
Who should care:Researchers & Academics
๐ง Deep Insight
Web-grounded analysis with 8 cited sources.
๐ Enhanced Key Takeaways
- โขQwen3-30B-A3B features 30.5 billion total parameters with 3.3 billion activated, utilizing 48 layers and 128 experts where only 8 are activated per task, supporting a 131K token context window[8][1].
- โขCommunity-modified versions like DavidAU's Qwen3-30B-A6B-16-Extreme increase active experts to 16 (activating ~6B parameters), trading inference speed for potentially deeper reasoning on nuanced tasks, with GPU speeds comparable to 6B dense models[2].
- โขQwen3-30B-A3B-Thinking-2507 is a specialized variant refined over three months to enhance reasoning quality and depth, while Qwen3 Coder 30B A3B Instruct variant outputs at 25.6 tokens/second on Alibaba's API, ranking low in speed among similar open-weight models[6][3].
- โขNemotron-3-Nano-30B-A3B from Nvidia matches Qwen3-30B-A3B in local coding benchmarks but lags in speed for code generation tasks[5].
๐ Competitor Analysisโธ Show
| Model | Total Params | Active Params | Key Benchmarks | Speed (t/s) |
|---|---|---|---|---|
| Qwen3-30B-A3B | 30B | 3B | ArenaHard: 91.0, AIMEโ24/25: 80.4 | 25.6 (API) [3] |
| Nemotron-3-Nano-30B-A3B | 30B | 3B | Similar accuracy to GPT OSS 20B in coding evals | Comparable, slightly slower [5] |
| Qwen3-235B-A22B | 235B | 22B | Outperforms DeepSeek R1, GPT-4o in coding/math | Faster inference than giants [4] |
๐ ๏ธ Technical Deep Dive
- โขArchitecture: 30.5B total parameters, 3.3B active; 48 layers; 128 MoE experts with 8 activated by default per forward pass[8].
- โขContext: Up to 131K tokens input; modified A6B variant supports 32K + 8K output (40K total)[1][2].
- โขInference: Base A3B runs at reading speed locally; A6B-16-Extreme halves token/s speed but activates ~6B params for complex tasks; GPU inference 4x-8x faster than CPU[2][4].
- โขVariants: Qwen3 Coder 30B A3B Instruct scores 20/100 on Intelligence Index (above avg), verbose output (13M tokens vs median 5.6M)[3].
๐ฎ Future ImplicationsAI analysis grounded in cited sources
MoE expert scaling beyond A3B will remain niche due to speed trade-offs
Modified A6B configs halve inference speed without proportional benchmark gains, limiting adoption to specialized deep-reasoning use cases[2].
Qwen3-30B-A3B variants will dominate efficient local coding
Community finetunes like Thinking-2507 will drive iterative MoE improvements
Three months of scaling enhanced reasoning depth, showing viability of targeted post-release optimizations[6].
โณ Timeline
2025-07
Qwen3-30B-A3B-Thinking-2507 released with three months of reasoning scaling
2025-12
DavidAU releases Qwen3-30B-A6B-16-Extreme finetune increasing experts to 16
๐ Sources (8)
Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.
- datacamp.com โ Qwen3
- Hugging Face โ Qwen3 30b A6b 16 Extreme
- artificialanalysis.ai โ Qwen3 Coder 30b A3b Instruct
- dev.to โ Qwen 3 vs Deep Seek R1 Evaluation Notes 1bi1
- grigio.org โ Opencode Local LLM Test with Nemotron 3 Nano 30b A3b vs Qwen3 Coder 30b A3b vs Gpt Oss 20b Mxfp4
- modelscope.cn โ Qwen3 30b A3b Thinking 2507
- youtube.com โ Watch
- openrouter.ai โ Qwen3 Coder 30b A3b Instruct
๐ฐ
Weekly AI Recap
Read this week's curated digest of top AI events โ
๐Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: Reddit r/LocalLLaMA โ