Qwen3.5-35B-A3B Nears Claude Opus on SWE-bench Hard

Post LinkedIn

🦙Read original on Reddit r/LocalLLaMA

#moe-model #agent-verificationqwen3.5-35b-a3b

💡3B MoE hits 38% SWE-bench Hard, beats baselines w/ simple verify trick

⚡ 30-Second TL;DR

What Changed

37.8% on SWE-bench Hard (45 tasks) with verify-on-edit vs 22.2% baseline

Why It Matters

Proves efficient verification boosts small MoE models to top-tier coding agent performance. Enables cost-effective self-hosted SWE agents for practitioners.

What To Do Next

Implement verify-on-edit in your vLLM agent loop for SWE-bench testing.

Who should care:Researchers & Academics

🧠 Deep Insight

Web-grounded analysis with 6 cited sources.

🔑 Enhanced Key Takeaways

•Qwen3.5-35B-A3B has a 262k token context window, exceeding Claude Opus 4.6's 200k tokens, enabling handling of larger codebases[1][2].
•As an open-weight model runnable locally, it achieves over 50 tokens per second on a single NVIDIA 4090 GPU, avoiding API rate limits[5].
•Qwen3 offers an 83x lower price per token compared to Claude Opus for coding tasks, making it ideal for high-volume prototyping[6].

📊 Competitor Analysis▸ Show

Metric	Qwen3.5-35B-A3B	Claude Opus 4.6
Creator	Alibaba (inferred open-weight)	Anthropic
Context Window	262k tokens	200k tokens (API), up to 1M input
Pricing	Free/open-weight (local), ~$0.06 equiv. blended	$5 input / $25 output per M tokens
Speed	50+ t/s on 4090 local	Lower on API, optimized for reasoning
SWE-bench Hard	37.8% (verify-on-edit)	40%
Open Source	Yes (local runnable)	Proprietary

🔮 Future ImplicationsAI analysis grounded in cited sources

Open-weight MoE models under 40B will surpass proprietary coding benchmarks by mid-2026

Qwen3.5-35B-A3B's 37.8% on SWE-bench Hard with simple verify-on-edit closes gap to Claude Opus, at 83x lower cost and local speed[1][3][6].

Local inference on consumer GPUs will dominate agentic coding workflows

50+ t/s on 4090 without rate limits enables real-time editing on large codebases, unlike API-dependent proprietary models[5].

⏳ Timeline

2026-02

Qwen3.5 series released, including 35B-A3B MoE model

2026-03

Qwen3.5-35B-A3B achieves 37.8% on SWE-bench Verified Hard with verify-on-edit

📎 Sources (6)

Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.

🦙Read original article on Reddit r/LocalLLaMA

📰

Weekly AI Recap

Read this week's curated digest of top AI events →

👉Related Updates

Same topic

Explore #moe-model

Same product