๐Ÿฆ™Stalecollected in 7h

Qwen3.5-35B-A3B Nears Claude Opus on SWE-bench Hard

Qwen3.5-35B-A3B Nears Claude Opus on SWE-bench Hard
PostLinkedIn
๐Ÿฆ™Read original on Reddit r/LocalLLaMA

๐Ÿ’ก3B MoE hits 38% SWE-bench Hard, beats baselines w/ simple verify trick

โšก 30-Second TL;DR

What Changed

37.8% on SWE-bench Hard (45 tasks) with verify-on-edit vs 22.2% baseline

Why It Matters

Proves efficient verification boosts small MoE models to top-tier coding agent performance. Enables cost-effective self-hosted SWE agents for practitioners.

What To Do Next

Implement verify-on-edit in your vLLM agent loop for SWE-bench testing.

Who should care:Researchers & Academics

๐Ÿง  Deep Insight

Web-grounded analysis with 6 cited sources.

๐Ÿ”‘ Enhanced Key Takeaways

  • โ€ขQwen3.5-35B-A3B has a 262k token context window, exceeding Claude Opus 4.6's 200k tokens, enabling handling of larger codebases[1][2].
  • โ€ขAs an open-weight model runnable locally, it achieves over 50 tokens per second on a single NVIDIA 4090 GPU, avoiding API rate limits[5].
  • โ€ขQwen3 offers an 83x lower price per token compared to Claude Opus for coding tasks, making it ideal for high-volume prototyping[6].
๐Ÿ“Š Competitor Analysisโ–ธ Show
MetricQwen3.5-35B-A3BClaude Opus 4.6
CreatorAlibaba (inferred open-weight)Anthropic
Context Window262k tokens200k tokens (API), up to 1M input
PricingFree/open-weight (local), ~$0.06 equiv. blended$5 input / $25 output per M tokens
Speed50+ t/s on 4090 localLower on API, optimized for reasoning
SWE-bench Hard37.8% (verify-on-edit)40%
Open SourceYes (local runnable)Proprietary

๐Ÿ”ฎ Future ImplicationsAI analysis grounded in cited sources

Open-weight MoE models under 40B will surpass proprietary coding benchmarks by mid-2026
Qwen3.5-35B-A3B's 37.8% on SWE-bench Hard with simple verify-on-edit closes gap to Claude Opus, at 83x lower cost and local speed[1][3][6].
Local inference on consumer GPUs will dominate agentic coding workflows
50+ t/s on 4090 without rate limits enables real-time editing on large codebases, unlike API-dependent proprietary models[5].

โณ Timeline

2026-02
Qwen3.5 series released, including 35B-A3B MoE model
2026-03
Qwen3.5-35B-A3B achieves 37.8% on SWE-bench Verified Hard with verify-on-edit
๐Ÿ“ฐ

Weekly AI Recap

Read this week's curated digest of top AI events โ†’

๐Ÿ‘‰Related Updates

AI-curated news aggregator. All content rights belong to original publishers.
Original source: Reddit r/LocalLLaMA โ†—