Open-Source 30B Model Rivals Gemini & Claude

💡Open 30B model beats Gemini/Claude on reasoning loop – benchmark it free today!

⚡ 30-Second TL;DR

What Changed

Open-source 30B model from research AI launched

Why It Matters

Democratizes high-performance reasoning models, enabling researchers to rival proprietary giants without high costs.

What To Do Next

Download the 30B model and test its hypothesis-verification loop on arXiv reasoning benchmarks.

Who should care:Researchers & Academics

Web-grounded analysis with 10 cited sources.

•Qwen3 30B A3B, a likely candidate for the model, was released on April 28, 2025, by the Qwen team with a 41K token context window and very low pricing of $0.08/M input and $0.28/M output tokens.[5]
•DeepSeek V3 is highlighted as a leading open-source model in 2026 comparisons for its performance-to-price ratio, free/low-cost API access, and strengths in code and reasoning tasks.[1][6]
•No specific 30B model from 'Research AI' with hypothesis-evidence-verification loop appears in major 2026 AI model leaderboards or comparisons, suggesting limited adoption or recognition beyond initial announcement.[1][2][4]

📊 Competitor Analysis▸ Show

Model	Parameters	Context Window	Pricing (input/output per M tokens)	Key Benchmarks
Qwen3 30B A3B	30B	41K tokens	$0.08 / $0.28	Advanced reasoning, structured data generation[5]
Claude Sonnet 4.6	Undisclosed	1M tokens	$3.00 / $15.00	Leads GDPval-AA (1,606 Elo), ARC-AGI-2 68.8%[4][5]
Gemini 3.1 Pro	Undisclosed	Undisclosed	$2.00 / $12.00	77.1% ARC-AGI-2, 94.3% GPQA Diamond[4]
DeepSeek V3	Undisclosed	128K tokens	Free/low-cost API	Strong in code, reasoning[1][6]

Open-source 30B models like Qwen3 will capture >20% more developer market share by end-2026

Their drastically lower costs (37.5x cheaper input than Claude Sonnet 4.6) enable widespread custom deployments in coding and agentic tasks.[5]

Hypothesis-evidence-verification loops in open models will standardize in scientific benchmarks by Q1 2027

Emerging open-source reasoning models already outperform closed ones like Gemini 2.5 Flash on math tasks such as AIME 2025.[9]

2025-04

Qwen releases Qwen3 30B A3B, an open-source 30B model with advanced reasoning capabilities.[5]

2026-02

Anthropic launches Claude Sonnet 4.6 and Opus 4.6, setting new benchmarks in reasoning and code.[4][5]

2026-02

Google releases Gemini 3.1 Pro, leading 13/16 benchmarks including ARC-AGI-2 at 77.1%.[4]

Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.

Weekly AI Recap

Read this week's curated digest of top AI events →

Same topic

Explore #open-source

Same product

AI-curated news aggregator. All content rights belong to original publishers.
Original source: 量子位 ↗