DeepMind Aletheia Sets FirstProof Math Record

Post LinkedIn

🧠Read original on 机器之心

#math-ai #reasoning-agent #theorem-proving #benchmarkaletheia

💡DeepMind AI cracks 6 real math research proofs—huge for automated theorem proving

⚡ 30-Second TL;DR

What Changed

Solves 6/10 unpublished research math problems autonomously

Why It Matters

Proves AI agents can tackle open math research, bridging contest-solving to discovery. Accelerates autonomous theorem proving. Highlights DeepMind's lead in superhuman reasoning.

What To Do Next

Replicate Aletheia prompts from GitHub on your math agent for FirstProof problems.

Who should care:Researchers & Academics

🧠 Deep Insight

Web-grounded analysis with 6 cited sources.

🔑 Enhanced Key Takeaways

•Aletheia solved specific FirstProof problems 2, 5, 7, 8, 9, and 10, with expert disagreement only on problem 8[1][2].
•Raw prompts and outputs for Aletheia are publicly available on GitHub at google-deepmind/superhuman/tree/main/aletheia[3].
•Aletheia uses Google Search and web browsing as tools to prevent citation hallucinations and synthesize mathematical literature[5][6].

🛠️ Technical Deep Dive

•Aletheia employs agentic scaffolding with iterative generation, verification, and revision using a natural language verifier to identify flaws[1][6].
•Features two variants (Aletheia A and B) with best-of-2 submissions per problem, showing improved accuracy over December 2025 version via scaffolding and base model upgrades[2].
•Integrates Gemini 3 Deep Think with inference-time scaling, achieving higher reasoning quality at lower compute (100x reduction from prior versions)[4][5].

🔮 Future ImplicationsAI analysis grounded in cited sources

Aletheia accelerates PhD-level math research by enabling autonomous paper generation.

It produced the fully autonomous Feng26 paper on arithmetic geometry eigenweights without human mathematical intervention[4][5].

AI resolves select open math problems at scale.

Aletheia autonomously solved 4 out of 700 Erdős conjectures and 63 technically correct solutions[5].

⏳ Timeline

2025-07

Gemini Deep Think achieves IMO gold-medal standard and 65.7% on IMO-ProofBench Advanced

2025-12

Early Aletheia version used for semi-autonomous Erdős problems

2026-01

Gemini Deep Think version achieves 95.1% on IMO-ProofBench Advanced with 100x compute reduction; Aletheia hits FutureMath Basic SOTA

2026-02-13

Aletheia submits best-of-2 solutions to FirstProof challenge

2026-02-24

arXiv paper released reporting Aletheia solving 6/10 FirstProof problems

📎 Sources (6)

Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.

🧠Read original article on 机器之心

📰

Weekly AI Recap

Read this week's curated digest of top AI events →

👉Related Updates

Same topic

Explore #math-ai

Same product