Claude Fails Elden Ring: No AGI Yet

Post LinkedIn

🦙Read original on Reddit r/LocalLLaMA

#agi-debate #llm-benchmarks #gaming-testclaude

💡Debunks AGI hype with real Claude gaming fail—key for benchmark realists

⚡ 30-Second TL;DR

What Changed

Critiques AGI claims by Jensen Huang and Marc Andreessen

Why It Matters

Sparks debate on AGI benchmarks, urging practitioners to test LLMs on novel tasks beyond standard evals.

What To Do Next

Test your LLM on zero-shot gaming tasks like Elden Ring navigation.

Who should care:Researchers & Academics

🧠 Deep Insight

AI-generated analysis for this event.

🔑 Enhanced Key Takeaways

•The failure of LLMs in complex, real-time environments like Elden Ring highlights the 'embodiment gap,' where models struggle with high-latency, non-deterministic visual feedback loops compared to static text-based reasoning.
•Industry researchers distinguish between 'System 1' (fast, intuitive) and 'System 2' (slow, deliberative) reasoning; current architectures like Claude's struggle to maintain long-horizon planning in dynamic game environments without explicit neuro-symbolic integration.
•The Reddit discourse reflects a broader shift in the AI community toward 'benchmarking by frustration,' where users test models against complex, multi-modal tasks to expose the limitations of current scaling laws.

📊 Competitor Analysis▸ Show

Feature	Claude 3.5 Opus	GPT-4o	Gemini 1.5 Pro
Reasoning Architecture	Transformer-based (CoT)	Multimodal Transformer	Mixture-of-Experts
Context Window	200k tokens	128k tokens	2M tokens
Game/Real-time Task Capability	Low (Text-heavy)	Low (Vision-limited)	Moderate (Long-context)
Pricing	$15/million input tokens	$5/million input tokens	$3.50/million input tokens

🛠️ Technical Deep Dive

•Current LLM architectures lack a persistent 'world model' state, preventing them from maintaining spatial awareness in 3D environments like Elden Ring.
•The failure to exit the room is attributed to the lack of a closed-loop feedback mechanism; the model receives a frame, but cannot predict the consequences of its actions (e.g., 'press W') on the game state.
•Claude Opus utilizes a standard Transformer decoder architecture optimized for text and code, which lacks the temporal memory required for continuous, real-time decision-making in gaming engines.

🔮 Future ImplicationsAI analysis grounded in cited sources

LLM-based agents will require dedicated 'World Model' layers to succeed in interactive gaming.

Without internal representations of physics and spatial constraints, models cannot perform the multi-step planning required for complex game navigation.

AGI definitions will shift from 'passing benchmarks' to 'demonstrating autonomous task completion in open-world environments'.

The failure in Elden Ring serves as a public litmus test that exposes the gap between high-scoring benchmarks and real-world utility.

⏳ Timeline

2024-03

Anthropic releases Claude 3 Opus, setting new industry benchmarks for reasoning.

2024-10

Anthropic releases Claude 3.5 Sonnet, focusing on improved agentic capabilities.

2025-06

Anthropic introduces 'Computer Use' capabilities, allowing models to interact with desktop interfaces.

🦙Read original article on Reddit r/LocalLLaMA

📰

Weekly AI Recap

Read this week's curated digest of top AI events →

👉Related Updates

Same topic

Explore #agi-debate

Same product

Meta Open-Sources Next AI Models

Reddit r/LocalLLaMA•Apr 6

AI-curated news aggregator. All content rights belong to original publishers.
Original source: Reddit r/LocalLLaMA ↗