๐Ÿ“„Stalecollected in 17h

MiRA Supercharges Open LLM Agents Past GPT-4

MiRA Supercharges Open LLM Agents Past GPT-4
PostLinkedIn
๐Ÿ“„Read original on ArXiv AI

๐Ÿ’กOpen model hits 43% SR on WebArena, beats GPT-4o 3xโ€”new SOTA for agents!

โšก 30-Second TL;DR

What Changed

Subgoal decomposition enables adaptive online planning with proprietary LLMs like Gemini (+10% SR).

Why It Matters

Open models now rival or exceed proprietary ones on agent benchmarks, democratizing advanced autonomy. Enables scalable RL for real-world digital environments like browsers and OS.

What To Do Next

Reproduce MiRA on Gemma3-12B using WebArena-Lite to fine-tune your web agents.

Who should care:Researchers & Academics

๐Ÿง  Deep Insight

AI-generated analysis for this event.

๐Ÿ”‘ Enhanced Key Takeaways

  • โ€ขMiRA utilizes a novel 'Reward-Weighted Regression' (RWR) variant that specifically addresses the high variance typically associated with RL fine-tuning in web-based environments.
  • โ€ขThe framework incorporates a 'Dynamic Subgoal Re-planning' mechanism that triggers automatically when the agent detects a deviation from the expected DOM (Document Object Model) state, reducing cumulative error.
  • โ€ขUnlike previous WebRL approaches that rely heavily on offline trajectory datasets, MiRA demonstrates significant sample efficiency by leveraging a hybrid training loop that combines synthetic trajectory generation with real-time environment feedback.
๐Ÿ“Š Competitor Analysisโ–ธ Show
FeatureMiRA (Gemma3-12B)WebRL (SOTA)GPT-4o (Agentic)
Planning StrategyDynamic Subgoal DecompositionStatic/HeuristicPrompt-based (CoT)
RL ApproachMilestone-based RWRPPO-basedNone (In-context)
WebArena-Lite SR43%38.4%13.9%
Training CostModerate (Fine-tuning)High (Full RL)N/A (Proprietary)

๐Ÿ› ๏ธ Technical Deep Dive

  • โ€ขArchitecture: Employs a dual-tower structure where a lightweight 'Planner' module generates subgoals, and a 'Policy' module (Gemma3-12B) executes actions.
  • โ€ขReward Function: Uses a dense reward signal derived from DOM-tree distance metrics and successful completion of intermediate HTML-element interactions.
  • โ€ขExecution Drift Mitigation: Implements a 'State-Consistency Check' that compares the current browser state against the predicted state from the subgoal planner; if the divergence exceeds a threshold, the agent forces a re-plan.
  • โ€ขTraining Methodology: Utilizes a two-stage process: (1) Supervised fine-tuning on successful trajectories, followed by (2) Reward-Weighted Regression (RWR) to optimize for milestone completion.

๐Ÿ”ฎ Future ImplicationsAI analysis grounded in cited sources

Small-scale LLMs will become the standard for autonomous web agents.
The success of MiRA demonstrates that specialized RL fine-tuning on 12B-parameter models can outperform massive, general-purpose frontier models in specific task-oriented domains.
Web-based automation will shift from prompt-engineering to RL-based agent training.
The performance gap between MiRA and GPT-4o suggests that architectural planning and RL-based reward optimization are more effective for long-horizon web tasks than in-context learning alone.

โณ Timeline

2025-11
Initial release of WebArena-Lite benchmark for evaluating long-horizon web agents.
2026-01
Development of the MiRA subgoal-driven framework begins, focusing on sparse reward signals.
2026-03
MiRA research paper published on ArXiv, demonstrating 43% success rate on WebArena-Lite.
๐Ÿ“ฐ

Weekly AI Recap

Read this week's curated digest of top AI events โ†’

๐Ÿ‘‰Related Updates

AI-curated news aggregator. All content rights belong to original publishers.
Original source: ArXiv AI โ†—