DIVE Scales Diversity for Tool-Use Generalization

๐กDiversity > quantity: +22pts OOD tool-use benchmarks w/ 4x less data
โก 30-Second TL;DR
What Changed
Inverts synthesis: execute tools first, derive tasks from traces
Why It Matters
DIVE addresses brittleness in agentic LLMs by prioritizing diversity, enabling robust tool-use generalization with less data. This shifts training paradigms toward quality over quantity, benefiting scalable agent development.
What To Do Next
Download DIVE dataset from arXiv:2603.11076 and fine-tune your tool-using LLM for OOD gains.
๐ง Deep Insight
Web-grounded analysis with 8 cited sources.
๐ Enhanced Key Takeaways
- โขDIVE employs an Evidence Collection-Task Derivation loop to generate executable tasks from tool execution traces, ensuring verifiability while expanding coverage to edge cases in tool usage.[8]
- โขThe dataset covers five domains including web browsing, file management, database querying, code execution, and multimedia processing, enabling broad real-world applicability.[8]
- โขTraining incorporates 48k supervised fine-tuning examples followed by 3.2k reinforcement learning steps using PPO on tool-use specific rewards.
๐ ๏ธ Technical Deep Dive
- โขEvidence Collection phase involves executing diverse tool combinations across 373 tools, logging traces including inputs, outputs, errors, and intermediate states.
- โขTask Derivation uses trace analysis to infer entailed tasks, such as 'query database for user info' from a successful SQL execution trace.
- โขDiversity scaling achieved via submodular selection for tool-pool coverage and per-task variety, prioritizing underrepresented execution paths.
- โขEvaluation on 9 out-of-distribution benchmarks including ToolBench, API-Bank, and custom held-out tool sets shows +22 average point gain over baselines.
๐ฎ Future ImplicationsAI analysis grounded in cited sources
โณ Timeline
๐ Sources (8)
Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.
Weekly AI Recap
Read this week's curated digest of top AI events โ
๐Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: ArXiv AI โ