Ohio State and Amazon release MMDR-Bench, a verifiable benchmark for multimodal Deep Research Agents. Focuses on process traceability, evidence alignment, and claim verification beyond superficial reports. Open resources include paper, GitHub, and Hugging Face datasets.
Key Points
- 1.Process/evidence/claim verifiability
- 2.Handles charts, screenshots, diagrams
- 3.Public eval framework available
Impact Analysis
Standardizes Agent evaluation; shifts from 'looks good' to rigorous metrics for research tasks.
