π€Reddit r/MachineLearningβ’Stalecollected in 2h
LoCoMo Audit: 6.4% Key Errors, Judge Passes 63% Wrongs
π‘LoCoMo flawed: 6.4% key errors, judge OKs 63% wrongsβrethink memory benchmarks now
β‘ 30-Second TL;DR
What Changed
99 errors in answer key: hallucinations, temporal reasoning, speaker attribution
Why It Matters
Exposes flaws in popular long-context memory benchmarks, urging caution in leaderboard comparisons and pushing for better alternatives.
What To Do Next
Download locomo-audit repo and validate your long-memory model scores against documented fixes.
Who should care:Researchers & Academics
π°
Weekly AI Recap
Read this week's curated digest of top AI events β
πRelated Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: Reddit r/MachineLearning β