Study compares reasoning vs non-reasoning LLMs on ToM benchmarks, finding no consistent gains and sometimes worse performance. Insights reveal slow thinking collapse, need for adaptive reasoning, and option-matching shortcuts. Interventions like S2F and T2M mitigate issues.
Key Points
- 1.Reasoning hurts on longer responses
- 2.Adaptive reasoning improves accuracy
- 3.Relies on shortcuts, not true deduction
Impact Analysis
Highlights limits of LRMs in social reasoning vs formal tasks. Calls for ToM-specific capabilities. Interventions boost performance.
Technical Details
Analyzes 9 LLMs on 3 ToM benchmarks. Tests reasoning budgets and option removal. Proposes S2F adaptive and T2M prevention methods.