Apple researchers demonstrate that reasoning trace length serves as a simple, effective confidence estimator in large reasoning models. It performs comparably to verbalized confidence across models, datasets, and prompts, acting complementarily. The work shows reasoning post-training alters the trace-confidence relationship.
Key Points
- 1.Trace length estimates uncertainty in LLMs
- 2.Complements zero-shot confidence methods
- 3.Validated via comprehensive experiments
Impact Analysis
Enhances LLM reliability by providing a low-cost hallucination detector, enabling safer deployment in real-world applications.
Technical Details
Evaluated on multiple reasoning models, datasets, and prompts. Reveals post-training shifts in trace length-confidence correlation.
