TRACER is a trajectory-level uncertainty metric for tool-using agents, combining surprisal, repetition, and coherence signals with tail-focused aggregation. Improves AUROC by 37% and AUARC by 55% on tau^2-bench for failure prediction. Code and benchmark on GitHub.
Key Points
- 1.Trajectory-level uncertainty metric for tool-using agents
- 2.Combines surprisal repetition coherence with tail-focused aggregation
- 3.Improves AUROC 37% AUARC 55% on tau^2-bench failure prediction
Impact Analysis
Developers of tool-using AI agents benefit from TRACER's superior failure prediction, enabling proactive risk mitigation in complex trajectories. It matters as it sets a new standard for uncertainty quantification, far outperforming baselines. Potential effects include integration into agent frameworks and benchmarks, fostering safer autonomous systems.
Technical Details
TRACER fuses surprisal (token unexpectedness), repetition (loop detection), and coherence (logical consistency) signals across agent trajectories. Tail-focused aggregation prioritizes extreme risk events for better failure forecasting. Evaluated on tau^2-bench with code and data on GitHub.
