AudioRouter applies RL to teach large audio language models (LALMs) when to use external audio tools, improving fine-grained perception without heavy training. It optimizes a lightweight routing policy while freezing the base model. Achieves big gains on benchmarks with 600x less data than traditional methods.
Key Points
- 1.RL for tool-use decisions in audio tasks
- 2.Data-efficient alternative to full retraining
- 3.Keeps reasoning model frozen
Impact Analysis
Offers scalable path for enhancing LALMs' perceptual abilities. Reduces data needs dramatically. Paves way for modular audio AI systems.
Technical Details
Formulates tool use as decision problem. Tested on audio understanding benchmarks. Substantial performance uplift with minimal data.