AMOR is a hybrid model inspired by dual-process cognition theories, dynamically activating sparse attention only when SSM predictions show high entropy uncertainty. It projects Ghost KV from SSM states for O(n) efficiency, outperforming SSM-only and Transformer baselines on retrieval tasks with perfect accuracy using just 22% attention positions. Prediction entropy reliably detects retrieval needs with a 1.09 nats gap.
Key Points
- 1.Dynamically routes to attention only on high-entropy SSM positions
- 2.Achieves perfect retrieval accuracy with 22% attention usage
- 3.Uses Ghost KV projection from SSM for O(n) compute efficiency
- 4.Entropy signals retrieval need with 1.09 nats gap over local positions
- 5.Provides interpretable adaptive computation via information theory
Impact Analysis
AMOR enables efficient hybrid architectures that adapt compute to task difficulty, potentially slashing inference costs for long-context LLMs. It bridges SSM speed with Transformer precision, advancing scalable AI models. Researchers can build more interpretable systems with metacognitive routing.
Technical Details
AMOR measures SSM uncertainty via prediction entropy to gate sparse attention. Keys/values are projected from SSM hidden states (Ghost KV), reusing O(n) computation instead of O(nยฒ) per layer. Validated on synthetic tasks showing entropy reliability and superior performance.