New ArXiv paper quantifies information optimal policies encode about environments. Proves mutual information of exactly n log m bits in Controlled Markov Processes. Bound holds for finite-horizon, discounted, and average reward maximization.
Key Points
- 1.Optimal policy reveals n log m bits about transition dynamics
- 2.Uniform prior over environments assumed
- 3.Lower bound on implicit world model for optimality
Impact Analysis
Establishes theoretical minimum for world models in RL agents. Informs AI safety by quantifying representation needs. Aids model compression and interpretability research.
Technical Details
Analyzes CMPs with n states, m actions. Mutual information I(environment; policy) = n log m bits for non-constant rewards. Proven across broad objectives.