OmniSteer addresses cross-modality vulnerabilities in OLLMs using AdvBench-Omni dataset and modality-semantics decoupling. Uncovers mid-layer dissolution and extracts golden refusal vector via SVD. Boosts refusal rate to 91.2% while preserving capabilities.
Key Points
- 1.Handles cross-modal safety risks
- 2.Refusal rate from 69.9% to 91.2%
- 3.Lightweight adapters for adaptive intervention
Impact Analysis
Strengthens OLLM safety without degrading multimodal performance.
Technical Details
SVD for pure refusal direction. Adapters modulate intensity dynamically.