FIRE mitigates backdoors in deployed neural networks by reversing trigger-induced latent space directions. It manipulates features along backdoor paths to neutralize triggers during inference. Outperforms baselines with low overhead on image tasks.
Key Points
- 1.Inference-time repair via latent directions
- 2.No training data or model changes needed
- 3.Superior on various attacks and architectures
Impact Analysis
Enables secure use of vulnerable deployed models without retraining. Low compute cost suits real-time applications.
Technical Details
Exploits structured changes in interlayer latent spaces. Turns backdoor against itself via feature transport.