Feature Activation Coverage (FAC) measures diversity in LLM feature space using sparse autoencoders. FAC Synthesis generates samples targeting missing features from seed data. Boosts diversity and performance on instruction, toxicity, reward, and steering tasks.
Key Points
- 1.Quantifies task-relevant diversity beyond text metrics
- 2.Enables cross-model transfer across LLaMA, Mistral, Qwen
Impact Analysis
Optimizes post-training data for better LLM downstream performance. Provides interpretable feature space for data-centric AI improvements.
Technical Details
Sparse autoencoder identifies gaps; synthesizes explicit feature-reflective samples. Shares interpretable space across model families.