Identifying Interactions at Scale for LLMs

๐กScalable SPEX unlocks key LLM interactions with minimal ablationsโvital for interpretability.
โก 30-Second TL;DR
What Changed
Exponential interactions challenge exhaustive analysis in LLMs.
Why It Matters
Advances safer LLMs by revealing hidden interaction patterns, crucial for trust and debugging at production scale. Reduces interpretability compute barriers for researchers and builders.
What To Do Next
Read the Berkeley AI Research blog and prototype SPEX ablation on your LLM model.
๐ง Deep Insight
Web-grounded analysis with 8 cited sources.
๐ Enhanced Key Takeaways
- โขProxySPEX fits gradient boosted trees (GBTs) as a proxy model to predict masked LLM outputs, then extracts interactions from the GBTs to exploit hierarchical structure in feature interactions.[1][2][3]
- โขProxySPEX applied to data attribution identifies interactions among CIFAR-10 training samples influencing test predictions, and to mechanistic interpretability uncovers attention head interactions within and across layers on question-answering tasks.[1][2][3]
- โขProxySPEX provides a scalable approximation of Shapley values by accounting for interactions and outperforms LASSO in faithfulness with limited inferences, while achieving higher test accuracy in attention head pruning tasks.[1][4]
๐ ๏ธ Technical Deep Dive
- โขFits gradient boosted trees (GBTs) to masked LLM outputs as a proxy model, leveraging the observation that LLM feature interactions are hierarchical (higher-order interactions accompanied by lower-order subsets).[1][2][3]
- โขExtracts interactions from fitted GBTs, converting to Fourier representation for attribution definitions including Shapley-based ones.[1][5]
- โขEvaluated on four high-dimensional datasets with hundreds of features; outperforms marginal attributions by 15-25% in faithfulness (Rยฒ) and uses 10ร fewer inferences than SPEX to match performance.[1][4]
- โขIn attention head pruning, ProxySPEX identifies heads for removal across layer ranges (initial 1-3, middle 14-16, final 30-32), yielding higher test accuracies than LASSO baselines at various sparsity levels.[4]
๐ฎ Future ImplicationsAI analysis grounded in cited sources
โณ Timeline
๐ Sources (8)
Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.
- arXiv โ 2505
- neurips.cc โ 118659
- openreview.net โ Forum
- arXiv โ 2505
- youtube.com โ Watch
- scribd.com โ Proxyspex Inference Efficient Interpretability via Sparse Feature Interactions in Llms Butler Agarwal Yu 2025 Nips
- semanticscholar.org โ 70da32f12f7a5f9728d447801e4ed958b2b5b398
- youtube.com โ Watch
Weekly AI Recap
Read this week's curated digest of top AI events โ
๐Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: Berkeley AI Research โ