Identifying Interactions at Scale for LLMs

Post LinkedIn

🐻Read original on Berkeley AI Research

#interpretability #ablation #spectral-methodsspex

💡Scalable SPEX unlocks key LLM interactions with minimal ablations—vital for interpretability.

⚡ 30-Second TL;DR

What Changed

Exponential interactions challenge exhaustive analysis in LLMs.

Why It Matters

Advances safer LLMs by revealing hidden interaction patterns, crucial for trust and debugging at production scale. Reduces interpretability compute barriers for researchers and builders.

What To Do Next

Read the Berkeley AI Research blog and prototype SPEX ablation on your LLM model.

Who should care:Researchers & Academics

🧠 Deep Insight

Web-grounded analysis with 8 cited sources.

🔑 Enhanced Key Takeaways

•ProxySPEX fits gradient boosted trees (GBTs) as a proxy model to predict masked LLM outputs, then extracts interactions from the GBTs to exploit hierarchical structure in feature interactions.[1][2][3]
•ProxySPEX applied to data attribution identifies interactions among CIFAR-10 training samples influencing test predictions, and to mechanistic interpretability uncovers attention head interactions within and across layers on question-answering tasks.[1][2][3]
•ProxySPEX provides a scalable approximation of Shapley values by accounting for interactions and outperforms LASSO in faithfulness with limited inferences, while achieving higher test accuracy in attention head pruning tasks.[1][4]

🛠️ Technical Deep Dive

•Fits gradient boosted trees (GBTs) to masked LLM outputs as a proxy model, leveraging the observation that LLM feature interactions are hierarchical (higher-order interactions accompanied by lower-order subsets).[1][2][3]
•Extracts interactions from fitted GBTs, converting to Fourier representation for attribution definitions including Shapley-based ones.[1][5]
•Evaluated on four high-dimensional datasets with hundreds of features; outperforms marginal attributions by 15-25% in faithfulness (R²) and uses 10× fewer inferences than SPEX to match performance.[1][4]
•In attention head pruning, ProxySPEX identifies heads for removal across layer ranges (initial 1-3, middle 14-16, final 30-32), yielding higher test accuracies than LASSO baselines at various sparsity levels.[4]

🔮 Future ImplicationsAI analysis grounded in cited sources

ProxySPEX reduces LLM inference costs by 10× for interaction attribution

It achieves equivalent approximation quality to SPEX with an order of magnitude fewer model inferences, making interpretability feasible for larger models where latency and monetary costs are high.[1][5]

Enables broader application of feature interaction analysis to tasks like RAG and attention pruning

Demonstrated success in data attribution on CIFAR-10 and mechanistic interpretability for attention heads improves model pruning accuracy, extending to multi-document reasoning and internal component analysis.[1][2][5]

⏳ Timeline

2025-01

SPEX proposed by Kang et al. as information-theoretic method scaling to 10^3 features using interaction sparsity.

2025-05

ProxySPEX paper released on arXiv introducing GBT-based hierarchical interaction discovery.

2025-12

ProxySPEX accepted to NeurIPS 2025 with poster presentation in San Diego.

📎 Sources (8)

Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.

🐻Read original article on Berkeley AI Research

📰

Weekly AI Recap

Read this week's curated digest of top AI events →

👉Related Updates

Same topic

Explore #interpretability

Same product