๐ŸปStalecollected in 37h

Identifying Interactions at Scale for LLMs

Identifying Interactions at Scale for LLMs
PostLinkedIn
๐ŸปRead original on Berkeley AI Research

๐Ÿ’กScalable SPEX unlocks key LLM interactions with minimal ablationsโ€”vital for interpretability.

โšก 30-Second TL;DR

What Changed

Exponential interactions challenge exhaustive analysis in LLMs.

Why It Matters

Advances safer LLMs by revealing hidden interaction patterns, crucial for trust and debugging at production scale. Reduces interpretability compute barriers for researchers and builders.

What To Do Next

Read the Berkeley AI Research blog and prototype SPEX ablation on your LLM model.

Who should care:Researchers & Academics

๐Ÿง  Deep Insight

Web-grounded analysis with 8 cited sources.

๐Ÿ”‘ Enhanced Key Takeaways

  • โ€ขProxySPEX fits gradient boosted trees (GBTs) as a proxy model to predict masked LLM outputs, then extracts interactions from the GBTs to exploit hierarchical structure in feature interactions.[1][2][3]
  • โ€ขProxySPEX applied to data attribution identifies interactions among CIFAR-10 training samples influencing test predictions, and to mechanistic interpretability uncovers attention head interactions within and across layers on question-answering tasks.[1][2][3]
  • โ€ขProxySPEX provides a scalable approximation of Shapley values by accounting for interactions and outperforms LASSO in faithfulness with limited inferences, while achieving higher test accuracy in attention head pruning tasks.[1][4]

๐Ÿ› ๏ธ Technical Deep Dive

  • โ€ขFits gradient boosted trees (GBTs) to masked LLM outputs as a proxy model, leveraging the observation that LLM feature interactions are hierarchical (higher-order interactions accompanied by lower-order subsets).[1][2][3]
  • โ€ขExtracts interactions from fitted GBTs, converting to Fourier representation for attribution definitions including Shapley-based ones.[1][5]
  • โ€ขEvaluated on four high-dimensional datasets with hundreds of features; outperforms marginal attributions by 15-25% in faithfulness (Rยฒ) and uses 10ร— fewer inferences than SPEX to match performance.[1][4]
  • โ€ขIn attention head pruning, ProxySPEX identifies heads for removal across layer ranges (initial 1-3, middle 14-16, final 30-32), yielding higher test accuracies than LASSO baselines at various sparsity levels.[4]

๐Ÿ”ฎ Future ImplicationsAI analysis grounded in cited sources

ProxySPEX reduces LLM inference costs by 10ร— for interaction attribution
It achieves equivalent approximation quality to SPEX with an order of magnitude fewer model inferences, making interpretability feasible for larger models where latency and monetary costs are high.[1][5]
Enables broader application of feature interaction analysis to tasks like RAG and attention pruning
Demonstrated success in data attribution on CIFAR-10 and mechanistic interpretability for attention heads improves model pruning accuracy, extending to multi-document reasoning and internal component analysis.[1][2][5]

โณ Timeline

2025-01
SPEX proposed by Kang et al. as information-theoretic method scaling to 10^3 features using interaction sparsity.
2025-05
ProxySPEX paper released on arXiv introducing GBT-based hierarchical interaction discovery.
2025-12
ProxySPEX accepted to NeurIPS 2025 with poster presentation in San Diego.
๐Ÿ“ฐ

Weekly AI Recap

Read this week's curated digest of top AI events โ†’

๐Ÿ‘‰Related Updates

AI-curated news aggregator. All content rights belong to original publishers.
Original source: Berkeley AI Research โ†—