X-MAP is an explainable framework combining SHAP attributions and NMF to create topic profiles for correctly classified spam/phishing vs. legitimate messages. It detects misclassifications via Jensen-Shannon divergence from these profiles. Experiments achieve 0.98 AUROC and recover 97% of false rejections when used as a repair layer.
Key Points
- 1.Combines SHAP feature attributions with NMF for interpretable topic profiles
- 2.Measures message deviation using Jensen-Shannon divergence
- 3.Misclassified messages show 2x larger divergence than correct ones
- 4.Achieves 0.98 AUROC as detector; recovers 97% false rejections
- 5.Lowers false-rejection rate to 0.089 at 95% true rejection rate
Impact Analysis
Enhances spam/phishing detectors by providing interpretable insights into failures, reducing false negatives that expose users and false positives that erode trust. Serves as a plug-in repair layer for existing models with high recovery rates.
Technical Details
Uses SHAP for local feature importance and NMF to decompose into non-negative topic factors for spam/legit profiles. Computes JS divergence between message topic distribution and class prototypes. Tested on SMS spam and phishing URL datasets.