Explain Yourself, Briefly! Self-Explaining Neural Networks with Concise Sufficient Reasons


Journal article


Shahaf Bassan, Shlomit Gur, Ron Eliav
ICLR, 2025

View PDF Paper Code Poster
Cite

Cite

APA   Click to copy
Bassan, S., Gur, S., & Eliav, R. (2025). Explain Yourself, Briefly! Self-Explaining Neural Networks with Concise Sufficient Reasons. ICLR.


Chicago/Turabian   Click to copy
Bassan, Shahaf, Shlomit Gur, and Ron Eliav. “Explain Yourself, Briefly! Self-Explaining Neural Networks with Concise Sufficient Reasons.” ICLR (2025).


MLA   Click to copy
Bassan, Shahaf, et al. “Explain Yourself, Briefly! Self-Explaining Neural Networks with Concise Sufficient Reasons.” ICLR, 2025.


BibTeX   Click to copy

@article{shahaf2025a,
  title = {Explain Yourself, Briefly! Self-Explaining Neural Networks with Concise Sufficient Reasons},
  year = {2025},
  journal = {ICLR},
  author = {Bassan, Shahaf and Gur, Shlomit and Eliav, Ron}
}

Abstract

Minimal sufficient reasons represent a prevalent form of explanation - the smallest subset of input features which, when held constant at their corresponding values, ensure that the prediction remains unchanged. Previous post-hoc methods attempt to obtain such explanations but face two main limitations: (1) Obtaining these subsets poses a computational challenge, leading most scalable methods to converge towards suboptimal, less meaningful subsets; (2) These methods heavily rely on sampling out-of-distribution input assignments, potentially resulting in counterintuitive behaviors. To tackle these limitations, we propose in this work a self-supervised training approach, which we term sufficient subset training (SST). Using SST, we train models to generate concise sufficient reasons for their predictions as an integral part of their output. Our results indicate that our framework produces succinct and faithful subsets substantially more efficiently than competing post-hoc methods, while maintaining comparable predictive performance.



Tools
Translate to