Interpretable Machine Learning
Complete Paper List in Reverse Chronological Order
Selected Recent Papers on Interpretable Machine Learning
- A. R. Hsu, Y. Cherapanamjeri, A. Y. Odisho, P. R. Carroll, B. Yu (2024). Mechanistic Interpretation through Contexual Decomposition in Transformers. https://arxiv.org/pdf/2407.00886.
- Y. Chen, C. Singh, X. Liu, S. Zuo, B. Yu, H. He, J. Gao (2024). Towards consistent natural-language explanations via explanation-consistent finetuning. https://arxiv.org/abs/2401.13986
- Q. Zhang, C. Singh, L. Liu, X. Liu, B. Yu, J. Gao, T. Zhao (2023). Tell your model where to attend: post-hoc attention steering for LLMs. ICLR 2024. https://arxiv.org/abs/2311.02262
- A. Agarwal, A. M. Kenny, Y. S. Tan, T. M. Tang, B. Yu (2023). MDI+: a flexible random forest-based feature importance framework. https://arxiv.org/abs/2307.01932 (PCS related)
- A. R. Hsu, Y. Cherapanamjeri, B. Park, T. Naumann, A. Odisho, and B. Yu (2023). Diagnosing transformers: illuminating feature space for clinical decison-making. ICLR (2024) https://arxiv.org/abs/2305.17588
- C. Singh, A. R. Hsu, R. Antonello, S. Jain, A. G. Huth, B. Yu and J. Gao (2023). Explaining black box text modules in natural language with language models.
- C. Singh, W. Ha and B. Yu (2021). Interpreting and Improving Deep-Learning Models with Reality Checks. https://arxiv.org/abs/2108.06847 to appear in the book entitled “xxAI - Beyond Explainable AI” (eds. Andreas Holzinger, Randy Goebel, Ruth Fong, Taesup Moon, Klaus-Robert Müller, and Wojciech Samek).
- W. Ha, C. Singh, F. Lanusse, S. Upadhyayula, and B. Yu (2021). Adaptive Wavelet Distillation from Neural Networks through Interpretation. Proc. NeurIPS 2021. (code)
- L. Reiger, J. W. Murdoch, S. Singh, B. Yu (2020). Interpretations are Useful: Penalizing Explanations to Align Neural Networks with Prior Knowledge. ICML Proceedings. (code)
- C. Singh, W. Ha, F. Lanusse, V. Boehm , J. Liu, B. Yu (2020). Transformation Importance with Applications to Cosmology ICLR Workshop paper. (code)
- W. J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl, and B. Yu* (2019) Definitions, methods, and applications in interpretable machine learning. PNAS, 116 (44) 22071-22080.
- W. J. Murdoch, C. Sign, and B. Yu (2019). Hierarchical interpretations for neural network predictions. ICLR. (code)
- J. Murdoch, P. Liu, and B. Yu (2018) Beyond word importance: contextual decomposition to extract interactions from LSTMs. Proc. ICLR 2018. https://arxiv.org/abs/1705.07356 (code)