×

Considerations when learning additive explanations for black-box models. (English) Zbl 1518.68323

Summary: Many methods to explain black-box models, whether local or global, are additive. In this paper, we study global additive explanations for non-additive models, focusing on four explanation methods: partial dependence, Shapley explanations adapted to a global setting, distilled additive explanations, and gradient-based explanations. We show that different explanation methods characterize non-additive components in a black-box model’s prediction function in different ways. We use the concepts of main and total effects to anchor additive explanations, and quantitatively evaluate additive and non-additive explanations. Even though distilled explanations are generally the most accurate additive explanations, non-additive explanations such as tree explanations that explicitly model non-additive components tend to be even more accurate. Despite this, our user study showed that machine learning practitioners were better able to leverage additive explanations for various tasks. These considerations should be taken into account when considering which explanation to trust and use to explain black-box models.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

References:

[1] Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., & Kim, B. (2019). Sanity checks for saliency maps. In NeurIPS.
[2] Amodio, S.; Aria, M.; D’Ambrosio, A., 2014 (2014), Statistica: On concurvity in nonlinear and nonparametric regression models, Statistica · Zbl 1307.62106
[3] Ancona, M., Ceolini, E., Oztireli, C., & Gross, M. (2018). Towards better understanding of gradient-based attribution methods for Deep Neural Networks. In ICLR.
[4] Angelino, E., Larus-Stone, N., Alabi, D., Seltzer, M., & Rudin, C. (2017). Learning certifiably optimal rule lists. In KDD. · Zbl 1473.68134
[5] Apley, D.W., & Zhu, J. (2020). Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society: Series B (Statistical Methodology),82, 4. · Zbl 07554784
[6] Atzmueller, M., & Lemmerich, F. (2012). VIKAMINE - Open-source subgroup discovery, pattern mining. In ECML PKDD: Analytics.
[7] Ba, J., & Caruana, R. (2014). Do deep nets really need to be deep?. In NeurIPS.
[8] Bach, S.; Binder, A.; Montavon, G.; Klauschen, F.; Muller, KR; Samek, W., On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation, PloS ONE, 10, 7 (2015) · doi:10.1371/journal.pone.0130140
[9] Bastani, O., Kim, C., & Bastani, H. (2017). Interpreting blackbox models via model extraction. In FAT/ML Workshop.
[10] Bau, D., Zhou, B., Khosla, A., Oliva, A., & Torralba, A. (2017). Quantifying interpretability of deep visual representations: Network dissection. In CVPR.
[11] Bhatt, U., Weller, A., & Moura, J. M. F. (2020). Evaluating and aggregating feature-based model explanations. In IJCAI.
[12] Bien, J.; Tibshirani, R., Prototype selection for interpretable classification, The Annals of Applied Statistics, 5, 4 (2011) · Zbl 1234.62096 · doi:10.1214/11-AOAS495
[13] Breiman, L., Random forests, Machine Learning, 45, 1 (2001) · Zbl 1007.68152
[14] Bucilua, C., Caruana, R., & Niculescu-Mizil, A. (2006). Model compression. In KDD.
[15] Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., & Elhadad, N. (2015). Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In KDD.
[16] Chang, C.H., Tan, S., Lengerich, B., Goldenberg, A., & Caruana, R. (2021). How interpretable and trustworthy are GAMs. In KDD.
[17] Covert, I., Lundberg, S., & Lee, S.I. (2020). Understanding global feature contributions through additive importance measures. In NeurIPS.
[18] Craven, M. W., & Shavlik, J. W. (1995). Extracting tree-structured representations of trained networks. InNeurIPS.
[19] Doshi-Velez, F., & Kim, B. (2018). Towards A rigorous science of interpretable machine learning. In Explainable and interpretable models in computer vision and machine learning. Springer.
[20] FICO. (2018). FICO explainable machine learning challenge. https://community.fico.com/s/explainable-machine-learning-challenge.
[21] Fisher A.J., Rudin, C., & Dominici, F. (2019). All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. JMLR. · Zbl 1436.62019
[22] Friedman, JH, Greedy function approximation: A gradient boosting machine, The Annals of Statistics, 29, 5 (2001) · Zbl 1043.62034 · doi:10.1214/aos/1013203451
[23] Friedman, JH; Popescu, BE, Predictive learning via rule ensembles, The Annals of Applied Statistics, 2, 3 (2008) · Zbl 1149.62051 · doi:10.1214/07-AOAS148
[24] Frosst, N., & Hinton, G. (2018). Distilling a neural network into a soft decision tree. In CEUR-WS.
[25] Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In AISTATS.
[26] Hastie, T.; Tibshirani, R., Generalized additive models, Journal of Statistical Science, 1, 3, 297-310 (1986) · Zbl 0645.62068
[27] Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer. · Zbl 1273.62005
[28] Hinton, G., Vinyals, O., & Dean, J. (2014). Distilling the knowledge in a neural network. In NeurIPS Deep learning and representation learning workshop.
[29] Hooker, G. (2004). Discovering additive structure in black box functions. In KDD.
[30] Ibrahim, M., Louie, M., Modarres, C., & Paisley, J. (2019). Mapping the landscape of predictions: Global explanations of neural networks. In AIES.
[31] Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML.
[32] Jesus, S., Belém, C., Balayan, V., Bento, J., Saleiro, P., Bizarro, P., & Gama, J. (2021). How can I choose an explainer? In FAccT: An application-grounded evaluation of post-hoc explanations.
[33] Rawal, K., & Lakkaraju, H. (2020). Beyond individualized recourse: Interpretable and interactive summaries of actionable recourses. In NeurIPS.
[34] Kaur, H., Nori, H., Jenkins, S., Caruana, R., Wallach, H., & Wortman Vaughan, J. (2020). Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning: Interpreting Interpretability. In CHI.
[35] Kim, B., Khanna, R., & Koyejo, O. (2016). Examples are not enough, learn to criticize! criticism for interpretability. In NeurIPS.
[36] Kim, B., Wattenberg, M., Gilmer, J., Cai, C.J., Wexler, J., Viegas, F., & Sayres, R.A. (2018). Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In ICML.
[37] Kingma, D. P., Adam J. L. (2015). A method for stochastic optimization: Adam. In ICLR.
[38] Lage, I., Chen, E., He, J., Narayanan, M., Kim, B., Gershman, S. J., & Doshi-Velez, F. (2019). Human evaluation of models built for interpretability. In HCOMP.
[39] Lakkaraju, H., Kamar, E., Caruana, R., & Leskovec, J. (2019). Faithful and customizable explanations of black box models. In AIES.
[40] Lending Club. (2011). Lending Club Loan Dataset 2007-2011. https://www.lendingclub.com/info/download-data.action.
[41] Lengerich, B., Tan, S., Chang, C. H., Hooker, G., & Caruana, R. (2020). An efficient algorithm for recovering identifiable additive models: Purifying interaction effects with the functional anova. In AISTATS.
[42] Letham, B.; Rudin, C.; McCormick, TH; Madigan, D., Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model, The Annals of Applied Statistics, 9, 3 (2015) · Zbl 1454.62348 · doi:10.1214/15-AOAS848
[43] LiMin, Fu, Rule generation from neural networks, IEEE Transactions on Systems, Man, and Cybernetics, 24, 8 (1994) · doi:10.1109/21.299696
[44] Lou, Y., Caruana, R., & Gehrke, J. (2012). Intelligible models for classification and regression. In KDD.
[45] Lou, Y., Caruana, R., Gehrke, J., & Hooker, G. (2013). Accurate intelligible models with pairwise interactions. In KDD.
[46] Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In NeurIPS.
[47] Montavon, G.; Samek, W.; Muller, KR, Methods for interpreting and understanding deep neural networks, Digital Signal Processing, 73, 1-5 (2018) · doi:10.1016/j.dsp.2017.10.011
[48] Mu, J., & Andrea, J. (2020). Compositional explanations of neurons. In NeurIPS.
[49] Nori, H., Jenkins, S., Koch, P., & Caruana, R. (2019). InterpretML: A unified framework for machine learning interpretability. arXiv preprint arXiv:1909.09223.
[50] Orlenko, A.; Moore, JH, A comparison of methods for interpreting random forest models of genetic association in the presence of non-additive interactions, BioData Mining, 14, 1 (2021) · doi:10.1186/s13040-021-00243-0
[51] Owen A. B., (2014). Sobol’ indices, & Shapley value. SIAM/ASA Journal on Uncertainty Quantification. · Zbl 1308.91129
[52] Poursabzi-Sangdeh, F., Goldstein, D. G., Hofman, J. M., Wortman Vaughan, J. W., & Wallach, H. (2021). Manipulating and measuring model interpretability. In CHI.
[53] Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why should I trust you?: Explaining the predictions of any classifier. In KDD.
[54] Rudin, C., Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead, Nature Machine Intelligence, 1, 5, 206-215 (2019) · doi:10.1038/s42256-019-0048-x
[55] Sanchez, I., Rocktaschel, T., Riedel, S., & Singh, S. (2015). Towards extracting faithful and descriptive representations of latent variable models. In AAAI spring syposium on knowledge representation and reasoning: Integrating symbolic and neural approaches.
[56] Setzu, M., Guidotti, R., Monreale, A., Turini, F., Pedreschi, D., & Giannotti, F. (2021). GLocalX-from local to global explanations of black box AI models. Artificial Intelligence294.
[57] Shrikumar, A., Greenside, P., & Kundaje, A. (2017). Learning important features through propagating activation differences. InICML.
[58] Simonyan, K., & Vedaldi, A., Zisserman, A. (2014). Visualising image classification models and saliency maps. In ICLR Workshop: Deep inside convolutional networks.
[59] Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR.
[60] Sobol’, I. M. (1990). On sensitivity estimation for nonlinear mathematical models. Matematicheskoe modelirovanie,2, 1. · Zbl 0974.00506
[61] Slack, D., Hilgard, S., Jia, E., & Singh, S., Lakkaraju, H. (2020). Adversarial attacks on post hoc explanation methods: Fooling LIME and SHAP. In AIES.
[62] Štrumbelj, E.; Kononenko, I., Explaining prediction models and individual predictions with feature contributions, Knowledge and Information Systems, 41, 3 (2014) · doi:10.1007/s10115-013-0679-x
[63] Tan, S. (2018). Interpretable approaches to detect bias in black-box models. In AIES doctoral consortium.
[64] Tan, S., Caruana, R., & Hooker, G., Lou, Y. (2018). Auditing black-box models using transparent model distillation: Distill-and-compare. In AIES.
[65] Tan, S., Soloviev, M., Hooker, G., & Wells, M. T. (2020). Tree space prototypes: Another look at making tree ensembles interpretable. In FODS.
[66] Tsang, M., Cheng, D., & Liu, Y. (2018). Detecting statistical interactions from neural network weights. In ICLR.
[67] van der Linden, I., Haned, H., & Kanoulas, E. (2019). Global aggregations of local explanations for black box models. In SIGIR Fairness, accountability, confidentiality, transparency, and safety workshop.
[68] Williamson, B., & Feng, J. (2020). Efficient nonparametric statistical inference on population feature importance using Shapley values. In ICML.
[69] Wood, S. N. (2006). Generalized additive models: An introduction with R. Chapman and Hall/CRC. · Zbl 1087.62082
[70] Wood, SN, Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models, Journal of the Royal Statistical Society: Series B, 73, 1 (2011) · Zbl 1411.62089
[71] Yan, T., & Procaccia, A. D. (2021). If you like shapley then you’ll love the core. In AAAI.
[72] Zhao, Q.; Hastie, T., Causal interpretations of black-box models, Journal of Business & Economic Statistics, 39, 1 (2021) · Zbl 07925206 · doi:10.1080/07350015.2019.1624293
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.