×

The holdout randomization test for feature selection in black box models. (English) Zbl 07546466

Summary: We propose the holdout randomization test (HRT), an approach to feature selection using black box predictive models. The HRT is a specialized version of the conditional randomization test (CRT) that uses data splitting for feasible computation. The HRT works with any predictive model and produces a valid \(p\)-value for each feature. To make the HRT more practical, we propose a set of extensions to maximize power and speed up computation. In simulations, these extensions lead to greater power than a competing knockoffs-based approach, without sacrificing control of the error rate. We apply the HRT to two case studies from the scientific literature where heuristics were originally used to select important features for predictive models. The results illustrate how such heuristics can be misleading relative to principled methods like the HRT. Code is available at https://github.com/tansey/hrt. Supplementary materials for this article are available online.

MSC:

62-XX Statistics

References:

[1] Barretina, J.; Caponigro, G.; Stransky, N.; Venkatesan, K.; Margolin, A. A.; Kim, S.; Wilson, C. J.; Lehár, J.; Kryukov, G. V.; Sonkin, D., “The Cancer Cell Line Encyclopedia Enables Predictive Modelling of Anticancer Drug Sensitivity, Nature, 483, 603 (2012) · doi:10.1038/nature11003
[2] Basu, S.; Kumbier, K.; Brown, J. B.; Yu, B., “Iterative Random Forests to Discover Predictive and Stable High-order Interactions, Proceedings of the National Academy of Sciences, 115, 1943-1948 (2018) · Zbl 1416.62594 · doi:10.1073/pnas.1711236115
[3] Benjamini, Y.; Hochberg, Y., “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing, Journal of the Royal Statistical Society, Series B, 57, 289-300 (1995) · Zbl 0809.62014
[4] Benjamini, Y.; Yekutieli, D., “The Control of the False Discovery Rate in Multiple Testing Under Dependency, The Annals of Statistics, 29, 1165-1188 (2001) · Zbl 1041.62061 · doi:10.1214/aos/1013699998
[5] Berrett, T. B., Wang, Y., Barber, R. F., and Samworth, R. J. (2018). “The Conditional Permutation Test,” arXiv preprint:1807.05405, · Zbl 1440.62223
[6] Bishop, C. M., Technical report, Mixture Density Networks (1994), Birmingham, UK: Aston University, Birmingham, UK
[7] Breiman, L., “Statistical Modeling: The Two Cultures, Statistical Science, 16, 3, 199-231 (2001) · Zbl 1059.62505 · doi:10.1214/ss/1009213726
[8] Candes, E.; Fan, Y.; Janson, L.; Lv, J., “Panning for Gold: ‘Model-X’ Knockoffs for High Dimensional Controlled Variable Selection, Journal of the Royal Statistical Society, Series B, 80, 551-577 (2018) · Zbl 1398.62335
[9] Chen, J.; Song, L.; Wainwright, M. J.; Jordan, M. I., “Learning to Explain: An Information-Theoretic Perspective on Model Interpretation,” International Conference on Machine Learning (ICML, 80, 883-892 (2018)
[10] Crawford, L.; Wood, K. C.; Zhou, X.; Mukherjee, S., “Bayesian Approximate Kernel Regression With Variable Selection, Journal of the American Statistical Association, 113, 1-12 (2018) · Zbl 1409.62132 · doi:10.1080/01621459.2017.1361830
[11] Fisher, A., Rudin, C., and Dominici, F. (2018), “Model Class Reliance: Variable Importance Measures for Any Machine Learning Model Class, From the “Rashomon” Perspective,” arXiv preprint:1801.01489,
[12] Gandy, A.; Hahn, G., “QuickMMCTest: Quick Multiple Monte Carlo Testing, Statistics and Computing, 27, 823-832 (2017) · Zbl 1505.62151
[13] Gimenez, J. R., Ghorbani, A., and Zou, J. (2018), “Knockoffs for the Mass: New Feature Importance Statistics With False Discovery Guarantees,” arXiv preprint: 1807.06214,
[14] Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. (2014)
[15] Jordon, J.; Yoon, J.; van der Schaar, M., “KnockoffGAN: Generating Knockoffs for Feature Selection Using Generative Adversarial Networks,” in International Conference on Learning Representations (2019)
[16] Keller, A.; Gerkin, R. C.; Guan, Y.; Dhurandhar, A.; Turu, G.; Szalai, B.; Mainland, J. D.; Ihara, Y.; Yu, C. W.; Wolfinger, R.; Vens, C.; Schietgat, L.; De Grave, K.; Norel, R.; Stolovitzky, G.; Cecchi, G. A.; Vosshall, L. B.; Meyer, P., “Predicting Human Olfactory Perception from Chemical Features of Odor Molecules, Science, 355, 820-826 (2017) · doi:10.1126/science.aal2014
[17] Lei, J.; G’Sell, M.; Rinaldo, A.; Tibshirani, R. J.; Wasserman, L., “Distribution-Free Predictive Inference for Regression, Journal of the American Statistical Association, 113, 1094-1111 (2018) · Zbl 1402.62155
[18] Liang, F.; Li, Q.; Zhou, L., Bayesian Neural Networks for Selection of Drug Sensitive Genes, Journal of the American Statistical Association, 1-18 (2018)
[19] Lu, Y. Y., Lv, J., Fan, Y., and Noble, W. S. (2018), “DeepPINK: Reproducible Feature Selection in Deep Neural Networks,” arXiv preprint:1809.01185.
[20] Lundberg, S. M.; Lee, S.-I. (2017)
[21] Meinshausen, N.; Meier, L.; Bühlmann, P., “P-values for High-Dimensional Regression, Journal of the American Statistical Association, 104, 1671-1681 (2009) · Zbl 1205.62089 · doi:10.1198/jasa.2009.tm08647
[22] Ribeiro, M. T.; Singh, S.; Guestrin, C., 1135-1144 (2016)
[23] Romano, Y.; Sesia, M.; Candès, E. J., “Deep Knockoffs, Journal of the American Statistical Association, 115, 1861-1872 (2019) · Zbl 1452.62710
[24] Sen, R., Shanmugam, K., Asnani, H., Rahimzamani, A., and Kannan, S. (2018), “Mimic and Classify: A Meta-Algorithm for Conditional Independence Testing,” arXiv preprint arXiv:1806.09708.
[25] Sesia, M., Sabatti, C., and Candès, E. J. (2017), “Gene Hunting With Knockoffs for Hidden Markov Models,” arXiv preprint:1706.04677. · Zbl 1506.62463
[26] Shah, R. D., and Peters, J. (2018), “The Hardness of Conditional Independence Testing and the Generalised Covariance Measure,” arXiv preprint arXiv:1804.07203. · Zbl 1451.62081
[27] Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and De Freitas, N. (2016), “Taking the Human Out of the Loop: A Review of Bayesian Optimization,” Proceedings of the IEEE, 104, 148-175.
[28] Shrikumar, A.; Greenside, P.; Kundaje, A., International Conference on Machine Learning ICML’17, Learning Important Features Through Propagating Activation Differences, 3145-3153 (2017), Sydney, NSW, Australia: JMLR.org, Sydney, NSW, Australia
[29] Stolovitzky, G.; Monroe, D.; Califano, A., “Dialogue on Reverse-Engineering Assessment and Methods, Annals of the New York Academy of Sciences, 1115, 1-22 (2007) · doi:10.1196/annals.1407.021
[30] Tansey, W.; Li, K.; Zhang, H.; Linderman, S. W.; Rabadan, R.; Blei, D. M.; Wiggins, C. H., “Dose-Response Modeling in High-Throughput Cancer Drug Screenings: An End-To-End Approach,”, Biostatistics (2021) · doi:10.1093/biostatistics/kxaa047
[31] Tansey, W.; Wang, Y.; Blei, D. M.; Rabadan, R., International Conference on Machine Learning, Black box FDR, 4874-4883 (2018), Stockholm: Sweden, Stockholm
[32] Tansey, W.; Wang, Y.; Rabadan, R.; Blei, D. M., “Double Empirical Bayes Testing, International Statistical Review, 88, S91-S113 (2020) · Zbl 07778691
[33] Vovk, V., “Conditional Validity of Inductive Conformal Predictors,”, in Asian Conference on Machine Learning, Singapore, 475-490 (2012)
[34] Wasserman, L.; Roeder, K., “High Dimensional Variable Selection,”, Annals of Statistics, 37, 2178-2201 (2009) · Zbl 1173.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.