×

Marginal false discovery rate control for likelihood-based penalized regression models. (English) Zbl 1429.62573

Summary: The popularity of penalized regression in high-dimensional data analysis has led to a demand for new inferential tools for these models. False discovery rate control is widely used in high-dimensional hypothesis testing, but has only recently been considered in the context of penalized regression. Almost all of this work, however, has focused on lasso-penalized linear regression. In this paper, we derive a general method for controlling the marginal false discovery rate that can be applied to any penalized likelihood-based model, such as logistic regression and Cox regression. Our approach is fast, flexible and can be used with a variety of penalty functions including lasso, elastic net, MCP, and MNet. We derive theoretical results under which the proposed method is valid, and use simulation studies to demonstrate that the approach is reasonably robust, albeit slightly conservative, when these assumptions are violated. Despite being conservative, we show that our method often offers more power to select causally important features than existing approaches. Finally, the practical utility of the method is demonstrated on gene expression datasets with binary and time-to-event outcomes.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis

References:

[1] Andersen, P., & Gill, R. (1982). Cox’s regression model for counting processes: A large sample study. Annals of Statistics, 10, 1100-1120. · Zbl 0526.62026
[2] Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, 57, 289-300. · Zbl 0809.62014
[3] Breheny, P. (2018). Marginal false discovery rates for penalized regression models. Biostatistics, X, arXiv:1607.05636.
[4] Breheny, P., & Huang, J. (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Annals of Applied Statistics, 5, 232-253. · Zbl 1220.62095
[5] Cox, D. (1972). Regression models and life tables. Journal of the Royal Statistical Society, 34, 187-220. · Zbl 0243.62041
[6] Dezeure, R., Bühlmann, P., Meier, L., & Meinshausen, N. (2015). High‐dimensional inference: Confidence intervals, P‐values and R‐software hdi. Statistical Science, 30, 533-558. · Zbl 1426.62183
[7] Dezeure, R., Bühlmann, P., & Zhang, C.‐H. (2017). High‐dimensional simultaneous inference with the bootstrap. TEST, 26, 685-719. · Zbl 06833591
[8] Fan, J. (2002). Variable selection for Cox’s proportional hazards model and frailty model. The Annals of Statistics, 30, 74-99. · Zbl 1012.62106
[9] Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348-1360. · Zbl 1073.62547
[10] G’Sell, M. G., Wager, S., Chouldechova, A., & Tibshirani, R. (2016). Sequential selection procedures and false discovery rate control. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78, 423-444. · Zbl 1414.62341
[11] Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed.). New York: Springer. · Zbl 1273.62005
[12] Huang, J., Breheny, P., Lee, S., Ma, S., & Zhang, C.‐H. (2016). The mnet method for variable selection. Statistica Sinica, 26, 903-923. · Zbl 1356.62091
[13] Javanmard, A., & Montanari, A. (2014). Confidence intervals and hypothesis testing for high‐dimensional regression. Journal of Machine Learning Research, 15, 2869-2909. · Zbl 1319.62145
[14] Lockhart, R., Taylor, J., Tibshirani, R., & Tibshirani, R. (2014). A significance test for the lasso. The Annals of Statistics, 42, 413-468. · Zbl 1305.62254
[15] McCullagh, P., & Nelder, J. (1989). Generalized linear models (2nd ed.). London: Chapman & Hall/CRC. · Zbl 0744.62098
[16] Meinshausen, N., Meier, L., & Bühlmann, P. (2009). P‐values for high‐dimensional regression. Journal of the American Statistical Association, 104, 1671-1681. · Zbl 1205.62089
[17] Peduzzi, P., Concato, J., Feinstein, A., & Holford, T. (1995). Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. Journal of Clinical Epidemiology, 48, 1503-1510.
[18] Shedden, K., Taylor, J. M. G., Enkemann, S. A., Tsao, M. S., Yeatman, T. J., Gerald, W. L., … Sharma, A. (2008). Gene expression‐based survival prediction in lung adenocarcinoma: A multi‐site, blinded validation study. Nature Medicine, 14, 822-827.
[19] Simon, N. (2011). Regularization paths for Cox’s proportional hazards model via coordinate descent. Journal of Statistical Software, 39, 1-22.
[20] Spira, A., Beane, J. E., Shah, V., Steiling, K., Liu, G., Schembri, F., … Brody, J. S. (2007). Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer. Nature Medicine, 13, 361-366.
[21] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, 58, 267-288. · Zbl 0850.62538
[22] Tibshirani, R. (2013). The lasso problem and uniqueness. Electronic Journal of Statistics, 7, 1456-1490. · Zbl 1337.62173
[23] Tibshirani, R., Taylor, J., Lockhart, R., & Tibshirani, R. (2016). Exact post‐selection inference for sequential regression procedures. Journal of the American Statistical Association, 111, 600-620.
[24] Wasserman, L., & Roeder, K. (2009). High dimensional variable selection. The Annals of Statistics, 37, 2178-2201. · Zbl 1173.62054
[25] Zhang, C.‐H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38, 894-942. · Zbl 1183.62120
[26] Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B, 67, 301-320. · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.