Document Zbl 1466.62353

Jiang, Binyan; Wang, Xiangyu; Leng, Chenlei

A direct approach for sparse quadratic discriminant analysis. (English) Zbl 1466.62353

J. Mach. Learn. Res. 19, Paper No. 31, 37 p. (2018).

Summary: Quadratic discriminant analysis (QDA) is a standard tool for classification due to its simplicity and flexibility. Because the number of its parameters scales quadratically with the number of the variables, QDA is not practical, however, when the dimensionality is relatively large. To address this, we propose a novel procedure named DA-QDA for QDA in analyzing high-dimensional data. Formulated in a simple and coherent framework, DA-QDA aims to directly estimate the key quantities in the Bayes discriminant function including quadratic interactions and a linear index of the variables for classification. Under appropriate sparsity assumptions, we establish consistency results for estimating the interactions and the linear index, and further demonstrate that the misclassification rate of our procedure converges to the optimal Bayes risk, even when the dimensionality is exponentially high with respect to the sample size. An efficient algorithm based on the alternating direction method of multipliers (ADMM) is developed for finding interactions, which is much faster than its competitor in the literature. The promising performance of DA-QDA is illustrated via extensive simulation studies and the analysis of four real datasets.

Cited in 11 Documents

MSC:

62H30	Classification and discrimination; cluster analysis (statistical aspects)
62P10	Applications of statistics to biology and medical sciences; meta analysis

Keywords:

Bayes risk; consistency; high dimensional data; linear discriminant analysis; quadratic discriminant analysis; sparsity

Software:

penalizedLDA; ElemStatLearn; boost; msda; glmnet

Cite Review PDF

Full Text: arXiv Link

References:

[1]	T. W. Anderson. An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley Series in Probability and Statistics. Wiley, New York, 2003. · Zbl 1039.62044
[2]	P. Bickel and E. Levina. Some theory for fisher’s linear discriminant function, ‘naive bayes’, and some alternatives when there are many more variables than observations. Bernoulli, 10:989–1010, 2004. · Zbl 1064.62073
[3]	P. Bickel and E. Levina. Covariance regularization by thresholding. The Annals of Statistics, 36:199–227, 2008. · Zbl 1196.62062
[4]	S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3:1–122, 2011. · Zbl 1229.90122
[5]	T. Cai and W. Liu. A direct estimation approach to sparse linear discriminant analysis. Journal of the American Statistical Association, 106:1566–1577, 2011. · Zbl 1233.62129
[6]	L. Chen, D. Sun, and K. C. Toh. A note on the convergence of ADMM for linearly constrained convex optimization problems. Computational Optimization and Applications, 66:327–343, 2017. · Zbl 1367.90083
[7]	M. Dettling. BagBoosting for tumor classification with gene expression data. Bioinformatics, 20:3583–3593, 2004.
[8]	D. Donoho and X. Huo. Uncertainty principles and ideal atomic decomposition. IEEE Transactions on Information Theory, 47:2845–2862, 2001. · Zbl 1019.94503
[9]	B. Efron. Large-scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Cambridge University Press, 2010. · Zbl 1277.62016
[10]	J. Fan, Y. Feng, and X. Tong. A ROAD to classification in high dimensional space. Journal of the Royal Statistical Society Series B, 74:745–771, 2012. · Zbl 1411.62167
[11]	J. Fan, T. Ke, H. Liu, and L. Xia. QUADRO: A supervised dimension reduction method via Rayleigh quotient optimization. The Annals of Statistics, 43:1493–1534, 2015. · Zbl 1317.62054
[12]	Y. Fan, Y. Kong, D. Li, and Z. Zheng. Innovated interaction screening for high-dimensional nonlinear classification. The Annals of Statistics, 43:1243–1272, 2015. · Zbl 1328.62383
[13]	J. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33:1, 2010.
[14]	E. Ghadimi, A. Teixeira, I. Shames, and M. Johansson. Optimal parameter selection for the alternating direction method of multipliers (ADMM): quadratic problems. IEEE Transactions on Automatic Control, 60:644–658, 2015. · Zbl 1360.90182
[15]	N. Hao and H. H. Zhang. Interaction screening for ultra-high dimensional data. Journal of the American Statistical Association, 109:1285–1301, 2014. 35 · Zbl 1368.62193
[16]	N. Hao and H. H. Zhang. A note on high dimensional linear regression with interactions. The American Statistician, to appear, 2016.
[17]	T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, 2009. · Zbl 1273.62005
[18]	M. Hong and Z. Luo. On the linear convergence of the alternating direction method of multipliers. Mathematical Programming, 162:165–199, 2017. · Zbl 1362.90313
[19]	B. Jiang, Z. Chen, and C. Leng. Dynamic linear discriminant analysis for high-dimensional data. Manuscript, 2015.
[20]	C. Leng. Sparse optimal scoring for multiclass cancer diagnosis and biomarker detection using microarray data. Computational Biology and Chemistry, 32:417–425, 2008. · Zbl 1158.92316
[21]	Q. Li and J. Shao. Sparse quadratic discriminant analysis for high dimensional data. Statistica Sinica, 25:457–473, 2015. · Zbl 1534.62079
[22]	H. Liu, J. Lafferty, and L. Wasserman. The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. The Journal of Machine Learning Research, 10:22952328, 2009. · Zbl 1235.62035
[23]	Q. Mai, Y. Yang, and H. Zou. Multiclass sparse discriminant analysis. arXiv preprint arXiv:1504.05845, 2015. · Zbl 1307.62166
[24]	Q. Mai and H. Zou. A note on the equivalence of three sparse linear discriminant methods. Technometrics, 55:243–246, 2013.
[25]	Q. Mai, H. Zou, and M. Yuan. A direct approach to sparse discriminant analysis in ultrahigh dimensions. Biometrika, 99:29–42, 2012. · Zbl 1437.62550
[26]	R. Nishihara, L. Lessard, B. Recht, A. Packard, and M. I. Jordan, 2015. A general analysis of the convergence of ADMM. arXiv preprint arXiv:1502.02009, 2015.
[27]	R. Pan, H. Wang, and R. Li. Ultrahigh dimensional multi-class linear discriminant analysis by pairwise sure independence screening. Journal of the American Statistical Association, 111:169–179, 2016.
[28]	P. Mesejo, D, Pizarro, A. Abergel, O. Rouquette, S. Beorchia, L. Poincloux, and A. Bartoli. Computer-Aided Classification of Gastrointestinal Lesions in Regular Colonoscopy. in IEEE Transactions on Medical Imaging, 35(9):2051–2063, 2016.
[29]	P. Ravikumar, M. Wainwright, G. Raskutti, and B. Yu. High-dimensional covariance estimation by minimizing l1-penalized log-determinant divergence. Electronic Journal of Statistics, 5:935–980, 2011. · Zbl 1274.62190
[30]	J. Shao, Y. Wang, X. Deng, and S. Wang. Sparse linear discriminant analysis by thresholding for high dimensional data. The Annals of statistics, 39:1241–1265, 2011. · Zbl 1215.62062
[31]	D. Singh, P. G. Febbo, K. Ross, D. G. Jackson, J. Manola, C. Ladd et al. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1:203–209, 2002. 36
[32]	S. Srivastava, M. R. Gupta, B. A. Frigyik. Bayesian Quadratic Discriminant Analysis. Journal of Machine Learning Research, 8:1277–1305, 2007 · Zbl 1222.62043
[33]	J. Sun and H. Zhao. The application of sparse estimation of covariance matrix to quadratic discriminant analysis. BMC Bioinformatics, 16:48, 2015.
[34]	A. W. Van der Vaart. Asymptotic statistics, Vol. 3. Cambridge University Press, 2000.
[35]	J. N. Weinstein, E. A. Collisson, G. B. Mills, K. R. M. Shaw5, B. A. Ozenberger, K. Ellrott, I. Shmulevich, C. Sander and J. M. Stuart. The cancer genome atlas pan-cancer analysis project. Nature genetics 45(10): 1113-1120, 2013.
[36]	D. M. Witten and R. Tibshirani. Penalized classification using Fisher’s linear discriminant. Journal of the Royal Statistical Society Series B, 73:753–772, 2011. · Zbl 1228.62079
[37]	T. Zhang and H. Zou. Sparse precision matrix estimation via lasso penalized D-trace loss. Biometrika, 101:103–120, 2014. · Zbl 1285.62063
[38]	S. Zhao, T. Cai, and H. Li. Direct estimation of differential networks. Biometrika, 101:253– 268, 2014. · Zbl 1452.62865
[39]	P. Zhao and B. Yu. On model selection consistency of Lasso. The Journal of Machine Learning Research, 7:2541–2563, 2006. · Zbl 1222.62008
[40]	H. Zou. The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101:1418–1429, 2006. · Zbl 1171.62326

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.