×

Sparse quadratic classification rules via linear dimension reduction. (English) Zbl 1409.62128

The main contribution of this paper is the new quadratic classifier based on discriminant analysis via projections (DAP). As part of the efficient convex optimization procedure the proposed DAP method combines the variable selection and projection of the original data on a lower-dimensional space. Both simulated and real data are employed to demonstrate the efficiency of the proposed method.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F15 Bayesian inference
62C12 Empirical decision procedures; empirical Bayes procedures
90C25 Convex programming

References:

[1] Bach, F. R., Consistency of the group Lasso and multiple kernel learning, J. Mach. Learn. Res., 9, 1179-1225, (2008) · Zbl 1225.68147
[2] R.F. Barber, M. Drton, Exact block-wise optimization in group lasso and sparse group lasso for linear regression, arXiv.org, 2010.; R.F. Barber, M. Drton, Exact block-wise optimization in group lasso and sparse group lasso for linear regression, arXiv.org, 2010.
[3] Boyd, S. P.; Vandenberghe, L., Convex Optimization, (2004), Cambridge Univ Press: Cambridge Univ Press Cambridge · Zbl 1058.90049
[4] Breheny, P.; Huang, J., Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors, Stat. Comput., 25, 173-187, (2015) · Zbl 1331.62359
[5] Cai, T. T.; Liu, W., A direct estimation approach to sparse linear discriminant analysis, J. Amer. Statist. Assoc., 106, 1566-1577, (2011) · Zbl 1233.62129
[6] Chen, H.-W.; Huang, H.-C.; Lin, Y.-S.; Chang, K.-J.; Kuo, W.-H.; Hwa, H.-L.; Hsieh, F.-J.; Juan, H.-F., Comparison and identification of estrogen-receptor related gene expression profiles in breast cancer of different ethnic origins, Breast Cancer Basic Clin. Res., 1, 35-49, (2008)
[7] Chin, K.; DeVries, S.; Fridlyand, J.; Spellman, P. T.; Roydasgupta, R.; Kuo, W.-L.; Lapuk, A.; Neve, R. M.; Qian, Z.; Ryder, T.; Chen, F.; Feiler, H.; Tokuyasu, T.; Kingsley, C.; Dairkee, S.; Meng, Z.; Chew, K.; Pinkel, D.; Jain, A.; Ljung, B. M.; Esserman, L.; Albertson, D. G.; Waldman, F. M.; Gray, J. W., Genomic and transcriptional aberrations linked to breast cancer pathophysiologies, Cancer Cell, 10, 529-541, (2006)
[8] Chowdary, D.; Lathrop, J.; Skelton, J.; Curtin, K.; Briggs, T.; Zhang, Y.; Yu, J.; Wang, Y.; Mazumder, A., Prognostic gene expression signatures can be measured in tissues collected in RNAlater preservative, J. Mol. Diagnostics : JMD, 8, 31-39, (2006)
[9] Clemmensen, L.; Witten, D. M.; Hastie, T. J.; Ersbøll, B., Sparse discriminant analysis, Technometrics, 53, 406-413, (2011)
[10] P. Danaher, JGL: Performs the Joint Graphical Lasso for sparse inverse covariance estimation on multiple classes, 2013. R; P. Danaher, JGL: Performs the Joint Graphical Lasso for sparse inverse covariance estimation on multiple classes, 2013. R
[11] Danaher, P.; Wang, P.; Witten, D. M., The joint graphical lasso for inverse covariance estimation across multiple classes, J. R. Stat. Soc. Ser. B Stat. Methodol., 76, 373-397, (2014) · Zbl 07555455
[12] Dudoit, S.; Fridlyand, J.; Speed, T. P., Comparison of discrimination methods for the classification of tumors using gene expression data, J. Amer. Statist. Assoc., 97, 77-87, (2002) · Zbl 1073.62576
[13] Friedman, J. H., Regularized discriminant analysis, J. Amer. Statist. Assoc., 84, 165-175, (1989)
[14] I. Gaynanova, MGSDA: Multi-Group Sparse Discriminant Analysis, 2016. R; I. Gaynanova, MGSDA: Multi-Group Sparse Discriminant Analysis, 2016. R
[15] Gaynanova, I.; Booth, J. G.; Wells, M. T., Simultaneous sparse estimation of canonical vectors in the \(p \gg N\) setting, J. Amer. Statist. Assoc., 111, 696-706, (2016)
[16] Gaynanova, I.; Booth, J. G.; Wells, M. T., Penalized versus constrained generalized eigenvalue problems, J. Comput. Graph. Statist., 26, 379-387, (2017)
[17] Gaynanova, I.; Kolar, M., Optimal variable selection in multi-group sparse discriminant analysis, Electron. J. Stat., 9, 2007-2034, (2015) · Zbl 1323.62060
[18] Gravier, E.; Pierron, G.; Vincent-Salomon, A.; Gruel, N.; Raynal, V.; Savignoni, A.; De Rycke, Y.; Pierga, J.-Y.; Lucchesi, C.; Reyal, F.; Fourquet, A.; Roman-Roman, S.; Radvanyi, F.; Sastre-Garau, X.; Asselain, B.; Delattre, O., A prognostic DNA signature for T1T2 node-negative breast cancer patients, Genes Chromosom. Cancer, 49, 1125-1134, (2010)
[19] Guo, J.; Levina, E.; Michailidis, G.; Zhu, J., Joint estimation of multiple graphical models, Biometrika, 98, 1-15, (2011) · Zbl 1214.62058
[20] Holst, F., Estrogen receptor alpha gene amplification in breast cancer: 25 years of debate, World J. Clin. Oncol., 7, 160-173, (2016)
[21] Holst, F.; Stahl, P. R.; Ruiz, C.; Hellwinkel, O.; Jehan, Z.; Wendland, M.; Lebeau, A.; Terracciano, L.; Al-Kuraya, K.; Jänicke, F.; Sauter, G.; Simon, R., Estrogen receptor alpha (ESR1) gene amplification is frequent in breast cancer, Nature Gen., 39, 655-660, (2007)
[22] Hsu, D.; Kakade, S. M.; Zhang, T., A tail inequality for quadratic forms of subgaussian random vectors, Electron. Comm. Probab., 17, 52-56, (2012) · Zbl 1309.60017
[23] Huang, J.; Breheny, P.; Ma, S., A selective review of group selection in high-dimensional models, Statist. Sci., 27, 481-499, (2012) · Zbl 1331.62347
[24] Iwamoto, T.; Booser, D.; Valero, V.; Murray, J. L.; Koenig, K.; Esteva, F. J.; Ueno, N. T.; Zhang, J.; Shi, W.; Qi, Y.; Matsuoka, J.; Yang, E. J.; Hortobagyi, G. N.; Hatzis, C.; Symmans, W. F.; Pusztai, L., Estrogen receptor (ER) mRNA and ER-related gene expression in breast cancers that are 1
[25] Kadota, T.; Shepp, L., On the best finite set of linear observables for discriminating two Gaussian signals, IEEE Trans. Inform. Theory, 13, 278-284, (1967) · Zbl 0153.48704
[26] Kolar, M.; Liu, H., Optimal feature selection in high-dimensional discriminant analysis, IEEE Trans. Inform. Theory, 61, 1063-1083, (2015) · Zbl 1359.62250
[27] Kullback, S., An application of information theory to multivariate analysis, Ann. Math. Stat., 23, 88-102, (1952) · Zbl 0047.13503
[28] Laenkholm, A.-V.; Knoop, A.; Ejlertsen, B.; Rudbeck, T.; Jensen, M.-B.; Müller, S.; Lykkesfeldt, A. E.; Rasmussen, B. B.; Nielsen, K. V., ESR1 gene status correlates with estrogen receptor protein levels measured by ligand binding assay and immunohistochemistry, Mol. Oncol., 6, 428-436, (2012)
[29] Laurent, B.; Massart, P., Adaptive estimation of a quadratic functional by model selection, Ann. Statist., 28, 1302-1338, (2000) · Zbl 1105.62328
[30] Y. Le, T.J. Hastie, Sparse quadratic discriminant analysis and community Bayes, arXiv.org, 2014.; Y. Le, T.J. Hastie, Sparse quadratic discriminant analysis and community Bayes, arXiv.org, 2014.
[31] Li, Y.; Ngom, A., Nonnegative least-squares methods for the classification of high-dimensional biological data, IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB), 10, 447-456, (2013)
[32] Li, Q.; Shao, J., Sparse quadratic discriminant analysis for high dimensional data, Statist. Sinica, 25, 457-473, (2015) · Zbl 1534.62079
[33] Lin, J., Divergence measures based on the shannon entropy, IEEE Trans. Inform. Theory, 37, 145-151, (1991) · Zbl 0712.94004
[34] Mai, Q.; Zou, H., A note on the connection and equivalence of three sparse linear discriminant analysis methods, Technometrics, 55, 243-246, (2013)
[35] Mai, Q.; Zou, H.; Yuan, M., A direct approach to sparse discriminant analysis in ultra-high dimensions, Biometrika, 99, 29-42, (2012) · Zbl 1437.62550
[36] Mardia, K. V.; Kent, J. T.; Bibby, J. M., Multivariate Analysis, (1979), Academic Press: Academic Press New York · Zbl 0432.62029
[37] O. Mersmann, microbenchmark: Accurate Timing Functions, 2015. R; O. Mersmann, microbenchmark: Accurate Timing Functions, 2015. R
[38] Muirhead, R. J., Aspects of Multivariate Statistical Theory, (1982), Wiley: Wiley New York · Zbl 0556.62028
[39] Niu, Y. S.; Hao, N.; Dong, B., A new reduced-rank linear discriminant analysis method and its applications, Statist. Sinica, 28, 189-202, (2018) · Zbl 1382.62034
[40] Obozinski, G.; Wainwright, M. J.; Jordan, M. I., Support union recovery in high-dimensional multivariate regression, Ann. Statist., 39, 1-47, (2011) · Zbl 1373.62372
[41] B.S. Price, RidgeFusion: RR; B.S. Price, RidgeFusion: RR
[42] Price, B. S.; Geyer, C. J.; Rothman, A. J., Ridge fusion in statistical learning, J. Comput. Graph. Statist., 24, 439-454, (2014)
[43] J.A. Ramey, Datamicroarray: Collection of Data Sets for Classification, 2016. https://github.com/ramhiser/datamicroarrayhttp://ramhiser.com; J.A. Ramey, Datamicroarray: Collection of Data Sets for Classification, 2016. https://github.com/ramhiser/datamicroarrayhttp://ramhiser.com
[44] J.A. Ramey, C.K. Stein, P.D. Young, D.M. Young, High-Dimensional Regularized Discriminant Analysis, arXiv.org, 2016.; J.A. Ramey, C.K. Stein, P.D. Young, D.M. Young, High-Dimensional Regularized Discriminant Analysis, arXiv.org, 2016.
[45] Rukhin, A. L., Generalized Bayes estimators of a normal discriminant function, J. Multivariate Anal., 41, 154-162, (1992) · Zbl 0745.62025
[46] N. Simon, R.J. Tibshirani, Discriminant Analysis with Adaptively Pooled Covariance, arXiv.org, 2011.; N. Simon, R.J. Tibshirani, Discriminant Analysis with Adaptively Pooled Covariance, arXiv.org, 2011.
[47] Simon, N.; Tibshirani, R. J., Standardization and the group Lasso penalty, Statist. Sinica, 22, 983-1001, (2012) · Zbl 1257.62080
[48] Tibshirani, R. J.; Hastie, T. J.; Narasimhan, B.; Chu, G., Class prediction by nearest shrunken centroids, with applications to DNA microarrays, Statist. Sci., 18, 104-117, (2003) · Zbl 1048.62109
[49] Tseng, P., Convergence of a block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl., 109, 475-494, (2001) · Zbl 1006.65062
[50] Wainwright, M. J., Sharp thresholds for high-dimensional and noisy sparsity recovery using \(\ell_1\)-constrained quadratic programming (Lasso), IEEE Trans. Inform. Theory, 55, 2183-2202, (2009) · Zbl 1367.62220
[51] T. Wang, I. Gaynanova, DAP: Discriminant Analysis via Projections, 2018. R; T. Wang, I. Gaynanova, DAP: Discriminant Analysis via Projections, 2018. R
[52] Wickham, H., : Elegant Graphics for Data Analysis, (2016), Springer: Springer New York · Zbl 1397.62006
[53] Witten, D. M.; Tibshirani, R. J., Penalized classification using Fisher’s linear discriminant, J. R. Stat. Soc. Ser. B, 73, 753-772, (2011) · Zbl 1228.62079
[54] Wu, Y.; Qin, Y.; Zhu, M., Quadratic discriminant analysis for high-dimensional data, Statist. Sinica, (2018), (in press)
[55] Wu, M. C.; Zhang, L.; Wang, Z.; Christiani, D. C.; Lin, X., Sparse linear discriminant analysis for simultaneous testing for the significance of a gene set/pathway and gene selection, Bioinformatics, 25, 1145-1151, (2009)
[56] Yuan, M.; Lin, Y., Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B, 68, 49-67, (2006) · Zbl 1141.62030
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.