×

Classification of COVID19 Patients using robust logistic regression. (English) Zbl 07618096

Summary: Coronavirus disease 2019 (COVID19) has triggered a global pandemic affecting millions of people. Severe acute respiratory syndrome coronavirus \(2\) (SARS-CoV-2) causing the COVID-19 disease is hypothesized to gain entry into humans via the airway epithelium, where it initiates a host response. The expression levels of genes at the upper airway that interact with the SARS-CoV-2 could be a telltale sign of virus infection. However, gene expression data have been flagged as suspicious of containing different contamination errors via techniques for extracting such information, and clinical diagnosis may contain labelling errors due to the specificity and sensitivity of diagnostic tests. We propose to fit the regularized logistic regression model as a classifier for COVID-19 diagnosis, which simultaneously identifies genes related to the disease and predicts the COVID-19 cases based on the expression values of the selected genes. We apply a robust estimating methods based on the density power divergence to obtain stable results ignoring the effects of contamination or labelling errors in the data and compare its performance with respect to the classical maximum likelihood estimator with different penalties, including the LASSO and the general adaptive LASSO penalties.

MSC:

62Jxx Linear inference, regression
62Fxx Parametric inference
62Gxx Nonparametric inference

References:

[1] Algamal, ZA; Lee, MH, Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer, Expert Syst Appl, 42, 9326-9332 (2015) · doi:10.1016/j.eswa.2015.08.016
[2] Araveeporn, A., The higher-order of adaptive lasso and elastic net methods for classification on high dimensional data, Mathematics, 9, 1091 (2021) · doi:10.3390/math9101091
[3] Avella-Medina, M.; Ronchetti, E., Robust and consistent variable selection in high-dimensional generalized linear models, Biometrika, 105, 31-44 (2018) · Zbl 07072391 · doi:10.1093/biomet/asx070
[4] Bianco, AM; Yohai, VJ, Robust estimation in the logistic regression model. Robust statistics, data analysis, and computer intensive methods (1996), New York: Springer, New York · Zbl 0839.62030
[5] Bianco AM, Boente G, Chebi G (2021) Penalized robust estimators in sparse logistic regression. TEST, 1-32
[6] Basu, A.; Harris, R.; Hjort, N.; Jones, MC, Robust and efficient estimation by minimising a density power divergence, Biometrika, 85, 549-559, 1998 (1998) · Zbl 0926.62021
[7] Basu A, Ghosh A, Jaenada M, Pardo L (2021) Robust adaptive Lasso in high-dimensional logistic regression with an application to genomic classification of cancer patients. arXiv:2109.03028
[8] Cantoni, E.; Ronchetti, E., Robust inference for generalized linear models, J Am Stat Assoc, 96, 1022-1030 (2001) · Zbl 1072.62610 · doi:10.1198/016214501753209004
[9] Cawley, GC; Talbot, NLC, Gene selection in cancer classification using sparse logistic regression with Bayesian regularization, Bioinformatics, 22, 19, 2348-2355 (2006) · doi:10.1093/bioinformatics/btl386
[10] Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, J Am Stat Assoc, 96, 1348-1360 (2001) · Zbl 1073.62547 · doi:10.1198/016214501753382273
[11] Fokianos, K., Comparing two samples by penalized logistic regression, Electron J Stat, 2, 564-580 (2008) · Zbl 1320.62070 · doi:10.1214/07-EJS078
[12] Ghosh, D.; Chinnaiyan, AM, Classification and selection of biomarkers in genomic data using LASSO, J Biomed Biotechnol, 2005, 2, 147 (2005) · doi:10.1155/JBB.2005.147
[13] Ghosh, A.; Basu, A., Robust estimation in generalized linear models: the density power divergence approach, TEST, 25, 2, 269-290 (2016) · Zbl 1342.62126 · doi:10.1007/s11749-015-0445-3
[14] Ghosh, A.; Majumdar, S., Ultrahigh-dimensional robust and efficient sparse regression using non-concave penalized density power divergence, IEEE Trans Inf Theory, 66, 12, 7812-7827 (2020) · Zbl 1457.62211 · doi:10.1109/TIT.2020.3013015
[15] Ghosh A, Jaenada M, Pardo L (2020) Robust adaptive variable selection in ultra-high dimensional linear regression models arXiv:2004.05470
[16] Hastie, T.; Tibshirani, R.; Friedman, J., The elements of statistical learning: data mining, inference and prediction (2009), Berlin: Springer, Berlin · Zbl 1273.62005 · doi:10.1007/978-0-387-84858-7
[17] Huang J, Ma S, Zhang CH (2008) The iterated lasso for high-dimensional logistic regression. The University of Iowa, Department of Statistics and Actuarial Sciences, pp 1-20
[18] Jacob L, Obozinski G, Vert JP (2009) Group lasso with overlap and graph lasso. In: Proceedings of the 26th annual international conference on machine learning, pp 433-440
[19] Konishi, S.; Kitagawa, G., Generalized information criteria in model selection, Biometrika, 83, 875-890 (1996) · Zbl 0883.62004 · doi:10.1093/biomet/83.4.875
[20] Mick E, Kamm J, Pisco AO, Ratnasiri K, Babik JM, Calfee CS et al (2020) Upper airway gene expression differentiates COVID-19 from other acute respiratory illnesses and reveals suppression of innate immune responses by SARS-CoV-2. medRxiv
[21] Park, MY; Hastie, T., Penalized logistic regression for detecting gene interactions, Biostatistics, 9, 30-50 (2008) · Zbl 1274.62853 · doi:10.1093/biostatistics/kxm010
[22] Ramesh, P.; Veerappapillai, S.; Karuppasamy, R., Gene expression profiling of corona virus microarray datasets to identify crucial targets in COVID-19 patients, Gene Rep, 22 (2021) · doi:10.1016/j.genrep.2020.100980
[23] Plan, Y.; Vershynin, R., Robust 1-bit compressed sensing and sparse logistic regression: a convex programming approach, IEEE Trans Inf Theory, 59, 1, 482-494 (2013) · Zbl 1364.94153 · doi:10.1109/TIT.2012.2207945
[24] Salahudeen AA, Choi SS, Rustagi A, Zhu J, Sean M, Flynn RA, Kuo CJ (2020) Progenitor identification and SARS-CoV-2 infection in long-term human distal lung organoid cultures. BioRxiv. doi:10.1101/2020.07.27.212076
[25] Shevade, SK; Keerthi, SS, A simple and efficient algorithm for gene selection using sparse logistic regression, Bioinformatics, 19, 17, 2246-2253 (2003) · doi:10.1093/bioinformatics/btg308
[26] Sun, H.; Wang, S., Penalized logistic regression for high-dimensional DNA methylation data with case-control studies, Bioinformatics, 28, 1368-1375 (2012) · doi:10.1093/bioinformatics/bts145
[27] Tibshirani, R., Regression shrinkage and selection via the lasso, J R Stat Soc Ser B (Methodol), 58, 1, 267-288 (1996) · Zbl 0850.62538
[28] Wu, TT; Chen, YF; Hastie, T.; Sobel, E.; Lange, K., Genome-wide association analysis by lasso penalized logistic regression, Bioinformatics, 25, 6, 714-721 (2009) · doi:10.1093/bioinformatics/btp041
[29] Zhang, YH; Li, H.; Zeng, T.; Chen, L.; Li, Z.; Huang, T.; Cai, YD, Identifying transcriptomic signatures and rules for SARS-CoV-2 infection, Front Cell Dev Biol, 8, 1763 (2021)
[30] Zhu, J.; Hastie, T., Classification of expressions arrays by penalized logistic regression, Biostatistics, 5, 3, 427-443 (2004) · Zbl 1154.62406 · doi:10.1093/biostatistics/kxg046
[31] Zou, H., The adaptive lasso and its oracle properties, J Am Stat Assoc, 101, 476, 1418-1429 (2006) · Zbl 1171.62326 · doi:10.1198/016214506000000735
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.