×

Decomposition of variation of mixed variables by a latent mixed Gaussian copula model. (English) Zbl 1522.62186

Summary: Many biomedical studies collect data of mixed types of variables from multiple groups of subjects. Some of these studies aim to find the group-specific and the common variation among all these variables. Even though similar problems have been studied by some previous works, their methods mainly rely on the Pearson correlation, which cannot handle mixed data. To address this issue, we propose a latent mixed Gaussian copula (LMGC) model that can quantify the correlations among binary, ordinal, continuous, and truncated variables in a unified framework. We also provide a tool to decompose the variation into the group-specific and the common variation over multiple groups via solving a regularized \(M\)-estimation problem. We conduct extensive simulation studies to show the advantage of our proposed method over the Pearson correlation-based methods. We also demonstrate that by jointly solving the \(M\)-estimation problem over multiple groups, our method is better than decomposing the variation group by group. We also apply our method to a Chlamydia trachomatis genital tract infection study to demonstrate how it can be used to discover informative biomarkers that differentiate patients.
{© 2022 The International Biometric Society.}

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI

References:

[1] Ahn, S.C. and Horenstein, A.R. (2013) Eigenvalue ratio test for the number of factors. Econometrica, 81, 1203-1227. · Zbl 1274.62403
[2] Alter, O., Brown, P.O. and Botstein, D. (2003) Generalized singular value decomposition for comparative analysis of genome‐scale expression data sets of two different organisms. Proceedings of the National Academy of Sciences, 100, 3351-3356.
[3] Amar, D., Safer, H. and Shamir, R. (2013) Dissection of regulatory networks that are altered in disease via differential co‐expression. PLoS Computational Biology, 9, e1002955.
[4] Andrew, D.W., Cochrane, M., Schripsema, J.H., Ramsey, K.H., Dando, S.J., O’Meara, C.P. et al. (2013) The duration of Chlamydia muridarum genital tract infection and associated chronic pathological changes are reduced in IL‐17 knockout mice but protection is not increased further by immunization. PloS One, 8, e76664.
[5] Candès, E.J. and Recht, B. (2009) Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9, 717-772. · Zbl 1219.90124
[6] Candès, E.J. and Tao, T. (2010) The power of convex relaxation: near‐optimal matrix completion. IEEE Transactions on Information Theory, 56, 2053-2080. · Zbl 1366.15021
[7] Choi, Y. and Kendziorski, C. (2009) Statistical methods for gene set co‐expression analysis. Bioinformatics, 25, 2780-2786.
[8] Darville, T., Albritton, H.L., Zhong, W., Dong, L., O’Connell, C.M., Poston, T.B. et al. (2019) Anti‐chlamydia IgG and IgA are insufficient to prevent endometrial chlamydia infection in women, and increased anti‐chlamydia IgG is associated with enhanced risk for incident infection. American Journal of Reproductive Immunology, 81, e13103.
[9] De Vito, R., Bellio, R., Trippa, L. and Parmigiani, G. (2019) Multi‐study factor analysis. Biometrics, 75, 337-346. · Zbl 1436.62538
[10] Fan, J., Liu, H., Ning, Y. and Zou, H. (2017) High dimensional semiparametric latent graphical model for mixed data. Journal of the Royal Statistical Society: Series B, 79, 405-421. · Zbl 1414.62179
[11] Feng, Q., Jiang, M., Hannig, J. and Marron, J. (2018) Angle‐based joint and individual variation explained. Journal of Multivariate Analysis, 166, 241-265. · Zbl 1408.62113
[12] Feng, H. and Ning, Y. (2019) High‐dimensional mixed graphical model with ordinal data: parameter estimation and statistical inference. In: Proceedings of the Twenty‐Second International Conference on Artificial Intelligence and Statistics, Vol. 89, pp. 654-663.
[13] Ha, M.J., Baladandayuthapani, V. and Do, K.‐A. (2015) DINGO: differential network analysis in genomics. Bioinformatics, 31, 3413-3420. · Zbl 1335.46011
[14] Huang, M., Müller, C.L. and Gaynanova, I. (2021) latentcor: An R package for estimating latent correlations from mixed data types. Journal of Open Source Software, 6, 3634-3638.
[15] Iwakura, Y. and Ishigame, H. (2006) The IL‐23/IL‐17 axis in inflammation. Journal of Clinical Investigation, 116, 1218-1222.
[16] Keates, S., Han, X., Kelly, C.P. and Keates, A.C. (2007) Macrophage‐inflammatory protein‐3α mediates epidermal growth factor receptor transactivation and ERK1/2 MAPK signaling in Caco‐2 colonic epithelial cells via metalloproteinase‐dependent release of amphiregulin. Journal of Immunology, 178, 8013-8021.
[17] Kiviat, N., Wolner‐Hanssen, P., Eschenbach, D., Wasserheit, J., Paavonen, J., Bell, T. et al. (1990) Endometrial histopathology in patients with culture‐proved upper genital tract infection and laparoscopically diagnosed acute salpingitis. American Journal of Surgical Pathology, 14, 167-175.
[18] Lam, C. and Yao, Q. (2012) Factor modeling for high‐dimensional time series: inference for the number of factors. Annals of Statistics, 40, 694-726. · Zbl 1273.62214
[19] Li, G. and Gaynanova, I. (2018) A general framework for association analysis of heterogeneous data. Annals of Applied Statistics, 12, 1700-1726. · Zbl 1405.62068
[20] Lijek, R.S., Helble, J.D., Olive, A.J., Seiger, K.W. and Starnbach, M.N. (2018) Pathology after Chlamydia trachomatis infection is driven by nonprotective immune cells that are distinct from protective populations. Proceedings of the National Academy of Sciences, 115, 2216-2221.
[21] Liu, H., Lafferty, J. and Wasserman, L. (2009) The nonparanormal: semiparametric estimation of high dimensional undirected graphs. Journal of Machine Learning Research, 10, 2295-2328. · Zbl 1235.62035
[22] Lock, E.F., Hoadley, K.A., Marron, J.S. and Nobel, A.B. (2013) Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Annals of Applied Statistics, 7, 523-542. · Zbl 1454.62355
[23] Löfstedt, T. and Trygg, J. (2011) OnPLS—a novel multiblock method for the modelling of predictive and orthogonal variation. Journal of Chemometrics, 25, 441-455.
[24] Marino, J., Furmento, V.A., Zotta, E. and Roguin, L.P. (2009) Peritumoral administration of granulocyte colony‐stimulating factor induces an apoptotic response on a murine mammary adenocarcinoma. Cancer Biology & Therapy, 8, 1737-1743.
[25] Mazumder, R., Hastie, T. and Tibshirani, R. (2010) Spectral regularization algorithms for learning large incomplete matrices. Journal of Machine Learning Research, 11, 2287-2322. · Zbl 1242.68237
[26] Nastase, M.V., Zeng‐Brouwers, J., Beckmann, J., Tredup, C., Christen, U., Radeke, H.H. et al. (2018) Biglycan, a novel trigger of Th1 and Th17 cell recruitment into the kidney. Matrix Biology, 68, 293-317.
[27] Ponnapalli, S.P., Saunders, M.A., Van Loan, C.F. and Alter, O. (2011) A higher‐order generalized singular value decomposition for comparison of global mRNA expression from multiple organisms. PloS One, 6, e28072.
[28] Poston, T.B., Lee, D.E., Darville, T., Zhong, W., Dong, L., O’Connell, C.M. et al. (2019) Cervical cytokines associated with Chlamydia trachomatis susceptibility and protection. Journal of Infectious Diseases, 220, 330-339.
[29] Quan, X., Booth, J.G. and Wells, M.T. (2018) Rank‐based approach for estimating correlations in mixed ordinal data. arXiv preprint arXiv:1809.06255.
[30] Rahmatallah, Y., Emmert‐Streib, F. and Glazko, G. (2014) Gene Sets Net Correlations Analysis (GSNCA): a multivariate differential coexpression test for gene sets. Bioinformatics, 30, 360-368.
[31] Rangel‐Moreno, J., Carragher, D.M., de laLuz Garcia‐Hernandez, M., Hwang, J.Y., Kusser, K., Hartson, L. et al. (2011) The development of inducible bronchus‐associated lymphoid tissue depends on IL‐17. Nature Immunology, 12, 639-646.
[32] Russell, A.N., Zheng, X., O’Connell, C.M., Taylor, B.D., Wiesenfeld, H.C., Hillier, S.L. et al. (2016) Analysis of factors driving incident and ascending infection and the role of serum antibody in Chlamydia trachomatis genital tract infection. Journal of Infectious Diseases, 213, 523-531.
[33] Shabalin, A.A. (2012) Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics, 28, 1353-1358.
[34] Shin, S.Y., Lee, D.H., Lee, J., Choi, C., Kim, J.‐Y., Nam, J.‐S. et al. (2017) C‐C motif chemokine receptor 1 (CCR1) is a target of the EGF‐AKT‐mTOR‐STAT3 signaling axis in breast cancer cells. Oncotarget, 8, 94591-94605.
[35] Shu, H., Wang, X. and Zhu, H. (2020) D‐CCA: a decomposition‐based canonical correlation analysis for high‐dimensional datasets. Journal of the American Statistical Association, 115, 292-306. · Zbl 1437.62211
[36] Tesson, B.M., Breitling, R. and Jansen, R.C. (2010) DiffCoEx: a simple and sensitive method to find differentially coexpressed gene modules. BMC Bioinformatics, 11, 497-505.
[37] Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58, 267-288. · Zbl 0850.62538
[38] vanDam, S., Vosa, U., van derGraaf, A., Franke, L. and deMagalhaes, J.P. (2018) Gene co‐expression analysis for functional classification and gene-disease predictions. Briefings in Bioinformatics, 19, 575-592.
[39] Watson, M. (2006) CoXpress: differential co‐expression in gene expression data. BMC Bioinformatics, 7, 509-520.
[40] Yoon, G., Carroll, R.J. and Gaynanova, I. (2020) Sparse semiparametric canonical correlation analysis for data of mixed types. Biometrika, 107, 609-625. · Zbl 1451.62051
[41] Yoon, G. and Gaynanova, I. (2021) mixedCCA: Sparse Canonical Correlation Analysis for High‐Dimensional Mixed Data. R Package Version 1.4.6.
[42] Zhao, T., Roeder, K. and Liu, H. (2014) Positive semidefinite rank‐based correlation matrix estimation with application to semiparametric graph estimation. Journal of Computational and Graphical Statistics, 23, 895-922.
[43] Zhong, W., Dong, L., Poston, T.B., Darville, T., Spracklen, C.N., Wu, D. et al. (2020) Inferring regulatory networks from mixed observational data using directed acyclic graphs. Frontiers in Genetics, 11, 8.
[44] Zhou, G., Cichocki, A., Zhang, Y. and Mandic, D.P. (2015) Group component analysis for multiblock data: common and individual feature extraction. IEEE Transactions on Neural Networks and Learning Systems, 27, 2426-2439.
[45] Zhu, H., Li, G. and Lock, E.F. (2020) Generalized integrative principal component analysis for multi‐type data with block‐wise missing structure. Biostatistics, 21, 302-318.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.