×

Model-based clustering with sparse covariance matrices. (English) Zbl 1430.62131

Summary: Finite Gaussian mixture models are widely used for model-based clustering of continuous data. Nevertheless, since the number of model parameters scales quadratically with the number of variables, these models can be easily over-parameterized. For this reason, parsimonious models have been developed via covariance matrix decompositions or assuming local independence. However, these remedies do not allow for direct estimation of sparse covariance matrices nor do they take into account that the structure of association among the variables can vary from one cluster to the other. To this end, we introduce mixtures of Gaussian covariance graph models for model-based clustering with sparse covariance matrices. A penalized likelihood approach is employed for estimation and a general penalty term on the graph configurations can be used to induce different levels of sparsity and incorporate prior knowledge. Model estimation is carried out using a structural-EM algorithm for parameters and graph structure estimation, where two alternative strategies based on a genetic algorithm and an efficient stepwise search are proposed for inference. With this approach, sparse component covariance matrices are directly obtained. The framework results in a parsimonious model-based clustering of the data via a flexible model for the within-group joint distribution of the variables. Extensive simulated data experiments and application to illustrative datasets show that the method attains good classification performance and model quality. The general methodology for model-based clustering with sparse covariance matrices is implemented in the R package mixggm, available on CRAN.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62H25 Factor analysis and principal components; correspondence analysis

References:

[1] Amerine, M.A.: The composition of wines. Sci Mon 77(5), 250-254 (1953)
[2] Azizyan, M., Singh, A., Wasserman, L.: Efficient sparse clustering of high-dimensional non-spherical Gaussian mixtures. In: Artificial Intelligence and Statistics, pp. 37-45 (2015)
[3] Baladandayuthapani, V., Talluri, R., Ji, Y., Coombes, K.R., Lu, Y., Hennessy, B.T., Davies, M.A., Mallick, B.K.: Bayesian sparse graphical models for classification with application to protein expression data. Ann. Appl. Stat. 8(3), 1443-1468 (2014) · Zbl 1303.62057
[4] Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3), 803-821 (1993) · Zbl 0794.62034
[5] Barber, R.F., Drton, M.: High-dimensional Ising model selection with Bayesian information criteria. Electr. J. Stat. 9(1), 567-607 (2015) · Zbl 1309.62050
[6] Baudry, J.P., Celeux, G.: EM for mixtures Initialization requires special care. Stat. Comput. 25(4), 713-726 (2015) · Zbl 1331.62301
[7] Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957) · Zbl 0077.13605
[8] Bien, J., Tibshirani, R.J.: Sparse estimation of a covariance matrix. Biometrika 98(4), 807-820 (2011) · Zbl 1228.62063
[9] Biernacki, C., Lourme, A.: Stable and visualizable Gaussian parsimonious clustering models. Stat. Comput. 24(6), 953-969 (2014) · Zbl 1332.62199
[10] Bollobas, B.: Random Graphs. Cambridge University Press, Cambridge (2001) · Zbl 0979.05003
[11] Bouveyron, C., Brunet, C.: Simultaneous model-based clustering and visualization in the fisher discriminative subspace. Stat. Comput. 22(1), 301-324 (2012) · Zbl 1322.62162
[12] Bouveyron, C., Brunet-Saumard, C.: Model-based clustering of high-dimensional data: a review. Comput. Stat. Data Anal. 71, 52-78 (2014) · Zbl 1471.62032
[13] Bozdogan, H.: Intelligent statistical data mining with information complexity and genetic algorithms. In: Statistical Data Mining and Knowledge Discovery, pp. 15-56 (2004)
[14] Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. Pattern Recogn. 28(5), 781-793 (1995)
[15] Chalmond, B.: A macro-DAG structure based mixture model. Stat. Methodol. 25, 99-118 (2015) · Zbl 1487.62068
[16] Chatterjee, S., Laudato, M., Lynch, L.A.: Genetic algorithms and their statistical applications: an introduction. Comput. Stat. Data Anal. 22(6), 633-651 (1996) · Zbl 0900.62336
[17] Chaudhuri, S., Drton, M., Richardson, T.S.: Estimation of a covariance matrix with zeros. Biometrika 94(1), 199-216 (2007) · Zbl 1143.62032
[18] Chen, J., Chen, Z.: Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95(3), 759-771 (2008) · Zbl 1437.62415
[19] Ciuperca, G., Ridolfi, A., Idier, J.: Penalized maximum likelihood estimator for normal mixtures. Scand. J. Stat. 30(1), 45-59 (2003) · Zbl 1034.62018
[20] Coomans, D., Broeckaert, M., Jonckheer, M., Massart, D.: Comparison of multivariate discriminant techniques for clinical data—application to the thyroid functional state. Methods Inf. Med. 22, 93-101 (1983)
[21] Danaher, P., Wang, P., Witten, D.M.: The joint graphical lasso for inverse covariance estimation across multiple classes. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 76(2), 373-397 (2014) · Zbl 07555455
[22] Dempster, A.: Covariance selection. Biometrics 28(1), 157-175 (1972)
[23] Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39(1), 1-38 (1977) · Zbl 0364.62022
[24] Drton, M., Maathuis, M.H.: Structure learning in graphical modeling. Annu. Rev. Stat. Appl. 4(1), 365-393 (2017)
[25] Edwards, D.: Introduction to Graphical Modelling. Springer, Berlin (2000) · Zbl 0952.62003
[26] Erdős, P., Rényi, A.: On random graphs I. Publ. Math. (Debrecen) 6, 290-297 (1959) · Zbl 0092.15705
[27] Erdős, P., Rényi, A.: On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 5(1), 17-60 (1960) · Zbl 0103.16301
[28] Fop, M., Murphy, T.B.: Variable selection methods for model-based clustering. Stat. Surv. 12, 18-65 (2018) · Zbl 1496.62105
[29] Forina, M., Armanino, C., Castino, M., Ubigli, M.: Multivariate data analysis as a discriminating method of the origin of wines. Vitis 25(3), 189-201 (1986)
[30] Foygel, R., Drton, M.: Extended Bayesian information criteria for Gaussian graphical models. In: Advances in Neural Information Processing Systems, pp. 604-612 (2010)
[31] Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis and density estimation. J. Am. Stat. Assoc. 97, 611-631 (2002) · Zbl 1073.62545
[32] Fraley, C., Raftery, A.E.: Bayesian regularization for normal mixture estimation and model-based clustering. Technical Report 486, Department of Statistics, University of Washington (2005) · Zbl 1159.62302
[33] Fraley, C., Raftery, A.E.: Bayesian regularization for normal mixture estimation and model-based clustering. J. Classif. 24(2), 155-181 (2007) · Zbl 1159.62302
[34] Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432-441 (2008) · Zbl 1143.62076
[35] Friedman, N.: Learning belief networks in the presence of missing values and hidden variables. In: Fisher, D. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning, pp. 125-133. Morgan Kaufmann (1997)
[36] Friedman, N.: The Bayesian structural EM algorithm. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp. 129-138. Morgan Kaufmann (1998)
[37] Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, Berlin (2006) · Zbl 1108.62002
[38] Galimberti, G., Soffritti, G.: Using conditional independence for parsimonious model-based Gaussian clustering. Stat. Comput. 23(5), 625-638 (2013) · Zbl 1322.62167
[39] Galimberti, G., Manisi, A., Soffritti, G.: Modelling the role of variables in model-based cluster analysis. Stat. Comput. 28, 1-25 (2017) · Zbl 1384.62195
[40] Gao, C., Zhu, Y., Shen, X., Pan, W.: Estimation of multiple networks in Gaussian mixture models. Electr. J. Stat. 10(1), 1133-1154 (2016) · Zbl 1335.62098
[41] Garber, J., Cobin, R., Gharib, H., Hennessey, J., Klein, I., Mechanick, J., Pessah-Pollack, R., Singer, P., Woeber, K.: Clinical practice guidelines for hypothyroidism in adults: cosponsored by the American Association of Clinical Endocrinologists and the American Thyroid Association. Endocr. Pract. 18(6), 988-1028 (2012)
[42] Goldberg, D.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Boston (1989) · Zbl 0721.68056
[43] Green, P.J.: On use of the EM for penalized likelihood estimation. J. R. Stat. Soc. Ser. B (Methodol.) 52, 443-452 (1990) · Zbl 0706.62022
[44] Greenhalgh, D., Marshall, S.: Convergence criteria for genetic algorithms. SIAM J. Comput. 30(1), 269-282 (2000) · Zbl 0959.68103
[45] Guo, J., Levina, E., Michailidis, G., Zhu, J.: Joint estimation of multiple graphical models. Biometrika 98(1), 1-15 (2011) · Zbl 1214.62058
[46] Harbertson, J.F., Spayd, S.: Measuring phenolics in the winery. Am. J. Enol. Vitic. 57(3), 280-288 (2006)
[47] Hoeting, J.A., Madigan, D., Raftery, A.E., Volinsky, C.T.: Bayesian model averaging: a tutorial. Stat. Sci. 14(4), 382-417 (1999) · Zbl 1059.62525
[48] Holland, J.H.: Genetic algorithms. Sci. Am. 267(1), 66-72 (1992)
[49] Huang, J.Z., Liu, N., Pourahmadi, M., Liu, L.: Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93(1), 85-98 (2006) · Zbl 1152.62346
[50] Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193-218 (1985) · Zbl 0587.62128
[51] Kauermann, G.: On a dualization of graphical Gaussian models. Scand. J. Stat. 23(1), 105-116 (1996) · Zbl 0912.62006
[52] Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge (2009) · Zbl 1183.68483
[53] Kriegel, H.P., Schubert, E., Zimek, A.: The (black) art of runtime evaluation: are we comparing algorithms or implementations? Knowl. Inf. Syst. 52(2), 341-378 (2017)
[54] Krishnamurthy, A.: High-dimensional clustering with sparse Gaussian mixture models. Unpublished paper (2011)
[55] Kumar, M.S., Safa, A.M., Deodhar, S.D., SO, P.: The relationship of thyroid-stimulating hormone (TSH), thyroxine (T4), and triiodothyronine (T3) in primary thyroid failure. Am. J. Clin. Pathol. 68(6), 747-751 (1977)
[56] Lee, KH., Xue, L.: Nonparametric finite mixture of Gaussian graphical models. Technometrics (2017)
[57] Lotsi, A., Wit, E.: High dimensional sparse Gaussian graphical mixture model. arXiv preprint arXiv:1308.3381 (2013) · Zbl 1358.62054
[58] Ma, J., Michailidis, G.: Joint structural estimation of multiple graphical models. J. Mach. Learn. Res. 17(166), 1-48 (2016) · Zbl 1392.62198
[59] Madigan, D., Raftery, A.E.: Model selection and accounting for model uncertainty in graphical models using Occam’s window. J. Am. Stat. Assoc. 89(428), 1535-1546 (1994) · Zbl 0814.62030
[60] Malsiner-Walli, G., Frühwirth-Schnatter, S., Grün, B.: Model-based clustering based on sparse finite Gaussian mixtures. Stat. Comput. 26(1), 303-324 (2016) · Zbl 1342.62109
[61] MartÄśÌĄnez, A.M., Vitria, J.: Learning mixture models using a genetic version of the EM algorithm. Pattern Recogn. Lett. 21(8), 759-769 (2000)
[62] Maugis, C., Celeux, G., Martin-Magniette, M.L.: Variable selection for clustering with Gaussian mixture models. Biometrics 65, 701-709 (2009) · Zbl 1172.62021
[63] McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2000) · Zbl 0963.62061
[64] McLachlan, G.J., Rathnayake, S.: On the number of components in a Gaussian mixture model. Wiley Interdiscipl. Rev. Data Min. Knowl. Discov. 4(5), 341-355 (2014)
[65] McNicholas, D.P., Murphy, T.B.: Parsimonious Gaussian mixture models. Stat. Comput. 18(3), 285-296 (2008)
[66] McNicholas, P.D.: Model-based clustering. J. Classif. 33(3), 331-373 (2016) · Zbl 1364.62155
[67] Miller, A.: Subset Selection in Regression. Chapman & Hall/CRC, London (2002) · Zbl 1051.62060
[68] Mohan, K., Chung, M., Han, S., Witten, D., Lee, Si., Fazel, M.: Structured learning of Gaussian graphical models. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 620-628 (2012)
[69] Mohan, K., London, P., Fazel, M., Witten, D., Lee, S.I.: Node-based learning of multiple Gaussian graphical models. J. Mach. Learn. Res. 15(1), 445-488 (2014) · Zbl 1318.62181
[70] Pan, W., Shen, X.: Penalized model-based clustering with application to variable selection. J. Mach. Learn. Res. 8, 1145-1164 (2007) · Zbl 1222.68279
[71] Pan, W., Shen, X., Jiang, A., Hebbel, R.P.: Semi-supervised learning via penalized mixture model with application to microarray sample classification. Bioinformatics 22(19), 2388-2395 (2006)
[72] Pernkopf, F., Bouchaffra, D.: Genetic-based EM algorithm for learning Gaussian mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1344-1348 (2005)
[73] Peterson, C., Stingo, F.C., Vannucci, M.: Bayesian inference of multiple Gaussian graphical models. J. Am. Stat. Assoc. 110(509), 159-174 (2015) · Zbl 1373.62106
[74] Poli, I., Roverato, A.: A genetic algorithm for graphical model selection. J. Ital. Stat. Soc. 7(2), 197-208 (1998) · Zbl 1454.62182
[75] Pourahmadi, M.: Covariance estimation: the GLM and regularization perspectives. Stat. Sci. 26(3), 369-387 (2011) · Zbl 1246.62139
[76] R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2017) https://www.R-project.org
[77] Raftery, A.E., Dean, N.: Variable selection for model-based clustering. J. Am. Stat. Assoc. 101, 168-178 (2006) · Zbl 1118.62339
[78] Richardson, T., Spirtes, P.: Ancestral graph markov models. Ann. Stat. 30(4), 962-1030 (2002) · Zbl 1033.60008
[79] Rodríguez, A., Lenkoski, A., Dobra, A.: Sparse covariance estimation in heterogeneous samples. Electr. J. Stat. 5, 981-1014 (2011) · Zbl 1274.62207
[80] Rothman, A.J.: Positive definite estimators of large covariance matrices. Biometrika 99(3), 733-740 (2012) · Zbl 1437.62595
[81] Roverato, A.: Hyper inverse Wishart distribution for non-decomposable graphs and its application to Bayesian inference for Gaussian graphical models. Scand. J. Stat. 29(3), 391-411 (2002) · Zbl 1036.62027
[82] Roverato, A., Paterlini, S.: Technological modelling for graphical models: an approach based on genetic algorithms. Comput. Stat. Data Anal. 47(2), 323-337 (2004) · Zbl 1429.62206
[83] Ruan, L., Yuan, M., Zou, H.: Regularized parameter estimation in high-dimensional Gaussian mixture models. Neural Comput. 23(6), 1605-1622 (2011) · Zbl 1217.62044
[84] Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461-464 (1978) · Zbl 0379.62005
[85] Scrucca, L.: GA: A package for genetic algorithms in R. J. Stat. Softw. 53(4), 1-37 (2013)
[86] Scrucca, L.; Celebi, ME (ed.); Aydin, K. (ed.), Genetic algorithms for subset selection in model-based clustering, 55-70 (2016), Berlin
[87] Scrucca, L.: On some extensions to GA package: hybrid optimisation, parallelisation and Islands evolution. R J. 9(1), 187-206 (2017)
[88] Scrucca, L., Raftery, A.E.: Improved initialisation of model-based clustering using Gaussian hierarchical partitions. Adv. Data Anal. Classif. 9(4), 447-460 (2015) · Zbl 1414.62272
[89] Scrucca, L., Fop, M., Murphy, T.B., Raftery, A.E.: mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. R J. 8(1), 289-317 (2016)
[90] Sharapov, R.R., Lapshin, A.V.: Convergence of genetic algorithms. Pattern Recogn. Image Anal. 16(3), 392-397 (2006)
[91] Shen, X., Ye, J.: Adaptive model selection. J. Am. Stat. Assoc. 97(457), 210-221 (2002) · Zbl 1073.62509
[92] Talluri, R., Baladandayuthapani, V., Mallick, B.K.: Bayesian sparse graphical models and their mixtures. Stat 3(1), 109-125 (2014) · Zbl 07847431
[93] Tan, K.M.: hglasso: Learning graphical models with hubs. R package version 12. (2014) https://CRAN.R-project.org/package=hglasso
[94] Thiesson, B., Meek, C., Chickering, D.M., Heckerman, D.: Learning mixtures of DAG models. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp 504-513 (1997)
[95] Titterington, D., Smith, A., Makov, U.: Statistical Analysis of Finite Mixture Distributions. Wiley, London (1985) · Zbl 0646.62013
[96] Wang, H.: Scaling it up: Stochastic search structure learning in graphical models. Bayesian Anal. 10(2), 351-377 (2015) · Zbl 1335.62068
[97] Wermuth, N., Cox, D., Marchetti, G.M.: Covariance chains. Bernoulli 12(5), 841-862 (2006) · Zbl 1134.62031
[98] Whittaker, J.: Graphical Models in Applied Multivariate Statistics. Wiley, London (1990) · Zbl 0732.62056
[99] Wiegand, R.E.: Performance of using multiple stepwise algorithms for variable selection. Stat. Med. 29(15), 1647-1659 (2010)
[100] Wu, C.F.J.: On the convergence properties of the EM algorithm. Ann. Stat. 11(1), 95-103 (1983) · Zbl 0517.62035
[101] Xie, B., Pan, W., Shen, X.: Variable selection in penalized model-based clustering via regularization on grouped parameters. Biometrics 64(3), 921-930 (2008) · Zbl 1146.62101
[102] Yuan, M., Lin, Y.: Model selection and estimation in the Gaussian graphical model. Biometrika 94(1), 19-35 (2007) · Zbl 1142.62408
[103] Zhou, H., Pan, W., Shen, X.: Penalized model-based clustering with unconstrained covariance matrices. Electr. J. Stat. 3, 1473-1496 (2009) · Zbl 1326.62143
[104] Zhou, S., RÃijtimann, P., Xu, M., BÃijhlmann, P.: High-dimensional covariance estimation based on Gaussian graphical models. J. Mach. Learn. Res. 12, 2975-3026 (2011) · Zbl 1280.62065
[105] Zhu, Y., Shen, X., Pan, W.: Structural pursuit over multiple undirected graphs. J. Am. Stat. Assoc. 109(508), 1683-1696 (2014) · Zbl 1368.62181
[106] Zou, H., Hastie, T., Tibshirani, R.: On the “degrees of freedom” of the lasso. Ann. Stat. 35(5), 2173-2192 (2007) · Zbl 1126.62061
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.