×

Regularized generalized canonical correlation analysis for multiblock or multigroup data analysis. (English) Zbl 1341.62160

Summary: This paper presents an overview of methods for the analysis of data structured in blocks of variables or in groups of individuals. More specifically, regularized generalized canonical correlation analysis (RGCCA), which is a unifying approach for multiblock data analysis, is extended to be also a unifying tool for multigroup data analysis. The versatility and usefulness of our approach is illustrated on two real datasets.

MSC:

62H20 Measures of association (correlation, canonical correlation, etc.)
90C25 Convex programming
90C90 Applications of mathematical programming

Software:

XLStat; RGCCA
Full Text: DOI

References:

[2] Carroll, J. D., A generalization of canonical correlation analysis to three or more sets of variables, Proceedings of the 76th Annual Convention of the American Psychological Association, 227-228 (1968)
[4] Chessel, D.; Hanafi, M., Analyses de la co-inertie de \(K\) nuages de points, Revue de Statistique Appliquée, 44, 35-60 (1996)
[5] De Roover, K.; Ceulemans, E.; Timmerman, M. E., How to perform multiblock component analysis in practice, Behavior Research Methods, 44, 41-56 (2012)
[6] De Roover, K.; Ceulemans, E.; Timmerman, M. E.; Onghena, P., A clusterwise simultaneous component method for capturing within-cluster differences in component variances and correlations, British Journal of Mathematical and Statistical Psychology, 66, 81-102 (2013) · Zbl 1406.91304
[7] De Roover, K.; Ceulemans, E.; Timmerman, M. E.; Vansteelandt, K.; Stouten, J.; Onghena, P., Clusterwise simultaneous component analysis for analyzing structural differences in multivariate multiblock data, Psychological Methods, 17, 100-119 (2012)
[8] De Roover, K.; Timmerman, M. E.; Van Mechelen, I.; Ceulemans, E., On the added value of multiset methods for three-way data analysis, Chemometrics and Intelligent Laboratory Systems, 129, 98-107 (2013)
[9] Eslami, E.; Qannari, E. M.; Kohler, A.; Bougeard, S., General overview of methods of analysis of multi-group datasets, Revue des Nouvelles Technologies de l’Information, 25, 108-123 (2013)
[10] Eslami, E.; Qannari, E. M.; Kohler, A.; Bougeard, S., Analyses factorielles de données structurées en groupes d’individus, Journal de la Société Française de Statistique, 154, 44-57 (2013) · Zbl 1316.62004
[11] Flury, B. N., Common principal components in \(k\) groups, Journal of the American Statistical Association, 79, 892-898 (1984)
[12] Flury, B. N., Two generalizations of common principal component model, Biometrika, 74, 59-69 (1987) · Zbl 0613.62076
[13] Hanafi, M., PLS path modelling: Computation of latent variables with the estimation mode B, Computational Statistics, 22, 275-292 (2007) · Zbl 1196.62103
[14] Hanafi, M.; Kiers, H. A.L., Analysis of \(K\) sets of data, with differential emphasis on agreement between and within sets, Computational Statistics & Data Analysis, 51, 1491-1508 (2006) · Zbl 1157.62422
[15] Hanafi, M.; Kohler, A.; Qannari, E. M., Shedding new light on hierarchical principal component analysis, Journal of Chemometrics, 24, 703-709 (2010)
[16] Hanafi, M.; Kohler, A.; Qannari, E. M., Connections between multiple co-inertia analysis and consensus principal component analysis, Chemometrics and Intelligent Laboratory Systems, 106, 37-40 (2011)
[17] Hanafi, M.; Mazerolles, E.; Dufour, E.; Qannari, E. M., Common components and specific weight analysis and multiple co-inertia analysis applied to the coupling of several measurement techniques, Journal of Chemometrics, 20, 172-183 (2006)
[18] Hassani, S.; Hanafi, M.; Qannari, E. M.; Kohler, A., Deflation strategies for multi-block principal component analysis revisited, Chemometrics and Intelligent Laboratory Systems, 120, 68 (2013)
[19] Horst, P., Relations among m sets of variables, Psychometrika, 26, 126-149 (1961) · Zbl 0099.35801
[20] Hotelling, H., Relations between two sets of variates, Biometrika, 28, 321-377 (1936) · Zbl 0015.40705
[21] Kettenring, J. R., Canonical analysis of several sets of variables, Biometrika, 58, 433-451 (1971) · Zbl 0225.62072
[22] Kiers, H. A.L., Hierarchical relations among three-ways methods, Psychometrika, 56, 449-470 (1991) · Zbl 0760.62059
[23] Kiers, H. A.L.; Ten Berge, J. M.F., Alternating least squares algorithms for simultaneous components analysis with equal component weight matrices in two or more populations, Psychometrika, 54, 467-473 (1989)
[24] Kiers, H. A.L.; Ten Berge, J. M.F., Hierarchical relations between methods for simultaneous component analysis and a technique for rotation to a simple simultaneous structure, British Journal of Mathematical and Statistical Psychology, 47, 109-126 (1994) · Zbl 0825.62512
[26] Krzanowski, W. J., Between-groups comparison of principal components, Journal of the American Statistical Association, 33, 164-168 (1979) · Zbl 0459.62042
[27] Krzanowski, W. J., Principal component analysis in the presence of group structure, Applied Statistics, 33, 164-168 (1984)
[28] Levin, J., Simultaneous factor analysis of several Gramian matrices, Psychometrika, 31, 413-419 (1966)
[29] Niesing, J., Simultaneous component and factor analysis methods for two or more groups: a comparative study, M&T series, Vol. 30 (1997), DSWO Press: DSWO Press Leiden · Zbl 0881.62066
[30] Pagès, J.; Asselin, C.; Morlat, R.; Robichet, J., Analyse factorielle multiple dans le traitement de données sensorielles: Application à des vins rouges de la vallée de la Loire, Sciences des aliments, 7, 549-571 (1987)
[31] Pearson, K., On lines and planes of closest fit to systems of points in space, Philosophical Magazine, 2, 559-572 (1901) · JFM 32.0710.04
[32] Smilde, A. K.; Westerhuis, J. A.; de Jong, S., A framework for sequential multiblock component methods, Journal of Chemometrics, 17, 323-337 (2003)
[33] Ten Berge, J. M.F., Generalized approaches to the MAXBET problem and the MAXDIFF problem, with applications to canonical correlations, Psychometrika, 53, 487-494 (1988) · Zbl 0726.62086
[35] Tenenhaus, M.; Esposito Vinzi, V., PLS regression, PLS path modeling and generalized Procrustean analysis: A combined approach for multiblock analysis, Journal of Chemometrics, 19, 145-153 (2005)
[36] Tenenhaus, M.; Esposito Vinzi, V.; Chatelin, Y.-M.; Lauro, C., PLS path modeling, Computational Statistics & Data Analysis, 48, 159-205 (2005) · Zbl 1429.62227
[37] Tenenhaus, M.; Hanafi, M., A bridge between PLS path modelling and multiblock data analysis, (Esposito Vinzi, V.; Henseler, J.; Chin, W.; Wang, H., Handbook of partial least squares (PLS): Concepts, methods and applications (2010), Springer Verlag), 99-123
[38] Tenenhaus, A.; Tenenhaus, M., Regularized generalized canonical correlation analysis, Psychometrika, 76, 257-284 (2011) · Zbl 1284.62753
[39] Timmerman, M. E.; Kiers, H. A.L., Four simultaneous component models for the analysis of multivariate time series from more than one subject to model intra-individual and inter-individual differences, Psychometrika, 68, 105-121 (2003) · Zbl 1306.62507
[40] Tucker, L. R., An inter-battery method of factor analysis, Psychometrika, 23, 111-136 (1958) · Zbl 0097.35102
[41] Van de Geer, J. P., Linear relations among k sets of variables, Psychometrika, 49, 70-94 (1984)
[42] van den Wollenberg, A. L., Redundancy analysis - An alternative to canonical correlation analysis, Psychometrika, 42, 207-219 (1977) · Zbl 0354.92050
[43] Van Deun, K.; Smilde, A. K.; van der Werf, M. J.; Kiers, H. A.L.; Van Mechelen, I., A structured overview of simultaneous component based data integration, BMC Bioinformatics, 10, 246 (2009)
[44] Westerhuis, J. A.; Kourti, T.; MacGregor, J. F., Analysis of multiblock and hierarchical PCA and PLS models, Journal of Chemometrics, 12, 301-321 (1998)
[45] Wold, H., Soft modeling: The basic design and some extensions, (Jöreskog, K. G.; Wold, H., Systems under indirect observation, Part 2 (1982), North-Holland: North-Holland Amsterdam), 1-54 · Zbl 0517.62065
[46] Wold, H., Partial least squares, (Kotz, S.; Johnson, N. L., Encyclopedia of statistical sciences, vol. 6 (1985), John Wiley & Sons: John Wiley & Sons New York), 581-591 · Zbl 0657.62002
[47] Wold, S.; Hellberg, S.; Lundstedt, T.; Sjostrom, M.; Wold, H., PLS modelling with latent variables in two or more dimensions, (Proceedings of the symposium on PLS model building: Theory and application (1987), Frankfurt am Main: Frankfurt am Main Germany), 1-21
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.