×

Regression based thresholds in principal loading analysis. (English) Zbl 1520.62077

Summary: Principal loading analysis is a dimension reduction method that discards variables which have only a small distorting effect on the covariance matrix. As a special case, principal loading analysis discards variables that are not correlated with the remaining ones. In multivariate linear regression on the other hand, predictors that are neither correlated with both the remaining predictors nor with the dependent variables have a regression coefficients equal to zero. Hence, if the goal is to select a number of predictors, variables that do not correlate are discarded as it is also done in principal loading analysis. That both methods select the same variables occurs not only for the special case of zero correlation however. We contribute conditions under which both methods share the same variable selection. Further, we extend those conditions to provide a choice for the threshold in principal loading analysis which only follows recommendations based on simulation results so far.

MSC:

62H25 Factor analysis and principal components; correspondence analysis
15A42 Inequalities involving eigenvalues and eigenvectors
15A18 Eigenvalues, singular values, and eigenvectors
62J05 Linear regression; mixed models

Software:

prinvars

References:

[1] Anderson, T. W., Asymptotic theory for principal component analysis, Ann. Math. Statist., 34, 1, 122-148 (1963) · Zbl 0202.49504
[2] Bauer, J. O., Correlation based principal loading analysis, (4th International Conference on Mathematics and Statistics (2021), Association for Computing Machinery: Association for Computing Machinery New York, NY, USA), 27-34
[3] Bauer, J. O.; Drabant, B., Principal loading analysis, Journal of Multivariate Analysis, 184 (2021) · Zbl 1473.62201
[4] Bauer, J. O.; Holzapfel, R., Prinvars: Principal variables (2022), R package version 0.1.1
[5] Dauxois, J.; Pousse, A.; Romain, Y., Asymptotic theory for the principal component analysis of a vector random function: Some applications to statistical inference, J. Multivariate Anal., 12, 1, 136-154 (1982) · Zbl 0539.62064
[6] Davis, C.; Kahan, W. M., The rotation of eigenvectors by a perturbation. III, SIAM J. Numer. Anal., 7, 1, 1-46 (1970) · Zbl 0198.47201
[7] Ferrara, C.; Martella, F.; Vichi, M., Dimensions of well-being and their statistical measurements, (Alleva, G.; Giommi, A., Topics in Theoretical and Applied Statistics (2016), Springer International Publishing: Springer International Publishing Cham), 85-99 · Zbl 1364.62284
[8] Hawkins, D. M., On the investigation of alternative regressions by principal component analysis, J. R. Statist. Soc.. Ser. C (Appl. Statist.), 22, 3, 275-286 (1973)
[9] Hocking, R. R., The analysis and selection of variables in linear regression, Biometrics, 32, 1, 1-49 (1976) · Zbl 0328.62042
[10] Hotelling, H., Analysis of a complex of statistical variables into principal components, J. Educ. Psychol., 24, 6, 417-441 (1933) · JFM 59.1182.04
[11] Hotelling, H., The relations of the newer multivariate statistical methods to factor analysis, Br. J. Statist. Psychol., 10, 69-79 (1957)
[12] Ipsen, I. C.F.; Nadler, B., Refined perturbation bounds for eigenvalues of hermitian and non-hermitian matrices, SIAM J. Matrix Anal. Appl., 31, 1, 40-53 (2009) · Zbl 1189.15022
[13] Johnson, R. A.; Wichern, D. W., Applied Multivariate Statistical Analysis (2007), Pearson Prentice Hall: Pearson Prentice Hall New Jersey · Zbl 1269.62044
[14] Kendall, M., A Course in Multivariate Analysis (1957), Griffin: Griffin London
[15] Kollo, T.; Neudecker, H., Asymptotics of eigenvalues and unit-length eigenvectors of sample variance and correlation matrices, J. Multivariate Anal., 47, 2, 283-300 (1993) · Zbl 0790.62055
[16] Mansfield, E. R.; Webster, J. T.; Gunst, R. F., An analytic variable selection technique for principal component regression, J. R. Statist. Soc.. Ser. C (Appl. Statist.), 26, 1, 34-40 (1977)
[17] Mardia, K.; Kent, J.; Bibby, J., Multivariate Analysis (1979), Academic Press: Academic Press San Diego · Zbl 0432.62029
[18] Muller, K. E.; Peterson, B. L., Practical methods for computing power in testing the multivariate general linear hypothesis, Comput. Statist. Data Anal., 2, 2, 143-158 (1984) · Zbl 0571.65119
[19] Neudecker, H.; Wesselman, A., The asymptotic variance matrix of the sample correlation matrix, Linear Algebra Appl., 127, 589-599 (1990) · Zbl 0716.62025
[20] Pearson, K., On lines and planes of closest fit to systems of points in space, Philos. Mag., 2, 559-572 (1901) · JFM 32.0246.07
[21] Ramirez-Figueroa, J. A.; Carlos Martin-Barreiro, C.; Nieto-Librero, A. B.; Leiva, V.; Galindo-Villardón, M. P., A new principal component analysis by particle swarm optimization with an environmental application for data science, Stoch. Environ. Res. Risk Assess. (2021)
[22] Stewart, G. W.; Sun, J., Matrix Perturbation Theory (1990), Academic Press: Academic Press Boston · Zbl 0706.65013
[23] Vichi, M.; Saporta, G., Clustering and disjoint principal component analysis, Comput. Statist. Data Anal., 53, 8, 3194-3208 (2009) · Zbl 1453.62230
[24] Vigneau, E.; Qannari, E. M., Clustering of variables around latent components, Comm. Statist. Simulation Comput., 32, 4, 1131-1150 (2003) · Zbl 1100.62582
[25] Webster, J. T.; Gunst, R. F.; Mason, R. L., Latent root regression analysis, Technometrics, 16, 4, 513-522 (1974) · Zbl 0294.62081
[26] Yu, Y.; Wang, T.; Samworth, R. J., A useful variant of the Davis-Kahan theorem for statisticians, Biometrika, 102, 2, 315-323 (2015) · Zbl 1452.15010
[27] Zou, H.; Hastie, T.; Tibshirani, R., Sparse principal component analysis, J. Comput. Graph. Statist., 15, 2, 265-286 (2006)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.