×

A note on switching eigenvalues under small perturbations. (English) Zbl 07905920

Summary: Sensitivity of eigenvectors and eigenvalues of symmetric matrix estimates to the removal of a single observation have been well documented in the literature. However, a complicating factor can exist in that the rank of the eigenvalues may change due to the removal of an observation, and with that so too does the perceived importance of the corresponding eigenvector. We refer to this problem as “switching of eigenvalues”. Since there is not enough information in the new eigenvalues, post observation removal, to indicate that this has happened, how do we know that this switching has occurred? In this article, we show that approximations to the eigenvalues can be used to help determine when switching may have occurred. We then discuss possible actions researchers can take based on this knowledge, for example making better choices when it comes to deciding how many principal components should be retained and adjustments to approximate influence diagnostics that perform poorly when switching has occurred. Our results are easily applied to any eigenvalue problem involving symmetric matrix estimators. We highlight our approach with application to two real data examples.

MSC:

62-XX Statistics

Software:

PCAtools

References:

[1] Allec, S. I., Sun, Y., Sun, J., Chang, C.-en. A, and Wong, B. M.. 2019. Heterogeneous CPU+GPU-enabled simulations for DFTB molecular dynamics of large chemical and biological systems. Journal of Chemical Theory and Computation15 (5):2807-15. doi:.
[2] Bénasséni, J.1990. Sensitivity coefficients for the subspaces spanned by principal components. Communications in Statistics- Theory and Methods19 (6):2021-34. doi:.
[3] Bénasséni, J.2018. A correction of approximations used in sensitivity study of principal component analysis. Computational Statistics33 (4): 1939-55. doi:. · Zbl 1417.62162
[4] Blighe, K., and Lun, A.. 2022. PCAtools: Everything principal components analysis, R package version 2.10.0. https://github.com/kevinblighe/PCAtools.
[5] Brodnjak-Vončina, D., Kodba, Z. C., and Novič, M.. 2005. Multivariate data analysis in classification of vegetable oils characterized by the content of fatty acids. Chemometrics and Intelligent Laboratory Systems75 (1):31-43. doi:.
[6] Cangelosi, R., and Goriely, A.. 2007. Component retention in principal component analysis with application to cDNA microarray data. Biology Direct2:2. doi:. PMC: 17229320
[7] Chatfield, C.2018. Introduction to multivariate analysis. Abingdon-on-Thames: Routledge.
[8] Cosnuau, A.2014. Computation on GPU of eigenvalues and eigenvectors of a large number of small Hermitian matrices. Procedia Computer Science29:800-10. doi:.
[9] Critchley, F.1985. Influence in principal components analysis. Biometrika72 (3):627-36. doi:. · Zbl 0608.62068
[10] Croux, C., and Haesbroeck., G.2000. Principal component analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika87 (3):603-18. doi:. · Zbl 0956.62047
[11] Devlin, S. J., Gnanadesikan, R., and Kettenring, J. R.. 1975. Robust estimation and outlier detection with correlation coefficients. Biometrika62 (3):531-45. doi:. · Zbl 0321.62053
[12] Gabriel, K. R.1980. Biplot display of multivariate matrices for inspection of data and diagnosis. Technical Report. Rochester University New York, Rochester, NY.
[13] Hampel, F. R.1974. The influence curve and its role in robust estimation. Journal of the American Statistical Association69 (346):383-93. doi:. · Zbl 0305.62031
[14] Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., and Stahel, W. A., 1986. Robust statistics: the approach based on influence functions, Vol. 196. New York: John Wiley & Sons. · Zbl 0593.62027
[15] Huber, P. J.1981. Regression. In Robust Statistics, ed. Huber, P. J., 153-98. New York: John Wiley & Sons. doi: · Zbl 0536.62025
[16] Jolliffe, I. T.2002. Principal component analysis. In Springer series in statistics, 2nd ed. New York: Springer. https://books.google.com.au/books?id=TtVF-ao4fI8C · Zbl 1011.62064
[17] Li, K.-C.1991. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association86 (414):316-27. doi:. · Zbl 0742.62044
[18] Li Wai Suen, C.2012. Influence diagnostics for high-dimensional principal component analysis. PhD diss., La Trobe University, Melbourne, Australia.
[19] North, G. R., Bell, T. L., Cahalan, R. F., and Moeng, F. J.. 1982. Sampling errors in the estimation of empirical orthogonal functions. Monthly Weather Review110 (7):699-706. doi:.
[20] Pearson, K.1901. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science2 (11):559-72. doi:. · JFM 32.0710.04
[21] Prendergast, L. A.2005. Influence functions for sliced inverse regression. Scandinavian Journal of Statistics32 (3):385-404. doi:. · Zbl 1088.62054
[22] Prendergast, L. A.2006. Detecting influential observations in sliced inverse regression analysis. Australian New Zealand Journal of Statistics48 (3):285-304. doi:. · Zbl 1108.62071
[23] Prendergast, L. A.2007. Implications of influence function analysis for sliced inverse regression and sliced average variance estimation. Biometrika94 (3):585-601. doi:. · Zbl 1135.62047
[24] Prendergast, L. A.2008. A note on sensitivity of principal component subspaces and the efficient detection of influential observations in high dimensions. Electronic Journal of Statistics2:454-67. doi:. · Zbl 1320.62140
[25] Prendergast, L. A, and Li Wai Suen, C.. 2011. A new and practical influence measure for subsets of covariance matrix sample principal components with applications to high dimensional datasets. Computational Statistics & Data Analysis55 (1):752-64. doi:. · Zbl 1247.62150
[26] Prendergast, L. A, and Smith, J. A.. 2010. Influence functions for dimension reduction methods: An example influence study of principal Hessian direction analysis. Scandinavian Journal of Statistics37 (4):588-611. doi:. · Zbl 1226.62064
[27] Prendergast, L. A, and Smith, J. A.. 2007. Sensitivity of principal Hessian direction analysis. Electronic Journal of Statistics1:253-67. doi:. · Zbl 1320.62061
[28] Shaker, A.2013. Combining dimension reduction methods. PhD diss., La Trobe University, Melbourne, Australia.
[29] Tanaka, Y.1988. Sensitivity analysis in principal component analysis: Influence on the subspace spanned by principal components. Communications in Statistics - Theory and Methods17 (9):3157-75. doi:. · Zbl 0696.62251
[30] Tanaka, Y., and Casta∼no-Tostado, E.. 1990. Quadratic perturbation expansions of certain functions of eigenvalues and eigenvectors and their application to sensitivity analysis in multivariate methods. Communications in Statistics - Theory and Methods19 (8):2943-65. doi:. · Zbl 0736.62051
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.