×

Numerical considerations and a new implementation for invariant coordinate selection. (English) Zbl 07669889

Summary: Invariant coordinate selection (ICS) is a multivariate data transformation and a dimension reduction method that can be useful in many different contexts. It can be used for outlier detection or cluster identification, and can be seen as an independent component or a non-Gaussian component analysis method. The usual implementation of ICS is based on a joint diagonalization of two scatter matrices, and may be numerically unstable in some ill-conditioned situations. We focus on one-step M-scatter matrices and propose a new implementation of ICS based on a pivoted QR factorization of the centered data set. This factorization avoids the direct computation of the scatter matrices and their inverse and brings numerical stability to the algorithm. Furthermore, the row and column pivoting leads to a rank revealing procedure that allows computation of ICS when the scatter matrices are not full rank. Several artificial and real data sets illustrate the interest of using the new implementation compared to the original one.

MSC:

62-04 Software, source code, etc. for problems pertaining to statistics
62-08 Computational methods for problems pertaining to statistics
62H99 Multivariate analysis
62P99 Applications of statistics
65F15 Numerical computation of eigenvalues and eigenvectors of matrices
65Y20 Complexity and performance of numerical algorithms

Software:

ICS; ICSOutlier; R

References:

[1] Alashwali, F. and Kent, J. T., The use of a common location measure in the invariant coordinate selection and projection pursuit, J. Multivariate Anal., 152 (2016), pp. 145-161, doi:10.1016/j.jmva.2016.08.007. · Zbl 1348.62184
[2] Archimbaud, A., Nordhausen, K., and Ruiz-Gazen, A., ICS for multivariate outlier detection with application to quality control, Comput. Statist. Data Anal., 128 (2018), pp. 184-199. · Zbl 1469.62016
[3] Archimbaud, A., Nordhausen, K., and Ruiz-Gazen, A., ICSOutlier: Unsupervised outlier detection for low-dimensional contamination structure, R J., 10 (2018), pp. 234-250, doi:10.32614/rj-2018-034.
[4] Argyris, J. H., The natural factor formulation of the stiffnesses for the matrix displacement method, Comput. Methods Appl. Mech. Engrg., 5 (1975), pp. 97-119. · Zbl 0291.73051
[5] Businger, P. A. and Golub, G. H., Linear least squares solutions by Householder transformations, Numer. Math., 7 (1965), pp. 269-276. · Zbl 0142.11503
[6] Campbell, N. and Mahon, R., A multivariate study of variation in two species of rock crab of the genus leptograpsus, Aust. J. Zool., 22 (1974), pp. 417-425.
[7] Cardoso, J.-F., Source separation using higher order moments, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, , IEEE Piscataway, NJ, 1989, pp. 2109-2112.
[8] Caussinus, H., Fekri, M., Hakam, S., and Ruiz-Gazen, A., A monitoring display of multivariate outliers, Comput. Statist. Data Anal., 44 (2003), pp. 237-252. · Zbl 1429.62217
[9] Caussinus, H. and Ruiz, A., Interesting projections of multidimensional data by means of generalized principal component analyses, in Compstat, Momirović, K. and Mildner, V., eds., Physica, Heidelberg, 1990, pp. 121-126.
[10] Caussinus, H. and Ruiz-Gazen, A., Classification and generalized principal component analysis, in Selected Contributions in Data Analysis and Classification, Brito, P., Cucumel, G., Bertrand, P., and de Carvalho, F., eds., Springer, Berlin, 2007, pp. 539-548. · Zbl 1181.68110
[11] Cox, A. J. and Higham, N. J., Stability of Householder QR factorization for weighted least squares problems, in Numerical Analysis 1997, Proceedings of the 17th Dundee Biennial Conference, , Griffiths, D. F., Higham, D. J., and Watson, G. A., eds., Addison Wesley Longman, Harlow, UK, 1998, pp. 57-73. · Zbl 0903.65036
[12] Critchley, F., Pires, A., and Amado, C., Principal Axis Analysis, Technical report 06/14, The Open University Milton Keynes, Milton Keynes, England, http://stats-www.open.ac.uk/technicalreports/PAA.pdf (2006).
[13] Drineas, P., Magdon-Ismail, M., Mahoney, M. W., and Woodruff, D. P., Fast approximation of matrix coherence and statistical leverage, J. Mach. Learn. Res., 13 (2012), pp. 3475-3506. · Zbl 1437.65030
[14] Eckart, C. and Young, G., The approximation of one matrix by another of lower rank, Psychometrika, 1 (1936), pp. 211-218. · JFM 62.1075.02
[15] Golub, G. H. and Van Loan, C. F., Matrix Computations, 4th ed., The Johns Hopkins University Press, Baltimore, MD, 2013. · Zbl 1268.65037
[16] Hammarling, S., Numerical solution of the stable, non-negative definite Lyapunov equation, IMA J. Numer. Anal., 2 (1982), pp. 303-323. · Zbl 0492.65017
[17] Ilmonen, P., Oja, H., and Serfling, R., On invariant coordinate system (ICS) functionals, Int. Stat. Rev., 80 (2012), pp. 93-110. · Zbl 1422.62175
[18] Jolliffe, I., Principal Component Analysis, 2nd ed., Springer, New York, 2002. · Zbl 1011.62064
[19] Kankainen, A., Taskinen, S., and Oja, H., Tests of multinormality based on location vectors and scatter matrices, Stat. Methods Appl., 16 (2007), pp. 357-379. · Zbl 1405.62062
[20] Loperfido, N., Some theoretical properties of two kurtosis matrices, with application to invariant coordinate selection, J. Multivariate Anal., 186 (2021), 104809. · Zbl 1476.62140
[21] Miettinen, J., Taskinen, S., Nordhausen, K., and Oja, H., Fourth moments and independent component analysis, Stat. Sci., 30 (2015), pp. 372-390. · Zbl 1332.62196
[22] Mirsky, L., Symmetric gauge functions and unitarily invariant norms, Q. J. Math., 11 (1960), pp. 50-59. · Zbl 0105.01101
[23] Nordhausen, K. and Oja, H., Independent component analysis: A statistical perspective, Wiley Interdiscip. Rev. Comput. Stat., 10 (2018), e1440, doi:10.1002/wics.1440. · Zbl 07910829
[24] Nordhausen, K., Oja, H., and Ollila, E., Robust independent component analysis based on two scatter matrices, Austrian J. Stat., 37 (2008), pp. 91-100.
[25] Nordhausen, K., Oja, H., and Ollila, E., Multivariate models and the first four moments, in Nonparametric Statistics and Mixture Models, Hunter, D., Richards, D., and Rosenberger, J., eds., World Scientific, Singapore, 2011, pp. 267-287. · Zbl 1414.62171
[26] Nordhausen, K., Oja, H., and Tyler, D. E., Tools for exploring multivariate data: The package ICS, J. Stat. Softw., 28 (2008), pp. 1-31, http://www.jstatsoft.org/v28/i06/.
[27] Nordhausen, K., Oja, H., and Tyler, D. E., Asymptotic and bootstrap tests for subspace dimension, J. Multivariate Anal., 188 (2022), 104830, doi:10.1016/j.jmva.2021.104830. · Zbl 1493.62317
[28] Nordhausen, K., Oja, H., Tyler, D. E., and Virta, J., Asymptotic and bootstrap tests for the dimension of the non-Gaussian subspace, IEEE Signal Process. Lett., 24 (2017), pp. 887-891.
[29] Nordhausen, K. and Ruiz-Gazen, A., On the usage of joint diagonalization in multivariate statistics, J. Multivariate Anal., 188 (2022), 104844, doi:10.1016/j.jmva.2021.104844. · Zbl 1493.62318
[30] Nordhausen, K. and Tyler, D. E., A cautionary note on robust covariance plug-in methods, Biometrika, 102 (2015), pp. 573-588, doi:10.1093/biomet/asv022. · Zbl 1452.62416
[31] Nordhausen, K. and Virta, J., An overview of properties and extensions of FOBI, Knowledge-Based Systems, 173 (2019), pp. 113-116.
[32] Oja, H., Sirkiä, S., and Eriksson, J., Scatter matrices and independent component analysis, Austrian J. Stat., 35 (2006), pp. 175-189.
[33] Powell, M. J. D. and Reid, J. K., On applying Householder transformations to linear least squares problems, in Information Processing 68, Proceedings of International Federation of Information Processing Congress, Edinburgh, 1968, , North Holland, Amsterdam, 1968, pp. 122-126. · Zbl 0194.47002
[34] R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, 2021, https://www.R-project.org/.
[35] Radojicic, U. and Nordhausen, K., Non-Gaussian component analysis: Testing the dimension of the signal subspace, in Analytical Methods in Statistics. AMISTAT 2019, Maciak, M., Pesta, M., and Schindler, M., eds., Springer, Cham, Switzerland, 2020, pp. 101-123. · Zbl 1455.62105
[36] Stewart, G. W., Determining rank in the presence of error, in Linear Algebra for Large Scale and Real-Time Applications, Springer, Cham, Switzerland, 1993, pp. 275-291. · Zbl 0813.65065
[37] Tyler, D. E., Critchley, F., Dümbgen, L., and Oja, H., Invariant coordinate selection, J. R. Stat. Soc. Ser. B Stat. Methodol., 71 (2009), pp. 549-592. · Zbl 1250.62032
[38] Vavasis, S. A., Stable finite elements for problems with wild coefficients, SIAM J. Numer. Anal., 33 (1996), pp. 890-916. · Zbl 0858.65112
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.