×

Detecting multivariate outliers using projection pursuit with particle swarm optimization. (English) Zbl 1436.62306

Lechevallier, Yves (ed.) et al., Proceedings of COMPSTAT’2010. 19th international conference on computational statistics, Paris, France, August 22–27, 2010. Keynote, invited and contributed papers. Heidelberg: Physica Verlag. 89-98 (2010).
Summary: Detecting outliers in the context of multivariate data is known as an important but difficult task and there already exist several detection methods. Most of the proposed methods are based either on the Mahalanobis distance of the observations to the center of the distribution or on a projection pursuit (PP) approach. In the present paper we focus on the one-dimensional PP approach which may be of particular interest when the data are not elliptically symmetric. We give a survey of the statistical literature on PP for multivariate outliers etection and investigate the pros and cons of the different methods. We also propose the use of a recent heuristic optimization algorithm called Tribes for multivariate outliers detection in the projection pursuit context.
For the entire collection see [Zbl 1202.62001].

MSC:

62H99 Multivariate analysis
62H25 Factor analysis and principal components; correspondence analysis
62-08 Computational methods for problems pertaining to statistics
90C59 Approximation methods and heuristics in mathematical programming
Full Text: DOI

References:

[1] ACHARD, V., LANDREVIE, A. and FORT, J.-C. (2004): Anomalies detection in hyperspectral imagery using projection pursuit algorithm In: L. Bruzzone (Ed): Image and Signal Processing for Remote Sensing X.Proceedings of the SPIE, Vol. 5573, 193-202.
[2] BARNETT, V. and LEWIS, T. (1994): Outliers in statistical data, third edition. Wiley. · Zbl 0801.62001
[3] BERRO, A., LARABI MARIE-SAINTE, S. and RUIZ-GAZEN, A. (2009): Genetic and Particle Swarm Optimization for Exploratory Projection Pursuit. Submited. · Zbl 1231.62001
[4] CAUSSINUS, H., FEKRI, M., HAKAM, S. and RUIZ-GAZEN, A. (2003): A monitoring display of Multivariate Outliers. Computational Statististics and Data Analysis 44, 237-252 · Zbl 1429.62217 · doi:10.1016/S0167-9473(03)00059-8
[5] CAUSSINUS, H. and RUIZ-GAZEN, A. (1990): Interesting projections of multidimensional data by means of generalized principal component analysis, COMPSTAT 90, Physica-Verlag, 121-126.
[6] CAUSSINUS, H. and RUIZ-GAZEN, A. (2009): Exploratory projection pursuit. In: G. Govaert: Data Analysis (Digital Signal and Image Processing series). Wiley, 67-89.
[7] CERIOLI, A., RIANI, M. and ATKINSON A. C. (2009): Controlling the size of multivariate outlier tests with the MCD estimator of scatter. Statistics and Computing 19, 341-353. · doi:10.1007/s11222-008-9096-5
[8] CLERC, M. (2005): L’optimization par essaims particulaires. Lavoisier. · Zbl 1078.90068
[9] COOK, D. , BUJA. A. and CABRERA, J. (1993): Projection Pursuit Indices Based on Orthogonal Function Expansions. Journal of Computational and Graphical Statistics 2, 225-250. · doi:10.2307/1390644
[10] COOK, D. and SWAYNE, D. F. (2007): Interactive and Dynamic Graphics for Data Analysis. Springer Verlag, New York. · Zbl 1154.62006 · doi:10.1007/978-0-387-71762-3
[11] COOREN, Y., CLERC, M. SIARRY, P. (2009): Performance evaluation of TRIBES, an adaptive particle swarm optimization algorithm. Swarm Intelligence 3, 149-178. · Zbl 1219.90152 · doi:10.1007/s11721-009-0026-8
[12] CROUX C. and RUIZ-GAZEN, A. (2005): High Breakdown Estimators for Principal Components: the Projection-Pursuit Approach Revisited. Journal of Multivariate Analysis, 95, 206-226. · Zbl 1065.62040 · doi:10.1016/j.jmva.2004.08.002
[13] CROUX, C., FILZMOSER, P. and OLIVEIRA, M. R. (2007): Algorithms for projection-pursuit robust principal components analysis. Chemometrics and Intelligent Laboratory Systems, 87, 218-225. · doi:10.1016/j.chemolab.2007.01.004
[14] DONOHO, D. L. (1982): Breakdown properties of multivariate location estimators. Ph.D. qualifying paper, Harvard University.
[15] EBERHART, R. C. and KENNEDY, J. (1995): A new optimizer using particle swarm theory. In: Proceedings of the Sixth International Symposium on Micromachine and Human Science. Nagoya, Japan, 39-43.
[16] FRIEDMAN, J. H. (1987): Exploratory projection pursuit. Journal of the American Statistical Association, 82, 249-266. · Zbl 0664.62060 · doi:10.2307/2289161
[17] FRIEDMAN J. H. and TUKEY J. W. (1974): A projection pursuit algorithm for exploratory data analysis. IEEE Transactions on Computers, Ser. C, 23, 881-889. · Zbl 0284.68079 · doi:10.1109/T-C.1974.224051
[18] GILLI, M. and SCHUMANN, E. (2009): Robust regression with optimization heuristics. Comisef Working paper series, WPS-011.
[19] GILLI, M. and WINKER, P. (2008): Review of heuristic optimization methods in econometrics. Comisef working papers series WPS-OO1.
[20] HADI, A. S., RAHMATULLAH IMON, A. H. M. and WERNER, M. (2009): Detection of outliers. Wiley Interdisciplinary Reviews: computational statistics, 1, 57-70. · doi:10.1002/wics.6
[21] HALL, P. (1989): On polynomial-based projection indexes for exploratory projection pursuit. The Annals of Statistics, 17, 589-605. · Zbl 0717.62051 · doi:10.1214/aos/1176347127
[22] HUBER, P. J. (1985): Projection pursuit. The Annals of Statistics, 13, 435-475. · Zbl 0595.62059 · doi:10.1214/aos/1176349519
[23] JOLLIFFE, I. T. (2002): Principal Component Analysis, second edition. Springer. · Zbl 1011.62064
[24] JONES, M. C. and SIBSON, R. (1987): What is projection pursuit? Journal of the Royal Statistical Society, 150, 1-37. · Zbl 0632.62059
[25] JUAN, J. and PRIETO, F. J. (2001): Using angles to identify concentrated multivariate outliers. Technometrics 43, 311-322 · doi:10.1198/004017001316975907
[26] KENNEDY, J. and EBERHART, R. C. (with Yuhui Shi) (2001): Swarm Intelligence. Morgan Kaufmann.
[27] LARABI MARIE-SAINTE, S., RUIZ-GAZEN, A. and BERRO, A. (2009): Tribes: une méthode d’optimization efficace pour révéler des optima locaux d’un indice de projection. Preprint.
[28] LI, G. and CHEN, Z. (1985): Projection-pursuit approach to robust dispersion matrices and principal components: primary theory and Monte Carlo. Journal of the American Statistical Association, 80, 759-766. · Zbl 0595.62060 · doi:10.2307/2288497
[29] MALPIKA, J. A., REJAS, J. G. and ALONSO, M. C. (2008): A projection pursuit algorithm for anomaly detection in hyperspectral imagery. Pattern recognition, 41, 3313-3327 · Zbl 1154.68325 · doi:10.1016/j.patcog.2008.04.014
[30] MARONNA, R. A. and YOHAI, V. J. (1995). The behavior of the Stahel-Donoho robust multivariate estimator. Journal of the American Statistical Association, 90 (429), 330-341. · Zbl 0820.62050 · doi:10.2307/2291158
[31] NASON, G. P. (1992): Design and choice of projections indices. Ph.D. dissertation, University of Bath.
[32] PEÑA, D. and PRIETO, F. (2001): Multivariate outlier detection and robust covariance matrix estimation. Technometrics, 43, 286-310 · doi:10.1198/004017001316975899
[33] ROUSSEEUW, P. J. and VAN ZOMEREN, B. H. (1990): Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association, 85, 633-639. · doi:10.2307/2289995
[34] RUIZ-GAZEN, A. (1993): Estimation robuste d’une matrice de dispersion et projections révélatrices. Ph.D. Dissertation. Université Paul Sabatier. Toulouse.
[35] SMETEK, T. E. and BAUER, K. W. (2008): A Comparison of Multivariate Outlier Detection Methods for Finding Hyperspectral Anomalies. Military Operations Research, 13, 19-44.
[36] STAHEL, W. A. (1981): Breakdown of covariance estimators. Research report 31. Fachgruppe für Statistik, E.T.H. Zürich.
[37] SUN, J. (1991): Significance levels in exploratory projection pursuit. Biometrika, 78(4), 759-769. · Zbl 0753.62067 · doi:10.1093/biomet/78.4.759
[38] TYLER, D. · Zbl 1250.62032 · doi:10.1111/j.1467-9868.2009.00706.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.