×

Robust classification for skewed data. (English) Zbl 1284.62378

Summary: In this paper we propose a robust classification rule for skewed unimodal distributions. For low dimensional data, the classifier is based on minimizing the adjusted outlyingness to each group. In the case of high dimensional data, the robustified SIMCA method is adjusted for skewness. The robustness of the methods is investigated through different simulations and by applying it to some datasets.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

AS 307; ROBPCA; LIBRA
Full Text: DOI

References:

[1] Azzalini A, Dalla Valle A (1996) The multivariate skew-normal distribution. Biometrika 83: 715–726 · Zbl 0885.62062 · doi:10.1093/biomet/83.4.715
[2] Brys G, Hubert M, Rousseeuw PJ (2005) A robustification of independent component analysis. J Chemom 19: 364–375 · doi:10.1002/cem.940
[3] Brys G, Hubert M, Struyf A (2004) A robust measure of skewness. J Comput Graph Stat 13: 996–1017 · Zbl 1088.62135 · doi:10.1198/106186004X12632
[4] Cheng AY, Ouyang M (2001) On algorithms for simplicial depth. In: Proceedings 13th Canadian conference on computational geometry, pp 53–56
[5] Croux C, Dehon C (2001) Robust linear discriminant analysis using S-estimators. Can J Stat 29: 473–492 · Zbl 0987.62044 · doi:10.2307/3316042
[6] Donoho DL (1982) Breakdown properties of multivariate location estimators. PhD thesis, Harvard University
[7] Dutta S, Ghosh AK (2009) On robust classification using projection depth. Indian Statistical Institute, Technical report R11/2009
[8] Ghosh AK, Chaudhuri P (2005) On maximum depth and related classifiers. Scand J Stat Theory Appl 32(2): 327–350 · Zbl 1089.62075 · doi:10.1111/j.1467-9469.2005.00423.x
[9] Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, New York · Zbl 0973.62007
[10] He X, Fung WK (2000) High breakdown estimate for multiple populations with applications to discriminant analysis. J Multivar Anal 72: 151–162 · Zbl 0969.62045 · doi:10.1006/jmva.1999.1857
[11] Hubert M, Engelen S (2004) Robust PCA and classification in biosciences. Bioinformatics 20: 1728–1736 · doi:10.1093/bioinformatics/bth158
[12] Hubert M, Rousseeuw PJ, Vanden Branden K (2005) ROBPCA: a new approach to robust principal component analysis. Technometrics 47: 64–79 · doi:10.1198/004017004000000563
[13] Hubert M, Rousseeuw PJ, Verdonck T (2009) Robust PCA for skewed data. Comput Stat Data Anal 53: 2264–2274 · Zbl 1453.62116 · doi:10.1016/j.csda.2008.05.027
[14] Hubert M, Van der Veeken S (2008) Outlier detection for skewed data. J Chemom 22: 235–246 · doi:10.1002/cem.1123
[15] Hubert M, Van der Veeken S (2010) Fast and robust classifiers adjusted for skewness. In: Proceedings of Compstat 2010. Springer, Berlin · Zbl 1284.62378
[16] Hubert M, Van Driessen K (2004) Fast and robust discriminant analysis. Comput Stat Data Anal 45: 301–320 · Zbl 1429.62247 · doi:10.1016/S0167-9473(02)00299-2
[17] Johnson RA, Wichern DW (1998) Applied multivariate statistical analysis. Prentice Hall Inc., Englewood Cliffs
[18] Liu RY (1990) On a notion of data depth based on random simplices. Ann Stat 18(1): 405–414 · Zbl 0701.62063 · doi:10.1214/aos/1176347507
[19] Rousseeuw PJ, Ruts I (1996) Bivariate location depth. Appl Stat 45: 516–526 · Zbl 0905.62002 · doi:10.2307/2986073
[20] Rousseeuw PJ, Struyf A (1998) Computing location depth and regression depth in higher dimensions. Stat Comput 8: 193–203 · doi:10.1023/A:1008945009397
[21] Stahel WA (1981) Robuste schätzungen: infinitesimale optimalität und schätzungen von kovarianzmatrizen. PhD thesis, ETH Zürich · Zbl 0531.62036
[22] Suykens JAK, Van Gestel T, De Brabanter J, De Moor B, Vandewalle J (2002) Least squares support vector machines. World Scientific, Singapore · Zbl 1017.93004
[23] Tukey JW (1975) Mathematics and picturing of data. In: Proceedings of the international congress of mathematicians, vol 2, pp 523–531
[24] Vanden Branden K, Hubert M (2005) Robust classification in high dimensions based on the SIMCA method. Chemom Intell Lab Syst 79: 10–21 · doi:10.1016/j.chemolab.2005.03.002
[25] Verboven S, Hubert M (2005) LIBRA: a Matlab library for robust analysis. Chemom Intell Lab Syst 75: 127–136 · doi:10.1016/j.chemolab.2004.06.003
[26] Wold S (1976) Pattern recognition by means of disjoint principal component models. Pattern Recognit 8: 127–139 · Zbl 0336.68040 · doi:10.1016/0031-3203(76)90014-5
[27] Zuo Y, Serfling R (2000) General notions of statistical depth function. Ann Stat 28: 461–482 · Zbl 1106.62334 · doi:10.1214/aos/1016218226
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.