×

Robust distance measure to detect outliers for categorical data. (English) Zbl 07558787

Summary: Distance-based techniques in detecting outliers appears to be an effective tool in both univariate and multivariate data. However, the effectiveness of the same is yet to be firmly established in categorical data as it poses challenges due to polarization of cell frequencies. The purpose of this paper is to evolve a new distance-based measure to detect outliers in two-dimensional contingency tables. The new distance measure based on pivotal element is evaluated through a comparison with other suitable distance measures from the literature for its performance. The consistency of the four distance measures is examined through a simulation study followed by the application to real datasets.

MSC:

62-XX Statistics

Software:

SAS
Full Text: DOI

References:

[1] Agresti, A., Categorical Data Analysis (2002), New York: Wiley, New York · Zbl 1018.62002
[2] Aitchison, J., The statistical analysis of compositional data (1986), London: Chapman and Hall, London · Zbl 0688.62004
[3] Barnett, VD; Lewis, T., Outliers in statistical data (1994), New York: Wiley, New York · Zbl 0801.62001
[4] Bradu, D.; Hawkins, DM, Location of multiple outliers in two-way tables using tetrads, Technometrics, 24, 103-108 (1982)
[5] Brown, BM, Identification of the sources of significance in two-way contingency tables, J R Stat Soc Ser C (Appl Stat), 23, 405-413 (1974)
[6] Correa, JC; Velez, JI, Una nota de cuidado sobre el efecto de datos parcialmente faltantes en la prueba de independencia, Comunicaciones en Estadística, 7, 2, 189-199 (2014)
[7] Cuadras, CM; Cuadras, D.; Greenacre, MJ, A comparison of different methods for representing categorical data, Commun Stat Simul Comput, 35, 2, 447-459 (2006) · Zbl 1093.62061
[8] Friendly, M., Visualizing categorical data (2000), Cary: SAS Institute, Cary · Zbl 1429.62015
[9] Fuchs, C.; Kenett, R., A test for detecting outlying cells in the multinomial distribution and two-way contingency tables, J Am Stat Assoc Theory Methods Sect, 75, 370, 395-398 (1980) · Zbl 0462.62041
[10] Gallo, M., Discriminant partial least squares analysis on compositional data, Stat Model, 10, 1, 41-56 (2010) · Zbl 07256814
[11] Gallo, M., Tucker3 model for compositional data, Commun Stat Theory Methods, 44, 21, 4441-4453 (2015) · Zbl 1333.62159
[12] Greenacre, MJ, Clustering the rows and columns of a contingency table, J Classif, 5, 1, 39-51 (1988) · Zbl 0652.62053
[13] Grubbs, FE, Procedures for detecting outlying observations in samples, Technometrics, 11, 1, 1-21 (1969)
[14] Haberman, SJ, The analysis of residuals in cross-classified tables, Biometrics, 29, 205-220 (1973)
[15] Imon, AHMR, Identifying multiple influential observations in linear regression, J Appl Stat, 32, 9, 926-946 (2005) · Zbl 1121.62404
[16] Kateri, M., Contingency table analysis (2014), Berlin: Springer, Berlin · Zbl 1291.62012
[17] Kotze, TJW; Hawkins, DM, The identification of outliers in two-way contingency tables using \(2 \times 2\) subtables, Appl Stat, 33, 215-223 (1984)
[18] Kuhnt, S., Outlier identification procedures for contingency tables using maximum likelihood and L1 estimates, Scand J Stat, 31, 431-442 (2004) · Zbl 1063.62086
[19] Kuhnt, S.; Rapallo, F.; Rehage, A., Outlier Detection in Contingency Tables based on Minimal Patterns, Statistics and Computing, 24, 481-491 (2014) · Zbl 1325.62117
[20] Lee, AH; Yick, JS, A perturbation approach to outlier detection in two-way contingency tables, Aust N Z J Stat, 41, 3, 305-314 (1999) · Zbl 1055.62528
[21] R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing Vienna, Austria
[22] Rapallo, F., Outliers and patterns of outliers in contingency tables with algebraic statistics, Scand J Stat, 39, 4, 784-797 (2012) · Zbl 1253.62043
[23] Sajesh, TA; Srinivasan, MR, Outlier detection for high dimensional data using the Comedian approach, J Stat Comput Simul, 82, 745-757 (2012) · Zbl 1432.62164
[24] Simonoff, JS, Detecting outlying cells in two-way contingency tables via backwards stepping, Technometrics, 30, 3, 339-345 (1988)
[25] Sripriya, TP; Srinivasan, MR, Detection of outlying cells in two-way contingency tables, Stat Appl, 16, 2, 103-113 (2018)
[26] Subbiah, M.; Srinivasan, MR, Classification of \(2\times 2\) sparse data with zero cells, Stat Probab Lett, 78, 3212-3215 (2008) · Zbl 1489.62174
[27] Upton, GJG, Categorical data analysis by example (2017), New York: Wiley, New York
[28] Yick, JS; Lee, AH, Unmasking outliers in two-way contingency tables, Comput Stat Data Anal, 29, 69-79 (1998) · Zbl 1042.62556
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.