×

A triplot for multiclass classification visualisation. (English) Zbl 1468.62062

Summary: Quadratic discriminant analysis is used when the assumption of equal covariance matrices for linear discrimination does not hold. The Canonical Variate Analysis biplot is used for graphical visualisation to accompany linear discriminant analysis. However, since class specific covariance matrix estimates are needed for quadratic discrimination the canonical transformation cannot be used. An alternative method of visually representing the discrimination and classification process is proposed: representing the sample points, classification regions based on quadratic discriminant analysis and including information on the variables. The methodology is further extended to other forms of multiclass classification and illustrated for support vector machines, classification trees, \(k\)-nearest neighbours and latent class analysis. In all these triplots three aspects are represented simultaneously, allowing for the representation of the relationships between samples and variables, relative to the classification regions.

MSC:

62-08 Computational methods for problems pertaining to statistics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text: DOI

References:

[1] Agresti, A., Categorical data analysis, (2013), Wiley Hoboken · Zbl 1281.62022
[2] Aitchinson, J.; Dunsmore, R., Statistical prediction analysis, (1975), Cambridge University Press Cambridge · Zbl 0327.62043
[3] Aitchinson, J.; Greenacre, M. J., Biplots of compositional data, Appl. Stat., 51, 375-392, (2002) · Zbl 1111.62300
[4] Bandeen-Roche, K. D.; Miglioretti, L.; Zeger, S. L.; Rathouz, P. J., Latent variable regression for multiple discrete outcomes, J. Amer. Statist. Assoc., 92, 1375-1386, (1997) · Zbl 0912.62121
[5] Breiman, L., Bagging predictors, Mach. Learn., 24, 123-140, (1996) · Zbl 0858.68080
[6] Breiman, L.; Friedman, J. H.; Olshen, R. A.; Stone, C. J., Classification and regression trees, (1984), Wadsworth · Zbl 0541.62042
[7] Cover, T.; Hart, P., Nearest neighbor pattern classification, IEEE Trans. Inform. Theory, 13, 21-27, (1967) · Zbl 0154.44505
[8] Drew, A. L.; Lewis, J. B., Polca: an R package for polytomous variable latent class analysis, J. Stat. Softw., 42, 1-29, (2011)
[9] Flury, B., A first course in multivariate statistics, (1997), Springer-Verlag New York · Zbl 0879.62052
[10] Flury, L.; Boukai, B.; Flury, B. D., The discrimination subspace model, J. Amer. Statist. Assoc., 92, 758-766, (1997) · Zbl 0888.62063
[11] Freund, Y.; Schapire, R.; Abe, N., A short introduction to boosting, Japan. Soc. Artificial Intelligence, 14, 771-780, (1999)
[12] Friedman, J. H., Regularized discriminant analysis, J. Amer. Statist. Assoc., 84, 165-175, (1989)
[13] Gower, J. C.; Hand, D. J., Biplots, (1996), Chapman and Hall London · Zbl 0867.62053
[14] Gower, J. C.; Lubbe, S.; le Roux, N. J., Understanding biplots, (2011), Wiley Chichester
[15] Greenacre, M. J., Biplots in practice, (2010), Fundación BBVA Barcelona
[16] Groenen, P. J.F.; le Roux, N. J.; Gardner-Lubbe, S., Spline-based nonlinear biplots, Adv. Data Anal. Classif., 9, 219-238, (2015) · Zbl 1414.62209
[17] Huber, P. J., Robust estimation of a local parameter, Ann. Math. Stat., 35, 73-101, (1964) · Zbl 0136.39805
[18] Johnson, R. A.; Wichern, D. W., Applied multivariate statistical analysis, (2007), Pearson International Edition New York · Zbl 1269.62044
[19] Kruskal, J. B.; Wish, M., Multidimensional scaling, (1978), Sage Beverley Hills
[20] Lee, Y.; Lin, Y.; Wahba, G., Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data, J. Amer. Statist. Assoc., 99, 67-81, (2004) · Zbl 1089.62511
[21] Lichman, M., UCI machine learning repository, (2013), University of California, School of Information and Computer Science Irvine, CA
[22] Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F., 2014. e1071: Misc functions of the department of statistics, TU Wien. R package version 1.6-3. http://CRAN.R-project.org/package=e1071.
[23] Ripley, B. D., Pattern recognition and neural networks, (1996), Cambridge University Press Cambridge · Zbl 0853.62046
[24] Ripley, B.D., 2014. tree: Classification and regression trees. R package version 1.0-35. http://CRAN.R-project.org/package=tree.
[25] Simonoff, J. S., Smoothing methods in statistics, (1996), Springer-Verlag New York · Zbl 0859.62035
[26] Stevens, J. P., Applied multivariate statistics for the social sciences, (2012), Taylor and Francis Group New York
[27] Venables, W. N.; Ripley, B. D., Modern applied statistics with S, (2002), Springer New York · Zbl 1006.62003
[28] Weihs, C.; Ligges, U.; Luebke, K.; Raabe, N., Klar analyzing German business cycles, (Baier, D.; Decker, R.; Schmidt-Thieme, L., Data Analysis and Decision Support, (2005), Springer-Verlag Berlin), 335-343
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.