×

A von Mises-Fisher mixture model for clustering numerical and categorical variables. (English) Zbl 07597423

Summary: This work presents a mixture model allowing to cluster variables of different types. All variables being measured on the same \(n\) statistical units, we first represent every variable with a unit-norm operator in \({\mathbb{R}}^{n\times n}\) endowed with an appropriate inner product. We propose a von Mises-Fisher mixture model on the unit-sphere containing these operators. The parameters of the mixture model are estimated with an EM algorithm, combined with a K-means procedure to obtain a good starting point. The method is tested on simulated data and eventually applied to wine data.

MSC:

62H11 Directional data; spatial statistics
Full Text: DOI

References:

[1] Banerjee, A.; Dhillon, I.; Ghosh, J.; Sra, S., Clustering on the unit hypersphere using von Mises-Fisher distributions, J Mach Learn Res, 6, 1345-1382 (2005) · Zbl 1190.62116
[2] Bry X, Cucala L (2018) Classifying variable-structures: a general framework. arXiv:1804.08901
[3] Celeux, G.; Govaert, G., A classification EM algorithm for clustering and two stochastic versions, Comput Stat Data Anal, 14, 315-332 (1992) · Zbl 0937.62605 · doi:10.1016/0167-9473(92)90042-E
[4] Chavent, M.; Kuentz, V.; Liquet, B.; Saracco, J., ClustOfVar: an R package for the clustering of variables, J Stat Softw, 50, 1-16 (2012) · doi:10.18637/jss.v050.i13
[5] Chavent M, Kuentz V, Saracco J (2010) A partitioning method for the clustering of categorical variables. In: Proceedings of the 11th IFCS biennial conference and 33rd annual conference of the Gesellschaft für Klassifikation
[6] Escoufier Y (1970) Échantillonnage dans une population de variables aléatoires réelles. Publications de l’Institut de Statistique de l’Université de Paris 19:1-47 · Zbl 0264.62021
[7] Gomes A (1993) Reconnaissance de mélanges de lois de Bingham: application à la classification de variables. PhD Thesis, Université Montpellier 2
[8] Gomes P (1987) Distribution de Bingham sur la n-sphere: une nouvelle approche de l’analyse factorielle. PhD Thesis, Université Montpellier 2
[9] Grün, B.; Leisch, F., FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters, J Stat Softw, 28, 1-35 (2008) · doi:10.18637/jss.v028.i04
[10] Hornik, K.; Feinerer, I.; Kober, M.; Buchta, C., Spherical k-means clustering, J Stat Softw, 50, 1-22 (2012) · doi:10.18637/jss.v050.i10
[11] Hornik, K.; Grün, B., movMF: an R package for fitting mixtures of von Mises-Fisher distributions, J Stat Softw, 58, 1-31 (2014) · doi:10.18637/jss.v058.i10
[12] Hornik, K.; Grün, B., On maximum likelihood estimation of the concentration parameter of von Mises-Fisher distributions, Comput Stat, 29, 945-957 (2014) · Zbl 1306.65071 · doi:10.1007/s00180-013-0471-0
[13] Hubert, L.; Arabie, P., Comparing partitions, J Classif, 2, 193-218 (1985) · doi:10.1007/BF01908075
[14] Kaufman, L.; Rousseeuw, P., Finding groups in data: an introduction to cluster analysis (1990), Hoboken: Wiley, Hoboken · Zbl 1345.62009 · doi:10.1002/9780470316801
[15] Kiers, H., Simple structure in component analysis techniques for mixtures of qualitative and quantitative variables, Psychometrika, 56, 197-212 (1991) · Zbl 0850.62461 · doi:10.1007/BF02294458
[16] Mardia, K.; Jupp, P., Directional statistics (2000), Hoboken: Wiley, Hoboken · Zbl 0935.62065
[17] McLachlan, G.; Peel, D., Finite mixture models (2000), Hoboken: Wiley, Hoboken · Zbl 0963.62061 · doi:10.1002/0471721182
[18] Mood, A.; Graybill, F.; Boes, D., Introduction to the theory of statistics (2001), New Delhi: Tata McGraw-Hill, New Delhi · Zbl 0277.62002
[19] Qannari, EM; Vigneau, E.; Courcoux, Ph, Une nouvelle distance entre variables. Application en classification, Revue de Stat Appliquée, 46, 21-32 (1998)
[20] Robert, P.; Escoufier, Y., A unifying tool for linear multivariate statistical methods: the RV-coefficient, Appl Stat, 25, 257-265 (1976) · doi:10.2307/2347233
[21] Saracco J, Chavent M, Kuentz V (2010) Clustering of categorical variables around latent variables. Cahiers du GREThA UMR CNRS 5113, février 2010, Université Bordeaux 4
[22] Schwarz, G., Estimating the dimension of a model, Ann Stat, 6, 461-464 (1978) · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[23] Soffritti, G., Hierarchical clustering of variables: a comparison among strategies of analysis, Commun Stat Simul Comput, 28, 977-999 (1999) · Zbl 0968.62522 · doi:10.1080/03610919908813588
[24] Tschuprow AA (1939) Principles of the mathematical theory of correlation. W. Hodge & Co · JFM 65.0600.02
[25] Vigneau, E.; Qannari, EM, Clustering of variables around latent components, Commun Stat Simul Comput, 32, 1131-1150 (2003) · Zbl 1100.62582 · doi:10.1081/SAC-120023882
[26] Vigneau, E.; Qannari, EM; Sahmer, K.; Ladiray, D., Classification de variables autour de composantes latentes, Revue de Stat Appliquée, 54, 27-45 (2006)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.