×

Fitting a Gaussian mixture model through the Gini index. (English) Zbl 1504.62082

Summary: A linear combination of Gaussian components is known as a Gaussian mixture model. It is widely used in data mining and pattern recognition. In this paper, we propose a method to estimate the parameters of the density function given by a Gaussian mixture model. Our proposal is based on the Gini index, a methodology to measure the inequality degree between two probability distributions, and consists in minimizing the Gini index between an empirical distribution for the data and a Gaussian mixture model. We will show several simulated examples and real data examples, observing some of the properties of the proposed method.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G07 Density estimation

Software:

PRMLT; EMD
Full Text: DOI

References:

[1] Bassetti, F., Bodini, A. and Regazzini, E. (2006). On minimum Kantorovich distance estimators, Statistics and Probability Letters76(12): 1298-1302. · Zbl 1090.62030
[2] Bishop, C.M. (2006). Pattern Recognition and Machine Learning, Springer, New York. · Zbl 1107.68072
[3] Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society: Series B (Methodological)39(1): 1-22. · Zbl 0364.62022
[4] Elkan, C. (1997). Boosting and naive Bayesian learning, Proceedings of the International Conference on Knowledge Discovery and Data Mining, Newport Beach, USA.
[5] Flach, P.A. and Lachiche, N. (2004). Naive Bayesian classification of structured data, Machine Learning57(3): 233-269. · Zbl 1079.68085
[6] Giorgi, G.M. and Gigliarano, C. (2017). The Gini concentration index: A review of the inference literature, Journal of Economic Surveys31(4): 1130-1148.
[7] Greenspan, H., Ruf, A. and Goldberger, J. (2006). Constrained Gaussian mixture model framework for automatic segmentation of MR brain images, IEEE Transactions on Medical Imaging25(9): 1233-1245.
[8] Kłopotek, R., Kłopotek, M. and Wierzchoń, S. (2020). A feasible k-means kernel trick under non-Euclidean feature space, International Journal of Applied Mathematics and Computer Science30(4): 703-715, DOI: 10.34768/amcs-2020-0052. · Zbl 1467.62089
[9] Kulczycki, P. (2018). Kernel estimators for data analysis, in M. Ram and J.P. Davim (Eds), Advanced Mathematical Techniques in Engineering Sciences, CRC/Taylor & Francis, Boca Raton, pp. 177-202.
[10] López-Lobato, A.L. and Avendaño-Garrido, M.L. (2020). Using the Gini index for a Gaussian mixture model, in L. Martínez-Villaseñor et al. (Eds), Advances in Computational Intelligence. MICAI 2020, Lecture Notes in Computer Science, Vol. 12469, Springer, Cham, pp. 403-418.
[11] Mao, C., Lu, L. and Hu, B. (2020). Local probabilistic model for Bayesian classification: A generalized local classification model, Applied Soft Computing93: 106379.
[12] Meng, X.-L. and Rubin, D.B. (1994). On the global and componentwise rates of convergence of the EM algorithm, Linear Algebra and its Applications199(Supp. 1): 413-425. · Zbl 0818.65153
[13] Povey, D., Burget, L., Agarwal, M., Akyazi, P., Kai, F., Ghoshal, A., Glembek, O., Goel, N., Karafiát, M., Rastrow, A., Rose, R., Schwarz, P. and Thomas, S. (2011). The subspace Gaussian mixture model: A structured model for speech recognition, Computer Speech & Language25(2): 404-439.
[14] Rachev, S., Klebanov, L., Stoyanov, S. and Fabozzi, F. (2013). The Methods of Distances in the Theory of Probability and Statistics, Springer, New York, pp. 659-663. · Zbl 1280.60005
[15] Reynolds, D.A. (2009). Gaussian mixture models, in S.Z. Li (Ed.), Encyclopedia of Biometrics, Springer, New York, pp. 659-663.
[16] Rubner, Y., Tomasi, C. and Guibas, L.J. (2000). The Earth mover’s distance as a metric for image retrieval, International Journal of Computer Vision40(2): 99-121. · Zbl 1012.68705
[17] Singh, R., Pal, B.C. and Jabr, R.A. (2009). Statistical representation of distribution system loads using Gaussian mixture model, IEEE Transactions on Power Systems25(1): 29-37.
[18] Torres-Carrasquillo, P.A., Reynolds, D.A. and Deller, J.R. (2002). Language identification using Gaussian mixture model tokenization, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, USA, pp. I-757.
[19] Ultsch, A. and Lötsch, J. (2017). A data science based standardized Gini index as a Lorenz dominance preserving measure of the inequality of distributions, PloS One12(8): e0181572.
[20] Vaida, F. (2005). Parameter convergence for EM and MM algorithms, Statistica Sinica15(2005): 831-840. · Zbl 1087.62035
[21] Villani, C. (2003). Topics in Optimal Transportation, American Mathematical Society, Providence. · Zbl 1106.90001
[22] Xu, L. and Jordan, M.I. (1996). On convergence properties of the EM algorithm for Gaussian mixtures, Neural Computation8(1): 129-151.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.