×

Finding cluster centers and sizes via multinomial parameterization. (English) Zbl 1329.62284

Summary: The clustering problem consists in dividing a data set into groups of observations that are similar within but different across. This paper presents a method for assessment the clusters centers and sizes in a nonlinear least squares optimization with multinomial parameterization. The method is especially useful for large data sets as it operates on the summary statistics only. This approach also works for the problem of finding clusters’ centers and sizes by the covariance matrix when the original data is not available. Estimation of the clusters centers and sizes can be followed by actual clustering. Example of application to marketing research problem is discussed.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text: DOI

References:

[2] (Balcazar, J. L.; Gionis, F. B.A.; Sebag, M., Machine Learning and Knowledge Discovery in Databases, Proceedings of the European Conference ECML PKDD (2010), Barcelona, Springer: Barcelona, Springer Spain), September 20-24 · Zbl 1197.68003
[3] Becker, S.; Le Cun, Y., Improving the convergence of back-propagation learning with second order methods, (Touretzky, D. S.; Hinton, G. E.; Sejnowski, T. J., Proceedings of the 1988 Connectionist Models Summer School (1988), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 29-37
[4] Bender, E. A., Mathematical Methods in Artificial Intelligence (2000), IEEE Computer Society Press: IEEE Computer Society Press Los Alamitos, CA
[5] Bishop, C. M., Pattern Recognition and Machine Learning (2006), Springer: Springer New York · Zbl 1107.68072
[6] Brusco, M. J.; Steinley, D., A comparison of heuristic procedures for minimum within-cluster sums of squares partitioning, Psychometrika, 73, 125-144 (2007) · Zbl 1291.62196
[7] Frey, B. J.; Dueck, D., Clustering by passing messages between data points, Science, 315, 972-976 (2007) · Zbl 1226.94027
[8] Friedman, J. H.; Meulman, J. J., Clustering objects on subsets of attributes, J. R. Stat. Soc. B, 66, 815-849 (2004) · Zbl 1060.62064
[9] Gan, G.; Ma, C.; Wu, J., Data Clustering: Theory, Algorithms, and Applications (2007), SIAM: SIAM Philadelphia · Zbl 1185.68274
[10] Hastie, T.; Tibshirani, R.; Friedman, J., The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2001), Springer: Springer New York · Zbl 0973.62007
[11] Heiser, W. J.; Groenen, P. J.F., Cluster differences scaling with a within-cluster loss component and a fuzzy successive approximation strategy to avoid local minima, Psychometrika, 62, 63-83 (1997) · Zbl 0889.92037
[12] Ladd, J. W., Linear probability functions and discriminant functions, Econometrica, 34, 873-885 (1966)
[13] Lipovetsky, S.; Conklin, M., Regression by data segments via discriminant analysis, J. Mod. Appl. Stat. Methods, 4, 63-74 (2005)
[14] Lipovetsky, S.; Conklin, M., Latent class regression model in IRLS approach, Math. Comput. Model., 42, 301-312 (2005) · Zbl 1079.62063
[15] Lipovetsky, S., PCA and SVD with nonnegative loadings, Pattern Recogn., 42, 68-76 (2009) · Zbl 1173.68676
[16] Lipovetsky, S., Linear regression with special coefficient features attained via parameterization in exponential, logistic, and multinomial-logit forms, Math. Comput. Model., 49, 1427-1435 (2009) · Zbl 1165.62327
[17] Lipovetsky, S., Total odds and other objectives for clustering via multinomial-logit model, Adv. Adapt. Data Anal., 4, 3 (2012)
[18] Lipovetsky, S., Additive and multiplicative mixed normal distributions and finding cluster centers, Int. J. Mach. Learning Cybern., 4, 1-11 (2013)
[19] (Liu, H.; Motoda, H., Computational Methods of Feature Selection (2008), Chapman & Hall/CRC: Chapman & Hall/CRC Boca Raton, FL) · Zbl 1130.62118
[20] Mirkin, B., Clustering for Data Mining: A Data Recovery Approach (2005), Chapman & Hall/CRC: Chapman & Hall/CRC Boca Raton, FL · Zbl 1083.68099
[21] Ray, S.; Lindsay, B. G., The topography of multivariate normal mixtures, Ann. Stat., 33, 2042-2065 (2005) · Zbl 1086.62066
[22] Ripley, B. D., Pattern Recognition and Neural Networks (1996), Cambridge University Press: Cambridge University Press Cambridge, UK · Zbl 0853.62046
[23] Szekely, G. J.; Rizzo, M. L., Hierarchical clustering via joint between-within distances: extending Ward’s minimum variance method, J. Classification, 22, 151-183 (2005) · Zbl 1336.62192
[24] Varmuza, K.; Filzmoser, P., Introduction to Multivariate Statistical Analysis in Chemometrics (2009), CRC: CRC Boca Raton, FL
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.