×

Clustering with mixtures of log-concave distributions. (English) Zbl 1445.62141

Summary: The EM algorithm is a popular tool for clustering observations via a parametric mixture model. Two disadvantages of this approach are that its success depends on the appropriateness of the assumed parametric model, and that each model requires a different implementation of the EM algorithm based on model-specific theoretical derivations. We show how this algorithm can be extended to work with the flexible, nonparametric class of log-concave component distributions. The advantages of the resulting algorithm are: first, it is not restricted to parametric models, so it no longer requires to specify such a model and its results are no longer sensitive to a misspecification thereof. Second, only one implementation of the algorithm is necessary. Furthermore, simulation studies based on the normal mixture model show that there seems to be no noticeable performance penalty of this more general nonparametric algorithm vis-a-vis the parametric EM algorithm in the special case where the assumed parametric model is indeed correct.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G07 Density estimation
Full Text: DOI

References:

[1] Eilers, P.H.C., Borgdorff, M.W., 2006. Non-parametric log-concave mixtures. Manuscript.; Eilers, P.H.C., Borgdorff, M.W., 2006. Non-parametric log-concave mixtures. Manuscript. · Zbl 1445.62070
[2] Fraley, C. F.; Raftery, A. E., Model-based clustering, discriminant analysis, and density estimation, J. Amer. Statist. Assoc., 97, 611-631 (2002) · Zbl 1073.62545
[3] Hastie, T. J.; Tibshirani, R. J., Discriminant analysis by Gaussian mixtures, J. Roy. Statist. Soc. Ser. B, 58, 155-176 (1996) · Zbl 0850.62476
[4] Hunter, D.R., Wang, S., Hettmansperger, T.P., 2006. Inference for mixtures of symmetric distributions. Ann. Statist., in press.; Hunter, D.R., Wang, S., Hettmansperger, T.P., 2006. Inference for mixtures of symmetric distributions. Ann. Statist., in press. · Zbl 1114.62035
[5] Jongbloed, G., The iterative convex minorant algorithm for nonparametric estimation, J. Comput. Graph. Statist., 7, 310-321 (1998)
[6] Lin, Y.; Jeon, Y., Discriminant analysis through a semi-parametric model, Biometrika, 90, 379-392 (2003) · Zbl 1034.62054
[7] McLachlan, G. J.; Krishnan, T., The EM Algorithm and Extensions (1997), Wiley: Wiley New York · Zbl 0882.62012
[8] McLachlan, G. J.; Peel, D., Finite Mixture Models (2000), Wiley: Wiley New York · Zbl 0963.62061
[9] Rufibach, K., 2006. Computing maximum likelihood estimators of a log-concave density function. J. Statist. Comput. Simul., in press.; Rufibach, K., 2006. Computing maximum likelihood estimators of a log-concave density function. J. Statist. Comput. Simul., in press. · Zbl 1146.62027
[10] Walther, G., Multiscale maximum likelihood analysis of a semiparametric model, with applications, Ann. Statist., 29, 1297-1319 (2001) · Zbl 1043.62043
[11] Walther, G., Detecting the presence of mixing with multiscale maximum likelihood, J. Amer. Statist. Assoc., 97, 508-513 (2002) · Zbl 1073.62533
[12] Walther, G., 2007. Oscillation analysis for the mixture complexity. Manuscript in preparation.; Walther, G., 2007. Oscillation analysis for the mixture complexity. Manuscript in preparation.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.