×

Numerical maximisation of likelihood: a neglected alternative to EM? (English) Zbl 1416.62152

Summary: There is by now a long tradition of using the EM algorithm to find maximum-likelihood estimates (MLEs) when the data are incomplete in any of a wide range of ways, even when the observed-data likelihood can easily be evaluated and numerical maximisation of that likelihood is available as a conceptually simple route to the MLEs. It is rare in the literature to see numerical maximisation employed if EM is possible. But with excellent general-purpose numerical optimisers now available free, there is no longer any reason, as a matter of course, to avoid direct numerical maximisation of likelihood. In this tutorial, I present seven examples of models in which numerical maximisation of likelihood appears to have some advantages over the use of EM as a route to MLEs. The mathematical and coding effort is minimal, as there is no need to derive and code the E and M steps, only a likelihood evaluator. In all the examples, the unconstrained optimiser nlm available in R is used, and transformations are used to impose constraints on parameters.
I suggest therefore that the following question be asked of proposed new applications of EM: Can the MLEs be found more simply and directly by using a general-purpose numerical optimiser?

MSC:

62F10 Point estimation
Full Text: DOI

References:

[1] Altman, R. M. (2007). Mixed hidden Markov models: An extension of the hidden Markov model to the longitudinal data setting. J. Amer. Statist. Assoc., 102, 201-210. · Zbl 1284.62803
[2] Altman, R. M. & PetkauA. J. (2005). Application of hidden Markov models to multiple sclerosis lesion count data. Statist. Med., 24, 2335-2344.
[3] Arslan, O., Constable, P. D. L. & Kent, J. T. (1993). Domains of convergence for the EM algorithm—a cautionary tale in a location estimation problem. Statist. Comput., 3, 103-108.
[4] Azzalini, A. & CapitanioA. (1999). Statistical applications of the multivariate skew normal distribution. J. R. Stat. Soc. Ser. B Stat. Methodol., 61(3), 579-602. · Zbl 0924.62050
[5] Best, D. J., Rayner, J. C. W. & ThasO. (2011). Smooth tests of fit for a mixture of two Poisson distributions. In Proceedings of the Fourth Annual ASEARC Conference, 17-18 February 2011, University of Western Sydney, Paramatta, Australia. Available at http://ro.uow.edu.au/asearc/20.
[6] Bishop, C. M. (2006). Pattern Recognition and Machine Learning. New York: Springer. · Zbl 1107.68072
[7] Boyd, S. P. & VandenbergheL. (2004). Convex Optimization. New York: Cambridge University Press. · Zbl 1058.90049
[8] Bulla, J. & Berzel, A. (2008). Computational issues in parameter estimation for stationary hidden Markov models. Comput. Statist., 23, 1-18.
[9] Carver, W. A. (1927). A genetic study of certain chlorophyll deficiencies in maize. Genetics, 12, 415-440.
[10] Clarke, C. A., Price Evans, D. A., McConnell, R. B. & Sheppard, P. M. (1959). Secretion of blood group antigens and peptic ulcer. Brit. Med. J., 1(5122), 603-607.
[11] Commenges, D., Prague, M. & DiakiteA. (2012). marqLevAlg: An algorithm for least‐squares curve fitting. R package version 1.0.
[12] Craig, B. A. & Sendi, P. P. (2002). Estimation of the transition matrix of a discrete‐time Markov chain. Health Econ., 11, 33-42.
[13] Davison, A. C. (2003). Statistical Models. Cambridge: Cambridge University Press. · Zbl 1044.62001
[14] Dempster, A. P., LairdN. M. & RubinD. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol., 39, 1-38. · Zbl 0364.62022
[15] Dick, N. P. & Bowden, D. C. (1973). Maximum likelihood estimation for mixtures of two normal distributions. Biometrics, 29, 781-790.
[16] Efron, B. & TibshiraniR. J. (1993). An Introduction to the Bootstrap. New York: Chapman & Hall. · Zbl 0835.62038
[17] EverittB. S. & HandD. J. (1981). Finite Mixture Distributions. London: Chapman & Hall. · Zbl 0466.62018
[18] ForsgrenA., GillP. E. & WrightM. H. (2002). Interior methods for nonlinear optimization. SIAM Rev., 44, 525-597. · Zbl 1028.90060
[19] HasselbladV. (1969). Estimation of finite mixtures of distributions from the exponential family. J. Amer. Statist. Assoc., 64, 1459-1471.
[20] HastieT., TibshiraniR. & FriedmanJ. (2009). The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd ed.New York: Springer. · Zbl 1273.62005
[21] JennrichR. I. & SchluchterM. D. (1986). Unbalanced repeated‐measures models with structured covariance matrices. Biometrics, 42, 805-820. · Zbl 0625.62052
[22] LangeK. (1995). A quasi‐Newton acceleration of the EM algorithm. Statist. Sinica, 5, 1-18. · Zbl 0824.62017
[23] LangeK. (2002). Mathematical and Statistical Methods for Genetic Analysis, 2nd ed.New York: Springer. · Zbl 0991.92017
[24] LangeK. (2004). Optimization. New York: Springer. · Zbl 1140.90004
[25] LangeK. L., LittleR. J. A. & TaylorJ. M. G. (1989). Robust statistical modeling using the t distribution. J. Amer. Statist. Assoc., 84, 881-896.
[26] LangrockR. (2012). Flexible latent‐state modelling of Old Faithful’s eruption inter‐arrival times in 2009. Aust. N. Z. J. Stat., 54, 261-279. · Zbl 1334.86011
[27] LangrockR., MacDonaldI. L. & ZucchiniW. (2012). Some nonstandard stochastic volatility models and their estimation using structured hidden Markov models. J. Empir. Finance, 19, 147-161.
[28] LevinsonS. E., RabinerL. R. & SondhiM. M. (1983). An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition. Bell System Tech. J., 62, 1035-1074. · Zbl 0507.68058
[29] LindstromM. L. & BatesD. M. (1988). Newton-Raphson and EM algorithms for linear mixed‐effects models for repeated‐measures data. J. Amer. Statist. Assoc., 83, 1014-1021. · Zbl 0671.65119
[30] MacKay, D. J. C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge: Cambridge University Press. · Zbl 1055.94001
[31] McLachlanG. J. & KrishnanT. (2008). The EM Algorithm and Extensions, 2nd ed.Hoboken, NJ: Wiley. · Zbl 1165.62019
[32] McLachlanG. J. & PeelD. (2000). Finite Mixture Models. New York: Wiley. · Zbl 0963.62061
[33] MelnykovV. & MaitraR. (2010). Finite mixture models and model‐based clustering. Stat. Surv., 4, 80-116. · Zbl 1190.62121
[34] MillarR. B. (2011). Maximum Likelihood Estimation and Inference: With Examples in R, SAS and ADMB. Chichester, UK: Wiley. · Zbl 1273.62012
[35] MonahanJ. F. (2011). Numerical Methods of Statistics, 2nd ed.New York: Cambridge University Press. · Zbl 1217.65022
[36] Mosimann, J. E. (1962). On the compound multinomial distribution, the multivariate β‐distribution, and correlations among proportions. Biometrika, 49, 65-82. · Zbl 0105.12502
[37] NarayananA. (1991). Algorithm AS 266: Maximum likelihood estimation of the parameters of the Dirichlet distribution. J. R. Stat. Soc. Ser. C. Appl. Stat., 40, 365-374.
[38] Numerical Algorithms Group. (2013). Email to author, 11 March 2013.
[39] R Development Core Team. (2011). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3‐900051‐07‐0, version 2.14.1.
[40] RaoC. R. (1973). Linear Statistical Inference and Its Applications, 2nd ed.New York: Wiley. · Zbl 0256.62002
[41] SahuS. K. & RobertsG. O. (1999). On convergence of the EM algorithm and the Gibbs sampler. Statist. Comput., 9, 55-64.
[42] ThistedR. A. (1988). Elements of Statistical Computing: Numerical Computation. New York: Chapman & Hall. · Zbl 0663.62001
[43] TitteringtonD. M., SmithA. F. M. & MakovU. E. (1985). Statistical Analysis of Finite Mixture Distributions. New York: Wiley. · Zbl 0646.62013
[44] TurnerR. (2008). Direct maximization of the likelihood of a hidden Markov model. Comput. Statist. Data Anal., 52, 4147-4160. · Zbl 1452.62606
[45] VenablesW. N. & RipleyB. D. (2002). Modern Applied Statistics with S, 4th ed.New York: Springer. · Zbl 1006.62003
[46] WelchL. R. (2003). Hidden Markov models and the Baum-Welch algorithm. IEEE Inform. Th. Soc. Newsl., 53, 1 + 10-13.
[47] WhitakerL. (1914). On the Poisson law of small numbers. Biometrika, 10, 36-71.
[48] WrightM. H. (2004). The interior‐point revolution in optimization: History, recent developments, and lasting consequences. Bull. Amer. Math. Soc. (N.S.), 42, 39-56. · Zbl 1114.90153
[49] WuT. T. & LangeK. (2010). The MM alternative to EM. Statist. Sci., 25, 492-505. · Zbl 1329.62106
[50] XuL. & JordanM. I. (1996). On convergence properties of the EM algorithm for Gaussian mixtures. Neural Comput., 8, 199-151.
[51] ZucchiniW. & MacDonaldI. L. (2009). Hidden Markov Models for Time Series: An Introduction Using R. London and Boca Raton, Florida: Chapman & Hall/CRC. · Zbl 1180.62130
[52] ZucchiniW., RaubenheimerD. & MacDonaldI. L. (2008). Modeling time series of animal behavior by means of a latent‐state model with feedback. Biometrics, 64, 807-815. · Zbl 1170.62408
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.