×

Approximate profile maximum likelihood. (English) Zbl 1441.62041

Summary: We propose an efficient algorithm for approximate computation of the profile maximum likelihood (PML), a variant of maximum likelihood maximizing the probability of observing a sufficient statistic rather than the empirical sample. The PML has appealing theoretical properties, but is difficult to compute exactly. Inspired by observations gleaned from exactly solvable cases, we look for an approximate PML solution, which, intuitively, clumps comparably frequent symbols into one symbol. This amounts to lower-bounding a certain matrix permanent by summing over a subgroup of the symmetric group rather than the whole group during the computation. We extensively experiment with the approximate solution, and the empirical performance of our approach is competitive and sometimes significantly better than state-of-the-art performances for various estimation problems.

MSC:

62B05 Sufficient statistics and fields
62-08 Computational methods for problems pertaining to statistics
90C39 Dynamic programming

References:

[1] Jayadev Acharya, Alon Orlitsky, and Shengjun Pan. The maximum likelihood probability of unique-singleton, ternary, and length-7 patterns. InInformation Theory (ISIT), 2009 IEEE International Symposium on, pages 1135-1139. IEEE, 2009.
[2] Jayadev Acharya, Hirakendu Das, Ashkan Jafarpour, Alon Orlitsky, and Shengjun Pan. Competitive closeness testing. InProceedings of the 24th Annual Conference on Learning Theory, pages 47-68, 2011.
[3] Jayadev Acharya, Hirakendu Das, Alon Orlitsky, and Ananda Theertha Suresh. A unified maximum likelihood approach for estimating symmetric properties of discrete distributions. InInternational Conference on Machine Learning, pages 11-21, 2017.
[4] Dragi Anevski, Richard D Gill, and Stefan Zohren. Estimating a probability mass function with unknown labels.The Annals of Statistics, 45(6):2708-2735, 2017. · Zbl 1386.62009
[5] FC Auluck. On partitions of bipartite numbers. InMathematical Proceedings of the Cambridge Philosophical Society, volume 49, pages 72-83. Cambridge Univ Press, 1953. · Zbl 0050.04102
[6] I B´ar´any and AM Vershik. On the number of convex lattice polytopes.Geometric and Functional Analysis, 2(4):381-393, 1992. · Zbl 0772.52010
[7] D Basu. On partial sufficiency: A review. InSelected Works of Debabrata Basu, pages 291-303. Springer, 2011.
[8] Tugkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D Smith, and Patrick White. Testing that distributions are close. InFoundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on, pages 259-269. IEEE, 2000. · Zbl 1281.68227
[9] Allan Birnbaum. On the foundations of statistical inference.Journal of the American Statistical Association, 57(298):269-306, 1962. · Zbl 0107.36505
[10] David A Blackwell and Meyer A Girshick.Theory of games and statistical decisions. Courier Corporation, 1979. · Zbl 0439.62008
[11] Yuheng Bu, Shaofeng Zou, Yingbin Liang, and Venugopal V Veeravalli. Estimation of kl divergence: Optimal minimax rate.arXiv preprint arXiv:1607.02653, 2016. · Zbl 1390.94614
[12] Chun Lam Chan, Winston Fernandes, Navin Kashyap, and Majunath Krishnapur. Phase transitions for the uniform distribution in the pattern maximum likelihood problem and its Bethe approximation.SIAM J. Discrete Math, 31:597-631, 2015. · Zbl 1370.68221
[13] Bradley Efron. Maximum likelihood and decision theory.The Annals of Statistics, pages 340-356, 1982. · Zbl 0494.62004
[14] Winston Fernandes and Navin Kashyap. A phase transition for the uniform distribution in the pattern maximum likelihood problem. InITW, 2013. · Zbl 1370.68221
[15] Jaroslav H´ajek. On basic concepts of statistics. InProceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probabilities, volume 1, pages 139-162, 1967. · Zbl 0214.45704
[16] Yanjun Han, Jiantao Jiao, and Tsachy Weissman. Minimax rate-optimal estimation of divergences between discrete distributions.arXiv preprint arXiv:1605.09124, 2016a. · Zbl 1359.94097
[17] Yanjun Han, Jiantao Jiao, and Tsachy Weissman. Minimax rate-optimal estimation of KL divergence between discrete distributions. InInformation Theory and Its Applications (ISITA), 2016 International Symposium on, pages 256-260. IEEE, 2016b.
[18] Godfrey H Hardy and Srinivasa Ramanujan. Asymptotic formulaæ in combinatory analysis. Proceedings of the London Mathematical Society, 2(1):75-115, 1918. · JFM 46.0198.04
[19] David A Harville. Maximum likelihood approaches to variance component estimation and to related problems.Journal of the American Statistical Association, 72(358):320-338, 1977. · Zbl 0373.62040
[20] Jiantao Jiao, Kartik Venkat, Yanjun Han, and Tsachy Weissman. Minimax estimation of functionals of discrete distributions.Information Theory, IEEE Transactions on, 61(5): 2835-2885, 2015. · Zbl 1359.62104
[21] Jiantao Jiao, Kartik Venkat, and Tsachy Weissman. Relations between information and estimation in discrete-time L´evy channels.IEEE Transactions on Information Theory, 63(6):3579-3594, 2017. · Zbl 1369.94409
[22] Jiantao Jiao, Yanjun Han, and Tsachy Weissman. Minimax estimation of theL1distance. IEEE Transactions on Information Theory, 64(10):6672-6706, 2018. · Zbl 1401.94040
[23] A Kolmogoroff. Sur lestimation statistique des parametres be la loi de gauss.Izvestiya Rossiiskoi Akademii Nauk. Seriya Matematicheskaya, 6(1):3-32, 1942. · Zbl 0063.03293
[24] L Le. Sufficiency and approximate sufficiency.The Annals of Mathematical Statistics, pages 1419-1455, 1964. · Zbl 0129.11202
[25] Lucien Le Cam. Maximum likelihood: an introduction.Statistics Branch, Department of Mathematics, University of Maryland, 1979. · Zbl 0715.62045
[26] Erich Leo Lehmann and George Casella.Theory of point estimation, volume 31. Springer, 1998. · Zbl 0916.62017
[27] Susan A Murphy and Aad W Van der Vaart. On profile likelihood.Journal of the American Statistical Association, 95(450):449-465, 2000. · Zbl 0995.62033
[28] OEIS Foundation Inc.The online encyclopedia of integer sequences (OEIS)-A007716. https://oeis.org/A007716. Accessed: 2017-05-16.
[29] Alon Orlitsky, Sajama, Narayana Santhanam, Krishnamurthy Viswanathan, and Junan Zhang. Algorithms for modeling distributions over large alphabets. InInformation Theory (ISIT), 2004 IEEE International Symposium on, page 304. IEEE, 2004a.
[30] Alon Orlitsky, Narayana P Santhanam, Krishnamurthy Viswanathan, and Junan Zhang. On modeling profiles instead of values. InProceedings of the 20th conference on Uncertainty in artificial intelligence, pages 426-435. AUAI Press, 2004b. · Zbl 1309.94056
[31] Daniel P Palomar and Sergio Verd´u. Lautum information.IEEE transactions on information theory, 54(3):964-975, 2008. · Zbl 1306.94010
[32] Liam Paninski. Estimation of entropy and mutual information.Neural Computation, 15 (6):1191-1253, 2003. · Zbl 1052.62003
[33] H Desmond Patterson and Robin Thompson. Recovery of inter-block information when block sizes are unequal.Biometrika, 58(3):545-554, 1971. · Zbl 0228.62046
[34] Dmitri S. Pavlichin, Jiantao Jiao, and Tsachy Weissman. Approximate profile maximum likelihood (software), 2017. URLhttps://doi.org/10.5281/zenodo.1043617. URL https://github.com/dmitrip/PML. · Zbl 1441.62041
[35] Aditi Raghunathan, Greg Valiant, and James Zou. Estimating the unseen from multiple populations.arXiv preprint arXiv:1707.03854, 2017.
[36] Gregory Valiant and Paul Valiant. The power of linear estimators. InFoundations of Computer Science (FOCS), 2011 IEEE 52nd Annual Symposium on, pages 403-412. IEEE, 2011a. · Zbl 1292.68159
[37] Gregory Valiant and Paul Valiant. Estimating the unseen: Ann/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs. Inthe 43rd ACM Symposium on Theory of Computing (STOC), 2011b. · Zbl 1288.68186
[38] Gregory Valiant and Paul Valiant. Estimating the unseen: improved estimators for entropy and other properties.InAdvances in Neural Information Processing Systems, pages 2157-2165, 2013. · Zbl 1426.62107
[39] L. G. Valiant. The complexity of computing the permanent.Theoretical Computer Science, 8:189-201, 1979. · Zbl 0415.68008
[40] Paul Valiant.Testing Symmetric Properties of Distributions. PhD thesis, Massachusetts Institute of Technology, 2008. · Zbl 1231.68280
[41] SS Vallender. Calculation of the wasserstein distance between probability distributions on the line.Theory of Probability & Its Applications, 18(4):784-786, 1974. · Zbl 0351.60009
[42] Aad W Van der Vaart.Asymptotic statistics, volume 3. Cambridge university press, 2000. · Zbl 1013.62031
[43] Pascal O. Vontobel. The Bethe approximation of the pattern maximum likelihood distribution. InISIT, 2012.
[44] Abraham Wald.Statistical decision functions.Wiley, 1950. · Zbl 0040.36402
[45] Haizhou Wang and Mingzhou Song. Ckmeans.1d.dp: Optimal k-means clustering in one dimension by dynamic programming.The R Journal, 3(2):29-33, 2011.
[46] Yihong Wu and Pengkun Yang. Chebyshev polynomials, moment matching, and optimal estimation of the unseen.arXiv preprint arXiv:1504.01227, 2015. · Zbl 1418.62127
[47] Yihong Wu and Pengkun Yang.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.