Abstract
Nonparametric density estimation aims to determine the sparsest model that explains a given set of empirical data and which uses as few assumptions as possible. Many of the currently existing methods do not provide a sparse solution to the problem and rely on asymptotic approximations. In this paper we describe a framework for density estimation which uses information-theoretic measures of model complexity with the aim of constructing a sparse density estimator that does not rely on large sample approximations. The effectiveness of the approach is demonstrated through an application to some well-known density estimation test cases.
Similar content being viewed by others
References
Abramson IS (1982) On bandwidth variation in kernel estimates—a square root law. Ann Stat 10:1217–1223
Basford KE, McLachlan GJ, York MG (1997) Modelling the distribution of stamp paper thickness via finite normal mixtures: the 1872 stamp issue of Mexico revisited. J Appl Stat 24:169–179
Ben-Tal A, Teboulle M (1987) Penalty functions and duality in stochastic programming via ϕ divergence functionals. Math Oper Res 12:224–240
Biernacki C, Celeux C, Govaert G (1998) Assessing a mixture model for clustering with the integrated classification likelihood. Technical report no. 3521. Rhône-Alpes, INRIA
Borwein JM, Lewis AS (1991) Duality relationships for entropy-like minimization problems. SIAM J Control Optim 29:325–338
Borwein JM, Lewis AS (2000) Convex analysis and nonlinear optimization: theory and examples. Springer, Berlin Heidelberg New York
Botev ZI (2005) Stochastic methods for optimization and machine learning. ePrintsUQ, BSc (Hons) thesis, Department of Mathematics, School of Physical Sciences, The University of Queensland. http://eprint.uq.edu.au/archive/00003377/
Botev ZI, Kroese DP (2008) Non-asymptotic bandwidth selection for density estimation of discrete data. Methodol Comput Appl Probab 10:435–451
Bowman AW (1985) A comparative study of some kernel-based nonparametric density estimators. J Stat Comput Simul 21:313–327
Bowman AW, Hall P, Titterington DM (1984) Cross-validation in nonparametric estimation of probabilities and probability densities. Biometrika 71:341–351
Boyd SP (2004) Convex optimization. Cambridge, New York
Celeux G, Soromenho G (1996) An entropy criterion for assessing the number of clusters in a mixture model. J Classif 13:195–212
Chib S (1982) Marginal likelihood from the gibbs output. J Am Stat Assoc 90:1313–1321
Chiu ST (1991) Bandwidth selection for kernel density estimation. Ann Stat 19:1883–1905
Csiszár I (1972) A class of measures of informativity of observation channels. Period Math Hung 2:191–213
Decarreau A, Hilhorst D, Lemarechal C, Navaza J (1992) Dual methods in entropy maximization. Applications to some problems in crystalography. SIAM J Optim 2:173–197
Devroye L, Gyofri L (1985) Nonparametric density estimation: the L 1 view. Wiley series in probability and mathematical statistics. Wiley, New York
Doucet A, de Freitas N, Gordon N (2001) Sequential Monte Carlo methods in practice. Springer, New York
Girolami M, He C (2003) Probability density estimation from optimally condensed data samples. IEEE Trans Pattern Anal Mach Intell 25(10):1253–1264
Girolami M, He C (2004) Novelty detection employing an l 2 optimal non-parametric density estimator. Pattern Recogn Lett 25:1389–1397
Hall P (1987) On Kullback–Leibler loss and density estimation. Ann Stat 15:1491–1519
Hall P, Turlach BA (1999) Reducing bias in curve estimation by use of weights. Comput Stat Data Anal 30:67–86
Havrda JH, Charvat F (1967) Quantification methods of classification processes: concepts of structural α entropy. Kybernatica 3:30–35
Izenman AJ, Sommer CJ (1988) Philatelic mixtures and multimodal densities. J Am Stat Assoc 83:941–953
Jaynes ET (1957) Information theory and statistical mechanics. Phys Rev 106:621–630
Jones MC, Marron JS, Sheather SJ (1996) Progress in data-based bandwidth selection for kernel density estimation. Comput Stat 11:337–381
Kapur JN (1989) Maximum entropy models in science and engineering. Wiley Eastern, New Delhi
Kapur JN (1994) Measures of information and their applications. Wiley, New Delhi
Kapur JN, Kesavan HK (1987) Generalized maximum entropy principle (with applications). Standford Educational Press, University of Waterloo, Waterloo
Kapur JN, Kesavan HK (1989) The generalized maximum entropy principle. IEEE Trans Syst Man Cybern 19:1042–1052
Kapur JN, Kesavan HK (1992) Entropy optimization principles with applications. Academic, New York
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86
Kesavan HK, Srikanth M, Roe PH (2000) Probability density function estimation using the minmax measure. IEEE Trans Syst Man Cybern Part C Appl Rev 30(1):77–83
Lehmann EL (1990) Model specification: the views of fisher and neyman, and later developments. Stat Sci 5:160–168
Loader CR (1999a) Bandwidth selection: classical or plug-in. Ann Stat 27:415–438
Loader CR (1999b) Local regression and likelihood. Springer, Berlin Heidelberg New York
Marron JS (1985) An asymptotically efficient solution to the bandwidth problem of kernel density estimation. Ann Stat 13:1011–1023
Marron JS, Wand MP (1992) Exact mean integrated squared error. Ann Stat 20:712–736
Marron JS, Jones MC, Park BU (1991) A simple root n bandwidth selector. Ann Stat 19(4):1919–1932
McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley, New York
Mclachlan GJ, Peel D (1997) Contribution to the discussion of paper by S. Richardson and P. J. Green. J R Stat Soc Ser B Stat Methodol 59:779–780
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Morejon RA, Principe JC (2004) Advanced search algorithms for information-theoretic learning with kernel-based estimators. IEEE Trans Neural Netw 15:874–884
Mukherjee S, Vapnik V (1999) Multivariate density estimation: a support vector machine approach. Massachusetts Institute of Technology. ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1653.ps
Pawitan Y (2001) In all likelihood: statistical modeling and inference using likelihood. Carendon, Oxford
Principe JC, Erdogmus D (2002) An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems. IEEE Trans Signal Process 50:1780–1786
Richardson S, Green PJ (1997) On bayesian analysis of mixtures with an unknown number of components (with discussion). J R Stat Soc Ser B Stat Methodol 59:731–792
Roeder K (1990) Density estimation with confidence sets exemplified by super-clusters and voids in the galaxies. J Am Stat Assoc 85:617–624
Rubinstein RY (2005) The stochastic minimum cross-entropy method for combinatorial optimization and rare-event estimation. Methodol Comput Appl Probab 7:5–50
Rubinstein RY, Kroese DP (2004) The cross-entropy method. Springer, Berlin Heidelberg New York
Rubinstein RY, Kroese DP (2007) Simulation and the Monte Carlo method, 2nd edn. Wiley, New York
Ruppert D, Cline DBH (1994) Bias reduction in kernel density estimation by smoothed empirical transformations. Ann Stat 22:185–210
Scott DW (1992) Multivariate density estimation. Theory, practice and visualization. Wiley, New York
Scott DW (2001) Parametric statistical modeling by minimum integrated square error. Technimetrics 43:274–285
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423, 623–659
Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London
Simonoff JS (1996) Smoothing methods in statistics. Springer, Berlin Heidelberg New York
Stone CJ (1984) An asymptotically optimal window selection rule for kernel density estimates. Ann Stat 12:1285–1297
Terrell GR, Scott DW (1992) Variable kernel density estimation. Ann Stat 20:1236–1265
Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New York
Tsallis C (1988) Possible generalization of Boltzmann-Gibbs statistics. J Stat Phys 52:479
Vapnik V (1998) Statistical learning theory. Wiley, New York
Wan FYM (1995) Introduction to the calculus of variations and its applications. Chapman and Hall, London
Wand MP, Jones MC (1995) Kernel smoothing. Chapman and Hall, London
Zhang P (1996) Nonparametric importance sampling. J Am Stat Assoc 91(435):1245–1253
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the Australian Research Council, under grant number DP0985177.
Rights and permissions
About this article
Cite this article
Botev, Z.I., Kroese, D.P. The Generalized Cross Entropy Method, with Applications to Probability Density Estimation. Methodol Comput Appl Probab 13, 1–27 (2011). https://doi.org/10.1007/s11009-009-9133-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11009-009-9133-7
Keywords
- Cross entropy
- Information theory
- Monte Carlo simulation
- Statistical modeling
- Kernel smoothing
- Functional optimization
- Bandwidth selection
- Calculus of variations