×

Nonparametric density estimation by exact leave-\(p\)-out cross-validation. (English) Zbl 1452.62264

Summary: The problem of density estimation is addressed by minimization of the \(L^{2}\)-risk for both histogram and kernel estimators. This quadratic risk is estimated by leave-\(p\)-out cross-validation (LPO), which is made possible thanks to closed formulas, contrary to common belief. The potential gain in the use of LPO with respect to V-fold cross-validation (V-fold) in terms of the bias-variance trade-off is highlighted. An exact quantification of this extra variability, induced by the preliminary random partition of the data in the V-fold, is proposed. Furthermore, exact expressions are derived for both the bias and the variance of the risk estimator with histograms. Plug-in estimates of these quantities are provided, while their accuracy is assessed thanks to concentration inequalities. An adaptive selection procedure for \(p\) in the case of histograms is subsequently presented. This relies on minimization of the mean square error of the LPO risk estimator. Finally a simulation study is carried out which first illustrates the higher reliability of the LPO with respect to the V-fold, and then assesses the behavior of the selection procedure. For instance optimality of leave-one-out (LOO) is shown, at least empirically, in the context of regular histograms.

MSC:

62G07 Density estimation
62-08 Computational methods for problems pertaining to statistics
Full Text: DOI

References:

[1] Bellman, R., Dreyfus, S.E., 1962. Applied Dynamic Programming. Princeton.; Bellman, R., Dreyfus, S.E., 1962. Applied Dynamic Programming. Princeton. · Zbl 0106.34901
[2] Castellan, G., 1999. Modified Akaike’s criterion for histogram density estimation. Technical Report 99.61, Université de Paris-Sud.; Castellan, G., 1999. Modified Akaike’s criterion for histogram density estimation. Technical Report 99.61, Université de Paris-Sud.
[3] Devroye, L.; Diaconis, L., Nonparametric Density Estimation: The \(L^1\) View (1985), Wiley: Wiley New York · Zbl 0546.62015
[4] Elisseeff, A., Pontil, M., 2003. Leave-one-out error and stability of learning algorithms with applications. NATO-ASI Series on Learning Theory and Practice. IOS Press.; Elisseeff, A., Pontil, M., 2003. Leave-one-out error and stability of learning algorithms with applications. NATO-ASI Series on Learning Theory and Practice. IOS Press.
[5] Freedman, D.; Diaconis, P., On the histogram as a density estimator: \(L^2\) Theory, Z. Wahrscheinlischkeittheorie Verw. Gebiete, 57, 453-476 (1981) · Zbl 0449.62033
[6] Hastie, T., Tibshirani, R., Friedman, J., 2001. The Elements of Statistical Learning. Springer Series in Statistics. Springer.; Hastie, T., Tibshirani, R., Friedman, J., 2001. The Elements of Statistical Learning. Springer Series in Statistics. Springer. · Zbl 0973.62007
[7] Hedenfalk, I.; Duggan, D.; Chen, Y.; Radmacher, M.; Bittner, M.; Simon, R.; Meltzer, P.; Gusterson, B.; Esteller, M.; Kallioniemi, O.; Wilfond, B.; Borg, A.; Trent, J., Gene expression profiles in hereditary breast cancer, New Engl. J. Medicine, 344, 539-548 (2001)
[8] Hubert, M.; Engelen, S., Fast cross-validation of high-breakdown resampling methods for PCA, Comput. Statist. Data Anal., 51, 10, 5013-5024 (2007) · Zbl 1162.62374
[9] Johnson, N., Kotz, S., Kemp, A., 2005. Univariate discrete distributions. General Probability and Mathematical Statistics. Wiley.; Johnson, N., Kotz, S., Kemp, A., 2005. Univariate discrete distributions. General Probability and Mathematical Statistics. Wiley. · Zbl 1092.62010
[10] Kohavi, R., 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of 14th International Joint Conference on Artificial Intelligence, pp. 1137-1143.; Kohavi, R., 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of 14th International Joint Conference on Artificial Intelligence, pp. 1137-1143.
[11] Langaas, M.; Lindqvist, B.; Ferkingstad, E., Estimating the proportion of true null hypotheses, with application to DNA microarray data, J. Roy. Statist. Soc. Ser. B, 67, 4, 555-572 (2005) · Zbl 1095.62037
[12] Marron, J.; Wand, M., Exact mean integrated squared errors, Ann. Statist., 20, 712-736 (1992) · Zbl 0746.62040
[13] Parzen, E., On estimation of a probability density and mode, Ann. Math. Statist., 33, 1065-1076 (1962) · Zbl 0116.11302
[14] Rosenblatt, M., Remarks on some non-parametric estimates of a density function, Ann. Math. Statist., 27, 642-669 (1956)
[15] Rudemo, M., Empirical choice of histograms and kernel density estimators, Scand. J. Statist., 9, 65-78 (1982) · Zbl 0501.62028
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.