Document Zbl 1190.62080

A survey of cross-validation procedures for model selection. (English) Zbl 1190.62080

Stat. Surv. 4, 40-79 (2010).

Summary: Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its (apparent) universality. Many results exist on model selection performances of cross-validation procedures. This survey intends to relate these results to the most recent advances of model selection theory, with a particular emphasis on distinguishing empirical statements from rigorous theoretical results. As a conclusion, guidelines are provided for choosing the best cross-validation procedure according to the particular features of the problem in hand.

Cited in 176 Documents

MSC:

62G08	Nonparametric regression and quantile regression
62G05	Nonparametric estimation
62G99	Nonparametric inference

Keywords:

model selection; cross-validation; leave-one-out

Software:

ElemStatLearn

Cite Review PDF

Full Text: DOI arXiv Euclid

References:

[1]	Akaike, H. (1970). Statistical predictor identification., Ann. Inst. Statist. Math. , 22:203-217. · Zbl 0259.62076 · doi:10.1007/BF02506337
[2]	Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In, Second International Symposium on Information Theory (Tsahkadsor, 1971) , pages 267-281. Akadémiai Kiadó, Budapest. · Zbl 0283.62006
[3]	Allen, D. M. (1974). The relationship between variable selection and data augmentation and a method for prediction., Technometrics , 16:125-127. JSTOR: · Zbl 0286.62044 · doi:10.2307/1267500
[4]	Alpaydin, E. (1999). Combined 5 x 2 cv F test for comparing supervised classification learning algorithms., Neur. Comp. , 11(8):1885-1892.
[5]	Anderson, R. L., Allen, D. M., and Cady, F. B. (1972). Selection of predictor variables in linear multiple regression. In bancroft, T. A., editor, In Statistical papers in Honor of George W. Snedecor . Iowa: iowa State University Press. · Zbl 0236.62020
[6]	Arlot, S. (2007)., Resampling and Model Selection . PhD thesis, University Paris-Sud 11. http://tel.archives-ouvertes.fr/tel-00198803/en/. · Zbl 1326.62097
[7]	Arlot, S. (2008a). Suboptimality of penalties proportional to the dimension for model selection in heteroscedastic regression.,
[8]	Arlot, S. (2008b)., V -fold cross-validation improved: V -fold penalization.
[9]	Arlot, S. (2009). Model selection by resampling penalization., Electron. J. Stat. , 3:557-624 (electronic). · Zbl 1326.62097 · doi:10.1214/08-EJS196
[10]	Arlot, S. and Celisse, A. (2009). Segmentation in the mean of heteroscedastic data via cross-validation., · Zbl 1221.62061
[11]	Baraud, Y. (2002). Model selection for regression on a random design., ESAIM Probab. Statist. , 6:127-146 (electronic). · Zbl 1059.62038 · doi:10.1051/ps:2002007
[12]	Barron, A., Birgé, L., and Massart, P. (1999). Risk bounds for model selection via penalization., Probab. Theory Related Fields , 113(3):301-413. · Zbl 0946.62036 · doi:10.1007/s004400050210
[13]	Bartlett, P. L., Boucheron, S., and Lugosi, G. (2002). Model selection and error estimation., Machine Learning , 48:85-113. · Zbl 0998.68117 · doi:10.1023/A:1013999503812
[14]	Bellman, R. E. and Dreyfus, S. E. (1962)., Applied Dynamic Programming . Princeton. · Zbl 0106.34901
[15]	Bengio, Y. and Grandvalet, Y. (2004). No unbiased estimator of the variance of, K -fold cross-validation. J. Mach. Learn. Res. , 5:1089-1105 (electronic). · Zbl 1222.68145
[16]	Bhansali, R. J. and Downham, D. Y. (1977). Some properties of the order of an autoregressive model selected by a generalization of Akaike’s FPE criterion., Biometrika , 64(3):547-551. JSTOR: · Zbl 0379.62077
[17]	Birgé, L. and Massart, P. (2001). Gaussian model selection., J. Eur. Math. Soc. (JEMS) , 3(3):203-268. · Zbl 1037.62001 · doi:10.1007/s100970100031
[18]	Birgé, L. and Massart, P. (2007). Minimal penalties for Gaussian model selection., Probab. Theory Related Fields , 138(1-2):33-73. · Zbl 1112.62082 · doi:10.1007/s00440-006-0011-8
[19]	Blanchard, G. and Massart, P. (2006). Discussion: “Local Rademacher complexities and oracle inequalities in risk minimization” [Ann. Statist., 34 (2006), no. 6, 2593-2656] by V. Koltchinskii. Ann. Statist. , 34(6):2664-2671. · doi:10.1214/009053606000001037
[20]	Boucheron, S., Bousquet, O., and Lugosi, G. (2005). Theory of classification: a survey of some recent advances., ESAIM Probab. Stat. , 9:323-375 (electronic). · Zbl 1136.62355 · doi:10.1051/ps:2005018
[21]	Bousquet, O. and Elisseff, A. (2002). Stability and Generalization., J. Machine Learning Research , 2:499-526. · Zbl 1007.68083 · doi:10.1162/153244302760200704
[22]	Bowman, A. W. (1984). An alternative method of cross-validation for the smoothing of density estimates., Biometrika , 71(2):353-360. JSTOR: · doi:10.1093/biomet/71.2.353
[23]	Breiman, L. (1996). Heuristics of instability and stabilization in model selection., Ann. Statist. , 24(6):2350-2383. · Zbl 0867.62055 · doi:10.1214/aos/1032181158
[24]	Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984)., Classification and regression trees . Wadsworth Statistics/Probability Series. Wadsworth Advanced Books and Software, Belmont, CA. · Zbl 0541.62042
[25]	Breiman, L. and Spector, P. (1992). Submodel selection and evaluation in regression. the x-random case., International Statistical Review , 60(3):291-319.
[26]	Burman, P. (1989). A comparative study of ordinary cross-validation, v -fold cross-validation and the repeated learning-testing methods. Biometrika , 76(3):503-514. JSTOR: · Zbl 0677.62065 · doi:10.1093/biomet/76.3.503
[27]	Burman, P. (1990). Estimation of optimal transformations using, v -fold cross validation and repeated learning-testing methods. Sankhyā Ser. A , 52(3):314-345. · Zbl 0745.62073
[28]	Burman, P., Chow, E., and Nolan, D. (1994). A cross-validatory method for dependent data., Biometrika , 81(2):351-358. JSTOR: · Zbl 0825.62669 · doi:10.1093/biomet/81.2.351
[29]	Burman, P. and Nolan, D. (1992). Data-dependent estimation of prediction functions., J. Time Ser. Anal. , 13(3):189-207. · Zbl 0754.62018 · doi:10.1111/j.1467-9892.1992.tb00102.x
[30]	Burnham, K. P. and Anderson, D. R. (2002)., Model selection and multimodel inference . Springer-Verlag, New York, second edition. A practical information-theoretic approach. · Zbl 1005.62007 · doi:10.1007/b97636
[31]	Cao, Y. and Golubev, Y. (2006). On oracle inequalities related to smoothing splines., Math. Methods Statist. , 15(4):398-414.
[32]	Celisse, A. (2008a). Model selection in density estimation via cross-validation. Technical report, · Zbl 1452.62264
[33]	Celisse, A. (2008b)., Model Selection Via Cross-Validation in Density Estimation, Regression and Change-Points Detection . PhD thesis, University Paris-Sud 11, http://tel.archives-ouvertes.fr/tel-00346320/en/.
[34]	Celisse, A. and Robin, S. (2008). Nonparametric density estimation by exact leave-p-out cross-validation., Computational Statistics and Data Analysis , 52(5):2350-2368. · Zbl 1452.62264
[35]	Chow, Y. S., Geman, S., and Wu, L. D. (1987). Consistent cross-validated density estimation., Ann. Statist. , 11:25-38. · Zbl 0509.62033 · doi:10.1214/aos/1176346053
[36]	Chu, C.-K. and Marron, J. S. (1991). Comparison of two bandwidth selectors with dependent errors., Ann. Statist. , 19(4):1906-1918. · Zbl 0738.62042 · doi:10.1214/aos/1176348377
[37]	Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation., Numer. Math. , 31(4):377-403. · Zbl 0377.65007 · doi:10.1007/BF01404567
[38]	Dalelane, C. (2005). Exact oracle inequality for sharp adaptive kernel density estimator. Technical report, arXiv.
[39]	Daudin, J.-J. and Mary-Huard, T. (2008). Estimation of the conditional risk in classification: The swapping method., Comput. Stat. Data Anal. , 52(6):3220-3232. · Zbl 1452.62438
[40]	Davies, S. L., Neath, A. A., and Cavanaugh, J. E. (2005). Cross validation model selection criteria for linear regression based on the Kullback-Leibler discrepancy., Stat. Methodol. , 2(4):249-266. · Zbl 1248.62110 · doi:10.1016/j.stamet.2005.05.002
[41]	Davison, A. C. and Hall, P. (1992). On the bias and variability of bootstrap and cross-validation estimates of error rate in discrimination problems., Biometrika , 79(2):279-284. JSTOR: · Zbl 0751.62029 · doi:10.1093/biomet/79.2.279
[42]	Devroye, L., Györfi, L., and Lugosi, G. (1996)., A probabilistic theory of pattern recognition , volume 31 of Applications of Mathematics (New York) . Springer-Verlag, New York. · Zbl 0853.68150
[43]	Devroye, L. and Wagner, T. J. (1979). Distribution-Free performance Bounds for Potential Function Rules., IEEE Transaction in Information Theory , 25(5):601-604. · Zbl 0432.62040 · doi:10.1109/TIT.1979.1056087
[44]	Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms., Neur. Comp. , 10(7):1895-1924.
[45]	Efron, B. (1983). Estimating the error rate of a prediction rule: improvement on cross-validation., J. Amer. Statist. Assoc. , 78(382):316-331. JSTOR: · Zbl 0543.62079 · doi:10.2307/2288636
[46]	Efron, B. (1986). How biased is the apparent error rate of a prediction rule?, J. Amer. Statist. Assoc. , 81(394):461-470. JSTOR: · Zbl 0621.62073 · doi:10.2307/2289236
[47]	Efron, B. (2004). The estimation of prediction error: covariance penalties and cross-validation., J. Amer. Statist. Assoc. , 99(467):619-642. With comments and a rejoinder by the author. · Zbl 1117.62324 · doi:10.1198/016214504000000692
[48]	Efron, B. and Morris, C. (1973). Combining possibly related estimation problems (with discussion)., J. R. Statist. Soc. B , 35:379. JSTOR: · Zbl 0281.62030
[49]	Efron, B. and Tibshirani, R. (1997). Improvements on cross-validation: the.632+ bootstrap method., J. Amer. Statist. Assoc. , 92(438):548-560. JSTOR: · Zbl 0887.62044 · doi:10.2307/2965703
[50]	Fromont, M. (2007). Model selection by bootstrap penalization for classification., Mach. Learn. , 66(2-3):165-207.
[51]	Geisser, S. (1974). A predictive approach to the random effect model., Biometrika , 61(1):101-107. JSTOR: · Zbl 0275.62065 · doi:10.1093/biomet/61.1.101
[52]	Geisser, S. (1975). The predictive sample reuse method with applications., J. Amer. Statist. Assoc. , 70:320-328. · Zbl 0321.62077 · doi:10.2307/2285815
[53]	Girard, D. A. (1998). Asymptotic comparison of (partial) cross-validation, GCV and randomized GCV in nonparametric regression., Ann. Statist. , 26(1):315-334. · Zbl 0932.62047 · doi:10.1214/aos/1030563988
[54]	Grünwald, P. D. (2007)., The Minimum Description Length Principle . MIT Press, Cambridge, MA, USA.
[55]	Györfi, L., Kohler, M., Krzyżak, A., and Walk, H. (2002)., A distribution-free theory of nonparametric regression . Springer Series in Statistics. Springer-Verlag, New York. · Zbl 1021.62024
[56]	Hall, P. (1983). Large sample optimality of least squares cross-validation in density estimation., Ann. Statist. , 11(4):1156-1174. · Zbl 0599.62051
[57]	Hall, P. (1987). On Kullback-Leibler loss and density estimation., The Annals of Statistics , 15(4):1491-1519. · Zbl 0678.62045 · doi:10.1214/aos/1176350606
[58]	Hall, P., Lahiri, S. N., and Polzehl, J. (1995). On bandwidth choice in nonparametric regression with both short- and long-range dependent errors., Ann. Statist. , 23(6):1921-1936. · Zbl 0856.62041 · doi:10.1214/aos/1034713640
[59]	Hall, P., Marron, J. S., and Park, B. U. (1992). Smoothed cross-validation., Probab. Theory Related Fields , 92(1):1-20. · Zbl 0742.62042 · doi:10.1007/BF01205233
[60]	Hall, P. and Schucany, W. R. (1989). A local cross-validation algorithm., Statist. Probab. Lett. , 8(2):109-117. · Zbl 0676.62038 · doi:10.1016/0167-7152(89)90002-3
[61]	Härdle, W. (1984). How to determine the bandwidth of some nonlinear smoothers in practice. In, Robust and nonlinear time series analysis (Heidelberg, 1983) , volume 26 of Lecture Notes in Statist. , pages 163-184. Springer, New York. · Zbl 0579.62050
[62]	Härdle, W., Hall, P., and Marron, J. S. (1988). How far are automatically chosen regression smoothing parameters from their optimum?, J. Amer. Statist. Assoc. , 83(401):86-101. With comments by David W. Scott and Iain Johnstone and a reply by the authors. JSTOR: · Zbl 0644.62048 · doi:10.2307/2288922
[63]	Hart, J. D. (1994). Automated kernel smoothing of dependent data by using time series cross-validation., J. Roy. Statist. Soc. Ser. B , 56(3):529-542. JSTOR: · Zbl 0800.62224
[64]	Hart, J. D. and Vieu, P. (1990). Data-driven bandwidth choice for density estimation based on dependent data., Ann. Statist. , 18(2):873-890. · Zbl 0703.62045 · doi:10.1214/aos/1176347630
[65]	Hart, J. D. and Wehrly, T. E. (1986). Kernel regression estimation using repeated measurements data., J. Amer. Statist. Assoc. , 81(396):1080-1088. JSTOR: · Zbl 0635.62030 · doi:10.2307/2289087
[66]	Hastie, T., Tibshirani, R., and Friedman, J. (2009)., The elements of statistical learning . Springer Series in Statistics. Springer-Verlag, New York. Data mining, inference, and prediction. 2nd edition. · Zbl 1273.62005
[67]	Herzberg, A. M. and Tsukanov, A. V. (1986). A note on modifications of jackknife criterion for model selection., Utilitas Math. , 29:209-216. · Zbl 0591.62063
[68]	Herzberg, P. A. (1969). The parameters of cross-validation., Psychometrika , 34:Monograph Supplement. · Zbl 0175.18002
[69]	Hesterberg, T. C., Choi, N. H., Meier, L., and Fraley, C. (2008). Least angle and l1 penalized regression: A review., Statistics Surveys , 2:61-93 (electronic). · Zbl 1189.62070 · doi:10.1214/08-SS035
[70]	Hills, M. (1966). Allocation Rules and their Error Rates., J. Royal Statist. Soc. Series B , 28(1):1-31. JSTOR: · Zbl 0166.14501
[71]	Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. (1999). Bayesian Model Averaging: A tutorial., Statistical Science , 14(4):382-417. · Zbl 1059.62525 · doi:10.1214/ss/1009212519
[72]	Huber, P. (1964). Robust estimation of a local parameter., Ann. Math. Statist. , 35:73-101. · Zbl 0136.39805 · doi:10.1214/aoms/1177703732
[73]	John, P. W. M. (1971)., Statistical design and analysis of experiments . The Macmillan Co., New York. · Zbl 0231.62089
[74]	Jonathan, P., Krzanowki, W. J., and McCarthy, W. V. (2000). On the use of cross-validation to assess performance in multivariate prediction., Stat. and Comput. , 10:209-229.
[75]	Kearns, M., Mansour, Y., Ng, A. Y., and Ron, D. (1997). An Experimental and Theoretical Comparison of Model Selection Methods., Machine Learning , 27:7-50.
[76]	Kearns, M. and Ron, D. (1999). Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation., Neural Computation , 11:1427-1453.
[77]	Koltchinskii, V. (2001). Rademacher penalties and structural risk minimization., IEEE Trans. Inform. Theory , 47(5):1902-1914. · Zbl 1008.62614 · doi:10.1109/18.930926
[78]	Lachenbruch, P. A. and Mickey, M. R. (1968). Estimation of Error Rates in Discriminant Analysis., Technometrics , 10(1):1-11. JSTOR: · doi:10.2307/1266219
[79]	Larson, S. C. (1931). The shrinkage of the coefficient of multiple correlation., J. Edic. Psychol. , 22:45-55. · JFM 57.0663.12
[80]	Lecué, G. (2006). Optimal oracle inequality for aggregation of classifiers under low noise condition. In Gabor Lugosi, H. U. S., editor, 19th Annual Conference On Learning Theory, COLT06. , pages 364-378. Springer. · Zbl 1143.68546 · doi:10.1007/11776420_28
[81]	Lecué, G. (2007). Suboptimality of penalized empirical risk minimization in classification. In, COLT 2007 , volume 4539 of Lecture Notes in Artificial Intelligence . Springer, Berlin. · Zbl 1203.68159
[82]	Leung, D., Marriott, F., and Wu, E. (1993). Bandwidth selection in robust smoothing., J. Nonparametr. Statist. , 2:333-339. · Zbl 1360.62132 · doi:10.1080/10485259308832562
[83]	Leung, D. H.-Y. (2005). Cross-validation in nonparametric regression with outliers., Ann. Statist. , 33(5):2291-2310. · Zbl 1086.62055 · doi:10.1214/009053605000000499
[84]	Li, K.-C. (1985). From Stein’s unbiased risk estimates to the method of generalized cross validation., Ann. Statist. , 13(4):1352-1377. · Zbl 0605.62047 · doi:10.1214/aos/1176349742
[85]	Li, K.-C. (1987). Asymptotic optimality for, C p , C L , cross-validation and generalized cross-validation: discrete index set. Ann. Statist. , 15(3):958-975. · Zbl 0653.62037 · doi:10.1214/aos/1176350486
[86]	Mallows, C. L. (1973). Some comments on, C p . Technometrics , 15:661-675. · Zbl 0269.62061 · doi:10.2307/1267380
[87]	Markatou, M., Tian, H., Biswas, S., and Hripcsak, G. (2005). Analysis of variance of cross-validation estimators of the generalization error., J. Mach. Learn. Res. , 6:1127-1168 (electronic). · Zbl 1222.68258
[88]	Massart, P. (2007)., Concentration inequalities and model selection , volume 1896 of Lecture Notes in Mathematics . Springer, Berlin. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6-23, 2003, With a foreword by Jean Picard. · Zbl 1170.60006 · doi:10.1007/978-3-540-48503-2
[89]	Molinaro, A. M., Simon, R., and Pfeiffer, R. M. (2005). Prediction error estimation: a comparison of resampling methods., Bioinformatics , 21(15):3301-3307.
[90]	Mosteller, F. and Tukey, J. W. (1968). Data analysis, including statistics. In Lindzey, G. and Aronson, E., editors, Handbook of Social Psychology, Vol. 2 . Addison-Wesley.
[91]	Nadeau, C. and Bengio, Y. (2003). Inference for the generalization error., Machine Learning , 52:239-281. · Zbl 1039.68104 · doi:10.1023/A:1024068626366
[92]	Nemirovski, A. (2000). Topics in Non-Parametric Statistics. In Bernard, P., editor, Lecture Notes in Mathematics , Lectures on Probability Theory and Statistics, Ecole d’ete de Probabilities de Saint-Flour XXVIII - 1998. M. Emery, A. Nemirovski, D. Voiculescu. · Zbl 0998.62033
[93]	Nishii, R. (1984). Asymptotic properties of criteria for selection of variables in multiple regression., Ann. Statist. , 12(2):758-765. · Zbl 0544.62063 · doi:10.1214/aos/1176346522
[94]	Opsomer, J., Wang, Y., and Yang, Y. (2001). Nonparametric regression with correlated errors., Statist. Sci. , 16(2):134-153. · Zbl 1059.62537 · doi:10.1214/ss/1009213287
[95]	Picard, R. R. and Cook, R. D. (1984). Cross-validation of regression models., J. Amer. Statist. Assoc. , 79(387):575-583. JSTOR: · Zbl 0547.62047 · doi:10.2307/2288403
[96]	Politis, D. N., Romano, J. P., and Wolf, M. (1999)., Subsampling . Springer Series in Statistics. Springer-Verlag, New York. · Zbl 0943.60003
[97]	Quenouille, M. H. (1949). Approximate tests of correlation in time-series., J. Roy. Statist. Soc. Ser. B. , 11:68-84. JSTOR: · Zbl 0035.09201
[98]	Raftery, A. E. (1995). Bayesian Model Selection in Social Research., Siociological Methodology , 25:111-163.
[99]	Ripley, B. D. (1996)., Pattern Recognition and Neural Networks . Cambridge Univ. Press. · Zbl 0853.62046
[100]	Rissanen, J. (1983). Universal Prior for Integers and Estimation by Minimum Description Length., The Annals of Statistics , 11(2):416-431. · Zbl 0513.62005 · doi:10.1214/aos/1176346150
[101]	Ronchetti, E., Field, C., and Blanchard, W. (1997). Robust linear model selection by cross-validation., J. Amer. Statist. Assoc. , 92:1017-1023. JSTOR: · Zbl 1067.62551 · doi:10.2307/2965566
[102]	Rudemo, M. (1982). Empirical Choice of Histograms and Kernel Density Estimators., Scandinavian Journal of Statistics , 9:65-78. · Zbl 0501.62028
[103]	Sauvé, M. (2009). Histogram selection in non gaussian regression., ESAIM: Probability and Statistics , 13:70-86. · Zbl 1180.62061 · doi:10.1051/ps:2008002
[104]	Schuster, E. F. and Gregory, G. G. (1981). On the consistency of maximum likelihood nonparametric density estimators. In Eddy, W. F., editor, Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface , pages 295-298. Springer-Verlag, New York.
[105]	Schwarz, G. (1978). Estimating the dimension of a model., Ann. Statist. , 6(2):461-464. · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[106]	Shao, J. (1993). Linear model selection by cross-validation., J. Amer. Statist. Assoc. , 88(422):486-494. JSTOR: · Zbl 0773.62051 · doi:10.2307/2290328
[107]	Shao, J. (1996). Bootstrap model selection., J. Amer. Statist. Assoc. , 91(434):655-665. JSTOR: · Zbl 0869.62030 · doi:10.2307/2291661
[108]	Shao, J. (1997). An asymptotic theory for linear model selection., Statist. Sinica , 7(2):221-264. With comments and a rejoinder by the author. · Zbl 1003.62527
[109]	Shibata, R. (1984). Approximate efficiency of a selection procedure for the number of regression variables., Biometrika , 71(1):43-49. JSTOR: · Zbl 0543.62053 · doi:10.1093/biomet/71.1.43
[110]	Stone, C. (1984). An asymptotically optimal window selection rule for kernel density estimates., The Annals of Statistics , 12(4):1285-1297. · Zbl 0599.62052 · doi:10.1214/aos/1176346792
[111]	Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions., J. Roy. Statist. Soc. Ser. B , 36:111-147. With discussion and a reply by the authors. JSTOR: · Zbl 0308.62063
[112]	Stone, M. (1977). Asymptotics for and against cross-validation., Biometrika , 64(1):29-35. JSTOR: · Zbl 0368.62046 · doi:10.1093/biomet/64.1.29
[113]	Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso., J. Royal Statist. Soc. Series B , 58(1):267-288. JSTOR: · Zbl 0850.62538
[114]	van der Laan, M. J. and Dudoit, S. (2003). Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: Finite sample oracle inequalities and examples. Working Paper Series Working Paper 130, U.C. Berkeley Division of Biostatistics. available at, http://www.bepress.com/ucbbiostat/paper130.
[115]	van der Laan, M. J., Dudoit, S., and Keles, S. (2004). Asymptotic optimality of likelihood-based cross-validation., Stat. Appl. Genet. Mol. Biol. , 3:Art. 4, 27 pp. (electronic). · Zbl 1038.62040
[116]	van der Laan, M. J., Dudoit, S., and van der Vaart, A. W. (2006). The cross-validated adaptive epsilon-net estimator., Statist. Decisions , 24(3):373-395. · Zbl 1111.62003 · doi:10.1524/stnd.2006.24.3.373
[117]	van der Vaart, A. W., Dudoit, S., and van der Laan, M. J. (2006). Oracle inequalities for multi-fold cross validation., Statist. Decisions , 24(3):351-371. · Zbl 1117.62042 · doi:10.1524/stnd.2006.24.3.351
[118]	van Erven, T., Grünwald, P. D., and de Rooij, S. (2008). Catching up faster by switching sooner: A prequential solution to the aic-bic dilemma.,
[119]	Vapnik, V. (1982)., Estimation of dependences based on empirical data . Springer Series in Statistics. Springer-Verlag, New York. Translated from the Russian by Samuel Kotz. · Zbl 0499.62005
[120]	Vapnik, V. N. (1998)., Statistical learning theory . Adaptive and Learning Systems for Signal Processing, Communications, and Control. John Wiley & Sons Inc., New York. A Wiley-Interscience Publication. · Zbl 0935.62007
[121]	Vapnik, V. N. and Chervonenkis, A. Y. (1974)., Teoriya raspoznavaniya obrazov. Statisticheskie problemy obucheniya . Izdat. “Nauka”, Moscow. Theory of Pattern Recognition (In Russian). · Zbl 0284.68070
[122]	Wahba, G. (1975). Periodic splines for spectral density estimation: The use of cross validation for determining the degree of smoothing., Communications in Statistics , 4:125-142. · Zbl 0305.62060 · doi:10.1080/03610927508827233
[123]	Wahba, G. (1977). Practical Approximate Solutions to Linear Operator Equations When the Data are Noisy., SIAM Journal on Numerical Analysis , 14(4):651-667. JSTOR: · Zbl 0402.65032 · doi:10.1137/0714044
[124]	Wegkamp, M. (2003). Model selection in nonparametric regression., The Annals of Statistics , 31(1):252-273. · Zbl 1019.62037 · doi:10.1214/aos/1046294464
[125]	Yang, Y. (2001). Adaptive Regression by Mixing., J. Amer. Statist. Assoc. , 96(454):574-588. JSTOR: · Zbl 1018.62033 · doi:10.1198/016214501753168262
[126]	Yang, Y. (2005). Can the strengths of AIC and BIC be shared? A conflict between model indentification and regression estimation., Biometrika , 92(4):937-950. · Zbl 1151.62301 · doi:10.1093/biomet/92.4.937
[127]	Yang, Y. (2006). Comparing learning methods for classification., Statist. Sinica , 16(2):635-657. · Zbl 1096.62071
[128]	Yang, Y. (2007). Consistency of cross validation for comparing regression procedures., Ann. Statist. , 35(6):2450-2473. · Zbl 1129.62039 · doi:10.1214/009053607000000514
[129]	Zhang, P. (1993). Model selection via multifold cross validation., Ann. Statist. , 21(1):299-313. · Zbl 0770.62053 · doi:10.1214/aos/1176349027

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.