×

Minimax fast rates for discriminant analysis with errors in variables. (English) Zbl 1388.62188

Summary: The effect of measurement errors in discriminant analysis is investigated. Given observations \(Z=X+\varepsilon\), where \(\varepsilon\) denotes a random noise, the goal is to predict the density of \(X\) among two possible candidates \(f\) and \(g\). We suppose that we have at our disposal two learning samples. The aim is to approach the best possible decision rule \(G^{\star}\) defined as a minimizer of the Bayes risk. { } In the free-noise case (\(\varepsilon=0\)), minimax fast rates of convergence are well-known under the margin assumption in discriminant analysis (see [E. Mammen and A. B. Tsybakov, Ann. Stat. 27, No. 6, 1808–1829 (1999; Zbl 0961.62058)]) or in the more general classification framework (see [A. B. Tsybakov, Ann. Stat. 32, No. 1, 135–166 (2004; Zbl 1105.62353)]). In this paper, we intend to establish similar results in the noisy case, that is, when dealing with errors in variables. We prove minimax lower bounds for this problem and explain how can these rates be attained, using in particular an empirical risk minimizer (ERM) method based on deconvolution kernel estimators.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G07 Density estimation

References:

[1] Audibert, J.-Y. (2004). Classification under polynomial entropy and margin assumptions and randomized estimators. Preprint, Laboratoire de Probabilités et Modéles Aléatoires, Univ. Paris VI and VII.
[2] Audibert, J.-Y. and Tsybakov, A.B. (2007). Fast learning rates for plug-in classifiers. Ann. Statist. 35 608-633. · Zbl 1118.62041 · doi:10.1214/009053606000001217
[3] Bartlett, P.L., Boucheron, S. and Lugosi, G. (2002). Model selection and error estimation. Machine Learning 48 85-113. · Zbl 0998.68117 · doi:10.1023/A:1013999503812
[4] Bartlett, P.L., Bousquet, O. and Mendelson, S. (2005). Local Rademacher complexities. Ann. Statist. 33 1497-1537. · Zbl 1083.62034 · doi:10.1214/009053605000000282
[5] Bartlett, P.L. and Mendelson, S. (2006). Empirical minimization. Probab. Theory Related Fields 135 311-334. · Zbl 1142.62348 · doi:10.1007/s00440-005-0462-3
[6] Bickel, P.J. and Ritov, Y. (2003). Nonparametric estimators which can be “plugged-in.” Ann. Statist. 31 1033-1053. · Zbl 1058.62031 · doi:10.1214/aos/1059655904
[7] Boucheron, S., Bousquet, O. and Lugosi, G. (2005). Theory of classification: A survey of some recent advances. ESAIM Probab. Stat. 9 323-375. · Zbl 1136.62355 · doi:10.1051/ps:2005018
[8] Butucea, C. (2007). Goodness-of-fit testing and quadratic functional estimation from indirect observations. Ann. Statist. 35 1907-1930. · Zbl 1126.62028 · doi:10.1214/009053607000000118
[9] Carroll, R.J., Delaigle, A. and Hall, P. (2009). Nonparametric prediction in measurement error models. J. Amer. Statist. Assoc. 104 993-1003. · Zbl 1388.62075 · doi:10.1198/jasa.2009.tm07543
[10] Chapelle, O., Weston, J., Bottou, L. and Vapnik, V. (2001). Vicinal risk minimization. In Advances in Neural Information Processing Systems 416-422. Cambridge, MA: MIT Press.
[11] Delaigle, A. and Gijbels, I. (2006). Estimation of boundary and discontinuity points in deconvolution problems. Statist. Sinica 16 773-788. · Zbl 1107.62029
[12] Delaigle, A., Hall, P. and Meister, A. (2008). On deconvolution with repeated measurements. Ann. Statist. 36 665-685. · Zbl 1133.62026 · doi:10.1214/009053607000000884
[13] Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Applications of Mathematics ( New York ) 31 . New York: Springer. · Zbl 0853.68150
[14] Engl, W.H., Hanke, M. and Neubauer, A. (2000). Regularization of Inverse Problems . Dordrecht: Kluwer Academic Publishers Group. · Zbl 0859.65054
[15] Fan, J. (1991). On the optimal rates of convergence for nonparametric deconvolution problems. Ann. Statist. 19 1257-1272. · Zbl 0729.62033 · doi:10.1214/aos/1176348248
[16] Fan, J. and Truong, Y.K. (1993). Nonparametric regression with errors in variables. Ann. Statist. 21 1900-1925. · Zbl 0791.62042 · doi:10.1214/aos/1176349402
[17] Genovese, C.R., Perone-Pacifico, M., Verdinelli, I. and Wasserman, L. (2012). Minimax manifold estimation. J. Mach. Learn. Res. 13 1263-1291. · Zbl 1283.62112
[18] Goldstein, L. and Messer, K. (1992). Optimal plug-in estimators for nonparametric functional estimation. Ann. Statist. 20 1306-1328. · Zbl 0763.62023 · doi:10.1214/aos/1176348770
[19] Klemelä, J. and Mammen, E. (2010). Empirical risk minimization in inverse problems. Ann. Statist. 38 482-511. · Zbl 1181.62044 · doi:10.1214/09-AOS726
[20] Koltchinskii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Statist. 34 2593-2656. · Zbl 1118.62065 · doi:10.1214/009053606000001019
[21] Korostelëv, A.P. and Tsybakov, A.B. (1993). Minimax Theory of Image Reconstruction. Lecture Notes in Statistics 82 . New York: Springer.
[22] Laurent, B., Loubes, J.-M. and Marteau, C. (2011). Testing inverse problems: A direct or an indirect problem? J. Statist. Plann. Inference 141 1849-1861. · Zbl 1394.62052 · doi:10.1016/j.jspi.2010.11.035
[23] Loubes, J.M. and Marteau, C. (2014). Goodness-of-fit strategies from indirect observations. J. Nonparametr. Statist. · Zbl 1359.62147 · doi:10.1080/10485252.2013.827680
[24] Loustau, S. (2009). Penalized empirical risk minimization over Besov spaces. Electron. J. Stat. 3 824-850. · Zbl 1326.62157 · doi:10.1214/08-EJS316
[25] Mallat, S. (2000). Une Exploration des Signaux en Ondelettes . Paris: Éditions de l’École Polytechnique, Ellipses diffusion. · Zbl 1196.94006
[26] Mammen, E. and Tsybakov, A.B. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808-1829. · Zbl 0961.62058 · doi:10.1214/aos/1017939240
[27] Massart, P. and Nédélec, É. (2006). Risk bounds for statistical learning. Ann. Statist. 34 2326-2366. · Zbl 1108.62007 · doi:10.1214/009053606000000786
[28] Meister, A. (2009). Deconvolution Problems in Nonparametric Statistics. Lecture Notes in Statistics 193 . Berlin: Springer. · Zbl 1178.62028 · doi:10.1007/978-3-540-87557-4
[29] Mendelson, S. (2004). On the performance of kernel classes. J. Mach. Learn. Res. 4 759-771. · Zbl 1083.68097 · doi:10.1162/1532443041424337
[30] Tsybakov, A.B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135-166. · Zbl 1105.62353 · doi:10.1214/aos/1079120131
[31] Tsybakov, A.B. and van de Geer, S.A. (2005). Square root penalty: Adaptation to the margin in classification and in edge estimation. Ann. Statist. 33 1203-1224. · Zbl 1080.62047 · doi:10.1214/009053604000001066
[32] van de Geer, S.A. (2000). Empirical Processes in M-estimation . Cambridge: Cambridge Univ. Press. · Zbl 1179.62073
[33] van der Vaart, A.W. and Wellner, J.A. (1996). Weak Convergence and Empirical Processes : With Applications to Statistics. Springer Series in Statistics . New York: Springer. · Zbl 0862.60002
[34] Vapnik, V.N. (2000). The Nature of Statistical Learning Theory , 2nd ed. Statistics for Engineering and Information Science . New York: Springer. · Zbl 0934.62009
[35] Yang, Y. (1999). Minimax nonparametric classification. I. Rates of convergence. IEEE Trans. Inform. Theory 45 2271-2284. · Zbl 0962.62026 · doi:10.1109/18.796368
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.