Abstract
The estimation of multivariate regression functions from bounded i.i.d. data is considered. The L 2 error with integration with respect to the design measure is used as an error criterion. The distribution of the design is assumed to be concentrated on a finite set. Neural network estimates are defined by minimizing the empirical L 2 risk over various sets of feedforward neural networks. Nonasymptotic bounds on the L 2 error of these estimates are presented. The results imply that neural networks are able to adapt to additive regression functions and to regression functions which are a sum of ridge functions, and hence are able to circumvent the curse of dimensionality in these cases.
Similar content being viewed by others
References
Andrews D.W.K., Whang Y.J. (1990). Additive interactive regression models: Circumvention of the curse of dimensionality. Econometric Theory 6:466–479
Anthony M., Bartlett P. (1999). Neural network learning: theoretical foundations. Cambridge University Press, Cambridge
Barron A.R. (1991). Complexity regularization with application to artificial neural networks. In: Roussas G. (eds). Nonparametric functional estimation and related topics, NATO ASI Series. Kluwer, Dodrecht, pp. 561–576
Barron A.R. (1994). Approximation and estimation bounds for artificial neural networks. Machine Learning 14:115–133
Bickel P.J., Klaasen C.A.J., Ritov Y., Wellner J.A. (1993). Efficent and adaptive estimation for semiparametric models. The John Hopkins University Press, Baltimore
Breiman L. (1993). Fitting additive models to regression data. Computational Statistics and Data Analysis 15:13–46
Breiman L., Freiman J.H. (1985). Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association 80:580–598
Burman P. (1990). Estimation of generalized additive models. Journal of Multivariate Analysis 32:230–255
Chen Z. (1991). Interaction spline models and their convergence rates. Annals of Statistics 19:1855–1868
Cybenko G. (1989). Approximations by superpositions of sigmoidal functions. Mathematic Control, Signals, Systems 2:303–314
Devroye L., Györfi L., Lugosi G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, Berlin Heidelberg New York
Friedman J.H., Stuetzle W. (1981). Projection pursuit regression. Journal of the American Statistical Association 76:817–823
Funahashi K. (1989). On the approximate realization of continuous mappings by neural networks. Neural Networks 2:183–192
Györfi L., Kohler M., Krzyżak A., Walk H. (2002). A distribution-free theory of nonparametric regression. Springer, Berlin Heidelberg New York
Hamers M., Kohler M. (2003). A bound on the expected maximal deviation of averages from their means. Statistics & Probability Letters 62:137–144
Hamers M., Kohler M. (2004). How well can a regression function be estimated if the distribution of the (random) design is concentrated on a finite set?. Journal of Statistical Planning and Inference 123:377–394
Hastie T., Tibshirani R.J. (1990). Generalized additive models. Chapman and Hall, London
Hertz J., Krogh A., Palmer R.G. (1991). Introduction to the Theory of Neural Computation. Addison-Wesley, Redwood City
Hornik K., Stinchcombe M., White H. (1989). Multi-layer feedforward networks are universal approximators. Neural Networks 2:359–366
Huang J. (1998). Projection estimation in multiple regression with applications to functional anova models. Annals of Statistics 26:242–272
Kohler M. (1998). Nonparametric regression function estimation using interaction least squares splines and complexity regularization. Metrika 47:147–163
Lee W.S., Bartlett P.L., Williamson R.C. (1996). Efficient agnostic learning of neural networks with bounded fan–in. IEEE Transactions on Information Theory 42:2118–2132
Linton O.B. (1997). Efficient estimation of additive nonparametric regression models. Biometrika 84:469–474
Linton O.B., Härdle W. (1996). Estimating additive regression models with known links. Biometrika 83:529–540
Linton O.B., Nielsen J.B. (1995). A kernel method of estimating structured nonparametric regression based on marginal integration. Biometrika 82:93–100
Lugosi G., Zeger K. (1995). Nonparametric estimation via empirical risk minimization. IEEE Transactions on Information Theory 41:677–687
McCulloch W.S., Pitts W. (1943). A logical calculus of ideas immanent in neural activity. Bulletin of Mathematical Biophysics 5:115–133
McGaffrey D.F., Gallant A.R. (1994). Convergence rates for single hidden layer feedforward networks. Neural Networks 7:147–158
Minsky M.L., Papert S. (1969). Perceptrons: An introduction to computational geometry. MIT Press, Cambridge
Newey W.K. (1994). Kernel estimation of partial means and a general variance estimator. Econometric Theory 10:233–253
Pollard D. (1984). Convergence of stochastic processes. Springer, Berlin Heidelberg New York
Ripley B.D. (1996). Pattern recognition and neural networks. Cambridge University Press, Cambridge
Rummelhart D.E., McClelland J.L. (1986). Parallel distributed processing: explorations in microstructure of cognition Vol. 1. Foundations. MIT Press, Cambridge
Stone C.J. (1982). Optimal global rates of convergence for nonparametric regression. Annals of Statistics 10:1040–1053
Stone C.J. (1985). Additive regression and other nonparametric models. Annals of Statistics 13:689–705
Stone C.J. (1994). The use of polynomial splines and their tensor products in multivariate function estimation. Annals of Statistics 22:118–184
Wahba G., Wang Y., Gu, C., Klein R., Klein B. (1995). Smoothing spline anova for exponential families, with application to the wisconsin epidemiological study of diabetic retinopathy. Annals of Statistics 23:1865–1895
White H. (1990). Connectionist nonparametric regression: multilayer feedforward networks can learn arbitrary mappings. Neural Networks 3:535–549
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Hamers, M., Kohler, M. Nonasymptotic Bounds on the L 2 Error of Neural Network Regression Estimates. Ann Inst Stat Math 58, 131–151 (2006). https://doi.org/10.1007/s10463-005-0005-9
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-005-0005-9