Nonasymptotic Bounds on the L 2 Error of Neural Network Regression Estimates

158 Accesses
9 Citations
2 Altmetric
Explore all metrics

Abstract

The estimation of multivariate regression functions from bounded i.i.d. data is considered. The L ₂ error with integration with respect to the design measure is used as an error criterion. The distribution of the design is assumed to be concentrated on a finite set. Neural network estimates are defined by minimizing the empirical L ₂ risk over various sets of feedforward neural networks. Nonasymptotic bounds on the L ₂ error of these estimates are presented. The results imply that neural networks are able to adapt to additive regression functions and to regression functions which are a sum of ridge functions, and hence are able to circumvent the curse of dimensionality in these cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Andrews D.W.K., Whang Y.J. (1990). Additive interactive regression models: Circumvention of the curse of dimensionality. Econometric Theory 6:466–479
Article MathSciNet Google Scholar
Anthony M., Bartlett P. (1999). Neural network learning: theoretical foundations. Cambridge University Press, Cambridge
MATH Google Scholar
Barron A.R. (1991). Complexity regularization with application to artificial neural networks. In: Roussas G. (eds). Nonparametric functional estimation and related topics, NATO ASI Series. Kluwer, Dodrecht, pp. 561–576
Google Scholar
Barron A.R. (1994). Approximation and estimation bounds for artificial neural networks. Machine Learning 14:115–133
MATH Google Scholar
Bickel P.J., Klaasen C.A.J., Ritov Y., Wellner J.A. (1993). Efficent and adaptive estimation for semiparametric models. The John Hopkins University Press, Baltimore
Google Scholar
Breiman L. (1993). Fitting additive models to regression data. Computational Statistics and Data Analysis 15:13–46
Article MATH MathSciNet Google Scholar
Breiman L., Freiman J.H. (1985). Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association 80:580–598
Article MATH MathSciNet Google Scholar
Burman P. (1990). Estimation of generalized additive models. Journal of Multivariate Analysis 32:230–255
Article MATH MathSciNet Google Scholar
Chen Z. (1991). Interaction spline models and their convergence rates. Annals of Statistics 19:1855–1868
MATH MathSciNet Google Scholar
Cybenko G. (1989). Approximations by superpositions of sigmoidal functions. Mathematic Control, Signals, Systems 2:303–314
Article MATH MathSciNet Google Scholar
Devroye L., Györfi L., Lugosi G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, Berlin Heidelberg New York
MATH Google Scholar
Friedman J.H., Stuetzle W. (1981). Projection pursuit regression. Journal of the American Statistical Association 76:817–823
Article MathSciNet Google Scholar
Funahashi K. (1989). On the approximate realization of continuous mappings by neural networks. Neural Networks 2:183–192
Article Google Scholar
Györfi L., Kohler M., Krzyżak A., Walk H. (2002). A distribution-free theory of nonparametric regression. Springer, Berlin Heidelberg New York
MATH Google Scholar
Hamers M., Kohler M. (2003). A bound on the expected maximal deviation of averages from their means. Statistics & Probability Letters 62:137–144
MATH MathSciNet Google Scholar
Hamers M., Kohler M. (2004). How well can a regression function be estimated if the distribution of the (random) design is concentrated on a finite set?. Journal of Statistical Planning and Inference 123:377–394
Article MATH MathSciNet Google Scholar
Hastie T., Tibshirani R.J. (1990). Generalized additive models. Chapman and Hall, London
MATH Google Scholar
Hertz J., Krogh A., Palmer R.G. (1991). Introduction to the Theory of Neural Computation. Addison-Wesley, Redwood City
Google Scholar
Hornik K., Stinchcombe M., White H. (1989). Multi-layer feedforward networks are universal approximators. Neural Networks 2:359–366
Article Google Scholar
Huang J. (1998). Projection estimation in multiple regression with applications to functional anova models. Annals of Statistics 26:242–272
Article MATH MathSciNet Google Scholar
Kohler M. (1998). Nonparametric regression function estimation using interaction least squares splines and complexity regularization. Metrika 47:147–163
MATH MathSciNet Google Scholar
Lee W.S., Bartlett P.L., Williamson R.C. (1996). Efficient agnostic learning of neural networks with bounded fan–in. IEEE Transactions on Information Theory 42:2118–2132
Article MATH MathSciNet Google Scholar
Linton O.B. (1997). Efficient estimation of additive nonparametric regression models. Biometrika 84:469–474
Article MATH MathSciNet Google Scholar
Linton O.B., Härdle W. (1996). Estimating additive regression models with known links. Biometrika 83:529–540
Article MATH MathSciNet Google Scholar
Linton O.B., Nielsen J.B. (1995). A kernel method of estimating structured nonparametric regression based on marginal integration. Biometrika 82:93–100
MATH MathSciNet Google Scholar
Lugosi G., Zeger K. (1995). Nonparametric estimation via empirical risk minimization. IEEE Transactions on Information Theory 41:677–687
Article MATH MathSciNet Google Scholar
McCulloch W.S., Pitts W. (1943). A logical calculus of ideas immanent in neural activity. Bulletin of Mathematical Biophysics 5:115–133
Article MATH MathSciNet Google Scholar
McGaffrey D.F., Gallant A.R. (1994). Convergence rates for single hidden layer feedforward networks. Neural Networks 7:147–158
Article Google Scholar
Minsky M.L., Papert S. (1969). Perceptrons: An introduction to computational geometry. MIT Press, Cambridge
MATH Google Scholar
Newey W.K. (1994). Kernel estimation of partial means and a general variance estimator. Econometric Theory 10:233–253
MathSciNet Google Scholar
Pollard D. (1984). Convergence of stochastic processes. Springer, Berlin Heidelberg New York
MATH Google Scholar
Ripley B.D. (1996). Pattern recognition and neural networks. Cambridge University Press, Cambridge
MATH Google Scholar
Rummelhart D.E., McClelland J.L. (1986). Parallel distributed processing: explorations in microstructure of cognition Vol. 1. Foundations. MIT Press, Cambridge
Google Scholar
Stone C.J. (1982). Optimal global rates of convergence for nonparametric regression. Annals of Statistics 10:1040–1053
MATH MathSciNet Google Scholar
Stone C.J. (1985). Additive regression and other nonparametric models. Annals of Statistics 13:689–705
MATH MathSciNet Google Scholar
Stone C.J. (1994). The use of polynomial splines and their tensor products in multivariate function estimation. Annals of Statistics 22:118–184
MATH MathSciNet Google Scholar
Wahba G., Wang Y., Gu, C., Klein R., Klein B. (1995). Smoothing spline anova for exponential families, with application to the wisconsin epidemiological study of diabetic retinopathy. Annals of Statistics 23:1865–1895
Article MATH MathSciNet Google Scholar
White H. (1990). Connectionist nonparametric regression: multilayer feedforward networks can learn arbitrary mappings. Neural Networks 3:535–549
Article Google Scholar

Download references

Author information

Authors and Affiliations

Fachbereich Mathematik, Universität Stuttgart, Pfaffenwaldring 57, D-70569, Stuttgart, Germany
Michael Hamers
Fachrichtung Mathematik, Universität des Saarlandes, Postfach 151150, D-66041, Saarbrücken, Germany
Michael Kohler

Authors

Michael Hamers
View author publications
You can also search for this author in PubMed Google Scholar
Michael Kohler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Hamers.

About this article

Cite this article

Hamers, M., Kohler, M. Nonasymptotic Bounds on the L ₂ Error of Neural Network Regression Estimates. Ann Inst Stat Math 58, 131–151 (2006). https://doi.org/10.1007/s10463-005-0005-9

Download citation

Received: 20 November 2003
Revised: 24 November 2004
Published: 15 March 2006
Issue Date: March 2006
DOI: https://doi.org/10.1007/s10463-005-0005-9

Nonasymptotic Bounds on the L ₂ Error of Neural Network Regression Estimates

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On the universal consistency of an over-parametrized deep neural network estimate learned by gradient descent

On Existence of Explicit Asymptotically Normal Estimators in Nonlinear Regression Problems

Asymptotics for Regression Models Under Loss of Identifiability

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On the universal consistency of an over-parametrized deep neural network estimate learned by gradient descent

On Existence of Explicit Asymptotically Normal Estimators in Nonlinear Regression Problems

Asymptotics for Regression Models Under Loss of Identifiability

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation

Nonasymptotic Bounds on the L ₂ Error of Neural Network Regression Estimates