Over-parametrized deep neural networks minimizing the empirical risk do not generalize well

November 2021 Over-parametrized deep neural networks minimizing the empirical risk do not generalize well

Michael Kohler, Adam Krzyżak

Author Affiliations +

Bernoulli 27(4): 2564-2597 (November 2021). DOI: 10.3150/21-BEJ1323

Abstract

Recently it was shown in several papers that backpropagation is able to find the global minimum of the empirical risk on the training data using over-parametrized deep neural networks. In this paper, a similar result is shown for deep neural networks with the sigmoidal squasher activation function in a regression setting, and a lower bound is presented which proves that these networks do not generalize well on a new data in the sense that networks which minimize the empirical risk do not achieve the optimal minimax rate of convergence for estimation of smooth regression functions.

Acknowledgements

The authors would like to thank the Associate Editor and three anonymous referees for various invaluable comment, which helped very much to improve the presentation. This work was supported in part by the Natural Sciences and Engineering Research Council of Canada under Grant RGPIN-2015-06412 and the second author would like to thank NSERC for funding this work.

Citation

Download Citation

Michael Kohler. Adam Krzyżak. "Over-parametrized deep neural networks minimizing the empirical risk do not generalize well." Bernoulli 27 (4) 2564 - 2597, November 2021. https://doi.org/10.3150/21-BEJ1323