×

Expressivity of deep neural networks. (English) Zbl 1523.68080

Grohs, Philipp (ed.) et al., Mathematical aspects of deep learning. Cambridge: Cambridge University Press. 149-199 (2023).
This article is a review of results concerning the expressivity of neural networks, including shallow and deep feedforward networks, convolutional, residual and recurrent neural networks.
This contains results showing the different universal approximation properties of these neural networks (NN) as well as bounds on the complexity of the networks. The authors use a uniformised framework for describing the different NNs and give main ideas of proofs for many results.
The second section introduces the classical results for shallow networks. This includes Theorems 3.6 and 3.7 [V. Maiorov et al., J. Approx. Theory 99, No. 1, 95–111 (1999; Zbl 0940.41009); V. Maiorov and A. Pinkus, Neurocomputing 25, No. 1–3, 81–91 (1999; Zbl 0931.68093); R. A. De Vore et al., Manuscr. Math. 63, No. 4, 469–478 (1989; Zbl 0682.41033)] giving the exponential lower bound for the size of a shallow network with the universal approximation property leading to the “curse of dimensionality”.
The following section lists results for deeper NNs including upper and lower bounds on the complexity. The results in this section are generally more recent and touch topics such as encodability.
In Section 3.5, the authors recall the definition of cartoon-like functions and shearlet systems that have been used for the proof of upper bounds on the complexity of NNs allowing for universality.
Section 3.6 is concerned with approximation theorems with extra assumptions on the data: hierarchical structure, data in a submanifold, solutions of PDEs. And the following section contains general remarks on shallow versions of deep networks with many references to the precise results.
In the last section, the authors describe results related to specific network architectures: convolutional neural networks (CNNs), residual neural networks (ResNets) and a general version of recurrent neural networks (RNNs). This includes result on CNNs showing that they can approximate continuous functions on images and results related to masking/segmentation, the universality properties of ResNets, and sequence-to-sequence RNNs.
For the entire collection see [Zbl 1504.68008].

MSC:

68T07 Artificial neural networks and deep learning