×

Nonlinear approximation via compositions. (English) Zbl 1475.41013

Given a function dictionary D and an approximation budget \(N \in \mathbb{N}\), nonlinear approximation seeks the linear combination of the best \(N\) terms \(\{T_n\}, 1\le n\le N\subseteq D\) to approximate a given function \(f\) with the minimum approximation error. Motivated by recent success of deep learning, authors propose dictionaries with functions in a form of compositions, and implement T using ReLU feed-forward neural networks (FNNs) with L hidden layers. They further quantify the improvement of the best \(N\)-term approximation rate in terms of \(N\) when \(L\). Finally, they show that dictionaries consisting of wide FNNs with a few hidden layers are more attractive in terms of computational efficiency than dictionaries with narrow and very deep FNNs for approximating Hölder continuous functions if the number of computer cores is larger than N in parallel computing.

MSC:

41A63 Multidimensional problems
41A25 Rate of convergence, degree of approximation
30D05 Functional equations in the complex plane, iteration and composition of analytic functions of one complex variable
37L65 Special approximation methods (nonlinear Galerkin, etc.) for infinite-dimensional dissipative dynamical systems
68T07 Artificial neural networks and deep learning

Software:

AdaGrad

References:

[1] Anthony, M.; Bartlett, P. L., Neural network learning: Theoretical foundations (2009), Cambridge University Press: Cambridge University Press New York, NY, USA
[2] Barron, A. R., Universal approximation bounds for superpositions of a sigmoidal function, IEEE Transactions on Information Theory, 39, 3, 930-945 (1993) · Zbl 0818.68126
[3] Bartlett, P.; Maiorov, V.; Meir, R., Almost linear VC dimension bounds for piecewise polynomial networks, Neural Computation, 10 (1998), 2159-2173
[4] Bianchini, M.; Scarselli, F., On the complexity of neural network classifiers: A comparison between shallow and deep architectures, IEEE Transactions on Neural Networks Learing Systems, 25, 8, 1553-1565 (2014)
[5] Candes, E. J.; Wakin, M. B., An introduction to compressive sampling, IEEE Signal Processing Magazine, 25, 2, 21-30 (2008)
[6] Chen, S.; Donoho, D., Basis pursuit, (Proceedings of 1994 28th Asilomar conference on signals, systems and computers, Vol. 1 (1994)), 41-44
[7] Cireşan, D. C.; Meier, U.; Masci, J.; Gambardella, L. M.; Schmidhuber, J., Flexible, high performance convolutional neural networks for image classification, (Proceedings of the twenty-second international joint conference on artificial intelligence - volume volume two. Proceedings of the twenty-second international joint conference on artificial intelligence - volume volume two, IJCAI’11 (2011), AAAI Press), 1237-1242, Retrieved from http://dx.doi.org/10.5591/978-1-57735-516-8/IJCAI11-210
[8] Costarelli, D.; Sambucini, A. R., Saturation classes for max-product neural network operators activated by sigmoidal functions, Results in Mathematics, 72, 3, 1555-1569 (2017) · Zbl 1376.41014
[9] Costarelli, D.; Vinti, G., Convergence for a family of neural network operators in orlicz spaces, Mathematische Nachrichten, 290, 2-3, 226-235 (2017), Retrieved from https://onlinelibrary.wiley.com/doi/abs/10.1002/mana.201600006 · Zbl 1373.47010
[10] Costarelli, D.; Vinti, G., Approximation results in orlicz spaces for sequences of kantorovich max-product neural network operators, Results in Mathematics, 73, 1, 1-15 (2018) · Zbl 1390.41019
[11] Cybenko, G., Approximation by superpositions of a sigmoidal function, MCSS, 2, 303-314 (1989) · Zbl 0679.94019
[12] Daubechies, I., Ten lectures on wavelets (1992), Society for Industrial and Applied Mathematics, Retrieved from https://epubs.siam.org/doi/abs/10.1137/1.9781611970104 · Zbl 0776.42018
[13] Davis, G., Adaptive nonlinear approximations (1994)
[14] DeVore, R. A., Nonlinear approximation, Acta Numerica, 7, 51-150 (1998) · Zbl 0931.65007
[15] Devore, R.; Ron, A., Approximation using scattered shifts of a multivariate function, Transactions of the American Mathematical Society, 362, 12, 6205-6229 (2010), Retrieved from http://www.jstor.org/stable/40997201 · Zbl 1215.41004
[16] Donoho, D. L., Compressed sensing, IEEE Transactions on Information Theory, 52, 4, 1289-1306 (2006) · Zbl 1288.94016
[17] Duchi, J.; Hazan, E.; Singer, Y., Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, 12, 2121-2159 (2011), Retrieved from http://dl.acm.org/citation.cfm?id=1953048.2021068 · Zbl 1280.68164
[18] Filip, S.-I.; Javeed, A.; Trefethen, L. N., Smooth random functions, random odes, and Gaussian processes, SIAM Review (2018), (in press) Retrieved from https://hal.inria.fr/hal-01944992
[19] Fukushima, K., Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, Biological Cybernetics, 36, 4, 193-202 (1980), Retrieved from https://doi.org/10.1007/BF00344251 · Zbl 0419.92009
[20] Hangelbroek, T.; Ron, A., Nonlinear approximation using Gaussian kernels, Journal of Functional Analysis, 259, 1, 203-219 (2010), Retrieved from http://www.sciencedirect.com/science/article/pii/S0022123610000467 · Zbl 1203.41015
[21] Harvey, N.; Liaw, C.; Mehrabian, A., Nearly-tight VC-dimension bounds for piecewise linear neural networks, (Kale, S.; Shamir, O., Proceedings of the 2017 conference on learning theory. Proceedings of the 2017 conference on learning theory, Proceedings of machine learning research, Vol. 65 (2017), PMLR: PMLR Amsterdam, Netherlands), 1064-1068, Retrieved from http://proceedings.mlr.press/v65/harvey17a.html
[22] Hornik, K.; Stinchcombe, M.; White, H., Multilayer feedforward networks are universal approximators, Neural Networks, 2, 5, 359-366 (1989), Retrieved from http://www.sciencedirect.com/science/article/pii/0893608089900208 · Zbl 1383.92015
[23] Jiang, J., Design of neural networks for lossless data compression, Optimization and Engineering, 35 (1996), 35-35-7, Retrieved from https://doi.org/10.1117/1.600614
[24] Johnson, R.; Zhang, T., Accelerating stochastic gradient descent using predictive variance reduction, (Proceedings of the 26th international conference on neural information processing systems - Volume 1. Proceedings of the 26th international conference on neural information processing systems - Volume 1, NIPS’13 (2013), Curran Associates Inc.: Curran Associates Inc. USA), 315-323, Retrieved from http://dl.acm.org/citation.cfm?id=2999611.2999647
[25] Joutsensalo, J., Nonlinear data compression and representation by combining self-organizing map and subspace rule, (Proceedings of 1994 IEEE International conference on neural networks (ICNN’94), Vol. 2 (1994)), 637-640
[26] Kawaguchi, K., Deep learning without poor local minima, (Lee, D. D.; Sugiyama, M.; Luxburg, U. V.; Guyon, I.; Garnett, R., Advances in neural information processing systems, Vol. 29 (2016), Curran Associates, Inc.), 586-594, Retrieved from http://papers.nips.cc/paper/6112-deep-learning-without-poor-local-minima.pdf
[27] Kawaguchi, K.; Bengio, Y., Depth with Nonlinearity Creates No Bad Local Minima in ResNets (2018), Retrieved from https://arxiv.org/abs/1810.09038
[28] Kearns, M. J.; Schapire, R. E., Efficient distribution-free learning of probabilistic concepts, Journal of Computer and System Sciences, 48, 3, 464-497 (1994), Retrieved from http://dx.doi.org/10.1016/S0022-0000(05)80062-5 · Zbl 0822.68093
[29] Kingma, D. P., & Ba, J. (2014). Adam: A Method for Stochastic Optimization, CoRR abs/1412.6980, Retrieved from http://arxiv.org/abs/1412.6980; Kingma, D. P., & Ba, J. (2014). Adam: A Method for Stochastic Optimization, CoRR abs/1412.6980, Retrieved from http://arxiv.org/abs/1412.6980
[30] Kumar, V., Introduction to parallel computing (2002), Addison-Wesley Longman Publishing Co., Inc.: Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA
[31] Lewicki, G.; Marino, G., Approximation of functions of finite variation by superpositions of a sigmoidal function, Applied Mathematics Letters, 17, 10, 1147-1152 (2004), Retrieved from http://www.sciencedirect.com/science/article/pii/S089396590481694X · Zbl 1067.41013
[32] Liang, S., & Srikant, R. (2016). Why Deep Neural Networks? CoRR abs/1610.04161. Retrieved from http://arxiv.org/abs/1610.04161; Liang, S., & Srikant, R. (2016). Why Deep Neural Networks? CoRR abs/1610.04161. Retrieved from http://arxiv.org/abs/1610.04161
[33] Lin, S.; Liu, X.; Rong, Y.; Xu, Z., Almost optimal estimates for approximation and learning by radial basis function networks, Machine Learning, 95, 2, 147-164 (2014), Retrieved from https://doi.org/10.1007/s10994-013-5406-z · Zbl 1320.68141
[34] Llanas, B.; Sainz, F., Constructive approximate interpolation by neural networks, Journal of Computational and Applied Mathematics, 188, 2, 283-308 (2006), Retrieved from http://www.sciencedirect.com/science/article/pii/S0377042705002566 · Zbl 1089.65012
[35] Lu, Z., Pu, H., Wang, F., Hu, Z., & Wang, L. (2017). The Expressive Power of Neural Networks: A View from the Width, CoRR abs/1709.02540, Retrieved from http://arxiv.org/abs/1709.02540; Lu, Z., Pu, H., Wang, F., Hu, Z., & Wang, L. (2017). The Expressive Power of Neural Networks: A View from the Width, CoRR abs/1709.02540, Retrieved from http://arxiv.org/abs/1709.02540
[36] Mallat, S. G.; Zhang, Z., Matching pursuits with time-frequency dictionaries, IEEE Transactions on Signal Processing, 41, 12, 3397-3415 (1993) · Zbl 0842.94004
[37] Montanelli, H.; Du, Q., New error bounds for deep networks using sparse grids (2017), Retrieved from https://arxiv.org/abs/1712.08688
[38] Montanelli, H.; Yang, H.; Du, Q., Deep ReLU networks overcome the curse of dimensionality for bandlimited functions (2019), Retrieved from https://arxiv.org/abs/1903.00735v1
[39] Montufar, G. F.; Pascanu, R.; Cho, K.; Bengio, Y., On the number of linear regions of deep neural networks, (Ghahramani, Z.; Welling, M.; Cortes, C.; Lawrence, N. D.; Weinberger, K. Q., Advances in Neural Information Processing Systems 27 (2014), Curran Associates, Inc.), 2924-2932, Retrieved from http://papers.nips.cc/paper/5422-on-the-number-of-linear-regions-of-deep-neural-networks.pdf
[40] Nguyen, Q. N., & Hein, M. (2017). The loss surface of deep and wide neural networks, CoRR abs/1704.08045, Retrieved from http://arxiv.org/abs/1704.08045; Nguyen, Q. N., & Hein, M. (2017). The loss surface of deep and wide neural networks, CoRR abs/1704.08045, Retrieved from http://arxiv.org/abs/1704.08045
[41] Ohlsson, H.; Yang, A. Y.; Dong, R.; Sastry, S. S., Nonlinear basis pursuit, (2013 Asilomar conference on signals, systems and computers (2013)), 115-119
[42] Petersen, P.; Voigtlaender, F., Optimal approximation of piecewise smooth functions using deep relu neural networks, Neural Networks, 108, 296-330 (2018), Retrieved from http://www.sciencedirect.com/science/article/pii/S0893608018302454 · Zbl 1434.68516
[43] Petrushev, P., Multivariate n-term rational and piecewise polynomial approximation, Journal of Approximation Theory, 121, 1, 158-197 (2003), Retrieved from http://www.sciencedirect.com/science/article/pii/S0021904502000606 · Zbl 1019.41004
[44] Rumelhart, D.; McClelland, J.; Group, P. R.; University of California, S. D.P. R.G., Psychological and biological models, A bradford book (1986), MIT Press, Retrieved from https://books.google.com.sg/books?id=davmLgzusB8C
[45] Sakurai, A., Tight bounds for the VC-dimension of piecewise polynomial networks, (Advances in neural information processing systems (1999), Neural information processing systems foundation), 323-329
[46] Scherer, D.; Müller, A.; Behnke, S., Evaluation of pooling operations in convolutional architectures for object recognition, (Diamantaras, K.; Duch, W.; Iliadis, L. S., Artificial neural networks - ICANN 2010 (2010), Springer Berlin Heidelberg: Springer Berlin Heidelberg Berlin, Heidelberg), 92-101
[47] Suzuki, T., Adaptivity of deep reLU network for learning in besov and mixed smooth besov spaces: optimal rate and curse of dimensionality, (International conference on learning representations (2019)), Retrieved from https://openreview.net/forum?id=H1ebTsActm
[48] Tariyal, S., Majumdar, A., Singh, R., & Vatsa, M. 2016. Greedy Deep Dictionary Learning, CoRR, abs/1602.00203, Retrieved from http://arxiv.org/abs/1602.00203; Tariyal, S., Majumdar, A., Singh, R., & Vatsa, M. 2016. Greedy Deep Dictionary Learning, CoRR, abs/1602.00203, Retrieved from http://arxiv.org/abs/1602.00203
[49] The computational work for this article was partially performed on resources of the National Supercomputing Centre, Singapore (https://www.nscc.sg; The computational work for this article was partially performed on resources of the National Supercomputing Centre, Singapore (https://www.nscc.sg
[50] Weinan, E., & Wang, Q. (2018). Exponential Convergence of the Deep Neural Network Approximation for Analytic Functions, CoRR abs/1807.00297. Retrieved from http://arxiv.org/abs/1807.00297; Weinan, E., & Wang, Q. (2018). Exponential Convergence of the Deep Neural Network Approximation for Analytic Functions, CoRR abs/1807.00297. Retrieved from http://arxiv.org/abs/1807.00297 · Zbl 1475.65007
[51] Werbos, P., Beyond regression: New tools for prediction and analysis in the behavioral sciences (1975), Harvard University, Retrieved from https://books.google.com.sg/books?id=z81XmgEACAAJ
[52] Xie, T. F.; Cao, F. L., The rate of approximation of Gaussian radial basis neural networks in continuous function space, Acta Mathematica Sinica, English Series, 29, 2, 295-302 (2013), Retrieved from https://doi.org/10.1007/s10114-012-1369-4 · Zbl 1263.41009
[53] Yarotsky, D., Error bounds for approximations with deep relu networks, Neural Networks, 94, 103-114 (2017), Retrieved from http://www.sciencedirect.com/science/article/pii/S0893608017301545 · Zbl 1429.68260
[54] Yarotsky, D., Optimal approximation of continuous functions by very deep relu networks, (Bubeck, S.; Perchet, V.; Rigollet, P., Proceedings of the 31st conference on learning theory. Proceedings of the 31st conference on learning theory, Proceedings of machine learning research, Vol. 75 (2018), PMLR), 639-649, Retrieved from http://proceedings.mlr.press/v75/yarotsky18a.html
[55] Zhang, S., Approximation Theories for Deep Neural Networks via Function Compositions (2019), the National University of Singapore, (n.d.) submitted for publication
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.