×

A decentralized training algorithm for echo state networks in distributed big data applications. (English) Zbl 1414.68074

Summary: The current big data deluge requires innovative solutions for performing efficient inference on large, heterogeneous amounts of information. Apart from the known challenges deriving from high volume and velocity, real-world big data applications may impose additional technological constraints, including the need for a fully decentralized training architecture. While several alternatives exist for training feed-forward neural networks in such a distributed setting, less attention has been devoted to the case of decentralized training of recurrent neural networks (RNNs). In this paper, we propose such an algorithm for a class of RNNs known as Echo State Networks. The algorithm is based on the well-known Alternating Direction Method of Multipliers optimization procedure. It is formulated only in terms of local exchanges between neighboring agents, without reliance on a coordinating node. Additionally, it does not require the communication of training patterns, which is a crucial component in realistic big data implementations. Experimental results on large scale artificial datasets show that it compares favorably with a fully centralized implementation, in terms of speed, efficiency and generalization accuracy.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI

References:

[1] Bakir, G., Predicting structured data (2007), MIT press
[2] Barbarossa, S.; Sardellitti, S.; Di Lorenzo, P., Distributed detection and estimation in wireless sensor networks, (Chellapa, R.; Theodoridis, S., E-Reference signal processing (2013), Elsevier), 329-408
[3] Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J., Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and \(Trends^®\) in Machine Learning, 3, 1, 1-122 (2011) · Zbl 1229.90122
[4] Butcher, J. B.; Verstraeten, D.; Schrauwen, B.; Day, C. R.; Haycock, P. W., Reservoir computing and extreme learning machines for non-linear time-series data analysis, Neural Networks, 38, 76-89 (2013)
[5] Campolucci, P.; Uncini, A.; Piazza, F.; Rao, B. D., On-line learning algorithms for locally recurrent neural networks, IEEE Transactions on Neural Networks, 10, 2, 253-271 (1999)
[6] Cattivelli, F. S.; Lopes, C. G.; Sayed, A. H., Diffusion recursive least-squares for distributed estimation over adaptive networks, IEEE Transactions on Signal Processing, 56, 5, 1865-1877 (2008) · Zbl 1390.94115
[7] Cevher, V.; Becker, S.; Schmidt, M., Convex optimization for big data: scalable, randomized, and parallel algorithms for big data analytics, IEEE Signal Processing Magazine, 31, 5, 32-43 (2014)
[8] Chu, C.-T.; Kim, S. K.; Lin, Y. A.; Yu, Y. Y., Map-reduce for machine learning on multicore, (Advances in neural information processing systems (2007)), 281-288
[9] Dean, J.; Corrado, G.; Monga, R.; Chen, K.; Devin, M.; Mao, M., Large scale distributed deep networks, (Advances in neural information processing systems (2012)), 1223-1231
[10] Di Lorenzo, P.; Sayed, A. H., Sparse distributed learning based on diffusion adaptation, IEEE Transactions on Signal Processing, 61, 6, 1419-1433 (2013) · Zbl 1393.94026
[11] Forero, P. A.; Cano, A.; Giannakis, G. B., Consensus-based distributed support vector machines, The Journal of Machine Learning Research, 11, 1663-1707 (2010) · Zbl 1242.68222
[12] Georgopoulos, L.; Hasler, M., Distributed machine learning in networks by consensus, Neurocomputing, 124, 2-12 (2014)
[13] Hermans, M.; Schrauwen, B., Training and analysing deep recurrent neural networks, (Advances in neural information processing systems (2013)), 190-198
[14] Honeine, P.; Richard, C.; Bermudez, J. C.M.; Snoussi, H., Distributed prediction of time series data with kernels and adaptive filtering techniques in sensor networks, (Proceedings of the 42nd Asilomar conference on signals, systems and computers (2008), IEEE), 246-250
[15] Igelnik, B.; Pao, Y.-H., Stochastic choice of basis functions in adaptive function approximation and the functional-link net, IEEE Transactions on Neural Networks, 6, 6, 1320-1329 (1995)
[16] Jaeger, H., Adaptive nonlinear system identification with echo state networks, (Advances in neural information processing systems (2002)), 593-600
[17] Jaeger, H.; Haas, H., Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication, Science, 304, 5667, 78-80 (2004)
[18] Li, D.; Han, M.; Wang, J., Chaotic time series prediction based on a novel robust echo state network, IEEE Transactions on Neural Networks and Learning Systems, 23, 5, 787-799 (2012)
[19] Lin, X.; Yang, Z.; Song, Y., Short-term stock price prediction based on echo state networks, Expert Systems with Applications, 36, 3, 7313-7317 (2009)
[20] Lu, Y.; Roychowdhury, V.; Vandenberghe, L., Distributed parallel support vector machines in strongly connected networks, IEEE Transactions on Neural Networks, 19, 7, 1167-1178 (2008)
[21] Luitel, B.; Venayagamoorthy, G. K., Decentralized asynchronous learning in cellular neural networks, IEEE Transactions on Neural Networks and Learning Systems, 23, 11, 1755-1766 (2012)
[22] Lukoševičius, M.; Jaeger, H., Reservoir computing approaches to recurrent neural network training, Computer Science Review, 3, 3, 127-149 (2009) · Zbl 1302.68235
[24] Mateos, G.; Bazerque, J. A.; Giannakis, G. B., Distributed sparse linear regression, IEEE Transactions on Signal Processing, 58, 10, 5262-5276 (2010) · Zbl 1391.62133
[25] Monner, D.; Reggia, J. A., A generalized LSTM-like training algorithm for second-order recurrent neural networks, Neural Networks, 25, 70-83 (2012) · Zbl 1259.68172
[26] Navia-Vázquez, A.; Gutierrez-Gonzalez, D.; Parrado-Hernández, E.; Navarro-Abellan, J. J., Distributed support vector machines, IEEE Transactions on Neural Networks, 17, 4, 1091-1097 (2006)
[27] Obst, O., Distributed fault detection in sensor networks using a recurrent neural network, Neural Processing Letters, 40, 3, 261-273 (2014)
[28] Olfati-Saber, R.; Fax, J. A.; Murray, R. M., Consensus and cooperation in networked multi-agent systems, Proceedings of the IEEE, 95, 1, 215-233 (2007) · Zbl 1376.68138
[29] Pao, Y.-H.; Takefuji, Y., Functional-link net computing: theory, system architecture, and functionalities, Computer, 25, 5, 76-79 (1992)
[31] Pearlmutter, B. A., Gradient calculations for dynamic recurrent neural networks: A survey, IEEE Transactions on Neural Networks, 6, 5, 1212-1228 (1995)
[32] Predd, J. B.; Kulkarni, S. R.; Poor, H. V., Distributed learning in wireless sensor networks, IEEE Signal Processing Magazine, 56-69 (2007)
[33] Predd, J. B.; Kulkarni, S. R.; Poor, H. V., A collaborative training algorithm for distributed learning, IEEE Transactions on Information Theory, 55, 4, 1856-1871 (2009) · Zbl 1368.68285
[35] Sardellitti, S.; Giona, M.; Barbarossa, S., Fast distributed average consensus algorithms based on advection-diffusion processes, IEEE Transactions on Signal Processing, 58, 2, 826-842 (2010) · Zbl 1391.93167
[36] Scardapane, S.; Nocco, G.; Comminiello, D.; Scarpiniti, M.; Uncini, A., An effective criterion for pruning reservoir’s connections in echo state networks, (2014 International joint conference on neural networks (IJCNN) (2014), INNS/IEEE), 1205-1212
[37] Scardapane, S.; Wang, D.; Panella, M.; Uncini, A., Distributed learning with random vector functional-link networks, Information Sciences, 301, 271-284 (2015) · Zbl 1360.68711
[38] Scardapane, S.; Wang, D.; Panella, M.; Uncini, A., Distributed music classification using random vector functional-link nets, (2015 International joint conference on neural networks (IJCNN) (2015), INNS/IEEE)
[39] Schliebs, S.; Mohemmed, A.; Kasabov, N., Are probabilistic spiking neural networks suitable for reservoir computing?, (2011 International joint conference on neural networks (IJCNN) (2011), INNS/IEEE), 3156-3163
[41] Skowronski, M. D.; Harris, J. G., Automatic speech recognition using a predictive echo state network classifier, Neural Networks, 20, 3, 414-423 (2007) · Zbl 1132.68663
[42] Steil, J. J., Online reservoir adaptation by intrinsic plasticity for backpropagation-decorrelation and echo state learning, Neural Networks, 20, 3, 353-364 (2007) · Zbl 1132.68594
[43] Sutskever, I.; Vinyals, O.; Le, Q. V.V., Sequence to sequence learning with neural networks, (Advances in neural information processing systems (2014)), 3104-3112
[44] Tong, M. H.; Bickett, A. D.; Christiansen, E. M.; Cottrell, G. W., Learning grammatical structure with echo state networks, Neural Networks, 20, 3, 424-432 (2007) · Zbl 1132.68601
[45] Triefenbach, F.; Jalalvand, A.; Demuynck, K.; Martens, J.-P., Acoustic modeling with hierarchical reservoirs, IEEE Transactions on Audio, Speech and Language Processing, 21, 11, 2439-2450 (2013)
[46] Vandoorne, K.; Dambre, J.; Verstraeten, D.; Schrauwen, B.; Bienstman, P., Parallel reservoir computing using optical amplifiers, IEEE Transactions on Neural Networks, 22, 9, 1469-1481 (2011)
[47] Vandoorne, K.; Mechet, P.; Van Vaerenbergh, T.; Fiers, M.; Morthier, G.; Verstraeten, D., Experimental demonstration of reservoir computing on a silicon photonics chip, Nature Communications, 5 (2014)
[48] Verstraeten, D.; Schrauwen, B.; d’Haene, M.; Stroobandt, D., An experimental unification of reservoir computing methods, Neural Networks, 20, 3, 391-403 (2007) · Zbl 1132.68605
[49] Verykios, V. S.; Bertino, E.; Fovino, I. N.; Provenza, L. P.; Saygin, Y.; Theodoridis, Y., State-of-the-art in privacy preserving data mining, ACM SIGMOD Record, 33, 1, 50-57 (2004)
[50] Wu, X.; Zhu, X.; Wu, G.-Q.; Ding, W., Data mining with big data, The IEEE Transactions on Knowledge and Data Engineering, 26, 1, 97-107 (2014)
[51] Xiao, L.; Boyd, S.; Lall, S., A scheme for robust distributed sensor fusion based on average consensus, (Fourth international symposium on information processing in sensor networks, 2005 (2005), IEEE), 63-70
[52] Xue, Y.; Yang, L.; Haykin, S., Decoupled echo state networks with lateral inhibition, Neural Networks, 20, 3, 365-376 (2007) · Zbl 1132.68610
[53] Yildiz, I. B.; Jaeger, H.; Kiebel, S. J., Re-visiting the echo state property, Neural Networks, 35, 1-9 (2012) · Zbl 1258.68129
[54] Zhang, B.; Miller, D. J.; Wang, Y., Nonlinear system modeling with random matrices: echo state networks revisited, IEEE Transactions on Neural Networks and Learning Systems, 23, 1, 175-182 (2012)
[55] Zinkevich, M.; Weimer, M.; Li, L.; Smola, A. J., Parallelized stochastic gradient descent, (Advances in neural information processing systems (2010)), 2595-2603
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.