×

Fully asynchronous policy evaluation in distributed reinforcement learning over networks. (English) Zbl 1480.93027

Summary: This paper proposes a fully asynchronous scheme for the policy evaluation problem of distributed reinforcement learning (DisRL) over directed peer-to-peer networks. Without waiting for any other node of the network, each node can locally update its value function at any time using (possibly delayed) information from its neighbors. This is in sharp contrast to the gossip-based scheme where a pair of nodes concurrently update. Even though the fully asynchronous setting involves a difficult multi-timescale decision problem, we design a novel incremental aggregated gradient (IAG) based distributed algorithm and develop a push-pull augmented graph approach to prove its exact convergence at a linear rate of \(\mathcal{O} ( c^k )\) where \(c \in (0, 1)\) and \(k\) is the total number of updates within the entire network. Finally, numerical experiments validate that our method speeds up linearly with respect to the number of nodes, and is robust to straggler nodes.

MSC:

93A16 Multi-agent systems
93B70 Networked control
68T05 Learning and adaptive systems in artificial intelligence
68W15 Distributed algorithms

Software:

IMPALA; Saga; AsySPA

References:

[1] Assran, M.; Rabbat, M., An empirical comparison of multi-agent optimization algorithms, (2017 IEEE global conference on signal and information processing (2017)), 573-577
[2] Assran, M. S.; Rabbat, M. G., Asynchronous gradient push, IEEE Transactions on Automatic Control, 66, 1, 168-183 (2021) · Zbl 1536.93788
[3] Assran, M.; Romoff, J.; Ballas, N.; Pineau, J.; Rabbat, M., Gossip-based actor-learner architectures for deep reinforcement learning, (Advances in neural information processing systems, Vol. 32 (2019)), 13320-13330
[4] Bertsekas, D. P.; Tsitsiklis, J. N., Parallel and distributed computation: Numerical methods (1989), Athena Scientific · Zbl 0743.65107
[5] Cassano, L.; Yuan, K.; Sayed, A. H., Multiagent fully decentralized value function learning with linear convergence rates, IEEE Transactions on Automatic Control, 66, 4, 1497-1512 (2021) · Zbl 1536.93019
[6] Chen, T.; Zhang, K.; Giannakis, G.; Başar, T., Communication-efficient distributed reinforcement learning (2018), arXiv preprint arXiv:1812.03239
[7] Defazio, A.; Bach, F.; Lacoste-Julien, S., SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, (Advances in neural information processing systems, Vol. 27 (2014)), 1646-1654
[8] Ding, D.; Wei, X.; Yang, Z.; Wang, Z.; Jovanović, M. R., Fast multi-agent temporal-difference learning via homotopy stochastic primal-dual optimization (2019), arXiv preprint arXiv:1908.02805
[9] Doan, T., Maguluri, S., & Romberg, J. (2019). Finite-time analysis of distributed TD(0) with linear function approximation on multi-agent reinforcement learning. In Proceedings of the 36th international conference on machine learning, Vol. 97 (pp. 1626-1635).
[10] Du, S. S., Chen, J., Li, L., Xiao, L., & Zhou, D. (2017). Stochastic variance reduction methods for policy evaluation. In Proceedings of the 34th international conference on machine learning, Vol. 70 (pp. 1049-1058).
[11] Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., & Ward, T., et al. (2018). IMPALA: Scalable distributed deep-RL with importance weighted actor-learner architectures. In Proceedings of the 35th international conference on machine learning, Vol. 80 (pp. 1407-1416).
[12] Gurbuzbalaban, M.; Ozdaglar, A.; Parrilo, P. A., On the convergence rate of incremental aggregated gradient algorithms, SIAM Journal on Optimization, 27, 2, 1035-1048 (2017) · Zbl 1366.90195
[13] Haochen, J., & Sra, S. (2019). Random shuffling beats SGD after finite epochs. In Proceedings of the 36th international conference on machine learning, Vol. 97 (pp. 2624-2633).
[14] Herz, A. V.; Marcus, C. M., Distributed dynamics in neural networks, Physical Review E, 47, 3, 2155 (1993)
[15] Kar, S.; Moura, J. M.; Poor, H. V., QD-learning: A collaborative distributed strategy for multi-agent reinforcement learning through consensus+innovations, IEEE Transactions on Signal Processing, 61, 7, 1848-1862 (2013) · Zbl 1393.94293
[16] Kober, J.; Bagnell, J. A.; Peters, J., Reinforcement learning in robotics: A survey, International Journal of Robotics Research, 32, 11, 1238-1274 (2013)
[17] Lian, X., Zhang, W., Zhang, C., & Liu, J. (2018). Asynchronous decentralized parallel stochastic gradient descent. In Proceedings of the 35th international conference on machine learning, Vol. 80 (pp. 3043-3052).
[18] Mannion, P., Mason, K., Devlin, S., Duggan, J., & Howley, E. (2016). Dynamic economic emissions dispatch optimisation using multi-agent reinforcement learning. In Proceedings of the adaptive and learning agents workshop (at AAMAS 2016).
[19] Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., & Harley, T., et al. (2016). Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd international conference on machine learning, Vol. 48 (pp. 1928-1937).
[20] Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, J.; Bellemare, M. G., Human-level control through deep reinforcement learning, Nature, 518, 7540, 529-533 (2015)
[21] Mokhtari, A.; Ribeiro, A., DSA: Decentralized double stochastic averaging gradient algorithm, Journal of Machine Learning Research, 17, 61, 1-35 (2016) · Zbl 1360.68699
[22] Nedić, A.; Olshevsky, A.; Shi, W., Achieving geometric convergence for distributed optimization over time-varying graphs, SIAM Journal on Optimization, 27, 4, 2597-2633 (2017) · Zbl 1387.90189
[23] Nedić, A.; Ozdaglar, A., Convergence rate for consensus with delays, Journal of Global Optimization, 47, 3, 437-456 (2010) · Zbl 1214.93012
[24] Pu, S.; Shi, W.; Xu, J.; Nedić, A., Push-pull gradient methods for distributed optimization in networks, IEEE Transactions on Automatic Control, 66, 1, 1-16 (2021) · Zbl 1536.93337
[25] Qu, G.; Lin, Y.; Wierman, A.; Li, N., Scalable multi-agent reinforcement learning for networked systems with average reward, (Advances in neural information processing systems, Vol. 33 (2020)), 2074-2086
[26] Qu, C.; Mannor, S.; Xu, H.; Qi, Y.; Song, L.; Xiong, J., Value propagation for decentralized networked deep multi-agent reinforcement learning, (Advances in neural information processing systems, Vol. 32 (2019)), 1182-1191
[27] Qureshi, M. I.; Xin, R.; Kar, S.; Khan, U. A., Push-SAGA: A decentralized stochastic algorithm with variance reduction over directed graphs (2020), arXiv preprint arXiv:2008.06082
[28] Ren, J., & Haupt, J. (2019). A communication efficient hierarchical distributed optimization algorithm for multi-agent reinforcement learning. In Real-world sequential decision making workshop at international conference on machine learning.
[29] Schmidt, M.; Le Roux, N.; Bach, F., Minimizing finite sums with the stochastic average gradient, Mathematical Programming, 162, 1-2, 83-112 (2017) · Zbl 1358.90073
[30] Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A., Mastering the game of go without human knowledge, Nature, 550, 7676, 354-359 (2017)
[31] Sutton, R. S.; Barto, A. G., Reinforcement learning: An introduction (2018), MIT Press · Zbl 1407.68009
[32] Sutton, R. S.; Maei, H. R.; Precup, D.; Bhatnagar, S.; Silver, D.; Szepesvári, C., Fast gradient-descent methods for temporal-difference learning with linear function approximation, (Annual international conference on machine learning (2009), ACM), 993-1000
[33] Tian, Y.; Sun, Y.; Scutari, G., Achieving linear convergence in distributed asynchronous multiagent optimization, IEEE Transactions on Automatic Control, 65, 12, 5264-5279 (2020) · Zbl 1536.93048
[34] Touri, B., Product of random stochastic matrices and distributed averaging (2012), Springer Science & Business Media · Zbl 1244.15023
[35] Van der Pol, E., & Oliehoek, F. A. (2016). Coordinated deep reinforcement learners for traffic light control. In Proceedings of learning, inference and control of multi-agent systems (at NIPS 2016).
[36] Vinyals, O.; Babuschkin, I.; Czarnecki, W. M.; Mathieu, M.; Dudzik, A.; Chung, J., Grandmaster level in StarCraft II using multi-agent reinforcement learning, Nature, 575, 7782, 350-354 (2019)
[37] Wai, H.-T.; Yang, Z.; Wang, Z.; Hong, M., Multi-agent reinforcement learning via double averaging primal-dual optimization, (Advances in neural information processing systems, Vol. 31 (2018)), 9649-9660
[38] Xie, P.; You, K.; Tempo, R.; Song, S.; Wu, C., Distributed convex optimization with inequality constraints over time-varying unbalanced digraphs, IEEE Transactions on Automatic Control, 63, 12, 4331-4337 (2018) · Zbl 1423.90193
[39] Xin, R.; Khan, U. A., A linear algorithm for optimization over directed graphs with geometric convergence, IEEE Control Systems Letters, 2, 3, 315-320 (2018)
[40] Xin, R.; Khan, U. A.; Kar, S., Variance-reduced decentralized stochastic optimization with accelerated convergence, IEEE Transactions on Signal Processing, 68, 6255-6271 (2020) · Zbl 07591175
[41] Xin, R.; Pu, S.; Nedić, A.; Khan, U. A., A general framework for decentralized optimization with first-order methods, Proceedings of the IEEE, 108, 11, 1869-1889 (2020)
[42] Xu, J.; Zhu, S.; Soh, Y. C.; Xie, L., Convergence of asynchronous distributed gradient methods over stochastic networks, IEEE Transactions on Automatic Control, 63, 2, 434-448 (2017) · Zbl 1390.90433
[43] Zhang, K.; Yang, Z.; Başar, T., Multi-agent reinforcement learning: A selective overview of theories and algorithms, Handbook of reinforcement learning and control, 321-384 (2021) · Zbl 07608712
[44] Zhang, K., Yang, Z., Liu, H., Zhang, T., & Başar, T. (2018). Fully decentralized multi-agent reinforcement learning with networked agents. In Proceedings of the 35th international conference on machine learning, Vol. 80 (pp. 5872-5881).
[45] Zhang, J.; You, K., Asynchronous decentralized optimization in directed networks (2019), arXiv preprint arXiv:1901.08215
[46] Zhang, J.; You, K., AsySPA: An exact asynchronous algorithm for convex optimization over digraphs, IEEE Transactions on Automatic Control, 65, 6, 2494-2509 (2020) · Zbl 07256364
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.