×

Learning with minimal information in continuous games. (English) Zbl 1466.91036

Summary: While payoff-based learning models are almost exclusively devised for finite action games, where players can test every action, it is harder to design such learning processes for continuous games. We construct a stochastic learning rule, designed for games with continuous action sets, which requires no sophistication from the players and is simple to implement: players update their actions according to variations in own payoff between current and previous action. We then analyze its behavior in several classes of continuous games and show that convergence to a stable Nash equilibrium is guaranteed in all games with strategic complements as well as in concave games, while convergence to Nash equilibrium occurs in all locally ordinal potential games as soon as Nash equilibria are isolated.

MSC:

91A26 Rationality and learning in game theory

References:

[1] Arrow, Kenneth J. and LeonidHurwicz (1960), “Stability of the gradient process in n‐person games.” Journal of the Society for Industrial and Applied Mathematics, 8, 280-294. · Zbl 0101.37104
[2] Babichenko, Yakov (2012), “Completely uncoupled dynamics and Nash equilibria.” Games and Economic Behavior, 76, 1-14. · Zbl 1274.91022
[3] Benaïm, Michel (1996), “A dynamical system approach to stochastic approximations.” SIAM Journal on Control and Optimization, 34, 437-472. · Zbl 0841.62072
[4] Benaïm, Michel (1999), “Dynamics of stochastic approximation algorithms.” In Séminaire de Probabilités, XXXIII, volume 1709 of Lecture Notes in Mathematics, 1-68, Springer, Berlin. · Zbl 0955.62085
[5] Benaïm, Michel and MathieuFaure (2012), “Stochastic approximations, cooperative dynamics and supermodular games.” Annals of Applied Probability, 22, 2133-2164. · Zbl 1429.62359
[6] Benaim, Michel, JosefHofbauer, and SylvainSorin (2005), “Stochastic approximations and differential inclusions.” SIAM Journal on Control and Optimization, 44, 328-348. · Zbl 1087.62091
[7] Brandiere, Odile and MarieDuflo (1996), “Les algorithmes stochastiques contournent‐ils les pièges?” Annales de l’I.H.P. Probabilités et Statistiques, 32, 395-427. · Zbl 0849.62043
[8] Bravo, Mario, DavidLeslie, and PanayotisMertikopoulos (2018), “Bandit learning in concave N‐person games.” In Advances in Neural Information Processing Systems 31, 5661-5671.
[9] Dindos, Martin and ClaudioMezzetti (2006), “Better‐reply dynamics and global convergence to Nash equilibrium in aggregative games.” Games and Economic Behavior, 54, 261-292. · Zbl 1125.91004
[10] Foster, Dean P. and H. PeytonYoung (2006), “Regret testing: Learning to play Nash equilibrium without knowing you have an opponent.” Theoretical Economics, 1, 341-367.
[11] Germano, Fabrizio and GáborLugosi (2007), “Global Nash convergence of Foster and Young”s regret testing.” Games and Economic Behavior, 60, 135-154. · Zbl 1155.91318
[12] Hart, Sergiu and AndreuMas‐Colell (2003), “Uncoupled dynamics do not lead to Nash equilibrium.” American Economic Review, 93, 1830-1836.
[13] Hart, Sergiu and AndreuMas‐Colell (2006), “Stochastic uncoupled dynamics and Nash equilibrium.” Games and Economic Behavior, 57, 286-303. · Zbl 1156.91319
[14] Hazan, Elad (2016), “Introduction to online convex optimization.” Foundations and Trends in Optimization, 2, 157-325. · Zbl 1503.68004
[15] Huck, Steffen, Hans‐TheoNormann, and JörgOechssler (2004), “Through trial and error to collusion.” International Economic Review, 45, 205-224.
[16] Mertikopoulos, Panayotis and ZhengyuanZhou (2019), “Learning in games with continuous action spaces and unknown payoff functions.” Mathematical Programming, 173, 465-507. · Zbl 1420.91027
[17] Nesterov, Yurii (2009), “Primal‐dual subgradient methods for convex problems.” Mathematical Programming, 120, 221-259. · Zbl 1191.90038
[18] Pemantle, Robin (1990), “Nonconvergence to unstable points in urn models and stochastic approximations.” Annals of Probability, 18, 698-712. · Zbl 0709.60054
[19] Perkins, Steven and David S.Leslie (2014), “Stochastic fictitious play with continuous action sets.” Journal of Economic Theory, 152, 179-213. · Zbl 1297.91010
[20] Posch, Martin (1997), “Cycling in a stochastic learning algorithm for normal form games.” Journal of Evolutionary Economics, 7, 193-207.
[21] Rosen, J. Ben (1965), “Existence and uniqueness of equilibrium points for concave n‐person games.” Econometrica, 33, 520-534. · Zbl 0142.17603
[22] Ruelle, David (1981), “Small random perturbations of dynamical systems and the definition of attractors.” Communications in Mathematical Physics, 82, 137-151. · Zbl 0482.58017
[23] Tatarenko, Tatiana (2018), “Stochastic learning in potential games: Communication and payoff‐based approaches.” Unpublished paper, arXiv:1804.04403. · Zbl 1511.91009
[24] Vives, Xavier (1990), “Nash equilibrium with strategic complementarities.” Journal of Mathematical Economics, 19, 305-321. · Zbl 0708.90094
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.