×

An adjusted payoff-based procedure for normal form games. (English) Zbl 1349.91062

Summary: We study a simple adaptive model in the framework of an \(N\)-player normal form game. The model consists of a repeated game where the players only know their own action space and their own payoff scored at each stage, not those of the other agents. Each player, in order to update her mixed action, computes the average vector payoff she has obtained by using the number of times she has played each pure action. The resulting stochastic process is analyzed via the ODE method from stochastic approximation theory. We are interested in the convergence of the process to rest points of the related continuous dynamics. Results concerning almost sure convergence and convergence with positive probability are obtained and applied to a traffic game. We also provide some examples where convergence occurs with probability zero.

MSC:

91A26 Rationality and learning in game theory
91A10 Noncooperative games
91A06 \(n\)-person games, \(n>2\)
91A20 Multistage and repeated games
62L20 Stochastic approximation
93E35 Stochastic learning and adaptive control

References:

[1] Alós-Ferrer C, Netzer N (2010) The logit-response dynamics. Games Econom. Behav. 68(2):413-427. CrossRef · Zbl 1207.91017
[2] Beggs AW (2005) On the convergence of reinforcement learning. J. Econom. Theory 122(1):1-36. CrossRef · Zbl 1118.91025
[3] Benaïm M (1999) Dynamics of stochastic approximation algorithms. Séminaire de Probabilités, XXXIII, Lecture Notes in Math., Vol. 1709 (Springer, Berlin), 1-68. CrossRef · Zbl 0955.62085
[4] Benveniste A, Métivier M, Priouret P (1990) Adaptive Algorithms and Stochastic Approximations (Springer, Berlin). CrossRef
[5] Blume L (1993) The statistical mechanics of strategic interaction. Games Econom. Behav. 5(3):387-424. CrossRef · Zbl 0797.90123
[6] Börgers T, Sarin R (1997) Learning through reinforcement and replicator dynamics. J. Econom. Theory 77(1):1-14. CrossRef · Zbl 0892.90198
[7] Brandière O, Duflo M (1996) Les algorithmes stochastiques contournent-ils les pièges?Ann. Inst. H. Poincaré Probab. Statist. 32(3):395-427. · Zbl 0849.62043
[8] Brown GW (1951) Iterative solution of games by fictitious play. Activity Analysis of Production and Allocation (John Wiley & Sons, New York), 374-376.
[9] Chen HF (2002) Stochastic Approximation and Its Applications (Kluwer Academic Publishers, Dordrecht, Netherlands).
[10] Cominetti R, Melo E, Sorin S (2010) A payoff-based learning procedure and its application to traffic games. Games Econom. Behav. 70(1):71-83. CrossRef · Zbl 1244.91012
[11] Conley C (1978) Isolated Invariant Sets and the Morse Index (American Mathematical Society, Providence, RI). CrossRef · Zbl 0397.34056
[12] Duflo M (1997) Random Iterative Models, Applications of Mathematics, Vol. 34 (Springer, Berlin). CrossRef
[13] Erev I, Roth AE (1998) Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. Amer. Econom. Rev. 88(4):848-881.
[14] Freund Y, Schapire RE (1999) Adaptive game playing using multiplicative weights. Games Econom. Behav. 29(1-2):79-103. CrossRef · Zbl 0964.91007
[15] Fudenberg D, Levine DK (1998) The Theory of Learning in Games (MIT Press, Cambridge, MA).
[16] Hart S, Mas-Colell A (2000) A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5):1127-1150. CrossRef · Zbl 1020.91003
[17] Hart S, Mas-Colell A (2001) A reinforcement procedure leading to correlated equilibrium. Economics Essays: A Festschrift for Werner Hildebrand (Springer, Berlin), 181-200. CrossRef · Zbl 1023.91004
[18] Hofbauer J, Hopkins E (2005) Learning in perturbed asymmetric games. Games Econom. Behav. 52(1):133-152. CrossRef · Zbl 1099.91028
[19] Kushner HJ, Yin G (2003) Stochastic Approximation and Recursive Algorithms and Applications (Springer-Verlag, New York).
[20] Laslier JF, Topol R, Walliser B (2001) A behavioral learning process in games. Games Econom. Behav. 37(2):340-366. CrossRef · Zbl 1019.91010
[21] Leslie DS, Collins EJ (2005) Individual Q-learning in normal form games. SIAM J. Control Optim. 44(2):495-514. CrossRef · Zbl 1210.93085
[22] Marden JR, Shamma JS (2012) Revisiting log-linear learning: Asynchrony, completeness and payoff-based implementation. Games Econom. Behav. 75(2):788-808. CrossRef · Zbl 1239.91017
[23] Milnor JW (1997) Topology from the Differentiable Viewpoint (Princeton University Press, Princeton, NJ).
[24] Monderer D, Shapley LS (1996) Potential games. Games Econom. Behav. 14(1):124-143. CrossRef · Zbl 0862.90137
[25] Pemantle R (1990) Nonconvergence to unstable points in urn models and stochastic approximations. Ann. Probab. 18(2):698-712. CrossRef · Zbl 0709.60054
[26] Posch M (1997) Cycling in a stochastic learning algorithm for normal form games. J. Evol. Econom. 7(2):193-207. CrossRef
[27] Robbins H, Monro S (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400-407. CrossRef · Zbl 0054.05901
[28] Sandholm WH (2010) Population Games and Evolutionary Dynamics (MIT Press, Cambridge, MA).
[29] Schreiber SJ (2001) Urn models, replicator processes, and random genetic drift. SIAM J. Appl. Math. 61(6):2148-2167. CrossRef · Zbl 0993.92026
[30] Shiryaev AN (1996) Probability (Springer, New York). CrossRef
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.