×

Randomized sampling for large zero-sum games. (English) Zbl 1319.91012

Summary: This paper addresses the solution of large zero-sum matrix games using randomized methods. We formalize a procedure, termed as the sampled security policy (SSP) algorithm, by which a player can compute policies that, with a high confidence, are security policies against an adversary using randomized methods to explore the possible outcomes of the game. The SSP algorithm essentially consists of solving a stochastically sampled subgame that is much smaller than the original game. We also propose a randomized algorithm, termed as the sampled security value (SSV) algorithm, which computes a high-confidence security-level (i.e., worst-case outcome) for a given policy, which may or may not have been obtained using the SSP algorithm. For both the SSP and the SSV algorithms we provide results to determine how many samples are needed to guarantee a desired level of confidence. We start by providing results when the two players sample policies with the same distribution and subsequently extend these results to the case of mismatched distributions. We demonstrate the usefulness of these results in a hide-and-seek game that exhibits exponential complexity.

MSC:

91A05 2-person games
91A60 Probabilistic games; gambling
68W20 Randomized algorithms

Software:

PRISM
Full Text: DOI

References:

[1] Alamo, T.; Tempo, R.; Camacho, E. F., Randomized strategies for probabilistic solutions of uncertain feasibility and optimization problems, IEEE Transactions on Automatic Control, 54, 11, 2545-2559 (2009) · Zbl 1367.90106
[3] Basar, T.; Olsder, G. J., Dynamic non-cooperative game theory (1999), SIAM: SIAM Philadelphia, PA, USA · Zbl 0946.91001
[4] Bellman, R., Dynamic programming treatment of the traveling salesman problem, Journal of the Association for Computing Machinery, 9, 61-63 (1962) · Zbl 0106.14102
[9] Browne, C. B.; Powley, E.; Whitehouse, D.; Lucas, S. M.; Cowling, P. I.; Rohlfshagen, P.; Tavener, S.; Perez, D.; Samothrakis, S.; Colton, S., A survey of Monte Carlo tree search methods, IEEE Transactions on Computational Intelligence and AI in Games, 4, 1, 1-43 (2012)
[10] Calafiore, G., On the expected probability of constraint violation in sampled convex programs, Journal of Optimization Theory and Applications, 143, 2, 405-412 (2009) · Zbl 1177.90316
[11] Calafiore, G., Random convex programs, SIAM Journal on Optimization, 20, 6, 3427-3463 (2010) · Zbl 1211.90168
[12] Calafiore, G. C.; Campi, M. C., The scenario approach to robust control design, IEEE Transactions on Automatic Control, 51, 5, 742-753 (2006) · Zbl 1366.93457
[13] Campi, M. C.; Calafiore, G. C., Notes on the scenario design approach, IEEE Transactions on Automatic Control, 54, 2, 382-385 (2009) · Zbl 1367.93223
[14] Campi, M. C.; Garatti, S., The exact feasibility of randomized solutions of robust convex programs, SIAM Journal on Control and Optimization, 19, 3, 1211-1230 (2008) · Zbl 1180.90235
[15] Campi, M. C.; Garatti, S.; Prandini, M., The scenario approach for systems and control design, Annual Reviews in Control, 33, 2, 149-157 (2009)
[16] de Farias, D. P.; Roy, B. V., On constraint sampling in the linear programming approach to approximate dynamic programming, Mathematics of Operations Research, 29, 3, 462-478 (2004) · Zbl 1082.90124
[17] Erdoǧan, E.; Iyengar, G., Ambiguous chance constrained problems and robust optimization, Mathematical Programming, 107, 1-2, 37-61 (2006) · Zbl 1134.90028
[18] Frank, A., Brute force search in games of imperfect information, (Heuristic programming in artificial intelligence 2: the second computer olympiad (1989), Ellis Horwood), 204-209, (Chapter)
[19] Frank, I.; Basin, D., Optimal play against best defence: complexity and heuristics, (Herik, H. J.; Iida, H., Computers and games. Computers and games, Lecture notes in computer science, Vol. 1558 (1999), Springer: Springer Berlin, Heidelberg), 50-73
[23] Hinton, Andrew; Kwiatkowska, Marta; Norman, Gethin; Parker, David, PRISM: a tool for automatic verification of probabilistic systems, (Hermanns, Holger; Palsberg, Jens, Tools and algorithms for the construction and analysis of systems. Tools and algorithms for the construction and analysis of systems, Lecture notes in computer science, Vol. 3920 (2006), Springer: Springer Berlin, Heidelberg), 441-444
[27] Lye, K.-W.; Wing, J., Game strategies in network security, International Journal of Information Security, 4, 71-86 (2005)
[28] Motwani, R.; Raghavan, P., Randomized algorithms (1995), Cambridge University Press · Zbl 0849.68039
[30] Tempo, R.; Bai, E. W.; Dabbene, F., Probabilistic robustness analysis: explicit bounds for the minimum number of samples, Systems & Control Letters, 30, 5, 237-242 (1997) · Zbl 0901.93017
[31] Tempo, R.; Calafiore, G.; Dabbene, F., Randomized algorithms for analysis and control of uncertain systems (2004), Springer-Verlag: Springer-Verlag London
[32] Vapnik, V., Statistical learning theory (1998), John Wiley: John Wiley New York · Zbl 0935.62007
[33] Vidyasagar, M., Statistical learning theory and randomized algorithms for control, IEEE Control Systems Magazine, 18, 6, 69-85 (1998)
[34] Vidyasagar, M.; Blondel, V. D., Probabilistic solutions to some NP—hard matrix problems, Automatica, 37, 9, 1397-1405 (2001) · Zbl 1031.93165
[35] Von Neumann, J., Zur theorie der gesellschaftsspiele, Mathematische Annalen, 100, 295-320 (1928) · JFM 54.0543.02
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.