Document Zbl 1522.68607

A survey of opponent modeling in adversarial domains. (English) Zbl 1522.68607

J. Artif. Intell. Res. (JAIR) 73, 277-327 (2022).

Summary: Opponent modeling is the ability to use prior knowledge and observations in order to predict the behavior of an opponent. This survey presents a comprehensive overview of existing opponent modeling techniques for adversarial domains, many of which must address stochastic, continuous, or concurrent actions, and sparse, partially observable payoff structures. We discuss all the components of opponent modeling systems, including feature extraction, learning algorithms, and strategy abstractions. These discussions lead us to propose a new form of analysis for describing and predicting the evolution of game states over time. We then introduce a new framework that facilitates method comparison, analyze a representative selection of techniques using the proposed framework, and highlight common trends among recently proposed methods. Finally, we list several open problems and discuss future research directions inspired by AI research on opponent modeling and related research in other disciplines.

MSC:

68T42	Agent technology and artificial intelligence
68T05	Learning and adaptive systems in artificial intelligence
68T20	Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.)
91A80	Applications of game theory

Keywords:

game playing; machine learning; multiagent systems; software agents

Cite Review PDF

Full Text: DOI

References:

[1]	Adachi, Y., Ito, M., & Naruse, T. (2016). Classifying the strategies of an opponent team based on a sequence of actions in the RoboCup SSL. InRobot World Cup, pp. 109-120. Springer.
[2]	Ahmadi, M., Lamjiri, A. K., Nevisi, M. M., Habibi, J., & Badie, K. (2003). Using a twolayered case-based reasoning for prediction in soccer coach. InInternational Conference on Machine Learning; Models, Technologies and Applications, pp. 181-185.
[3]	Al-Shedivat, M., Bansal, T., Burda, Y., Sutskever, I., Mordatch, I., & Abbeel, P. (2017). Continuous adaptation via meta-learning in nonstationary and competitive environments.arXiv preprint arXiv:1710.03641.
[4]	Albrecht, S. V., & Stone, P. (2018). Autonomous agents modelling other agents: A comprehensive survey and open problems.Artificial Intelligence,258, 66-95. · Zbl 1433.68460
[5]	Alimadadi, S., Mesbah, A., & Pattabiraman, K. (2018). Inferring hierarchical motifs from execution traces. InProceedings of the 4oth International Conference on Software Engineering (ICSE).
[6]	An, V. C., & Sandholm, T. (2003). AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. InProceedings of the Twentieth International Conference on Machine Learning, pp. 83-90.
[7]	Archibald, C., & Nieves-Rivera, D. (2018). Execution skill estimation. InProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1859-1861.
[8]	Archibald, C., & Nieves-Rivera, D. (2019). Bayesian execution skill estimation. InProceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, Vol. 33, pp. 6014-6021.
[9]	Arora, S., & Doshi, P. (2021). A survey of inverse reinforcement learning: Challenges, methods and progress.Artificial Intelligence, 103500. · Zbl 1519.68207
[10]	Baarslag, T., Hendrikx, M. J. C., Hindriks, K. V., & Jonker, C. M. (2016). Learning about the opponent in automated bilateral negotiation: A comprehensive survey of opponent modeling techniques.Autonomous Agents and Multi-Agent Systems,30(5), 849-898.
[11]	Baez, S. (2015). Predicting opponent team activity in a RoboCup environment.arXiv preprint arXiv:1503.01446.
[12]	Baker, C. L., Tenenbaum, J. B., & Saxe, R. R. (2007). Goal inference as inverse planning. InProceedings of the Annual Meeting of the Cognitive Science Society, Vol. 29.
[13]	Bakkes, S. C., Spronck, P. H., & van Lankveld, G. (2012). Player behavioural modelling for video games.Entertainment Computing,3(3), 71-79.
[14]	Ball, D., & Wyeth, G. (2003). Classifying an opponent’s behaviour in robot soccer. In Proceedings of the Australasian Conference on Robotics and Automation.
[15]	Bard, N., & Bowling, M. (2007). Particle filtering for dynamic agent modelling in simplified poker. InProceedings of the National Conference on Artificial Intelligence, Vol. 22, p. 515.
[16]	Beetz, M., Hoyningen-Huene, N. v., Bandouch, J., Kirchlechner, B., Gedikli, S., & Maldonado, A. (2006). Camera-based observation of football games for analyzing multi-agent activities. InProceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 42-49.
[17]	Beetz, M., Kirchlechner, B., & Lames, M. (2005). Computerized real-time analysis of football games.IEEE Pervasive Computing,4(3), 33-39.
[18]	Belghazi, M. I., Baratin, A., Rajeswar, S., Ozair, S., Bengio, Y., Courville, A., & Hjelm, R. D. (2018). Mine: Mutual information neural estimation.arXiv preprint arXiv:1801.04062.
[19]	Bengio, Y., & Frasconi, P. (1995). An input output HMM architecture. InAdvances in Neural Information Processing Systems, pp. 427-434.
[20]	Bernstein, D. S., Givan, R., Immerman, N., & Zilberstein, S. (2002).The complexity of decentralized control of markov decision processes.Mathematics of Operations Research (MOR),27(4), 819-840. · Zbl 1082.90593
[21]	Bhandari, I., Colet, E., Parker, J., Pines, Z., Pratap, R., & Ramanujam, K. (1997). Advanced scout: Data mining and knowledge discovery in NBA data.Data Mining and Knowledge Discovery,1(1), 121-125.
[22]	Biermann, A. W., & Feldman, J. A. (1972). On the synthesis of finite-state machines from samples of their behavior.IEEE Transactions on Computers,100(6), 592-597. · Zbl 0243.94039
[23]	Biswas, J., Mendoza, J. P., Zhu, D., Choi, B., Klee, S., & Veloso, M. (2014). Opponentdriven planning and execution for pass, attack, and defense in a multi-robot soccer team. InProceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems, pp. 493-500.
[24]	Blaylock, N., & Allen, J. (2006). Fast hierarchical goal schema recognition. InProceedings of the National Conference on Artificial Intelligence, Vol. 21, p. 796.
[25]	Bombini, G., Di Mauro, N., Ferilli, S., & Esposito, F. (2010). Classifying agent behaviour through relational sequential patterns. InKES International Symposium on Agent and Multi-Agent Systems: Technologies and Applications, pp. 273-282. Springer.
[26]	Bowling, M., & Veloso, M. (2001). Rational and convergent learning in stochastic games. InProceedings of the Seventeenth International Joint Conference on Artificial Intelligence, Vol. 17, pp. 1021-1026.
[27]	Brockbank, E., & Vul, E. (2021). Formalizing opponent modeling with the rock, paper, scissors game.Games,12(3), 70. · Zbl 1484.91006
[28]	Brown, N., Bakhtin, A., Lerer, A., & Gong, Q. (2020). Combining deep reinforcement learning and search for imperfect-information games.Proceedings of the Thirty-Fourth Conference on Neural Information Processing Systems.
[29]	Brown, N., & Sandholm, T. (2019). Solving imperfect-information games via discounted regret minimization. InProceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, Vol. 33, pp. 1829-1836.
[30]	Browne, C. B., Powley, E., Whitehouse, D., Lucas, S. M., Cowling, P. I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., & Colton, S. (2012). A survey of Monte Carlo tree search methods.IEEE Transactions on Computational Intelligence and AI in games,4(1), 1-43.
[31]	Browning, B., Bruce, J., Bowling, M., & Veloso, M. (2005). STP: Skills, Tactics, and Plays for multi-robot control in adversarial environments.Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering,219(1), 33-52.
[32]	Bui, H. H., Venkatesh, S., & West, G. (2002). Policy recognition in the abstract hidden Markov model.Journal of Artificial Intelligence Research,17, 451-499. · Zbl 1053.68101
[33]	Butler, S., & Demiris, Y. (2009). Predicting the movements of robot teams using generative models. InDistributed Autonomous Robotic Systems, Vol. 8, pp. 533-542. Springer.
[34]	Carmel, D., & Markovitch, S. (1993). Learning models of opponent’s strategy game playing. InProceedings of the 1993 AAAI Fall Symposium on Games: Planning and Learning.
[35]	Carmel, D., & Markovitch, S. (1996a). Incorporating opponent models into adversary search. InProceedings of the Tenth AAAI Conference on Artificial Intelligence, Vol. 1, pp. 120-125.
[36]	Carmel, D., & Markovitch, S. (1996b). Learning models of intelligent agents. InProceedings of the Thirteenth National Conference on Artificial Intelligence and the Eighth Innovative Applications of Artificial Intelligence Conference, Vol. 1, pp. 62-67.
[37]	Carmel, D., & Markovitch, S. (1996c). Opponent modeling in multi-agent systems.Adaption and Learning in Multi-agent Systems, 40-52.
[38]	Ceren, R., He, K., Doshi, P., & Banerjee, B. (2021). PALO bounds for reinforcement learning in partially observable stochastic games.Neurocomputing,420, 36-56.
[39]	Chakraborty, D., Agmon, N., & Stone, P. (2013). Targeted opponent modeling of memorybounded agents. InProceedings of the Adaptive Learning Agents Workshop (ALA).
[40]	Chen, H., Wang, C., Huang, J., & Gong, J. (2020). Efficient use of heuristics for accelerating XCS-based policy learning in Markov games.arXiv preprint arXiv:2005.12553.
[41]	Chen, S., & Arkin, R. C. (2021). Counter-misdirection in behavior-based multi-robot teams. InIEEE International Conference on Intelligence and Safety for Robotics (ISR).
[42]	Cliff, O. M., Lizier, J. T., Wang, X. R., Wang, P., Obst, O., & Prokopenko, M. (2013). Towards quantifying interaction networks in a football match. InRobot Soccer World Cup, pp. 1-12. Springer.
[43]	Cliff, O. M., Lizier, J. T., Wang, X. R., Wang, P., Obst, O., & Prokopenko, M. (2017). Quantifying long-range interactions and coherent structure in multi-agent dynamics. Artificial life,23(1), 34-57.
[44]	Da Silva, F. L., & Costa, A. H. R. (2019). A survey on transfer learning for multiagent reinforcement learning systems.Journal of Artificial Intelligence Research,64, 645- 703. · Zbl 1489.68221
[45]	Davis, T., Waugh, K., & Bowling, M. (2019). Solving large extensive-form games with strategy constraints. InProceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, Vol. 33, pp. 1861-1868.
[46]	De Weerd, H., Verbrugge, R., & Verheij, B. (2013). How much does it help to know what she knows you know? an agent-based simulation study.Artificial Intelligence,199, 67-92. · Zbl 1284.68567
[47]	Devaney, M., & Ram, A. (1998). Needles in a haystack: Plan recognition in large spatial domains involving multiple agents. InProceedings of the Fifteenth National Conference on Artificial Intelligence and the Tenth Innovative Applications of Artificial Intelligence Conference, pp. 942-947.
[48]	Ding, S., Zhu, H., Jia, W., & Su, C. (2012). A survey on feature extraction for pattern recognition.Artificial Intelligence Review,37(3), 169-180.
[49]	Donkers, H., Uiterwijk, J. W. H. M., & van den Herik, H. J. (2001). Probabilistic opponentmodel search.Information Sciences,135(3-4), 123-149. · Zbl 1002.68780
[50]	Donkers, H. H. L. M. (2003). Nosce hostem: Searching with opponent models..
[51]	Doshi, P., & Gmytrasiewicz, P. J. (2009). Monte Carlo sampling methods for approximating interactive POMDPs.Journal of Artificial Intelligence Research,34, 297-337. · Zbl 1182.68236
[52]	Doucet, A., De Freitas, N., Murphy, K., & Russell, S. (2000). Rao-Blackwellised particle filtering for dynamic Bayesian networks. InProceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pp. 176-183.
[53]	Duda, R. O., Hart, P. E., & Stork, D. G. (2012).Pattern Classification. John Wiley & Sons. · Zbl 0968.68140
[54]	Egorov, M., Kochenderfer, M. J., & Uudmae, J. J. (2016). Target surveillance in adversarial environments using POMDPs. InProceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 2473-2479.
[55]	Emery-Montemerlo, R., Gordon, G., Schneider, J., & Thrun, S. (2004). Approximate solutions for partially observable stochastic games with common payoffs. InProceedings
[56]	Erdogan, C., & Veloso, M. M. (2011).Action selection via learning behavior patterns in multi-robot domains.InProceedings of the Twenty-Second International Joint Conference on Artificial Intelligence.
[57]	Ernst, M. D., Perkins, J. H., Guo, P. J., McCamant, S., Pacheco, C., Tschantz, M. S., & Xiao, C. (2007). The Daikon system for dynamic detection of likely invariants.Science of Computer Programming,69(1-3), 35-45. · Zbl 1161.68390
[58]	Esposito, F., Di Mauro, N., Basile, T., & Ferilli, S. (2008). Multi-dimensional relational sequence mining.Fundamenta Informaticae,89(1), 23-43. · Zbl 1155.68484
[59]	Everett, R., & Roberts, S. (2018). Learning against non-stationary agents with opponent modelling and deep reinforcement learning. In2018 AAAI Spring Symposium Series.
[60]	Fagan, M., & Cunningham, P. (2003). Case-based plan recognition in computer games. In International Conference on Case-Based Reasoning, pp. 161-170. Springer. · Zbl 1045.68703
[61]	Fagundes, M. S., Meneguzzi, F., Bordini, R. H., & Vieira, R. (2014). Dealing with ambiguity in plan recognition under time constraints. InProceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems, pp. 389-396.
[62]	Fard, A. M., Salmani, V., Naghibzadeh, M., Nejad, S. K., & Ahmadi, H. (2007). Game theory-based data mining technique for strategy making of a soccer simulation coach agent. InInternational Symposium on Intelligent Systems and Technologies Applications, Vol. 2007, pp. 54-65.
[63]	Farina, G., Kroer, C., Brown, N., & Sandholm, T. (2019). Stable-predictive optimistic counterfactual regret minimization. InProceedings of the International Conference on Machine Learning, pp. 1853-1862.
[64]	Farouk, G. M., Moawad, I. F., & Aref, M. M. (2017). A machine learning based system for mostly automating opponent modeling in real-time strategy games. InProceedings of the 12th International Conference on Computer Engineering and Systems (ICCES), pp. 337-346. IEEE.
[65]	Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography.Communications of the Association for Computing Machinery,24(6), 381-395.
[66]	Floyd, M. W., Karneeb, J., & Aha, D. W. (2017). Case-based team recognition using learned opponent models. InInternational Conference on Case-Based Reasoning, pp. 123-138. Springer.
[67]	Frank, E., & Witten, I. H. (1998). Generating accurate rule sets without global optimization. InProceedings of the Fifteenth International Conference on Machine Learning.
[68]	Fukushima, T., Nakashima, T., & Akiyama, H. (2017). Online opponent formation identification based on position information.RoboCup 2017.
[69]	Gallego, V., Naveiro, R., & Insua, D. R. (2019). Reinforcement learning under threats. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, Vol. 33, pp. 9939-9940.
[70]	Ganzfried, S., Nowak, A., & Pinales, J. (2018). Successful Nash equilibrium agent for a 3-player imperfect-information game.arXiv preprint arXiv:1804.04789. · Zbl 1401.91018
[71]	Ganzfried, S., & Sandholm, T. (2011). Game theory-based opponent modeling in large imperfect-information games. InThe 10th International Conference on Autonomous Agents and Multiagent Systems, Vol. 2, pp. 533-540.
[72]	Ganzfried, S., & Sandholm, T. (2015). Safe opponent exploitation.ACM Transactions on Economics and Computation (TEAC),3(2), 1-28.
[73]	Gaurav, S., & Ziebart, B. D. (2019). Discriminatively learning inverse optimal control models for predicting human intentions. InInternational Conference on Autonomous Agents and Multiagent Systems.
[74]	Geib, C. W., & Goldman, R. P. (2009). A probabilistic plan recognition algorithm based on plan tree grammars.Artificial Intelligence,173(11), 1101-1132.
[75]	Gold, E. M. (1978). Complexity of automaton identification from given data.Information and Control,37(3), 302-320. · Zbl 0376.68041
[76]	Gold, K. (2010).Training goal recognition online from low-level inputs in an actionadventure game. InSixth Artificial Intelligence and Interactive Digital Entertainment Conference.
[77]	Goldman, A. I., et al. (2012). Theory of mind.The Oxford Handbook of Philosophy of Cognitive Science,1.
[78]	Green, C. (2004). Phased searching with NEAT: Alternating between complexification and simplification.Unpublished manuscript.
[79]	Grover, A., Al-Shedivat, M., Gupta, J. K., Burda, Y., & Edwards, H. (2018). Learning policy representations in multiagent systems.arXiv preprint arXiv:1806.06464.
[80]	Hadfield-Menell, D., Russell, S. J., Abbeel, P., & Dragan, A. (2016). Cooperative inverse reinforcement learning.Advances in Neural Information Processing Systems,29, 3909- 3917.
[81]	Han, K., Veloso, M., et al. (2000). Automated robot behavior recognition. InRobotics Research International Symposium, Vol. 9, pp. 249-256.
[82]	Hansen, E. A., Bernstein, D. S., & Zilberstein, S. (2004). Dynamic programming for partially observable stochastic games. InProceedings of the Eighteenth AAAI Conference on Artificial Intelligence, Vol. 4, pp. 709-715.
[83]	Hawasly, M., & Ramamoorthy, S. (2013). Lifelong transfer learning with an option hierarchy. In2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1341-1346. IEEE.
[84]	Hayes, R., & Beling, P. (2018). Unsupervised hierarchical clustering of build orders in a real-time strategy game.The Computer Games Journal,7(1), 5-26.
[85]	He, H., Boyd-Graber, J., Kwok, K., & Daum´e III, H. (2016). Opponent modeling in deep reinforcement learning. InInternational Conference on Machine Learning, pp. 1804- 1813.
[86]	Hernandez-Leal, P., Kaisers, M., Baarslag, T., & de Cote, E. M. (2017a). A survey of learning in multiagent environments: Dealing with non-stationarity.arXiv preprint arXiv:1707.09183.
[87]	Hernandez-Leal, P., Zhan, Y., Taylor, M. E., Sucar, L. E., & de Cote, E. M. (2017b). Efficiently detecting switches against non-stationary opponents.Autonomous Agents and Multi-Agent Systems,31(4), 767-789.
[88]	Heule, M. J., & Verwer, S. (2013). Software model synthesis using satisfiability solvers. Empirical Software Engineering,18(4), 825-856.
[89]	Hoang, T. N., & Low, K. H. (2013). Interactive pomdp lite: Towards practical planning to predict and exploit intentions for interacting with self-interested agents. InProceedings of the Twenty-Third International Joint Conference on Artificial Intelligence.
[90]	Hoang, T. N., Xiao, Y., Sivakumar, K., Amato, C., & How, J. (2017). Near-optimal adversarial policy switching for decentralized asynchronous multi-agent systems.arXiv preprint arXiv:1710.06525.
[91]	Hoffmann, J., Porteous, J., & Sebastia, L. (2004). Ordered landmarks in planning.Journal of Artificial Intelligence Research,22, 215-278. · Zbl 1080.68670
[92]	Hong, J. (2001). Goal recognition through goal graph analysis.Journal of Artificial Intelligence Research,15, 1-30. · Zbl 0970.68193
[93]	Hong, Z.-W., Su, S.-Y., Shann, T.-Y., Chang, Y.-H., & Lee, C.-Y. (2017). A deep policy inference Q-network for multi-agent systems.arXiv preprint arXiv:1712.07893.
[94]	Howe, A. E., & Cohen, P. R. (1995). Understanding planner behavior.Artificial Intelligence, 76(1-2), 125-166.
[95]	Hsieh, J.-L., & Sun, C.-T. (2008). Building a player strategy model by analyzing replays of real-time strategy games. InProceedings of the 2008 IEEE International Joint Conference on Neural Networks, pp. 3106-3111. IEEE.
[96]	Huo, X., Ni, X. S., & Smith, A. K. (2007). A survey of manifold-based learning methods. Recent Advances in Data Mining of Enterprise Data, 691-745.
[97]	Iglesias, J. A., Ledezma, A., & Sanchis, A. (2009). Caos Coach 2006 Simulation Team: An opponent modelling approach.Computing and Informatics.
[98]	Intille, S. S., & Bobick, A. F. (1999). A framework for recognizing multi-agent action from visual evidence.Proceedings of the Sixteenth National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference, 99(518-525).
[99]	Intille, S. S., & Bobick, A. F. (2001). Recognizing planned, multiperson action.Computer Vision and Image Understanding,81(3), 414-445. · Zbl 1011.68553
[100]	Isaacs, R. (1955). A card game with bluffing.The American Mathematical Monthly,62(2), 99-108. · Zbl 0064.13403
[101]	Izenman, A. J. (2012). Introduction to manifold learning.Wiley Interdisciplinary Reviews: Computational Statistics,4(5), 439-446.
[102]	Johanson, M., & Bowling, M. (2009). Data biased robust counter strategies. InArtificial Intelligence and Statistics, pp. 264-271.
[103]	Johanson, M., Bowling, M., & Zinkevich, M. (2008). Computing robust counter-strategies. InIn Advances in Neural Information Processing Systems.
[104]	Kabanza, F., Bellefeuille, P., Bisson, F., Benaskeur, A. R., & Irandoust, H. (2010). Opponent behaviour recognition for real-time strategy games.Plan, Activity, and Intent Recognition,10(5).
[105]	Kaminka, G. A., Fidanboylu, M., Chang, A., & Veloso, M. M. (2002). Learning the sequential coordinated behavior of teams from observations. InRobot Soccer World Cup, pp. 111-125. Springer.
[106]	Kamrani, F., Luotsinen, L. J., & Løvlid, R. A. (2016). Learning objective agent behavior using a data-driven modeling approach. In2016 IEEE International Conference on Systems, Man, and Cybernetics, pp. 002175-002181. IEEE.
[107]	Kar, D., Ford, B., Gholami, S., Fang, F., Plumptre, A., Tambe, M., Driciru, M., Wanyama, F., Rwetsiba, A., Nsubaga, M., et al. (2017). Cloudy with a chance of poaching: Adversary behavior modeling and forecasting with real-world poaching data. InProceedings of the Sixteenth Conference on Autonomous Agents and MultiAgent Systems, pp. 159-167.
[108]	Keren, S., Gal, A., & Karpas, E. (2015). Goal recognition design for non-optimal agents.. InProceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 3298-3304.
[109]	Kim, M., & Kim, K. (2017). Opponent modeling based on action table for MCTS-based fighting game AI. InProceedings of the IEEE Conference on Computational Intelligence and Games, pp. 178-180.
[110]	Kitano, H., Asada, M., Kuniyoshi, Y., Noda, I., & Osawa, E. (1997). RoboCup: The robot world cup initiative. InProceedings of the First International Conference on Autonomous Agents, AGENTS ’97, pp. 340-347, New York, NY, USA. ACM.
[111]	Kovalchik, S., Ingram, M., Weeratunga, K., & Goncu, C. (2020). Space-time VON CRAMM: Evaluating decision-making in tennis with Variational generatiON of Complete Resolution Arcs via Mixture Modeling.arXiv preprint arXiv:2005.12853.
[112]	Krka, I., Brun, Y., & Medvidovic, N. (2014). Automatic mining of specifications from invocation traces and method invariants. InProceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 178-189. Association for Computing Machinery.
[113]	Kroer, C., & Sandholm, T. (2020). Limited lookahead in imperfect-information games. Artificial Intelligence, 103218. · Zbl 1437.91067
[114]	Kuhlmann, G., Knox, W. B., & Stone, P. (2006). Know thine enemy: A champion RoboCup coach agent. InProceedings of the National Conference on Artificial Intelligence, Vol. 21, p. 1463-1468.
[115]	Lattner, A. D., Miene, A., Visser, U., & Herzog, O. (2005). Sequential pattern mining for situation and behavior prediction in simulated robotic soccer. InRobot Soccer World Cup, pp. 118-129. Springer.
[116]	Laviers, K., & Sukthankar, G. (2011). A real-time opponent modeling system for Rush Football. InProceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Vol. 22, p. 2476-2481.
[117]	Laviers, K., Sukthankar, G., Aha, D. W., Molineaux, M., Darken, C., et al. (2009). Improving offensive performance through opponent modeling. InProceedings of the AAAI Artificial Intelligence and Interactive Digital Entertainment Conference (AIIDE).
[118]	Ledezma, A., Aler, R., Sanchis, A., & Borrajo, D. (2009). OMBO: An opponent modeling approach.AI Communications,22(1), 21-35. · Zbl 1200.68246
[119]	Leece, M., & Jhala, A. (2014). Opponent state modeling in RTS games with limited information using Markov random fields. In2014 IEEE Conference on Computational Intelligence and Games, pp. 1-7. IEEE.
[120]	Li, R., & Chellappa, R. (2010). Group motion segmentation using a spatio-temporal driving force model. InIEEE Conference on Computer Vision and Pattern Recognition, pp. 2038-2045. IEEE.
[121]	Li, R., Chellappa, R., & Zhou, S. K. (2009). Learning multi-modal densities on discriminative temporal interaction manifold for group activity recognition. InIEEE Conference on Computer Vision and Pattern Recognition, pp. 2450-2457. IEEE.
[122]	Li, W., Wang, X., Jin, B., Sheng, J., & Zha, H. (2021). Dealing with non-stationarity in multi-agent reinforcement learning via trust region decomposition.arXiv preprint arXiv:2102.10616.
[123]	Lin, X., Beling, P. A., & Cogill, R. (2017). Multiagent inverse reinforcement learning for two-person zero-sum games.IEEE Transactions on Games,10(1), 56-68.
[124]	Liu, A., Chen, J., Yu, M., Zhai, Y., Zhou, X., & Liu, J. (2018).Watch the unobserved: A simple approach to parallelizing Monte Carlo tree search.arXiv preprint arXiv:1810.11755.
[125]	Lockett, A. J., Chen, C. L., & Miikkulainen, R. (2007). Evolving explicit opponent models in game playing. InProceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, pp. 2106-2113. Association for Computing Machinery.
[126]	Lu, F., Yamamoto, K., Nomura, L. H., Mizuno, S., Lee, Y., & Thawonmas, R. (2013). Fighting game artificial intelligence competition platform. InProceedings of the 2nd IEEE Global Conference on Consumer Electronics, pp. 320-323. IEEE.
[127]	Lucey, P., Oliver, D., Carr, P., Roth, J., & Matthews, I. (2013). Assessing team strategy using spatiotemporal data. InProceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1366-1374.
[128]	Mar´ın, C. A., Castillo, L. P., & Garrido, L. (2005). Dynamic adaptive opponent modeling: Predicting opponent motion while playing soccer. InFifth European Workshop on Adaptive Agents and Multiagent Systems, Paris, France.
[129]	Markovitch, S., & Reger, R. (2005). Learning and exploiting relative weaknesses of opponent agents.Autonomous Agents and Multi-Agent Systems,10(2), 103-130.
[130]	Martin, B. (1995). Instance-based learning: Nearest neighbour with generalisation.University of Waikato, Department of Computer Science.
[131]	Masters, P., & Sardina, S. (2017). Cost-based goal recognition for path-planning. InProceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pp. 750-758.
[132]	Masters, P., & Sardina, S. (2021). Expecting the unexpected: Goal recognition for rational and irrational agents.Artificial Intelligence, 103490. · Zbl 1519.68235
[133]	McCracken, P., & Bowling, M. (2004). Safe strategies for agent modelling in games. In Proceedings of the AAAI Fall Symposium on Artificial Multi-agent Learning, pp. 103- 110.
[134]	Meek, C., & Glymour, C. (1994). Conditioning and intervening.The British journal for the philosophy of science,45(4), 1001-1021. · Zbl 0813.62003
[135]	Mescheder, D., Tuyls, K., & Kaisers, M. (2011).Opponent modeling with POMDPs. InProceedings of the 23rd Belgium-Netherlands Conference on Artificial Intelligence (BNAIC 2011), pp. 152-159.
[136]	Molineaux, M., Aha, D. W., & Sukthankar, G. (2009). Beating the defense: Using plan recognition to inform learning agents. Tech. rep., DTIC Document.
[137]	Mordatch, I., & Abbeel, P. (2018). Emergence of grounded compositional language in multiagent populations. InProceedings of the Thirty-Second AAAI Conference on Artificial Intelligence.
[138]	M¨ullner, D. (2011). Modern hierarchical agglomerative clustering algorithms.arXiv preprint arXiv:1109.2378.
[139]	Nair, R., Tambe, M., Marsella, S., & Raines, T. (2004). Automated assistants for analyzing team behaviors.Autonomous Agents and Multi-Agent Systems,8(1), 69-111.
[140]	Ng, B., Boakye, K., Meyers, C., & Wang, A. (2012). Bayes-adaptive interactive POMDPs. InProceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence.
[141]	Ng, B., Meyers, C., Boakye, K., & Nitao, J. (2010). Towards applying interactive POMDPs to real-world adversary modeling. InThe Twenty-Second Annual Conference on Innovative Applications of Artificial Intelligence (IAAI).
[142]	Nguyen, T. H., Yang, R., Azaria, A., Kraus, S., & Tambe, M. (2013).Analyzing the effectiveness of adversary modeling in security games. InProceedings of the TwentySeventh AAAI Conference on Artificial Intelligence.
[143]	Ontan´on, S., Synnaeve, G., Uriarte, A., Richoux, F., Churchill, D., & Preuss, M. (2013). A survey of real-time strategy game AI research and competition in StarCraft.IEEE Transactions on Computational Intelligence and AI in Games,5(4), 293-311.
[144]	Pan, S. J., & Yang, Q. (2010). A survey on transfer learning.IEEE Transactions on Knowledge and Data Engineering,22(10), 1345-1359.
[145]	Panella, A., & Gmytrasiewicz, P. (2017). Interactive POMDPs with finite-state models of other agents.Autonomous Agents and Multi-Agent Systems,31(4), 861-904.
[146]	Papoudakis, G., & Albrecht, S. V. (2020). Variational autoencoders for opponent modeling in multi-agent systems.arXiv preprint arXiv:2001.10829.
[147]	Papoudakis, G., Christianos, F., Rahman, A., & Albrecht, S. V. (2019).Dealing with non-stationarity in multi-agent deep reinforcement learning.arXiv preprint arXiv:1906.04737.
[148]	Pereira, R. F., Oren, N., & Meneguzzi, F. (2017). Landmark-based heuristics for goal recognition. InProceedings of the Thirty-First AAAI Conference on Artificial Intelligence. · Zbl 1478.68331
[149]	Perˇse, M., Kristan, M., Kovaˇciˇc, S., Vuˇckoviˇc, G., & Perˇs, J. (2009). A trajectory-based analysis of coordinated team activity in a basketball game.Computer Vision and Image Understanding,113(5), 612-621.
[150]	Pita, J., Jain, M., Marecki, J., Ord´o˜nez, F., Portway, C., Tambe, M., Western, C., Paruchuri, P., & Kraus, S. (2008). Deployed ARMOR protection: The application of a game theoretic model for security at the Los Angeles International Airport. InProceedings of the Seventh International Joint Conference on Autonomous Agents and Multiagent Systems: Industrial Track, pp. 125-132.
[151]	Pitt, L., & Warmuth, M. K. (1993). The minimum consistent DFA problem cannot be approximated within any polynomial.Journal of the ACM (JACM),40(1), 95-142. · Zbl 0774.68084
[152]	Pless, R., & Souvenir, R. (2009). A survey of manifold learning for images.IPSJ Transactions on Computer Vision and Applications,1, 83-94.
[153]	Ponsen, M., De Jong, S., & Lanctot, M. (2011). Computing approximate Nash equilibria and robust best-responses using sampling.Journal of Artificial Intelligence Research, 42, 575-605. · Zbl 1235.91037
[154]	Pourmehr, S., & Dadkhah, C. (2011). An overview on opponent modeling in robocup soccer simulation 2D. InRobot Soccer World Cup, pp. 402-414. Springer.
[155]	Powers, R., & Shoham, Y. (2005). Learning against opponents with bounded memory. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Vol. 5, pp. 817-822.
[156]	Pozanco, A., Yolanda, E., Fern´andez, S., & Borrajo, D. (2018). Counterplanning using goal recognition and landmarks. InProceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp. 4808-4814.
[157]	Premack, D., & Woodruff, G. (1978). Does the chimpanzee have a theory of mind?.Behavioral and Brain Sciences,1(4), 515-526.
[158]	Quinlan, J. R. (1993). Combining instance-based and model-based learning. InProceedings of the Tenth International Conference on Machine Learning, pp. 236-243.
[159]	Rahman, M., & Oh, J. C. (2018).Online learning for patrolling robots against active adversarial attackers.InInternational Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, pp. 477-488. Springer.
[160]	Rakoczy, H., Warneken, F., & Tomasello, M. (2008). The sources of normativity: Young children’s awareness of the normative structure of games.Developmental Psychology, 44(3), 875.
[161]	Ramırez, M., & Geffner, H. (2009). Plan recognition as planning. InProceedings of the Twenty-First International Joint Conference on Artifical Intelligence, pp. 1778-1783. Morgan Kaufmann Publishers Inc.
[162]	Ramırez, M., & Geffner, H. (2010). Probabilistic plan recognition using off-the-shelf classical planners. InProceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, pp. 1121-1126.
[163]	Ramırez, M., & Geffner, H. (2011). Goal recognition over POMDPs: Inferring the intention of a POMDP agent. InProceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, pp. 2009-2014.
[164]	Rathnasabapathy, B., Doshi, P., & Gmytrasiewicz, P. (2006). Exact solutions of interactive POMDPs using behavioral equivalence. InProceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 1025-1032.
[165]	Riley, P., & Veloso, M. (2000). On behavior classification in adversarial environments. In Distributed Autonomous Robotic Systems, Vol. 4, pp. 371-380. Springer.
[166]	Riley, P., & Veloso, M. (2001). Recognizing probabilistic opponent movement models. In Robot Soccer World Cup, pp. 453-458. Springer. · Zbl 1050.68878
[167]	Rosman, B., Hawasly, M., & Ramamoorthy, S. (2016). Bayesian policy reuse.Machine Learning,104(1), 99-127. · Zbl 1454.68129
[168]	Rovatsos, M., Weiß, G., & Wolf, M. (2003). Multiagent learning for open systems: A study in opponent classification. InAdaptive Agents and Multi-agent Systems, pp. 66-87. Springer. · Zbl 1032.68705
[169]	Rusu, A. A., Colmenarejo, S. G., Gulcehre, C., Desjardins, G., Kirkpatrick, J., Pascanu, R., Mnih, V., Kavukcuoglu, K., & Hadsell, R. (2015). Policy distillation.arXiv preprint arXiv:1511.06295.
[170]	Sadilek, A., & Kautz, H. A. (2010). Recognizing multi-agent activities from GPS data.. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, Vol. 39, p. 109. · Zbl 1237.68161
[171]	Sandholm, T. (2007). Perspectives on multiagent learning.Artificial Intelligence,171(7), 382-391. · Zbl 1168.68492
[172]	Saria, S., & Mahadevan, S. (2004). Probabilistic plan recognition in multiagent systems. In International Conference on Automated Planning and Scheduling, pp. 287-296.
[173]	Schadd, F., Bakkes, S., & Spronck, P. (2007). Opponent modeling in real-time strategy games. InGAMEON, pp. 61-70.
[174]	Seuken, S., & Zilberstein, S. (2008). Formal models and algorithms for decentralized decision making under uncertainty.Autonomous Agents and Multi-Agent Systems (JAAMAS), 17(2), 190-250.
[175]	Shao, K., Zhu, Y., & Zhao, D. (2018). StarCraft micromanagement with reinforcement learning and curriculum transfer learning.IEEE Transactions on Emerging Topics in Computational Intelligence,3(1), 73-84.
[176]	Sheikhpour, R., Sarram, M. A., Gharaghani, S., & Chahooki, M. A. Z. (2017). A survey on semi-supervised feature selection methods.Pattern Recognition,64, 141-158. · Zbl 1429.68239
[177]	Shen, M., & How, J. P. (2019a). Active perception in adversarial scenarios using maximum entropy deep reinforcement learning.arXiv preprint arXiv:1902.05644.
[178]	Shen, M., & How, J. P. (2019b).Robust opponent modeling via adversarial ensemble reinforcement learning in asymmetric imperfect-information games.arXiv preprint arXiv:1909.08735.
[179]	Shieh, E., An, B., Yang, R., Tambe, M., Baldwin, C., DiRenzo, J., Maule, B., & Meyer, G. (2012). Protect: A deployed game theoretic system to protect the ports of the United States. InProceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, Vol. 1, pp. 13-20.
[180]	Siddiquie, B., Yacoob, Y., & Davis, L. (2009). Recognizing plays in American Football videos.University of Maryland.
[181]	Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. (2016). Mastering the game of go with deep neural networks and tree search.nature,529(7587), 484-489.
[182]	Sonu, E., & Doshi, P. (2015). Scalable solutions of interactive POMDPs using generalized and bounded policy iteration.Autonomous Agents and Multi-Agent Systems,29(3), 455-494.
[183]	Spronck, P., & den Teuling, F. (2010). Player modeling in Civilization IV. InSixth Artificial Intelligence and Interactive Digital Entertainment Conference.
[184]	Stankiewicz, J. A., & Schadd, M. P. (2009). Opponent modeling in Stratego.Natural Computing.
[185]	Stanley, K. O., & Miikkulainen, R. (2002). Continual coevolution through complexification. InProceedings of the 4th Annual Conference on Genetic and Evolutionary Computation, pp. 113-120.
[186]	Steffens, T. (2002). Feature-based declarative opponent-modelling in multi-agent systems. Robot Soccer World Cup.
[187]	Steffens, T. (2005). Similarity-based opponent modelling using imperfect domain theories. InIEEE Conference on Computational Inteligence in Games (CIG).
[188]	Stone, P., Riley, P., & Veloso, M. (2000). Defining and using ideal teammate and opponent agent models: A case study in robotic soccer. InFourth International Conference on MultiAgent Systems, pp. 441-442. IEEE.
[189]	Stone, P., & Veloso, M. (2000). Layered learning. InEuropean Conference on Machine Learning, pp. 369-381. Springer.
[190]	Storcheus, D., Rostamizadeh, A., & Kumar, S. (2015).A survey of modern questions and challenges in feature extraction. InFeature Extraction: Modern Questions and Challenges, pp. 1-18.
[191]	Sturtevant, N. (2004). Current challenges in multi-player game search. InInternational Conference on Computers and Games, pp. 285-300. Springer.
[192]	Sturtevant, N., & Bowling, M. (2006). Robust game play against unknown opponents. In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 713-719.
[193]	Sturtevant, N., Zinkevich, M., & Bowling, M. (2006). Prob-maxn: Playing N-player games with opponent models. InProceedings of the Twentieth AAAI Conference on Artificial Intelligence, Vol. 6, pp. 1057-1063.
[194]	Sukthankar, G., Geib, C., Bui, H. H., Pynadath, D., & Goldman, R. P. (2014).Plan, Activity, and Intent Recognition: Theory and Practice. Newnes.
[195]	Sukthankar, G., & Sycara, K. (2006). Robust recognition of physical team behaviors using spatio-temporal models. InProceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 638-645.
[196]	Sukthankar, G., & Sycara, K. (2007). Policy recognition for multi-player tactical scenarios. InProceedings of the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems, p. 16.
[197]	Synnaeve, G., & Bessiere, P. (2011). A Bayesian model for opening prediction in RTS games with application to StarCraft. InIEEE Conference on Computational Intelligence and Games (CIG), pp. 281-288. IEEE.
[198]	Synnaeve, G., Lin, Z., Gehring, J., Gant, D., Mella, V., Khalidov, V., Carion, N., & Usunier, N. (2018). Forward modeling for partial observation strategy games: A StarCraft defogger. InAdvances in Neural Information Processing Systems, pp. 10738-10748.
[199]	Tak´acs, B., Butler, S., & Demiris, Y. (2007).Multi-agent behaviour segmentation via spectral clustering. InProceedings of the PAIR Workshop at the Twenty-First AAAI Conference on Artificial Intelligence, pp. 74-81.
[200]	Tang, Z., Yu, C., Chen, B., Xu, H., Wang, X., Fang, F., Du, S., Wang, Y., & Wu, Y. (2021). Discovering diverse multi-agent strategic behavior via reward randomization.arXiv preprint arXiv:2103.04564.
[201]	Tang, Z., Zhu, Y., Zhao, D., & Lucas, S. M. (2020). Enhanced rolling horizon evolution algorithm with opponent model learning: Results for the fighting game AI competition. arXiv preprint arXiv:2003.13949.
[202]	Tian, R., Sun, L., & Tomizuka, M. (2021a). Bounded risk-sensitive Markov games: Forward policy design and inverse reward learning with iterative reasoning and cumulative prospect theory. InProceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence.
[203]	Tian, R., Tomizuka, M., & Sun, L. (2021b). Learning human rewards by inferring their latent intelligence levels in multi-agent games: A theory-of-mind approach with application to driving data.arXiv preprint arXiv:2103.04289.
[204]	Torkaman, A., & Safabakhsh, R. (2019). Robust opponent modeling in real-time strategy games using Bayesian Networks.Journal of AI and Data Mining,7(1), 149-159.
[205]	Trevizan, F. W., & Veloso, M. M. (2010). Learning opponent’s strategies in the RoboCup Small Size League.InProceedings of the Ninth International Conference on Autonomous Agents and Multiagent Systems, Vol. 10.
[206]	Tucker, A., Gleave, A., & Russell, S. (2018). Inverse reinforcement learning for video games. arXiv preprint arXiv:1810.10593.
[207]	Vail, D. L., & Veloso, M. M. (2008). Feature selection for activity recognition in multirobot domains.. InProceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, Vol. 8, pp. 1415-1420.
[208]	Vail, D. L., Veloso, M. M., & Lafferty, J. D. (2007). Conditional random fields for activity recognition. InProceedings of the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems, p. 235.
[209]	Vered, M., & Kaminka, G. A. (2017). Heuristic online goal recognition in continuous domains.arXiv preprint arXiv:1709.09839.
[210]	Visser, U., & Weland, H.-G. (2002). Using online learning to analyze the opponent’s behavior. InRobot Soccer World Cup, pp. 78-93. Springer.
[211]	Von Neumann, J., & Morgenstern, O. (1944).Theory of Games and Economic Behavior. Princeton University Press. · Zbl 0063.05930
[212]	Wang, J., Zhu, T., Li, H., Hsueh, C.-H., & Wu, I.-C. (2017). Belief-state Monte Carlo tree search for Phantom Go.IEEE Transactions on Games,10(2), 139-154.
[213]	Wang, M., Wang, Z., Talbot, J., Gerdes, J. C., & Schwager, M. (2021). Game-theoretic planning for self-driving cars in multivehicle competitive scenarios.IEEE Transactions on Robotics.
[214]	Wang, Z., Boularias, A., M¨ulling, K., & Peters, J. (2011). Balancing safety and exploitability in opponent modeling. InProceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence.
[215]	Weber, B. G., & Mateas, M. (2009). A data mining approach to strategy prediction. In2009 IEEE Symposium on Computational Intelligence and Games, pp. 140-147. IEEE.
[216]	Wei, X., Lucey, P., Morgan, S., & Sridharan, S. (2013). Sweet-spot: Using spatiotemporal data to discover and predict shots in tennis. InMIT Sloan Sports Analytics Conference.
[217]	Weiss, K., Khoshgoftaar, T. M., & Wang, D. (2016). A survey of transfer learning.Journal of Big data,3(1), 9.
[218]	Wendler, J., & Bach, J. (2003). Recognizing and predicting agent behavior with case based reasoning. InRobot Soccer World Cup, pp. 729-738. Springer.
[219]	Willmott, S., Richardson, J., Bundy, A., & Levine, J. (2001). Applying adversarial planning techniques to Go.Theoretical Computer Science,252(1-2), 45-82. · Zbl 0962.91011
[220]	Wu, Z., Li, K., Zhao, E., Xu, H., Zhang, M., Fu, H., An, B., & Xing, J. (2021). L2E: Learning to exploit your opponent.arXiv preprint arXiv:2102.09381.
[221]	Wunder, M., Kaisers, M., Yaros, J. R., & Littman, M. (2011). Using iterated reasoning to predict opponent strategies. InThe 10th International Conference on Autonomous Agents and Multiagent Systems, Vol. 2, pp. 593-600.
[222]	Xu, Z., & Julius, A. A. (2018). Census signal temporal logic inference for multiagent group behavior analysis.IEEE Transactions on Automation Science and Engineering,15(1), 264-277.
[223]	Yang, R., Ford, B., Tambe, M., & Lemieux, A. (2014). Adaptive resource allocation for wildlife protection against illegal poachers. InProceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems, pp. 453-460.
[224]	Yang, T., Hao, J., Meng, Z., Zhang, C., Zheng, Y., & Zheng, Z. (2019). Towards efficient detection and optimal response against sophisticated opponents. InProceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 623-629.
[225]	Yasui, K., Kobayashi, K., Murakami, K., & Naruse, T. (2013). Analyzing and learning an opponent’s strategies in the RoboCup Small Size League. InRobot Soccer World Cup, pp. 159-170. Springer.
[226]	Yin, Q., Yue, S., Zha, Y., & Jiao, P. (2016). A semi-Markov decision model for recognizing the destination of a maneuvering agent in real time strategy games.Mathematical Problems in Engineering,2016. · Zbl 1400.90296
[227]	Yolanda, E., R-Moreno, M. D., Smith, D. E., et al. (2015). A fast goal recognition technique based on interaction estimates. InProceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence.
[228]	Yu, X., Jiang, J., Jiang, H., & Lu, Z. (2021). Model-based opponent modeling.arXiv preprint arXiv:2108.01843.
[229]	Zeng, Y., & Doshi, P. (2012). Exploiting model equivalences for solving interactive dynamic influence diagrams.Journal of Artificial Intelligence Research,43, 211-255. · Zbl 1237.68199
[230]	Zhang, Y., R˘adulescu, R., Mannion, P., Roijers, D. M., & Now´e, A. (2020). Opponent modelling using policy reconstruction for multi-objective normal form games. InProceedings of the Adaptive and Learning Agents Workshop (ALA-20) at AAMAS.(under review).
[231]	Zheng, Y., Meng, Z., Hao, J., Zhang, Z., Yang, T., & Fan, C. (2018). A deep bayesian policy reuse approach against non-stationary agents. InAdvances in Neural Information Processing Systems, pp. 954-964.
[232]	Zhifei, S., & Joo, E. M. (2012). A survey of inverse reinforcement learning techniques. International Journal of Intelligent Computing and Cybernetics.
[233]	Zhu, X. J. (2005). Semi-supervised learning literature survey. Tech. rep., University of Wisconsin-Madison Department of Computer Sciences.
[234]	Zhuo, H. H., Yang, Q., & Kambhampati, S. (2012). Action-model based multi-agent plan recognition. InAdvances in Neural Information Processing Systems, pp. 368-376.
[235]	Zhuo, H. H., & Li, L. (2011). Multi-agent plan recognition with partial team traces and plan libraries. InProceedings of the Twenty-Second International Joint Conference on Artificial Intelligence

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.