×

Model checking for safe navigation among humans. (English) Zbl 1514.68136

McIver, Annabelle (ed.) et al., Quantitative evaluation of systems. 15th international conference, QEST 2018, Beijing, China, September 4–7, 2018. Proceedings. Cham: Springer. Lect. Notes Comput. Sci. 11024, 207-222 (2018).
Summary: We investigate the use of probabilistic model checking to synthesise optimal strategies for autonomous systems that operate among uncontrollable agents such as humans. To formally assess such uncontrollable behaviour, we use models obtained from reinforcement learning. These behaviour models are, e.g., based on data collected in experiments in which humans execute dynamic tasks in a virtual environment. We first describe a method to translate such behaviour models into Markov decision processes (MDPs). The composition of these MDPs with models for (controllable) autonomous systems gives rise to stochastic games (SGs). MDPs and SGs are amenable to probabilistic model checking which enables the synthesis of strategies that provably adhere to formal specifications such as probabilistic temporal logic constraints. Experiments with a prototype provide (1) systematic insights on the credibility and the characteristics of behavioural models and (2) methods for automated synthesis of strategies satisfying guarantees on their required characteristics in the presence of humans.
For the entire collection see [Zbl 1398.68036].

MSC:

68Q60 Specification and verification (program logics, model checking, etc.)
68T05 Learning and adaptive systems in artificial intelligence
68T42 Agent technology and artificial intelligence
90C40 Markov and semi-Markov decision processes
93A16 Multi-agent systems
93C85 Automated systems (robots, etc.) in control theory
Full Text: DOI

References:

[1] Brafman, R.I., Tennenholtz, M.: On partially controlled multi-agent systems. J. Artif. Intell. Res. 4, 477-507 (1996) · Zbl 0900.68160 · doi:10.1613/jair.318
[2] Dresner, K., Stone, P.: A multiagent approach to autonomous intersection management. J. Artif. Intell. Res. 31, 591-656 (2008) · doi:10.1613/jair.2502
[3] Wellman, M.P., Wurman, P.R., O’Malley, K., Bangera, R., Reeves, D., Walsh, W.E.: Designing the market game for a trading agent competition. IEEE Internet Comput. 5(2), 43-51 (2001) · doi:10.1109/4236.914647
[4] Khandelwal, P., et al.: Bwibots: a platform for bridging the gap between AI and human-robot interaction research. Int. J. Robot. Res. 36, 635-659 (2017) · doi:10.1177/0278364916688949
[5] Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994) · Zbl 0829.90134 · doi:10.1002/9780470316887
[6] Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
[7] Kwiatkowska, M.Z.: Model checking for probability and time: from theory to practice. In: LICS, p. 351. IEEE Computer Society (2003)
[8] Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585-591. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_47 · doi:10.1007/978-3-642-22110-1_47
[9] Dehnert, C., Junges, S., Katoen, J.-P., Volk, M.: A STORM is coming: a modern probabilistic model checker. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10427, pp. 592-600. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63390-9_31
[10] Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Formal Aspects Comput. 6(5), 512-535 (1994) · Zbl 0820.68113 · doi:10.1007/BF01211866
[11] Condon, A.: The complexity of stochastic games. Inf. Comput. 96(2), 203-224 (1992) · Zbl 0756.90103 · doi:10.1016/0890-5401(92)90048-K
[12] Kwiatkowska, M., Parker, D., Wiltsche, C.: PRISM-Games 2.0: a tool for multi-objective strategy synthesis for stochastic games. In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 560-566. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49674-9_35 · doi:10.1007/978-3-662-49674-9_35
[13] Dean, T.L., Givan, R.: Model minimization in Markov decision processes. In: AAAI/IAAI, pp. 106-111. AAAI Press/The MIT Press (1997)
[14] Tong, M.H., Zohar, O., Hayhoe, M.M.: Control of gaze while walking: task structure, reward, and uncertainty. J. Vis. 17(1), 28 (2017) · doi:10.1167/17.1.28
[15] Rothkopf, C.A., Ballard, D.H.: Modular inverse reinforcement learning for visuomotor behaviour. Biol. Cybern. 107(4), 477-490 (2013) · Zbl 1294.68137 · doi:10.1007/s00422-013-0562-6
[16] Sprague, N., Ballard, D.: Multiple-goal reinforcement learning with modular sarsa (0). IJCA I, 1445-1447 (2003)
[17] Ballard, D.H., Kit, D., Rothkopf, C.A., Sullivan, B.: A hierarchical modular architecture for embodied cognition. Multisens. Res. 26(1-2), 177-204 (2013) · doi:10.1163/22134808-00002414
[18] Leong, Y.C., Radulescu, A., Daniel, R., DeWoskin, V., Niv, Y.: Dynamic interaction between reinforcement learning and attention in multidimensional environments. Neuron 93(2), 451-463 (2017) · doi:10.1016/j.neuron.2016.12.040
[19] Konur, S., Dixon, C., Fisher, M.: Analysing robot swarm behaviour via probabilistic model checking. Robot. Auton. Syst. 60(2), 199-213 (2012) · doi:10.1016/j.robot.2011.10.005
[20] Johnson, B., Kress-Gazit, H.: Analyzing and revising synthesized controllers for robots with sensing and actuation errors. Int. J. Robot. Res. 34(6), 816-832 (2015) · doi:10.1177/0278364914562980
[21] Giaquinta, R., Hoffmann, R., Ireland, M., Miller, A., Norman, G.: Strategy synthesis for autonomous agents using PRISM. In: Dutle, A., Muñoz, C., Narkawicz, A. (eds.) NFM 2018. LNCS, vol. 10811, pp. 220-236. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77935-5_16 · doi:10.1007/978-3-319-77935-5_16
[22] Chen, T., Kwiatkowska, M., Simaitis, A., Wiltsche, C.: Synthesis for multi-objective stochastic games: an application to autonomous urban driving. In: Joshi, K., Siegle, M., Stoelinga, M., D’Argenio, P.R. (eds.) QEST 2013. LNCS, vol. 8054, pp. 322-337. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40196-1_28 · doi:10.1007/978-3-642-40196-1_28
[23] Feng, L., Wiltsche, C., Humphrey, L., Topcu, U.: Synthesis of human-in-the-loop control protocols for autonomous systems. IEEE Trans. Autom. Sci. Eng. 13(2), 450-462 (2016) · doi:10.1109/TASE.2016.2530623
[24] Lacerda, B., Parker, D., Hawes, N.: Optimal policy generation for partially satisfiable co-safe LTL specifications. In: IJCAI, pp. 1587-1593. AAAI Press (2015)
[25] Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. ICML 157, 157-163 (1994)
[26] Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artif. Intell. 136(2), 215-250 (2002) · Zbl 0995.68075 · doi:10.1016/S0004-3702(02)00121-2
[27] Bruni, R., Corradini, A., Gadducci, F., Lluch Lafuente, A., Vandin, A.: Modelling and analyzing adaptive self-assembly strategies with maude. In: Durán, F. (ed.) WRLA 2012. LNCS, vol. 7571, pp. 118-138. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34005-5_7 · doi:10.1007/978-3-642-34005-5_7
[28] Katoen, J.P.: The probabilistic model checking landscape. In: LICS, pp. 31-45. ACM (2016) · Zbl 1401.68201
[29] Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437-1480 (2015) · Zbl 1351.68209
[30] Sculley, D., Phillips, T., Ebner, D., Chaudhary, V., Young, M.: Machine learning: the high-interest credit card of technical debt (2014)
[31] Winterer, L., et al.: Motion planning under partial observability using game-based abstraction. In: CDC, pp. 2201-2208. IEEE (2017)
[32] Etessami, K., Kwiatkowska, M.Z., Vardi, M.Y., Yannakakis, M.: Multi-objective model checking of Markov decision processes. Log. Methods Comput. Sci. 4(4), 1-21 (2008) · Zbl 1161.68565
[33] Agha, G., Palmskog, K.: A survey of statistical model checking. ACM Trans. Model. Comput. Simul. 28(1), 6:1-6:39 (2018) · Zbl 1542.68091 · doi:10.1145/3158668
[34] Wachter, B., Zhang, L., Hermanns, H.: Probabilistic model checking modulo theories. In: QEST, pp. 129-140. IEEE CS (2007)
[35] Brázdil, T., et al.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 98-114. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11936-6_8 · Zbl 1448.68290 · doi:10.1007/978-3-319-11936-6_8
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.