×

Good-for-MDPs automata for probabilistic analysis and reinforcement learning. (English) Zbl 1507.68167

Biere, Armin (ed.) et al., Tools and algorithms for the construction and analysis of systems. 26th international conference, TACAS 2020, held as part of the European joint conferences on theory and practice of software, ETAPS 2020, Dublin, Ireland, April 25–30, 2020. Proceedings. Part I. Cham: Springer. Lect. Notes Comput. Sci. 12078, 306-323 (2020).
Summary: We characterize the class of nondeterministic \(\omega \)-automata that can be used for the analysis of finite Markov decision processes (MDPs). We call these automata ‘good-for-MDPs’ (GFM). We show that GFM automata are closed under classic simulation as well as under more powerful simulation relations that leverage properties of optimal control strategies for MDPs. This closure enables us to exploit state-space reduction techniques, such as those based on direct and delayed simulation, that guarantee simulation equivalence. We demonstrate the promise of GFM automata by defining a new class of automata with favorable properties – they are Büchi automata with low branching degree obtained through a simple construction – and show that going beyond limit-deterministic automata may significantly benefit reinforcement learning.
For the entire collection see [Zbl 1496.68011].

MSC:

68Q45 Formal languages and automata
90C40 Markov and semi-Markov decision processes

References:

[1] T. Babiak, M. Křetínský, V. Rehák, and J. Strejcek. LTL to Büchi automata translation: Fast and more deterministic. In Tools and Algorithms for the Construction and Analysis of Systems, pages 95-109, 2012. · Zbl 1352.68142
[2] Ch. Baier and J.-P. Katoen. Principles of Model Checking. MIT Press, 2008. · Zbl 1179.68076
[3] C. Courcoubetis and M. Yannakakis. Verifying temporal properties of finite-state probabilistic programs. In Foundations of Computer Science, pages 338-345. IEEE, 1988.
[4] C. Courcoubetis and M. Yannakakis. The complexity of probabilistic verification. J. ACM, 42(4):857-907, July 1995. · Zbl 0885.68109
[5] L. de Alfaro. Formal Verification of Probabilistic Systems. PhD thesis, Stanford University, 1998.
[6] P. Dhariwal, Ch. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, Y. Wu, and P. Zhokhov. Openai baselines. https://github.com/openai/baselines, 2017.
[7] D. L. Dill, A. J. Hu, and H. Wong-Toi. Checking for language inclusion using simulation relations. In Computer Aided Verification, pages 255-265, July 1991. LNCS 575.
[8] A. Duret-Lutz, A. Lewkowicz, A. Fauchille, T. Michaud, E. Renault, and L. Xu. Spot 2.0 - A framework for LTL and \(\omega \)-automata manipulation. In Automated Technology for Verification and Analysis, pages 122-129, 2016.
[9] K. Etessami, T. Wilke, and R. A. Schuller. Fair simulation relations, parity games, and state space reduction for Büchi automata. SIAM J. Comput., 34(5):1159-1175, 2005. · Zbl 1079.68050
[10] S. Gurumurthy, R. Bloem, and F. Somenzi. Fair simulation minimization. In Computer Aided Verification (CAV’02), pages 610-623, July 2002. LNCS 2404. · Zbl 1010.68086
[11] E. M. Hahn, G. Li, S. Schewe, A. Turrini, and L. Zhang. Lazy probabilistic model checking without determinisation. In Concurrency Theory, pages 354-367, 2015. · Zbl 1374.68290
[12] E. M. Hahn, M. Perez, S. Schewe, F. Somenzi, A. Trivedi, and D. Wojtczak. Omega-regular objectives in model-free reinforcement learning. In Tools and Algorithms for the Construction and Analysis of Systems, pages 395-412, 2019. LNCS 11427. · Zbl 1527.68177
[13] E. M. Hahn, M. Perez, F. Somenzi, A. Trivedi, S. Schewe, and D. Wojtczak. Good-for-MDPs automata. arXiv e-prints, abs/1909.05081, September 2019.
[14] T. Henzinger, O. Kupferman, and S. Rajamani. Fair simulation. In Concurrency Theory, pages 273-287, 1997. LNCS 1243. · Zbl 1009.68071
[15] T. A. Henzinger and N. Piterman. Solving games without determinization. In Computer Science Logic, pages 394-409, September 2006. LNCS 4207. · Zbl 1225.68118
[16] D. Kini and M. Viswanathan. Optimal translation of LTL to limit deterministic automata. In Tools and Algorithms for the Construction and Analysis of Systems, pages 113-129, 2017. · Zbl 1452.68121
[17] J. Klein, D. Müller, Ch. Baier, and S. Klüppelholz. Are good-for-games automata good for probabilistic model checking? In Language and Automata Theory and Applications, pages 453-465. Springer, 2014. · Zbl 1408.68093
[18] J. Klein, D. Müller, Ch. Baier, and S. Klüppelholz. Are good-for-games automata good for probabilistic model checking? In Language and Automata Theory and Applications, pages 453-465, 2014. · Zbl 1408.68093
[19] J. Křetínský, T. Meggendorfer, S. Sickert, and Ch. Ziegler. Rabinizer 4: from LTL to your favourite deterministic automaton. In Computer Aided Verification, pages 567-577. Springer, 2018.
[20] J. Křetínský, T. Meggendorfer, and S. Sickert. Owl: A library for \(\omega \)-words, automata, and LTL. In Automated Technology for Verification and Analysis, pages 543-550, 2018. · Zbl 1517.68203
[21] R. Milner. An algebraic definition of simulation between programs. Int. Joint Conf. on Artificial Intelligence, pages 481-489, 1971.
[22] N. Piterman. From deterministic Büchi and Streett automata to deterministic parity automata. Logical Methods in Computer Science, 3(3):1-21, 2007. · Zbl 1125.68067
[23] M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, New York, NY, USA, 1994. · Zbl 0829.90134
[24] S. Safra. Complexity of Automata on Infinite Objects. PhD thesis, The Weizmann Institute of Science, March 1989.
[25] S. Schewe. Beyond hyper-minimisation—minimising DBAs and DPAs isNP-complete. In Foundations of Software Technology and Theoretical Computer Science, FSTTCS, pages 400-411, 2010. · Zbl 1245.68096
[26] S. Schewe and T. Varghese. Tight bounds for the determinisation and complementation of generalised Büchi automata. In Automated Technology for Verification and Analysis, pages 42-56, 2012. · Zbl 1374.68265
[27] S. Schewe and T. Varghese. Determinising parity automata. In Mathematical Foundations of Computer Science, pages 486-498, 2014. · Zbl 1425.68229
[28] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017.
[29] S. Sickert, J. Esparza, S. Jaax, and J. Křetínský. Limit-deterministic Büchi automata for linear temporal logic. In Computer Aided Verification, pages 312-332, 2016. LNCS 9780. · Zbl 1411.68054
[30] S. Sickert and J. Křetínský. MoChiBA: Probabilistic LTL model checking using limit-deterministic Büchi automata. In Automated Technology for Verification and Analysis, pages 130-137, 2016.
[31] F. Somenzi and R. Bloem. Efficient Büchi automata from LTL formulae. In Computer Aided Verification, pages 248-263, July 2000. LNCS 1855. · Zbl 0974.68086
[32] M.-H. Tsai, S. Fogarty, M. Y. Vardi, and Y.-K. Tsay. State of Büchi complementation. Logical Mehods in Computer Science, 10(4), 2014. · Zbl 1448.68278
[33] M.-H. Tsai, Y.-K. Tsay, and Y.-S. Hwang. GOAL for games, omega-automata, and logics. In Computer Aided Verification, pages 883-889, 2013.
[34] M. Y. Vardi. Automatic verification of probabilistic concurrent finite state programs. In Foundations of Computer Science, pages 327-338, 1985.
[35] E. M. Hahn, M. Perez, S. Schewe, F. Somenzi, A. Trivedi, and D. Wojtczak. Good-for-MDPs Automata for Probabilistic Analysis and Reinforcement Learning Figshare (2020), https://doi.org/10.6084/m9.figshare.11882739.
[36] A. Hartmanns and M. Seidl. tacas20ae.ova. Figshare (2019) https://doi.org/10.6084/m9.figshare.9699839.v2
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.