×

Fast matching of regular patterns with synchronizing counting. (English) Zbl 07770347

Kupferman, Orna (ed.) et al., Foundations of software science and computation structures. 26th international conference, FOSSACS 2023, held as part of the European joint conferences on theory and practice of software, ETAPS 2023, Paris, France, April 22–27, 2023. Proceedings. Cham: Springer. Lect. Notes Comput. Sci. 13992, 392-412 (2023).
Summary: Fast matching of regular expressions with bounded repetition, aka counting, such as \(\mathtt{(ab)\{50,100\}} \), i.e., matching linear in the length of the text and independent of the repetition bounds, has been an open problem for at least two decades. We show that, for a wide class of regular expressions with counting, which we call synchronizing, fast matching is possible. We empirically show that the class covers nearly all counting used in usual applications of regex matching. This complexity result is based on an improvement and analysis of a recent matching algorithm that compiles regexes to deterministic counting-set automata (automata with registers that hold sets of numbers).
For the entire collection see [Zbl 1524.68006].

MSC:

68Nxx Theory of software
68Qxx Theory of computing

Software:

GitHub; Snort

References:

[1] Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools (2nd Edition). Addison Wesley (August 2006), http://www.amazon.ca/exec/obidos/redirect?tag=citeulike09-20 &path=ASIN/0321486811 · Zbl 1429.68002
[2] Antimirov, V.: Partial derivatives of regular expressions and finite automaton constructions. Theoretical Computer Science 155(2), 291 - 319 (1996). doi:10.1016/0304-3975(95)00182-4, doi:10.1016/0304-3975(95)00182-4 · Zbl 0872.68120
[3] Baldwin, A.: Regular expression denial of service affecting express.js. https://medium.com/node-security/regular-expression-denial-of-service-affecting- express-js-9c397c164c43 (2016)
[4] Björklund, H., Martens, W., Timm, T.: Efficient incremental evaluation of succinct regular expressions. In: CIKM’15. ACM (2015). doi:10.1145/2806416.2806434
[5] Chapman, C., Stolee, K.T.: Exploring regular expression usage and context in python. In: Zeller, A., Roychoudhury, A. (eds.) Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016, Saarbrücken, Germany, July 18-20, 2016. pp. 282-293. ACM (2016). doi:10.1145/2931037.2931073, doi:10.1145/2931037.2931073
[6] contributors, W.: Regular expression—wikipedia (2019), https://en.wikipedia.org/w/index.php?title=Regular_expression &
[7] Davis, J.C.: Rethinking regex engines to address ReDoS. In: ESEC/FSE’19. pp. 1256-1258. ACM (2019)
[8] Davis, J.C., Coghlan, C.A., Servant, F., Lee, D.: The impact of regular expression denial of service (redos) in practice: an empirical study at the ecosystem scale. In: Leavens, G.T., Garcia, A., Pasareanu, C.S. (eds.) Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 04-09, 2018. pp. 246-256. ACM (2018). doi:10.1145/3236024.3236027, doi:10.1145/3236024.3236027
[9] Davis, J.C., Coghlan, C.A., Servant, F., Lee, D.: The impact of regular expression denial of service (ReDoS) in practice: An empirical study at the ecosystem scale. In: ESEC/FSE’18. pp. 246-256. ACM (2018)
[10] Davis, J.C., Michael IV, L.G., Coghlan, C.A., Servant, F., Lee, D.: Why aren’t regular expressions a lingua franca? An empirical study on the re-use and portability of regular expressions. In: ESEC/FSE’19. pp. 1256-1258. ACM (2019)
[11] Davis, J.C., Servant, F., Lee, D.: Using selective memoization to defeat regular expression denial of service (ReDoS). In: 42nd IEEE Symposium on Security and Privacy, SP 2021, San Francisco, CA, USA, 24-27 May 2021. pp. 1-17. IEEE (2021). doi:10.1109/SP40001.2021.00032, doi:10.1109/SP40001.2021.00032
[12] docs.rs: regex - rust. https://docs.rs/regex/1.5.4/regex/ (2021)
[13] Exchange, S.: Outage postmortem. http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016 (2016)
[14] Gelade, W., Gyssens, M., Martens, W.: Regular expressions with counting: Weak versus strong determinism. In: Mathematical Foundations of Computer Science 2009. pp. 369-381. Springer Berlin Heidelberg, Berlin, Heidelberg (2009). doi:10.1007/978-3-642-03816-7_32 · Zbl 1250.68158
[15] Gelade, W., Gyssens, M., Martens, W.: Regular expressions with counting: Weak versus strong determinism. SIAM J. Comput. 41(1), 160-190 (2012). doi:10.1137/100814196,extended version of paper in MFCS’09 · Zbl 1252.68146
[16] Glushkov, V.M.: The abstract theory of automata. Russian Math. Surveys 16, 1-53 (1961). doi:10.1070/RM1961v016n05ABEH004112 · Zbl 0104.35404
[17] Google: RE2. https://github.com/google/re2
[18] Graham-Cumming, J.: Details of the Cloudflare outage on july 2, 2019. https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019/ (2019)
[19] Haertel, M., et al.: GNU grep. https://www.gnu.org/software/grep/
[20] Holík, L., Lengál, O., Saarikivi, O., Turoňová, L., Veanes, M., Vojnar, T.: Succinct determinisation of counting automata via sphere construction. In: Proc. of APLAS’19. LNCS, vol. 11893, pp. 468-489. Springer (2019). doi:10.1007/978-3-030-34175-6_24 · Zbl 1542.68085
[21] Holík, L., Síč, J., Turoňová, L., Vojnar, T.: Fast matching of regular patterns with synchronizing counting (technical report). Tech. rep., Brno University of Technology (2023), doi:10.48550/arXiv.2301.12851
[22] Hovland, D.: Regular expressions with numerical constraints and automata with counters. In: ICTAC. LNCS, vol. 5684, pp. 231-245. Springer (2009). doi:10.1007/978-3-642-03466-4_15 · Zbl 1250.68161
[23] Hovland, D.: The membership problem for regular expressions with unordered concatenation and numerical constraints. In: Language and Automata Theory and Applications. pp. 313-324. Springer Berlin Heidelberg, Berlin, Heidelberg (2012). doi:10.1007/978-3-642-28332-1_27 · Zbl 1351.68140
[24] Hromkovič, J., Seibert, S., Wilke, T.: Translating regular expressions into small \(\epsilon \)-free nondeterministic finite automata. In: Reischuk, R., Morvan, M. (eds.) STACS 97. pp. 55-66. Springer Berlin Heidelberg, Berlin, Heidelberg (1997) · Zbl 1498.68138
[25] Kilpeläinen, P., Tuhkanen, R.: Regular expressions with numerical occurrence indicators - preliminary results. In: SPLST’03. pp. 163-173. University of Kuopio, Department of Computer Science (2003)
[26] Kilpeläinen, P., Tuhkanen, R.: One-unambiguity of regular expressions with numeric occurrence indicators. Information and Computation 205(6), 890-916 (2007). doi:10.1016/j.ic.2006.12.003 · Zbl 1118.68079
[27] M. Roesch et al.: Snort: A Network Intrusion Detection and Prevention System,. http://www.snort.org
[28] RegExLib.com: The Internet’s first Regular Expression Library. http://regexlib.com/
[29] Robin Sommer et al.: The Bro Network Security Monitor, http://www.bro.org
[30] Saarikivi, O., Veanes, M., Wan, T., Xu, E.: Symbolic regex matcher. In: Vojnar, T., Zhang, L. (eds.) TACAS’2019. LNCS, vol. 11427, pp. 372-378. Springer (2019). doi:10.1007/978-3-030-17462-0_24, doi:10.1007/978-3-030-17462-0_24
[31] Smith, R., Estan, C., Jha, S.: XFA: faster signature matching with extended automata. In: IEEE Symposium on Security and Privacy. IEEE (2008). doi:10.1109/SP.2008.14
[32] Smith, R., Estan, C., Jha, S., Siahaan, I.: Fast signature matching using extended finite automaton (XFA). In: ICISS’08. LNCS, vol. 5352, pp. 158-172. Springer (2008). doi:10.1007/978-3-540-89862-7_15
[33] Sperberg-McQueen, M.: Notes on finite state automata with counters. https://www.w3.org/XML/2004/05/msm-cfa.html, https://www.w3.org/XML/2004/05/msm-cfa.html, accessed: 2018-08-08
[34] The Sagan team: The Sagan Log Analysis Engine, https://quadrantsec.com/sagan_log_analysis_engine/
[35] Thompson, K.: Programming techniques: Regular expression search algorithm. Commun. ACM 11(6), 419-422 (1968) · Zbl 0164.46205
[36] Turoňová, L., Holík, L., Lengál, O., Saarikivi, O., Veanes, M., Vojnar, T.: Regex matching with counting-set automata. Proc. ACM Program. Lang. 4(OOPSLA), 218:1-218:30 (2020)
[37] Turoňová, L., Holík, L., Lengál, O., Veanes, M., Vojnar, T.: Counting in regexes considered harmful (2022)
[38] Češka, M., Havlena, V., Holík, L., Lengál, O., Vojnar, T.: Approximate reduction of finite automata for high-speed network intrusion detection. In: Proc. of TACAS’18. LNCS, vol. 10806. Springer (2018). doi:10.1007/978-3-319-89963-3_9
[39] Wang, P., Stolee, K.T.: How well are regular expressions tested in the wild? In: Leavens, G.T., Garcia, A., Pasareanu, C.S. (eds.) Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 04-09, 2018. pp. 668-678. ACM (2018). doi:10.1145/3236024.3236072, doi:10.1145/3236024.3236072
[40] Wang, X., Hong, Y., Chang, H., Park, K., Langdale, G., Hu, J., Zhu, H.: Hyperscan: A fast multi-pattern regex matcher for modern CPUs. In: 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). pp. 631-648. USENIX Association, Boston, MA (Feb 2019), https://www.usenix.org/conference/nsdi19/presentation/wang-xiang
[41] Wübbeling, M.: Regular expression security. ADMIN 55 (2020)
[42] Yang, L., Karim, R., Ganapathy, V., Smith, R.: Improving NFA-based signature matching using ordered binary decision diagrams. In: Recent Advances in Intrusion Detection. pp. 58-78. Springer Berlin Heidelberg (2010)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.