skip to main content
research-article
Open access

Decision procedures for path feasibility of string-manipulating programs with complex operations

Published: 02 January 2019 Publication History

Abstract

The design and implementation of decision procedures for checking path feasibility in string-manipulating programs is an important problem, with such applications as symbolic execution of programs with strings and automated detection of cross-site scripting (XSS) vulnerabilities in web applications. A (symbolic) path is given as a finite sequence of assignments and assertions (i.e. without loops), and checking its feasibility amounts to determining the existence of inputs that yield a successful execution. Modern programming languages (e.g. JavaScript, PHP, and Python) support many complex string operations, and strings are also often implicitly modified during a computation in some intricate fashion (e.g. by some autoescaping mechanisms).
In this paper we provide two general semantic conditions which together ensure the decidability of path feasibility: (1) each assertion admits regular monadic decomposition (i.e. is an effectively recognisable relation), and (2) each assignment uses a (possibly nondeterministic) function whose inverse relation preserves regularity. We show that the semantic conditions are expressive since they are satisfied by a multitude of string operations including concatenation, one-way and two-way finite-state transducers, replaceall functions (where the replacement string could contain variables), string-reverse functions, regular-expression matching, and some (restricted) forms of letter-counting/length functions. The semantic conditions also strictly subsume existing decidable string theories (e.g. straight-line fragments, and acyclic logics), and most existing benchmarks (e.g. most of Kaluza’s, and all of SLOG’s, Stranger’s, and SLOTH’s benchmarks). Our semantic conditions also yield a conceptually simple decision procedure, as well as an extensible architecture of a string solver in that a user may easily incorporate his/her own string functions into the solver by simply providing code for the pre-image computation without worrying about other parts of the solver. Despite these, the semantic conditions are unfortunately too general to provide a fast and complete decision procedure. We provide strong theoretical evidence for this in the form of complexity results. To rectify this problem, we propose two solutions. Our main solution is to allow only partial string functions (i.e., prohibit nondeterminism) in condition (2). This restriction is satisfied in many cases in practice, and yields decision procedures that are effective in both theory and practice. Whenever nondeterministic functions are still needed (e.g. the string function split), our second solution is to provide a syntactic fragment that provides a support of nondeterministic functions, and operations like one-way transducers, replaceall (with constant replacement string), the string-reverse function, concatenation, and regular-expression matching. We show that this fragment can be reduced to an existing solver SLOTH that exploits fast model checking algorithms like IC3.
We provide an efficient implementation of our decision procedure (assuming our first solution above, i.e., deterministic partial string functions) in a new string solver OSTRICH. Our implementation provides built-in support for concatenation, reverse, functional transducers (FFT), and replaceall and provides a framework for extensibility to support further string functions. We demonstrate the efficacy of our new solver against other competitive solvers.

Supplementary Material

WEBM File (a49-wu.webm)

References

[1]
Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Bui Phi Diep, Lukás Holík, Ahmed Rezine, and Philipp Rümmer. 2017. Flatten and conquer: a framework for efficient analysis of string constraints. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017. ACM, New York, NY, USA, 602–617.
[2]
Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Bui Phi Diep, Lukás Holík, Ahmed Rezine, and Philipp Rümmer. 2018. TRAU: SMT Solver for String Constraints. In Formal Methods in Computer Aided Design, FMCAD 2018. To appear.
[3]
Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Lukás Holík, Ahmed Rezine, Philipp Rümmer, and Jari Stenman. 2014. String Constraints for Verification. In Computer Aided Verification - 26th International Conference, CAV 2014. Springer, 150–166.
[4]
Rajeev Alur and Jyotirmoy V. Deshmukh. 2011. Nondeterministic Streaming String Transducers. In Automata, Languages and Programming - 38th International Colloquium, ICALP 2011, Zurich, Switzerland, July 4-8, 2011, Proceedings, Part II. Springer, 1–20.
[5]
Christel Baier and Joost-Pieter Katoen. 2008. Principles of Model Checking (Representation and Mind Series). The MIT Press.
[6]
Pablo Barceló, Diego Figueira, and Leonid Libkin. 2013. Graph Logics with Rational Relations. Logical Methods in Computer Science 9, 3 (2013).
[7]
Clark W. Barrett, Roberto Sebastiani, Sanjit A. Seshia, and Cesare Tinelli. 2009. Satisfiability Modulo Theories. In Handbook of Satisfiability. IOS Press, 825–885.
[8]
Michael Benedikt, Leonid Libkin, Thomas Schwentick, and Luc Segoufin. 2003. Definable relations and first-order query languages over strings. J. ACM 50, 5 (2003), 694–751.
[9]
Murphy Berzish, Vijay Ganesh, and Yunhui Zheng. 2017. Z3str3: A string solver with theory-aware heuristics. In 2017 Formal Methods in Computer Aided Design, FMCAD 2017, Vienna, Austria, October 2-6, 2017. IEEE, 55–59.
[10]
Nikolaj Bjørner, Nikolai Tillmann, and Andrei Voronkov. 2009. Path feasibility analysis for string-manipulating programs. In Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2009. Springer, 307–321.
[11]
J Richard Büchi and Steven Senger. 1990. Definability in the existential theory of concatenation and undecidable extensions of this theory. In The Collected Works of J. Richard Büchi. Springer, 671–683.
[12]
Thierry Cachat and Igor Walukiewicz. 2007. The Complexity of Games on Higher Order Pushdown Automata. CoRR abs/0705.0262 (2007). arXiv: 0705.0262 http://arxiv.org/abs/0705.0262
[13]
Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and Automatic Generation of High-coverage Tests for Complex Systems Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI’08). USENIX Association, Berkeley, CA, USA, 209–224. http://dl.acm.org/citation.cfm?id=1855741. 1855756
[14]
Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Dawson R. Engler. 2006. EXE: Automatically Generating Inputs of Death. In Proceedings of the 13th ACM Conference on Computer and Communications Security (CCS ’06). ACM, New York, NY, USA, 322–335.
[15]
Cristian Cadar and Koushik Sen. 2013. Symbolic Execution for Software Testing: Three Decades Later. Commun. ACM 56, 2 (Feb. 2013), 82–90.
[16]
Olivier Carton, Christian Choffrut, and Serge Grigorieff. 2006. Decision problems among the main subfamilies of rational relations. ITA 40, 2 (2006), 255–275.
[17]
Taolue Chen, Yan Chen, Matthew Hague, Anthony W. Lin, and Zhilin Wu. 2018a. What is decidable about string constraints with the ReplaceAll function. PACMPL 2, POPL (2018), 3:1–3:29.
[18]
Taolue Chen, Matthew Hague, Anthony W. Lin, Philipp Rümmer, and Zhilin Wu. 2018b. Decision Procedures for Path Feasibility of String-Manipulating Programs with Complex Operations. CoRR abs/1811.03167 (2018). arXiv: 1811.03167 https://arxiv.org/abs/1811.03167
[19]
Yan Chen. 2018a. Solving String Constraints with ReplaceAll Function. Master’s thesis. State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, China.
[20]
Yan Chen. 2018b. Z3-replaceall. https://github.com/TinyYan/z3- replaceAll . Referred in Jan 2018.
[21]
Christian Choffrut. 2006. Relations over Words and Logic: A Chronology. Bulletin of the EATCS 89 (2006), 159–163.
[22]
Loris D’Antoni and Margus Veanes. 2013. Static Analysis of String Encoders and Decoders. In Verification, Model Checking, and Abstract Interpretation, VMCAI. Springer, 209–228.
[23]
Leonardo De Moura and Nikolaj Bjørner. 2011. Satisfiability modulo theories: introduction and applications. Commun. ACM 54, 9 (2011), 69–77.
[24]
J. Dénes. 1967. Connections between transformation semigroups and graphs. In Theory of Graphs. Gordon & Breach.
[25]
Volker Diekert. 2002. Makanin’s Algorithm. In Algebraic Combinatorics on Words, M. Lothaire (Ed.). Encyclopedia of Mathematics and its Applications, Vol. 90. Cambridge University Press, Chapter 12, 387–442.
[26]
Joost Engelfriet and Hendrik Jan Hoogeboom. 2001. MSO definable string transductions and two-way finite-state transducers. ACM Trans. Comput. Log. 2, 2 (2001), 216–254.
[27]
Emmanuel Filiot, Olivier Gauwin, Pierre-Alain Reynier, and Frédéric Servais. 2013. From Two-Way to One-Way Finite State Transducers. In 28th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2013, New Orleans, LA, USA, June 25-28, 2013. IEEE Computer Society, 468–477.
[28]
Vijay Ganesh, Mia Minnes, Armando Solar-Lezama, and Martin C. Rinard. 2012. Word Equations with Length Constraints: What’s Decidable?. In Hardware and Software: Verification and Testing - 8th International Haifa Verification Conference, HVC 2012, Haifa, Israel, November 6-8, 2012. Revised Selected Papers. Springer, 209–226.
[29]
Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Directed Automated Random Testing. SIGPLAN Not. 40, 6 (June 2005), 213–223.
[30]
Lukás Holík, Petr Janku, Anthony W. Lin, Philipp Rümmer, and Tomás Vojnar. 2018. String constraints with concatenation and transducers solved efficiently. PACMPL 2, POPL (2018), 4:1–4:32.
[31]
Pieter Hooimeijer, Benjamin Livshits, David Molnar, Prateek Saxena, and Margus Veanes. 2011. Fast and Precise Sanitizer Analysis with BEK. In 20th USENIX Security Symposium, San Francisco, CA, USA, August 8-12, 2011, Proceedings. USENIX Association. http://static.usenix.org/events/sec11/tech/full_papers/Hooimeijer.pdf
[32]
John E. Hopcroft and Jeffrey D. Ullman. 1979. Introduction to Automata Theory, Languages and Computation. Addison-Wesley.
[33]
Artur Jez. 2016. Recompression: A Simple and Powerful Technique for Word Equations. J. ACM 63, 1 (2016), 4:1–4:51.
[34]
Neil D. Jones. 2001. The expressive power of higher-order types or, life without CONS. Journal of Functional Programming 11, 1 (2001), 55–94. http://journals.cambridge.org/action/displayAbstract?aid=68581
[35]
Christoph Kern. 2014. Securing the tangled web. Commun. ACM 57, 9 (2014), 38–47.
[36]
Adam Kiezun, Vijay Ganesh, Shay Artzi, Philip J. Guo, Pieter Hooimeijer, and Michael D. Ernst. 2012. HAMPI: A solver for word equations over strings, regular expressions, and context-free grammars. ACM Trans. Softw. Eng. Methodol. 21, 4 (2012), 25.
[37]
James C. King. 1976. Symbolic Execution and Program Testing. Commun. ACM 19, 7 (1976), 385–394.
[38]
Dexter Kozen. 1997. Automata and Computability. Springer.
[39]
Daniel Kroening and Ofer Strichman. 2008. Decision Procedures. Springer.
[40]
Tianyi Liang, Andrew Reynolds, Cesare Tinelli, Clark Barrett, and Morgan Deters. 2014. A DPLL(T) Theory Solver for a Theory of Strings and Regular Expressions. In Computer Aided Verification - 26th International Conference, CAV 2014. Springer, 646–662.
[41]
Leonid Libkin. 2003. Variable independence for first-order definable constraints. ACM Trans. Comput. Log. 4, 4 (2003), 431–451.
[42]
Anthony W. Lin and Pablo Barceló. 2016. String Solving with Word Equations and Transducers: Towards a Logic for Analysing Mutation XSS. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16). Springer, 123–136.
[43]
K. L. McMillan. 1993. Symbolic model checking. Kluwer.
[44]
Yasuhiko Minamide. 2005. Static approximation of dynamically generated Web pages. In Proceedings of the 14th international conference on World Wide Web, WWW 2005. ACM, 432–441.
[45]
Christophe Morvan. 2000. On Rational Graphs. In Foundations of Software Science and Computation Structures, Third International Conference, FOSSACS 2000. Springer, 252–266.
[46]
Robert Nieuwenhuis, Albert Oliveras, and Cesare Tinelli. 2004. Abstract DPLL and Abstract DPLL Modulo Theories. In Logic for Programming, Artificial Intelligence, and Reasoning, 11th International Conference, LPAR 2004 (LNCS), Vol. 3452. Springer, 36–50.
[47]
Nicholas Pippenger. 2010. Theories of Computability. Cambridge University Press.
[48]
Wojciech Plandowski. 2004. Satisfiability of word equations with constants is in PSPACE. J. ACM 51, 3 (2004), 483–496.
[49]
Philipp Rümmer. 2008. A Constraint Sequent Calculus for First-Order Logic with Linear Integer Arithmetic. In Logic for Programming, Artificial Intelligence, and Reasoning, 15th International Conference, LPAR 2008, LNCS 5330. Springer, 274–289.
[50]
Prateek Saxena, Devdatta Akhawe, Steve Hanna, Feng Mao, Stephen McCamant, and Dawn Song. 2010. A Symbolic Execution Framework for JavaScript. In 31st IEEE Symposium on Security and Privacy, S&P 2010, 16-19 May 2010, Berleley/Oakland, California, USA. IEEE, 513–528.
[51]
Koushik Sen, Swaroop Kalasapur, Tasneem G. Brutch, and Simon Gibbs. 2013. Jalangi: a selective record-replay and dynamic analysis framework for JavaScript. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013. ACM, 488–498.
[52]
Koushik Sen, Darko Marinov, and Gul Agha. 2005. CUTE: a concolic unit testing engine for C. In Proceedings of the 10th European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2005, Lisbon, Portugal, September 5-9, 2005, ESEC/SIGSOFT FSE 2005. ACM, 263–272.
[53]
Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2014. S3: A Symbolic String Solver for Vulnerability Detection in Web Applications. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, CCS 2014. ACM, 1232–1243.
[54]
Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2016. Progressive Reasoning over Recursively-Defined Strings. In Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part I. Springer, 218–240.
[55]
Andrew van der Stock, Brian Glas, Neil Smithline, and Torsten Gigler. 2017. OWASP Top 10 – 2017. https://www.owasp. org/index.php/Top_10- 2017_Top_10 . Referred January 2018.
[56]
Margus Veanes, Nikolaj Bjørner, Lev Nachmanson, and Sergey Bereg. 2017. Monadic Decomposition. J. ACM 64, 2 (2017), 14:1–14:28.
[57]
Hung-En Wang, Tzung-Lin Tsai, Chun-Han Lin, Fang Yu, and Jie-Hong R. Jiang. 2016. String Analysis via Automata Manipulation with Logic Circuit Representation. In Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part I (Lecture Notes in Computer Science), Vol. 9779. Springer, 241–260.
[58]
Joel Weinberger, Prateek Saxena, Devdatta Akhawe, Matthew Finifter, Eui Chul Richard Shin, and Dawn Song. 2011. A Systematic Analysis of XSS Sanitization in Web Application Frameworks. In Computer Security - ESORICS 2011 - 16th European Symposium on Research in Computer Security, Leuven, Belgium, September 12-14, 2011. Proceedings. Springer, 150–171.
[59]
Fang Yu, Muath Alkhalaf, and Tevfik Bultan. 2010. Stranger: An Automata-Based String Analysis Tool for PHP. In Tools and Algorithms for the Construction and Analysis of Systems, 16th International Conference, TACAS 2010. Springer, 154–157.
[60]
Fang Yu, Muath Alkhalaf, Tevfik Bultan, and Oscar H. Ibarra. 2014. Automata-based Symbolic String Analysis for Vulnerability Detection. Form. Methods Syst. Des. 44, 1 (2014), 44–70.
[61]
Bohdan Zelinka. 1981. Graphs of semigroups. Časopis pro pěstování matematiky 106, 4 (1981), 407–408. http://eudml.org/ doc/19323
[62]
Yunhui Zheng, Vijay Ganesh, Sanu Subramanian, Omer Tripp, Julian Dolby, and Xiangyu Zhang. 2015. Effective Search-Space Pruning for Solvers of String Equations, Regular Expressions and Length Constraints. In Computer Aided Verification -27th International Conference, CAV 2015, San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part I. Springer, 235–254.
[63]
Yunhui Zheng, Xiangyu Zhang, and Vijay Ganesh. 2013. Z3-str: a Z3-based string solver for web application analysis. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2013, Saint Petersburg, Russian Federation, August 18-26, 2013. ACM, 114–124.

Cited By

View all
  • (2024)Repairing Regex-Dependent String FunctionsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695005(294-305)Online publication date: 27-Oct-2024
  • (2024)A Constraint Solving Approach to Parikh Images of Regular LanguagesProceedings of the ACM on Programming Languages10.1145/36498558:OOPSLA1(1235-1263)Online publication date: 29-Apr-2024
  • (2024)Parikh’s Theorem Made SymbolicProceedings of the ACM on Programming Languages10.1145/36329078:POPL(1945-1977)Online publication date: 5-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 3, Issue POPL
January 2019
2275 pages
EISSN:2475-1421
DOI:10.1145/3302515
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 January 2019
Published in PACMPL Volume 3, Issue POPL

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Decision Procedures
  2. ReplaceAll
  3. Reverse
  4. Straight-Line Programs
  5. String Constraints
  6. Transducers

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)195
  • Downloads (Last 6 weeks)47
Reflects downloads up to 22 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Repairing Regex-Dependent String FunctionsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695005(294-305)Online publication date: 27-Oct-2024
  • (2024)A Constraint Solving Approach to Parikh Images of Regular LanguagesProceedings of the ACM on Programming Languages10.1145/36498558:OOPSLA1(1235-1263)Online publication date: 29-Apr-2024
  • (2024)Parikh’s Theorem Made SymbolicProceedings of the ACM on Programming Languages10.1145/36329078:POPL(1945-1977)Online publication date: 5-Jan-2024
  • (2024)Mata: A Fast and Simple Finite Automata LibraryTools and Algorithms for the Construction and Analysis of Systems10.1007/978-3-031-57249-4_7(130-151)Online publication date: 5-Apr-2024
  • (2024)Z3-Noodler: An Automata-based String SolverTools and Algorithms for the Construction and Analysis of Systems10.1007/978-3-031-57246-3_2(24-33)Online publication date: 6-Apr-2024
  • (2024) Tarsis : An effective automata‐based abstract domain for string analysis Journal of Software: Evolution and Process10.1002/smr.2647Online publication date: 14-Feb-2024
  • (2023)Solving String Constraints with Lengths by StabilizationProceedings of the ACM on Programming Languages10.1145/36228727:OOPSLA2(2112-2141)Online publication date: 16-Oct-2023
  • (2023)Data Constraint Mining for Automatic Reconciliation Scripts GenerationProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598122(1119-1130)Online publication date: 12-Jul-2023
  • (2023)Black Ostrich: Web Application Scanning with String SolversProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3616582(549-563)Online publication date: 15-Nov-2023
  • (2023)On the Expressive Power of String ConstraintsProceedings of the ACM on Programming Languages10.1145/35712037:POPL(278-308)Online publication date: 11-Jan-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media