×

Can an A.I. win a medal in the mathematical olympiad? – Benchmarking mechanized mathematics on pre-university problems. (English) Zbl 1462.68152

Summary: This paper introduces a benchmark problem library for mechanized mathematics including computer algebra and automated theorem proving. The library consists of pre-university mathematical problems taken from exercise problem books, university entrance examinations, and the International Mathematical Olympiads. The subject areas include real algebra, geometry, number theory, pre-calculus, calculus, and combinatorics. It thus includes problems in various areas of pre-university mathematics and with a variety of difficulty. Unlike other existing benchmark libraries, this one contains problems that are formalized so that they are obtainable as the result of mechanical translation of the original problems expressed in natural language. In other words, the library is designed to support the integration of the technologies of mechanized mathematics and natural language processing towards the goal of end-to-end automatic mathematical problem solving. The paper also presents preliminary experimental results of our prototype reasoning component of an end-to-end system on the library. The library is publicly available through the Internet.

MSC:

68T01 General topics in artificial intelligence
00A07 Problem books
68V15 Theorem proving (automated and interactive theorem provers, deduction, resolution, etc.)
68V20 Formalization of mathematics in connection with theorem provers
68W30 Symbolic computation and algebraic computation
Full Text: DOI

References:

[1] B. Akbarpour and L.C. Paulson, MetiTarski: An automatic theorem prover for real-valued special functions,Journal of Automated Reasoning44(3) (2010), 175-205. doi:10.1007/s10817009-9149-2. · Zbl 1215.68206
[2] N.H. Arai, Tractability of cut-free Gentzen type propositional calculus with permutation inference,Theoretical Computer Science170(1) (1996), 129-144. doi:10.1016/S03043975(96)80704-3. · Zbl 0874.03065
[3] N.H. Arai and R. Masukawa, How to find symmetries hidden in combinatorial problems, in:Proceedings of the Eighth Symposium on the Integration of Symbolic Computation and Mechanized Reasoning, 2000. · Zbl 0986.68126
[4] R. Atkey, P. Johann and A. Kennedy, Abstraction and invariance for algebraically indexed types, in:Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. POPL ’13, ACM, New York, NY, USA, 2013, pp. 87-100,http://doi.acm.org/10.1145/ 2429069.2429082. · Zbl 1301.68183
[5] J. Avigad, R.Y. Lewis and C. Roux, A heuristic prover for real inequalities,Journal of Automated Reasoning56(3) (2016), 367-386. doi:10.1007/s10817-015-9356-y. · Zbl 1356.68174
[6] C. Barrett, A. Stump and C. Tinelli, The Satisfiability Modulo Theories Library (SMT-LIB), 2010,www.SMT-LIB.org.
[7] D.G. Bobrow, Natural language input for a computer problem solving system, PhD thesis, Massachusetts Institute of Technology, 1964.
[8] J. Bos, Wide-coverage semantic analysis with boxer, in:Semantics in Text Processing. STEP 2008 Conference Proceedings. Research in Computational Semantics, J. Bos and R. Delmonte, eds, College Publications, 2008, pp. 277-286. doi:10. 3115/1626481.1626503.
[9] R. Bradford, J.H. Davenport, M. England, S. McCallum and D. Wilson, Truth table invariant cylindrical algebraic decomposition,Journal of Symbolic Computation76(2016), 1-35. doi:10.1016/j.jsc.2015.11.002. · Zbl 1351.68314
[10] C.W. Brown, Open non-uniform cylindrical algebraic decompositions, in:Proceedings of the 2015 ACM on International Symposium on Symbolic and Algebraic Computation. ISSAC ’15, ACM, New York, NY, USA, 2015, pp. 85-92. doi:10. 1145/2755996.2756654. · Zbl 1346.68273
[11] C. Chen and M. Moreno Maza, Quantifier elimination by cylindrical algebraic decomposition based on regular chains,Journal of Symbolic Computation75(C) (2016), 74-93. doi:10. 1016/j.jsc.2015.11.008. · Zbl 1398.68695
[12] S. Clark and J.R. Curran, Wide-coverage efficient statistical parsing with CCG and log-linear models,Computational Linguistics33(2007). doi:10.1162/coli.2007.33.4.493. · Zbl 1234.68402
[13] J.H. Davenport and J. Heintz, Real quantifier elimination is doubly exponential,Journal of Symbolic Computation5(1-2) (1988), 29-35,http://www.sciencedirect.com/ science/article/pii/S074771718880004X. doi:10.1016/S07477171(88)80004-X. · Zbl 0663.03015
[14] L.A. Dennis, J. Gow and C. Schürmann, Challenge Problems for Inductive Theorem Provers v1.0, Technical report ULCS07-004, University of Liverpool, Department of Computer Science, 2007.
[15] M. England and J.H. Davenport, Experience with heuristics, benchmarks & standards for cylindrical algebraic decomposition, in:Proceedings of the 1st Workshop on Satisfiability Checking and Symbolic Computation Co-Located with 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2016), Timisoara, Romania, September 24, 2016, E. Ábrahám, J.H. Davenport and P. Fontaine, eds, CEUR Workshop Proceedings, Vol. 1804, CEUR-WS.org, 2016, pp. 24-31,http://ceur-ws.org/Vol-1804/ paper-06.pdf.
[16] R. Fukasaku, H. Iwane and Y. Sato, Improving a CGS-QE algorithm, in:Revised Selected Papers of the 6th International Conference on Mathematical Aspects of Computer and Information Sciences - Volume 9582. MACIS 2015, Springer-Verlag Inc., New York, NY, USA, 2016, pp. 231-235. · Zbl 1460.13051
[17] J.P. Gelb, Experiments with a natural language problemsolving system, in:Proceedings of the 2nd International Joint Conference on Artificial Intelligence. IJCAI’71, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1971, pp. 455- 462.
[18] A. Grabowski, A. Korniłowicz and A. Naumowicz, Mizar in a nutshell,Journal of Formalized Reasoning3(2) (2010), 153- 245. · Zbl 1211.68369
[19] J. Harrison, Without loss of generality, in:Theorem Proving in Higher Order Logics, 22nd International Conference, TPHOLs 2009, Proceedings, Munich, Germany, August 17- 20, 2009, 2009, pp. 43-59. doi:10.1007/978-3-642-033599_3. · Zbl 1252.68254
[20] H.H. Hoos and T. Stützle,SATLIB: An Online Resource for Research on SAT, IOS Press, 2000, pp. 283-292. · Zbl 0979.68128
[21] M.J. Hosseini, H. Hajishirzi, O. Etzioni and N. Kushman, Learning to solve arithmetic word problems with verb categorization, in:Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP, Doha, Qatar, October 25-29, 2014, A meeting of SIGDAT, a Special Interest Group of the ACL, 2014, pp. 523-533,http:// aclweb.org/anthology/D/D14/D14-1058.pdf. doi:10.3115/v1/ D14-1058.
[22] H. Iwane and H. Anai, Formula simplification for real quantifier elimination using geometric invariance, in:Proceedings of the 42nd International Symposium on Symbolic and Algebraic Computation (ISSAC-2017), 2017, to appear. · Zbl 1457.68326
[23] H. Iwane, T. Matsuzaki, N. Arai and H. Anai, Automated natural language geometry math problem solving by real quantier elimination, in:Proceedings of the 10th International Workshop on Automated Deduction (ADG2014), 2014, pp. 75-84.
[24] H. Iwane, H. Yanami, H. Anai and K. Yokoyama, An effective implementation of symbolic-numeric cylindrical algebraic · Zbl 1291.68433
[25] C. Kaliszyk, G. Sutcliffe and F. Rabe, TH1: The TPTP typed higher-order form with rank-1 polymorphism, in:Proceedings of the 5th Workshop on Practical Aspects of Automated Reasoning (PAAR 2016), 2016, pp. 41-55.
[26] M. Kerber and M. Pollet, A tough nut for mathematical knowledge management, in:Mathematical Knowledge Management, Springer, 2006, pp. 81-95. doi:10.1007/11618027_6. · Zbl 1151.68665
[27] M. Kobayashi, H. Iwane, T. Matsuzaki and H. Anai, Efficient subformula orders for real quantifier elimination of non-prenex formulas, in:Mathematical Aspects of Computer and Information Sciences - 6th International Conference, MACIS 2015, Revised Selected Papers, Berlin, Germany, November 11-13, 2015, pp. 236-251. doi:10.1007/978-3-319-32859-1_21. · Zbl 1460.68086
[28] R. Koncel-Kedziorski, H. Hajishirzi, A. Sabharwal, O. Etzioni and S. Ang, Parsing algebraic word problems into equations, Transactions of the Association for Computational Linguistics3(2015), 585-597,https://transacl.org/ojs/index.php/tacl/ article/view/692.
[29] N. Kushman, Y. Artzi, L. Zettlemoyer and R. Barzilay, Learning to automatically solve algebra word problems, in:Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Volume 1: Long Papers, Association for Computational Linguistics, Baltimore, Maryland, 2014, pp. 271-281,http://www.aclweb.org/anthology/ P14-1026. doi:10.3115/v1/P14-1026.
[30] T. Kwiatkowksi, L. Zettlemoyer, S. Goldwater and M. Steedman, Inducing probabilistic CCG grammars from logical form with higher-order unification, in:Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2010, pp. 1223-1233.
[31] T. Matsuzaki, T. Ito, H. Iwane, H. Anai and N.H. Arai, Semantic parsing of pre-university math problems, in:Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2017, to appear.
[32] T. Matsuzaki, H. Iwane, H. Anai and N.H. Arai, The most uncreative examinee: A first step toward wide coverage natural language math problem solving, in:Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014, pp. 1098-1104.
[33] T. Matsuzaki, H. Iwane, M. Kobayashi, Y. Zhan, R. Fukasaku, J. Kudo, H. Anai and N.H. Arai, Race against the teens – benchmarking mechanized math on pre-university problems, in:Automated Reasoning - 8th International Joint Conference, IJCAR 2016, Coimbra, Portugal, June 27, N. Olivetti and A. Tiwari, eds, Proceedings. Lecture Notes in Computer Science, Vol. 9706, Springer, 2016, pp. 213-227, July 2, 2016. · Zbl 1475.68459
[34] T. Matsuzaki, M. Kobayashi and N.H. Arai, An informationprocessing account of representation change: International mathematical olympiad problems are hard not only for humans, in:Proceedings of the 38th Annual Conference of the Cognitive Science Society, Cognitive Science Society, 2016, pp. 2297-2302.
[35] J. McCarthy, A tough nut for proof procedures, Technical report Stanford Artificial Intelligence Project Memo 16, Stanford University, 1964.
[36] A. Mitra and C. Baral, Learning to use formulas to solve simple arithmetic problems, in:Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016, pp. 2144-2153.
[37] P. Quaresma, Thousands of geometric problems for geometric theorem provers (TGTP), in:Automated Deduction in Geometry, P. Schreck, J. Narboux and J. Richter-Gebert, eds, Lecture Notes in Computer Science, Vol. 6877, Springer, Berlin Heidelberg, 2011, pp. 169-181. doi:10.1007/978-3-642-250705_10. · Zbl 1350.68243
[38] D. Raggi, A. Bundy, G. Grov and A. Pease, Automating change of representation for proofs in,Discrete Mathematics (Extended Version). Mathematics in Computer Science10(4) (2016), 429-457. · Zbl 1409.68261
[39] S. Roy and D. Roth, Solving general arithmetic word problems, in:Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics (ACL), 2015, pp. 1743- 1752. doi:10.18653/v1/D15-1202.
[40] S. Shi, Y. Wang, C.-Y. Lin, X. Liu and Y. Rui, Automatically solving number word problems by semantic parsing and reasoning, in:EMNLP, L. Màrquez, C. Callison-Burch, J. Su, D. Pighin and Y. Marton, eds, The Association for Computational Linguistics, 2015, pp. 1132-1142,http://dblp.uni-trier. de/db/conf/emnlp/emnlp2015.html#ShiWLLR15.
[41] S. Shuppan, 2014, Studyaid D.B. Chart-Shiki database (in Japanese).
[42] M. Steedman,The Syntactic Process. Bradford Books, MIT Press, Cambridge, 2001.
[43] Strzebo´nski, A., Cylindrical algebraic decomposition using local projections,Journal of Symbolic Computation76(C) (2016), 36-64. doi:10.1016/j.jsc.2015.11.018. · Zbl 1350.14042
[44] G. Sutcliffe, The TPTP problem library and associated infrastructure: The FOF and CNF parts, v3.5.0,Journal of Automated Reasoning43(4) (2009), 337-362. doi:10.1007/s10817009-9143-8. · Zbl 1185.68636
[45] G. Sutcliffe, M. Stickel, S. Schulz, Urban and J. Answer, Extraction for TPTP,http://www.cs.miami.edu/ tptp/TPTP/ Proposals/AnswerExtraction.html.
[46] A. Tarski,A Decision Method for Elementary Algebra and Geometry, University of California Press, Berkeley, 1951. · Zbl 0044.25102
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.