Abstract
This work focuses on the task of Mathematical Answer Retrieval and studies the factors a recent Transformer-Encoder-based Language Model (LM) uses to assess the relevance of an answer for a given mathematical question. Mainly, we investigate three factors: (1) the general influence of mathematical formulae, (2) the usage of structural information of those formulae, (3) the overlap of variable names in answers and questions. The findings of the investigation indicate that the LM for Mathematical Answer Retrieval mainly relies on shallow features such as the overlap of variables between question and answers. Furthermore, we identified a malicious shortcut in the training data that hinders the usage of structural information and by removing this shortcut improved the overall accuracy. We want to foster future research on how LMs are trained for Mathematical Answer Retrieval and provide a basic evaluation set up (Link to repository: https://github.com/AnReu/math_analysis) for existing models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
We use a custom tokenizer, e.g., is tokenized as .
References
Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019)
Belinkov, Y.: Probing classifiers: promises, shortcomings, and advances. Comput. Linguist. 48(1), 207–219 (2022)
del Barrio, E., Cuesta-Albertos, J.A., Matrán, C.: An optimal transportation approach for assessing almost stochastic order. In: Gil, E., Gil, E., Gil, J., Gil, M.Á. (eds.) The Mathematics of the Uncertain. SSDC, vol. 142, pp. 33–44. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73848-2_3
Dror, R., Shlomov, S., Reichart, R.: Deep dominance - how to properly compare deep neural models. In: Korhonen, A., Traum, D.R., Màrquez, L. (eds.) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, 28 July–2 August 2019, vol. 1: Long Papers, pp. 2773–2785. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/p19-1266
Dua, D., Wang, Y., Dasigi, P., Stanovsky, G., Singh, S., Gardner, M.: Drop: a reading comprehension benchmark requiring discrete reasoning over paragraphs. In: Proceedings of NAACL-HLT, pp. 2368–2378 (2019)
Fan, Y., Guo, J., Ma, X., Zhang, R., Lan, Y., Cheng, X.: A linguistic study on relevance modeling in information retrieval. In: Proceedings of the Web Conference 2021, pp. 1053–1064 (2021)
Geletka, M., Kalivoda, V., Štefánik, M., Toma, M., Sojka, P.: Diverse semantics representation is king. In: Proceedings of the Working Notes of CLEF 2022 (2022)
Hendrycks, D., et al.: Measuring mathematical problem solving with the math dataset. In: NeurIPS (2021)
Humeau, S., Shuster, K., Lachaux, M.A., Weston, J.: Poly-encoders: architectures and pre-training strategies for fast and accurate multi-sentence scoring. In: International Conference on Learning Representations (2019)
Khattab, O., Zaharia, M.: Colbert: efficient and effective passage search via contextualized late interaction over bert. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 39–48 (2020)
Mansouri, B., Agarwal, A., Oard, D., Zanibbi, R.: Finding old answers to new math questions: the ARQMath lab at CLEF 2020. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12036, pp. 564–571. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_73
Mansouri, B., Agarwal, A., Oard, D., Zanibbi, R.: Advancing math-aware search: the arqmath-2 lab at clef 2021, pp. 631–638 (2021)
Mansouri, B., Novotnỳ, V., Agarwal, A., Oard, D.W., Zanibbi, R.: Overview of arqmath-3 (2022): third clef lab on answer retrieval for questions on math (working notes version). In: Proceedings of the Working Notes of CLEF 2022 (2022)
Mansouri, B., Oard, D.W., Zanibbi, R.: DPRL systems in the clef 2021 arqmath lab: sentence-bert for answer retrieval, learning-to-rank for formula retrieval (2021)
Novotnỳ, V., Štefánik, M.: Combining sparse and dense information retrieval. In: Proceedings of the Working Notes of CLEF (2022)
O’Connor, J., Andreas, J.: What context features can transformer language models use? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, vol. 1: Long Papers, pp. 851–864 (2021)
Pham, T., Bui, T., Mai, L., Nguyen, A.: Out of order: how important is the sequential order of words in a sentence in natural language understanding tasks? In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 1145–1160 (2021)
Polu, S., Sutskever, I.: Generative language modeling for automated theorem proving. arXiv preprint arXiv:2009.03393 (2020)
Qiao, Y., Xiong, C., Liu, Z., Liu, Z.: Understanding the behaviors of bert in ranking. arXiv preprint arXiv:1904.07531 (2019)
Reusch, A., Lehner, W.: Extracting operator trees from model embeddings. In: Proceedings of the 1st MathNLP Workshop (2022)
Reusch, A., Thiele, M., Lehner, W.: Transformer-encoder and decoder models for questions on math. In: Proceedings of the Working Notes of CLEF 2022, pp. 5–8 (2022)
Reusch, A., Thiele, M., Lehner, W.: Transformer-encoder-based mathematical information retrieval. In: International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 175–189. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-13643-6_14
Rohatgi, S., Wu, J., Giles, C.L.: Psu at clef-2020 arqmath track: unsupervised re-ranking using pretraining. In: CEUR Workshop Proceedings. Thessaloniki, Greece (2020)
Saxton, D., Grefenstette, E., Hill, F., Kohli, P.: Analysing mathematical reasoning abilities of neural models. In: International Conference on Learning Representations (2019)
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: International Conference on Machine Learning, pp. 3319–3328. PMLR (2017)
Ulmer, D., Hardmeier, C., Frellsen, J.: deep-significance: easy and meaningful signifcance testing in the age of neural networks. In: ML Evaluation Standards Workshop at the Tenth International Conference on Learning Representations (2022)
Van Aken, B., Winter, B., Löser, A., Gers, F.A.: How does bert answer questions? a layer-wise analysis of transformer representations. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1823–1832 (2019)
Vashishth, S., Upadhyay, S., Tomar, G.S., Faruqui, M.: Attention interpretability across NLP tasks. arXiv preprint arXiv:1909.11218 (2019)
Wallat, J., Singh, J., Anand, A.: Bertnesia: investigating the capture and forgetting of knowledge in bert. CoRR abs/2106.02902 (2021). https://arxiv.org/abs/2106.02902
Wolf, T., et al.: Transformers: state-of-the-art natural language processing, pp. 38–45. Association for Computational Linguistics (2020). https://www.aclweb.org/anthology/2020.emnlp-demos.6
Zhan, J., Mao, J., Liu, Y., Zhang, M., Ma, S.: An analysis of bert in document ranking. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1941–1944 (2020)
Zhong, W., Lin, S.C., Yang, J.H., Lin, J.: One blade for one purpose: advancing math information retrieval using hybrid search. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 141–151 (2023)
Zhong, W., Yang, J.H., Lin, J.: Evaluating token-level and passage-level dense retrieval models for math information retrieval. arXiv preprint arXiv:2203.11163 (2022)
Acknowledgements
The authors would like to thank the anonymous reviewers for their helpful feedback and comments. This work was supported by the DFG under Germany’s Excellence Strategy, Grant No. EXC-2068-390729961, Cluster of Excellence “Physics of Life” of TU Dresden. Furthermore, the authors are grateful for the GWK support for funding this project by providing computing time through the Center for Information Services and HPC (ZIH) at TU Dresden.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Reusch, A., Gonsior, J., Hartmann, C., Lehner, W. (2024). Investigating the Usage of Formulae in Mathematical Answer Retrieval. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14608. Springer, Cham. https://doi.org/10.1007/978-3-031-56027-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-56027-9_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56026-2
Online ISBN: 978-3-031-56027-9
eBook Packages: Computer ScienceComputer Science (R0)