Skip to main content

The ChEMU 2022 Evaluation Campaign: Information Extraction in Chemical Patents

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2022)

Abstract

The discovery of new chemical compounds is a key driver of the chemistry and pharmaceutical industries, and many other industrial sectors. Patents serve as a critical source of information about new chemical compounds. The ChEMU (Cheminformatics Elsevier Melbourne Universities) lab addresses information extraction over chemical patents and aims to advance the state of the art on this topic. ChEMU lab 2022, as part of the 13th Conference and Labs of the Evaluation Forum (CLEF-2022), will be the third ChEMU lab. The ChEMU 2020 lab provided two information extraction tasks, named entity recognition and event extraction. The ChEMU 2021 lab introduced two more tasks, chemical reaction reference resolution and anaphora resolution. For ChEMU 2022, we plan to re-run all the four tasks with a new task on semantic classification for tables as the fifth one. In this paper, we introduce ChEMU 2022, including its motivation, goals, tasks, resources, and evaluation framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
eBook
USD 89.00
Price excludes VAT (USA)
Softcover Book
USD 119.99
Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Reaxys® Copyright ©2021 Elsevier Life Sciences IP Limited. Reaxys is a trademark of Elsevier Life Sciences IP Limited, used under license. https://www.reaxys.com.

  2. 2.

    http://chemu.eng.unimelb.edu.au/.

References

  1. Akhondi, S.A., et al.: Automatic identification of relevant chemical compounds from patents. Database 2019, baz001 (2019)

    Google Scholar 

  2. Bregonje, M.: Patents: a unique source for scientific technical information in chemistry related industry? World Patent Inf. 27(4), 309–315 (2005)

    Article  Google Scholar 

  3. Fang, B., Druckenbrodt, C., Akhondi, S.A., He, J., Baldwin, T., Verspoor, K.M.: ChEMU-Ref: a corpus for modeling anaphora resolution in the chemical domain. In: Merlo, P., Tiedemann, J., Tsarfaty, R. (eds.) Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, EACL 2021, Online, 19–23 April 2021, pp. 1362–1375. Association for Computational Linguistics (2021). https://www.aclweb.org/anthology/2021.eacl-main.116/

  4. He, J., et al.: ChEMU 2021: reaction reference resolution and Anaphora resolution in chemical patents. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12657, pp. 608–615. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72240-1_71

    Chapter  Google Scholar 

  5. He, J., et al.: Overview of ChEMU 2020: named entity recognition and event extraction of chemical reactions from patents. In: Arampatzis, A., et al. (eds.) CLEF 2020. LNCS, vol. 12260, pp. 237–254. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58219-7_18

    Chapter  Google Scholar 

  6. He, J., et al.: ChEMU 2020: natural language processing methods are effective for information extraction from chemical patents. Frontiers Res. Metrics Anal. 6, 654438 (2021). https://doi.org/10.3389/frma.2021.654438

  7. Hu, M., Cinciruk, D., Walsh, J.M.: Improving automated patent claim parsing: dataset, system, and experiments. arXiv preprint arXiv:1605.01744 (2016)

  8. Krallinger, M., Leitner, F., Rabal, O., Vazquez, M., Oyarzabal, J., Valencia, A.: CHEMDNER: the drugs and chemical names extraction challenge. J. Cheminform. 7(1), 1–11 (2015)

    Article  Google Scholar 

  9. Li, Y., et al.: Overview of ChEMU 2021: reaction reference resolution and Anaphora resolution in chemical patents. In: Candan, K.S., et al. (eds.) CLEF 2021. LNCS, vol. 12880, pp. 292–307. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85251-1_20

    Chapter  Google Scholar 

  10. Li, Y., et al.: Extended overview of ChEMU 2021: reaction reference resolution and anaphora resolution in chemical patents. In: Faggioli, G., Ferro, N., Joly, A., Maistro, M., Piroi, F. (eds.) Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, Bucharest, Romania, 21st–24th September 2021. CEUR Workshop Proceedings, vol. 2936, pp. 693–709. CEUR-WS.org (2021). http://ceur-ws.org/Vol-2936/paper-58.pdf

  11. Muresan, S., et al.: Making every SAR point count: the development of chemistry connect for the large-scale integration of structure and bioactivity data. Drug Discovery Today 16(23–24), 1019–1030 (2011)

    Google Scholar 

  12. Nguyen, D.Q., et al.: ChEMU: named entity recognition and event extraction of chemical reactions from patents. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12036, pp. 572–579. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_74

    Chapter  Google Scholar 

  13. Senger, S., Bartek, L., Papadatos, G., Gaulton, A.: Managing expectations: assessment of chemistry databases generated by automated extraction of chemical structures from patents. J. Cheminform. 7(1), 1–12 (2015). https://doi.org/10.1186/s13321-015-0097-z

  14. Yoshikawa, H., et al.: Chemical reaction reference resolution in patents. In: Proceedings of the 2nd Workshop on on Patent Text Mining and Semantic Technologies (2021)

    Google Scholar 

  15. Zhai, Z., et al.: ChemTables: dataset for table classification in chemical patents (2021). https://doi.org/10.17632/g7tjh7tbrj.3

  16. Zhai, Z., et al.: ChemTables: a dataset for semantic classification on tables in chemical patents. J. Cheminform. 13(1), 97 (2021). https://doi.org/10.1186/s13321-021-00568-2

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karin Verspoor .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, Y. et al. (2022). The ChEMU 2022 Evaluation Campaign: Information Extraction in Chemical Patents. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13186. Springer, Cham. https://doi.org/10.1007/978-3-030-99739-7_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-99739-7_50

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-99738-0

  • Online ISBN: 978-3-030-99739-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics