Skip to main content

Document-Level Iterative Entity and Relation Extraction for Materials Scientific Literature

  • Conference paper
  • First Online:
Advanced Intelligent Computing Technology and Applications (ICIC 2024)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14877))

Included in the following conference series:

  • 251 Accesses

Abstract

The existing document-level entity and relation extraction methods mainly concentrate on generic semantics. However, for scientific literature, especially in the materials domain, there are a large number of entities that are iteratively generated by other entities under certain conditions, which are different from traditional overlapping entities and overlapping relations that overlap at least one entity. To address these challenges, this paper proposes a new entity type in the field of information extraction: Iterative Entity as above mentioned. Meanwhile, we propose a document-level Iterative Entity and Relation Extraction method for Materials scientific literature, namely MatIERE, which contains two modules: 1) named entity extraction model for materials scientific literature with hybrid rule and semantic block labeling, which is used to extract entity such as materials, processes, and material pronouns, etc., while a materials process knowledge base is introduced to refine the extraction results; 2) rule-based iterative entity and relation recognition algorithm: firstly, we use material pronouns as relation trigger words to search for the corresponding entity or iterative entity within the context and establish the relation; then we propose to add the entity-relation triples extracted from current iteration as iterative entity to the entity set, which is input for the next iteration to extract the document-level iterative entity and relation. In the experiments, we first construct a dataset with a total of 48,714 entities and 22,885 document-level relations containing iterative entities from materials scientific literature. The comparison results show that our approach significantly outperforms other baseline models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
eBook
USD 69.99
Price excludes VAT (USA)
Softcover Book
USD 89.99
Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yanjing, S., Huadong, F., Yang, B., Xue, J., Jianxin, X.: Research progress of material genetic engineering in china. Acta Metall. Sin. 56(10), 1313–1323 (2020)

    Google Scholar 

  2. Keith, T., Butler, D.W., Davies, H.C., Olexandr, I., Aron, W.: Machine learning for molecular and materials science. Nature 559(7715), 547–555 (2018)

    Article  Google Scholar 

  3. Hongtao, Z., Huadong, F., Shuaicheng, Z., Wei, Y., Jianxin, X.: Machine learning assisted composition effective design for precipitation strengthened copper alloys. Acta Mater. 215, 117118 (2021)

    Article  Google Scholar 

  4. Scott, K., et al.: The Open Quan-tum Materials Database (OQMD): assessing the accuracy of DFT formation energies. Comput. Mater. 1(1), 1–15 (2015)

    Google Scholar 

  5. Gupta, T., Zaki, M., Krishnan, N.A., Mausam: MatSciBERT: a materials domain language model for text mining and information extraction. npj Comput. Mater. 8(1), 102 (2022)

    Google Scholar 

  6. Wang, W., Jiang, X., Tian, S., et al.: Automated pipeline for superalloy data by text mining. Comput. Mater. 8(1), 9 (2022)

    Article  Google Scholar 

  7. Kim, E., Huang, K., Tomala, A., et al.: Machine-learned and codified synthesis parameters of oxide materials. Sci. Data 4(1), 1–9 (2017)

    Article  Google Scholar 

  8. Liu, Y., Yao, C., Niu, C., et al.: Text mining of hypereutectic Al-Si alloys literature based on active learning. Mater. Today Commun. 26, 102032 (2021)

    Article  Google Scholar 

  9. Xiao, W., Xiaoxin, W., Yongqi, C., Huiran, Z.: Material domain knowledge map construction method based on natural language processing. J. Shanghai Univ. (Nat. Sci. Ed.) 28(3), 386–398 (2022)

    Google Scholar 

  10. Jiaying, K., Weidong, Z., Xianhui, L.: A two-graph docment-level relation extraction method integrating relational transfer information. Comput. Sci. 50(12), 229–235 (2023)

    Google Scholar 

  11. Xiaoyao, D., Gang, Z., Jichang, L., Jing, C.: Research on document-level relation extraction method to enhance entity representation. Comput. Sci. 50(8), 157–162 (2023)

    Google Scholar 

  12. Zhengguang, L., Hongfei, L., Chen, S., Bo, X., Wei, Z.: Document-level chemical-induced disease relationship extraction based on interactive self- attention. J. Chin. Inf. Process. 36(7), 98–105 (2022)

    Google Scholar 

  13. Jianyong, D., Xiao, Y., Hao, W., Li, H., Xin, L.: Document-level relation extraction for graph attention convolutional networks based on inter-sentence information. Comput. Sci. 50(S1), 191–196 (2023)

    Google Scholar 

  14. He, J., Nguyen, D.Q., Akhondi, S.A., et al.: Chemu 2020: Natural language processing methods are effective for information extraction from chemical patents. Front. Res. Metrics Anal. 6, 654438 (2021)

    Article  Google Scholar 

  15. Guo, J., Ibanez-Lopez, A.S., Gao, H., et al.: Automated chemical reaction extraction from scientific literature. J. Chem. Inf. Model. 62(9), 2035–2045 (2021)

    Article  Google Scholar 

  16. Wang, Z., Kononova, O., Cruse, K., et al.: Dataset of solution-based inorganic materials synthesis procedures extracted from the scientific literature. Sci. Data 9(1), 231 (2022)

    Article  Google Scholar 

  17. Li, J., Fei, H., Liu, J., et al.: Unified named entity recognition as word-word relation classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 10, pp. 10965–10973 (2022)

    Google Scholar 

  18. Eberts, M., Ulges, A.: Span-based joint entity and relation extraction with transformer pre-training. arXiv preprint arXiv:1909.07755 (2019)

    Google Scholar 

  19. Zeng, S., Xu, R., Chang, B., et al.: Double graph-based reasoning for document-level relation extraction, pp. 1630–1640. arXiv preprint arXiv:2009.13752 (2020)

    Google Scholar 

  20. Xu, B., Wang, Q., Lyu, Y., et al.: Entity structure within and throughout: Modeling mention dependencies for document-level relation extraction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35(16), pp. 14149–14157 (2021)

    Google Scholar 

  21. Zhou, W., Huang, K., Ma, T., et al.: Document-level relation extraction with adaptive thresholding and localized context pooling. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 16, pp. 14612–14620 (2021)

    Google Scholar 

  22. Nan, G., Guo, Z., Sekulić, I., et al.: Reasoning with latent structure refinement for document-level relation extraction, pp. 1546–1557. arXiv preprint arXiv:2005.06312 (2020)

    Google Scholar 

  23. Li, J., Xu, K., Li, F., et al.: MRN: a locally and globally mention-based reasoning network for document-level relation extraction. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 1359–1370 (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinguo You .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Geng, Q., You, J., Guo, H., Huang, X., Tao, J., Yi, J. (2024). Document-Level Iterative Entity and Relation Extraction for Materials Scientific Literature. In: Huang, DS., Si, Z., Zhang, Q. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2024. Lecture Notes in Computer Science(), vol 14877. Springer, Singapore. https://doi.org/10.1007/978-981-97-5669-8_41

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-5669-8_41

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-5668-1

  • Online ISBN: 978-981-97-5669-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics