Abstract
The existing document-level entity and relation extraction methods mainly concentrate on generic semantics. However, for scientific literature, especially in the materials domain, there are a large number of entities that are iteratively generated by other entities under certain conditions, which are different from traditional overlapping entities and overlapping relations that overlap at least one entity. To address these challenges, this paper proposes a new entity type in the field of information extraction: Iterative Entity as above mentioned. Meanwhile, we propose a document-level Iterative Entity and Relation Extraction method for Materials scientific literature, namely MatIERE, which contains two modules: 1) named entity extraction model for materials scientific literature with hybrid rule and semantic block labeling, which is used to extract entity such as materials, processes, and material pronouns, etc., while a materials process knowledge base is introduced to refine the extraction results; 2) rule-based iterative entity and relation recognition algorithm: firstly, we use material pronouns as relation trigger words to search for the corresponding entity or iterative entity within the context and establish the relation; then we propose to add the entity-relation triples extracted from current iteration as iterative entity to the entity set, which is input for the next iteration to extract the document-level iterative entity and relation. In the experiments, we first construct a dataset with a total of 48,714 entities and 22,885 document-level relations containing iterative entities from materials scientific literature. The comparison results show that our approach significantly outperforms other baseline models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yanjing, S., Huadong, F., Yang, B., Xue, J., Jianxin, X.: Research progress of material genetic engineering in china. Acta Metall. Sin. 56(10), 1313–1323 (2020)
Keith, T., Butler, D.W., Davies, H.C., Olexandr, I., Aron, W.: Machine learning for molecular and materials science. Nature 559(7715), 547–555 (2018)
Hongtao, Z., Huadong, F., Shuaicheng, Z., Wei, Y., Jianxin, X.: Machine learning assisted composition effective design for precipitation strengthened copper alloys. Acta Mater. 215, 117118 (2021)
Scott, K., et al.: The Open Quan-tum Materials Database (OQMD): assessing the accuracy of DFT formation energies. Comput. Mater. 1(1), 1–15 (2015)
Gupta, T., Zaki, M., Krishnan, N.A., Mausam: MatSciBERT: a materials domain language model for text mining and information extraction. npj Comput. Mater. 8(1), 102 (2022)
Wang, W., Jiang, X., Tian, S., et al.: Automated pipeline for superalloy data by text mining. Comput. Mater. 8(1), 9 (2022)
Kim, E., Huang, K., Tomala, A., et al.: Machine-learned and codified synthesis parameters of oxide materials. Sci. Data 4(1), 1–9 (2017)
Liu, Y., Yao, C., Niu, C., et al.: Text mining of hypereutectic Al-Si alloys literature based on active learning. Mater. Today Commun. 26, 102032 (2021)
Xiao, W., Xiaoxin, W., Yongqi, C., Huiran, Z.: Material domain knowledge map construction method based on natural language processing. J. Shanghai Univ. (Nat. Sci. Ed.) 28(3), 386–398 (2022)
Jiaying, K., Weidong, Z., Xianhui, L.: A two-graph docment-level relation extraction method integrating relational transfer information. Comput. Sci. 50(12), 229–235 (2023)
Xiaoyao, D., Gang, Z., Jichang, L., Jing, C.: Research on document-level relation extraction method to enhance entity representation. Comput. Sci. 50(8), 157–162 (2023)
Zhengguang, L., Hongfei, L., Chen, S., Bo, X., Wei, Z.: Document-level chemical-induced disease relationship extraction based on interactive self- attention. J. Chin. Inf. Process. 36(7), 98–105 (2022)
Jianyong, D., Xiao, Y., Hao, W., Li, H., Xin, L.: Document-level relation extraction for graph attention convolutional networks based on inter-sentence information. Comput. Sci. 50(S1), 191–196 (2023)
He, J., Nguyen, D.Q., Akhondi, S.A., et al.: Chemu 2020: Natural language processing methods are effective for information extraction from chemical patents. Front. Res. Metrics Anal. 6, 654438 (2021)
Guo, J., Ibanez-Lopez, A.S., Gao, H., et al.: Automated chemical reaction extraction from scientific literature. J. Chem. Inf. Model. 62(9), 2035–2045 (2021)
Wang, Z., Kononova, O., Cruse, K., et al.: Dataset of solution-based inorganic materials synthesis procedures extracted from the scientific literature. Sci. Data 9(1), 231 (2022)
Li, J., Fei, H., Liu, J., et al.: Unified named entity recognition as word-word relation classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 10, pp. 10965–10973 (2022)
Eberts, M., Ulges, A.: Span-based joint entity and relation extraction with transformer pre-training. arXiv preprint arXiv:1909.07755 (2019)
Zeng, S., Xu, R., Chang, B., et al.: Double graph-based reasoning for document-level relation extraction, pp. 1630–1640. arXiv preprint arXiv:2009.13752 (2020)
Xu, B., Wang, Q., Lyu, Y., et al.: Entity structure within and throughout: Modeling mention dependencies for document-level relation extraction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35(16), pp. 14149–14157 (2021)
Zhou, W., Huang, K., Ma, T., et al.: Document-level relation extraction with adaptive thresholding and localized context pooling. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 16, pp. 14612–14620 (2021)
Nan, G., Guo, Z., Sekulić, I., et al.: Reasoning with latent structure refinement for document-level relation extraction, pp. 1546–1557. arXiv preprint arXiv:2005.06312 (2020)
Li, J., Xu, K., Li, F., et al.: MRN: a locally and globally mention-based reasoning network for document-level relation extraction. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 1359–1370 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Geng, Q., You, J., Guo, H., Huang, X., Tao, J., Yi, J. (2024). Document-Level Iterative Entity and Relation Extraction for Materials Scientific Literature. In: Huang, DS., Si, Z., Zhang, Q. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2024. Lecture Notes in Computer Science(), vol 14877. Springer, Singapore. https://doi.org/10.1007/978-981-97-5669-8_41
Download citation
DOI: https://doi.org/10.1007/978-981-97-5669-8_41
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5668-1
Online ISBN: 978-981-97-5669-8
eBook Packages: Computer ScienceComputer Science (R0)