skip to main content
research-article

Research on medical text named entity recognition based on Two-stage approach

Published: 24 July 2024 Publication History

Abstract

At present, more medical Named-entity recognition work focuses on extracting continuous entities, ignoring overlapping entities and discontinuous entities. Due to the fact that sequence annotation frameworks (BIO, etc.) suitable for continuous entities are not well suited for the recognition of discontinuous and overlapping entities, these structured entities are inherently difficult to recognize. Therefore, based on the two-stage approach of decomposing entity recognition into two subtasks: segment generation and segment combination, a new annotation framework is considered, and character features are incorporated to enrich the features of discontinuous and overlapping entities in the data. In terms of model construction, BERT input vectors are used to replace Word2Vec pretraining model inputs, and relative position self-attention layers are added to enhance the model's data processing ability. The experimental structure shows that compared to mainstream model frameworks, this framework can better solve the recognition problems of discontinuous and overlapping entities in medical texts.

References

[1]
Jie Zhang, Dan Shen, Guodong Zhou, Jian Su, and Chew-Lim Tan. 2004. Enhancing hmm-based biomedical named entity recognition by studying special phenomena. Journal of Biomedical Informatics, 37(6):411–422.
[2]
Jenny Rose Finkel and Christopher D Manning. 2009. Nested named entity recognition. In Proc. of EMNLP.
[3]
Ronan Collobert, Jason Weston, L´eon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug):2493–2537.
[4]
Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991.
[5]
Emma Strubell, Patrick Verga, David Belanger, and Andrew McCallum. 2017. Fast and accurate entity recognition with iterated dilated convolutions. In Proc. of EMNLP.
[6]
Jason PC Chiu and Eric Nichols. 2016. Named entity recognition with bidirectional lstm-cnns. Transactions of the Association for Computational Linguistics, 4:357–370.
[7]
Buzhou Tang, Hongxin Cao, Yonghui Wu, Min Jiang, and Hua Xu. 2013. Recognizing clinical entities in hospital discharge summaries using structural support vector machines with word representation features. In Proc. of BMC medical informatics and decision making.
[8]
Yaoyun Zhang, Jingqi Wang, Buzhou Tang, Yonghui Wu, Min Jiang, Yukun Chen, and Hua Xu. 2014. Uth ccb: a report for semeval 2014–task 7 analysis of clinical text. In Proc. of SemEval.
[9]
Jun Xu, Yaoyun Zhang, Jingqi Wang, Yonghui Wu, Min Jiang, Ergin Soysal, and Hua Xu. 2015. Uth-ccb: the participation of the semeval 2015 challenge–task 14. In Proc. of SemEval.
[10]
Xiang Dai, Sarvnaz Karimi, Ben Hachey, and Cecile Paris. 2020b. An effective transition-based model for discontinuous ner. In Proceedings of ACL.
[11]
Aldrian Obaja Muis and Wei Lu. 2016a. Learning to recognize discontiguous entities. In Proc. of EMNLP.
[12]
Bailin Wang and Wei Lu. 2018. Neural segmental hypergraphs for overlapping mention recognition. In Proc. of EMNLP.
[13]
Bailin Wang and Wei Lu. 2019. Combining Spans into Entities: A Neural Two-Stage Approach for Recognizing Discontiguous Entities. In Proc. of EMNLP.

Index Terms

  1. Research on medical text named entity recognition based on Two-stage approach

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    CSAIDE '24: Proceedings of the 2024 3rd International Conference on Cyber Security, Artificial Intelligence and Digital Economy
    March 2024
    676 pages
    ISBN:9798400718212
    DOI:10.1145/3672919
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 July 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    CSAIDE 2024

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 3
      Total Downloads
    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 24 Oct 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media