FulltextAttention: Full-text Classification of Scientific Papers Using Combined Model of BiGRU and Bahdanau Attention

Y Jang, H Choi, K Won, SY Shin�- Proceedings of the 2023 International�…, 2023 - dl.acm.org
Y Jang, H Choi, K Won, SY Shin
Proceedings of the 2023 International Conference on Research in Adaptive and�…, 2023dl.acm.org
This study proposes a novel approach, FulltextAttention, for classifying lengthy documents,
particularly scientific papers. The proposed model tackles the length constraint issue of
transformer-based attention mechanisms by dividing the documents into fixed-size chunks. It
consists of two modules: The document segmentation module splits the input document into
manageable fixed-size chunks, while the attention module performs token-level and chunk-
level attention in a two-step process. Unlike the existing transformer-based models�…
This study proposes a novel approach, FulltextAttention, for classifying lengthy documents, particularly scientific papers. The proposed model tackles the length constraint issue of transformer-based attention mechanisms by dividing the documents into fixed-size chunks. It consists of two modules: The document segmentation module splits the input document into manageable fixed-size chunks, while the attention module performs token-level and chunk-level attention in a two-step process. Unlike the existing transformer-based models, FulltextAttention is not limited by the length of the input sequence by utilizing the RNN-based bidirectional GRU (BiGRU). The attention module employs Bahdanau Attention that is applicable to RNNs. The proposed model was trained by optimizing parameters to minimize the loss through the Bahdanau attention process after passing the feature embedding through the BiGRU layer. Our experiments on scientific papers of RF-EMF (Radio Frequency Electromagnetic Field) show that the proposed model using full-text achieves comparable performance to the state-of-the-art transformer-based BERT model using abstract data. Although the limited size of the full-text dataset affected the model's performance, this can be addressed by augmenting the dataset in the future. Furthermore, the proposed model exhibits potential for efficient utilization in other downstream tasks, such as question and answering (Q&A).
ACM Digital Library
Showing the best result for this search. See all results