DSCJA-Captioner: Dual-Branch Spatial and Channel Joint Attention for Image Captioning

X Tian, X Yang, S Ma, B Song…�- 2023 18th International�…, 2023 - ieeexplore.ieee.org
X Tian, X Yang, S Ma, B Song, Z He
2023 18th International Conference on Intelligent Systems and�…, 2023ieeexplore.ieee.org
Transformer-based image captioning models have been widely used in recent years, but
most existing attentions are designed to capture spatial dependencies. These are still
inadequate for image captioning. For example, the performance of image captioning also
heavily depends on the categories and attributes of the objects. Meanwhile, in the decoding
process, when fusing text and vision information, simple splicing is used without fully fusing
text and visual information, and the vision information is not fully utilized, which affects the�…
Transformer-based image captioning models have been widely used in recent years, but most existing attentions are designed to capture spatial dependencies. These are still inadequate for image captioning. For example, the performance of image captioning also heavily depends on the categories and attributes of the objects. Meanwhile, in the decoding process, when fusing text and vision information, simple splicing is used without fully fusing text and visual information, and the vision information is not fully utilized, which affects the representation capability of the model. Therefore, in order to remedy the above limitations, we propose a Dual-branch Spatial and Channel Joint Attention for image captioning task, which captures both spatial and channel information to improve the representation capability of the model. Further, it also uses a Cross Pre-Fusion module in the decoder to explore the deep relationship between text and vision information, to improve the quality of the sentences. The entire model is abbreviated as DSCJA-captioner. Finally, we have done extensive experiments on the MS COCO dataset to validate the effectiveness of our method. Compared with the state-of-the-art models, our model is competitive.
ieeexplore.ieee.org
Showing the best result for this search. See all results