Computer Science > Computer Vision and Pattern Recognition

arXiv:2408.14998 (cs)

[Submitted on 27 Aug 2024]

Title:FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text Spotting

Authors:Alloy Das, Sanket Biswas, Umapada Pal, Josep Lladós, Saumik Bhattacharya

Abstract:The proliferation of scene text in both structured and unstructured environments presents significant challenges in optical character recognition (OCR), necessitating more efficient and robust text spotting solutions. This paper presents FastTextSpotter, a framework that integrates a Swin Transformer visual backbone with a Transformer Encoder-Decoder architecture, enhanced by a novel, faster self-attention unit, SAC2, to improve processing speeds while maintaining accuracy. FastTextSpotter has been validated across multiple datasets, including ICDAR2015 for regular texts and CTW1500 and TotalText for arbitrary-shaped texts, benchmarking against current state-of-the-art models. Our results indicate that FastTextSpotter not only achieves superior accuracy in detecting and recognizing multilingual scene text (English and Vietnamese) but also improves model efficiency, thereby setting new benchmarks in the field. This study underscores the potential of advanced transformer architectures in improving the adaptability and speed of text spotting applications in diverse real-world settings. The dataset, code, and pre-trained models have been released in our Github.

Comments:	Accepted in ICPR 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2408.14998 [cs.CV]
	(or arXiv:2408.14998v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2408.14998

Submission history

From: Alloy Das [view email]
[v1] Tue, 27 Aug 2024 12:28:41 UTC (12,786 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text Spotting

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text Spotting

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators