Conference Articles

Fourier Feature-based CBAM and Vision Transformer for Text Detection in Drone Images

Ayush Roy, Indian Statistical Institute, Kolkata
Palaiahnakote Shivakumara, Universiti Malaya
Umapada Pal, Indian Statistical Institute, Kolkata
Hamam Mokayed, Luleå University of Technology
Marcus Liwicki, Luleå University of Technology

Document Type

Conference Article

Publication Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract

The use of drones for several real-world applications is increasing exponentially, especially for the purpose of monitoring, surveillance, security, etc. Most existing scene text detection methods were developed for normal scene images. This work aims to develop a model for detecting text in drone as well as scene images. To reduce the adverse effects of drone images, we explore the combination of Fourier transform and Convolutional Block Attention Module (CBAM) to enhance the degraded information in the images without affecting high-contrast images. This is because the above combination helps us to extract prominent features which represent text irrespective of degradations. Therefore, the refined features extracted from the Fourier Contouring Network (FCN) are supplied to Vision Transformer, which uses the ResNet50 as a backbone and encoder-decoder for text detection in both drone and scene images. Hence, the model is called Fourier Transform based Transformer. Experimental results on drone datasets and benchmark datasets, namely, Total-Text and ICDAR 2015 of natural scene text detection show the proposed model is effective and outperforms the state-of-the-art models.

First Page

257

Last Page

271

DOI

10.1007/978-3-031-41501-2_18

Publication Date

1-1-2023

Recommended Citation

Roy, Ayush; Shivakumara, Palaiahnakote; Pal, Umapada; Mokayed, Hamam; and Liwicki, Marcus, "Fourier Feature-based CBAM and Vision Transformer for Text Detection in Drone Images" (2023). Conference Articles. 568.
https://digitalcommons.isical.ac.in/conf-articles/568

This document is currently not available here.

COinS

Conference Articles

Fourier Feature-based CBAM and Vision Transformer for Text Detection in Drone Images

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Browse

Search

Author Corner

Links

Conference Articles

Fourier Feature-based CBAM and Vision Transformer for Text Detection in Drone Images

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Share

Browse

Search

Author Corner

Links