A New Unsupervised Approach for Text Localization in Shaky and Non-shaky Scene Video
Document Type
Conference Article
Publication Title
Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics
Abstract
Text Detection in shaky and non-shaky videos is challenging due to poor video quality and the presence of static and dynamic obstacles. Video captured by a shaky camera due to wind is considered shaky video, while video captured by a fixed camera is considered as non-shaky video. Most state-of-the-art methods achieve the best results when exploring the concept of deep learning. The present study proposes an unsupervised approach for text spotting in shaky and non-shaky videos. In the first stage, our method selects keyframes from the input video by estimating the similarity between the temporal frames, which we named activation frames. For each activation frame, the proposed method extracts statistical features such as orientation, spectral, edge density and intensity features that represent text information. The extracted features are fed to a K-means clustering method to obtain the text clusters, which results in text regions in the activation frames. For each region, the proposed method uses optical flow to extract spatial consistency, motion consistency and depth map consistency for localizing text using temporal voting-non-maximum suppression. Experiments are conducted on our shaky and non-shaky dataset, and the benchmark dataset of ICDAR 2015. For the experiments it can be seen that the proposed method is superior to existing methods.
First Page
162
Last Page
179
DOI
10.1007/978-3-031-70549-6_10
Publication Date
1-1-2024
Recommended Citation
Halder, Arnab; Palaiahnakote, Shivakumara; Pal, Umapada; Blumenstein, Michael; and Liu, Cheng Lin, "A New Unsupervised Approach for Text Localization in Shaky and Non-shaky Scene Video" (2024). Conference Articles. 821.
https://digitalcommons.isical.ac.in/conf-articles/821