A New Unsupervised Approach for Text Localization in Shaky and Non-shaky Scene Video

Document Type

Conference Article

Publication Title

Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

Abstract

Text Detection in shaky and non-shaky videos is challenging due to poor video quality and the presence of static and dynamic obstacles. Video captured by a shaky camera due to wind is considered shaky video, while video captured by a fixed camera is considered as non-shaky video. Most state-of-the-art methods achieve the best results when exploring the concept of deep learning. The present study proposes an unsupervised approach for text spotting in shaky and non-shaky videos. In the first stage, our method selects keyframes from the input video by estimating the similarity between the temporal frames, which we named activation frames. For each activation frame, the proposed method extracts statistical features such as orientation, spectral, edge density and intensity features that represent text information. The extracted features are fed to a K-means clustering method to obtain the text clusters, which results in text regions in the activation frames. For each region, the proposed method uses optical flow to extract spatial consistency, motion consistency and depth map consistency for localizing text using temporal voting-non-maximum suppression. Experiments are conducted on our shaky and non-shaky dataset, and the benchmark dataset of ICDAR 2015. For the experiments it can be seen that the proposed method is superior to existing methods.

First Page

162

Last Page

179

DOI

10.1007/978-3-031-70549-6_10

Publication Date

1-1-2024

Share

COinS