A New Hybrid Method for Caption and Scene Text Classification in Action Video Images

Article Type

Research Article

Publication Title

International Journal of Pattern Recognition and Artificial Intelligence


Achieving a better recognition rate for text in action video images is challenging due to multiple types of text with unpredictable actions in the background. In this paper, we propose a new method for the classification of caption (which is edited text) and scene text (text that is a part of the video) in video images. This work considers five action classes, namely, Yoga, Concert, Teleshopping, Craft, and Recipes, where it is expected that both types of text play a vital role in understanding the video content. The proposed method introduces a new fusion criterion based on Discrete Cosine Transform (DCT) and Fourier coefficients to obtain the reconstructed images for caption and scene text. The fusion criterion involves computing the variances for coefficients of corresponding pixels of DCT and Fourier images, and the same variances are considered as the respective weights. This step results in Reconstructed image-1. Inspired by the special property of Chebyshev-Harmonic-Fourier-Moments (CHFM) that has the ability to reconstruct a redundancy-free image, we explore CHFM for obtaining the Reconstructed image-2. The reconstructed images along with the input image are passed to a Deep Convolutional Neural Network (DCNN) for classification of caption/scene text. Experimental results on five action classes and a comparative study with the existing methods demonstrate that the proposed method is effective. In addition, the recognition results of the before and after the classification obtained from different methods show that the recognition performance improves significantly after classification, compared to before classification.



Publication Date


This document is currently not available here.