Arabic and Latin Scene Text Recognition by Combining Handcrafted and Deep-Learned Features

Article Type

Research Article

Publication Title

Arabian Journal for Science and Engineering


Recognizing text within camera-captured images has been a very significant research topic for the last decades. In this paper, we intend to recognize Latin/Arabic text in natural scenes. For this reason, we present a comparative study between handcrafted and hybrid features. To acquire handcrafted features, we employ a standard bag of features (BoF) model based on a variant of the dense scale-invariant feature transform (SIFT) features. However, hybrid features are obtained by combining handcrafted features with deep-learned features using deep sparse auto-encoder (SAE). Indeed, an SAE-based method is applied in the local feature learning step to enhance discriminative as well as representative abilities of character image features. In the recognition step, we use hidden Markov model (HMM) with the aim of constructing a hybrid BoF-SAE-HMM architecture. We extensively evaluate our system upon various cropped word datasets of Arabic as well as Latin script. Using handcrafted features, the mean recognition accuracy obtained is 70.5 % for Arabic and 82.3 % for Latin script. Using hybrid features, we reached a mean recognition accuracy of 79.2 % and 91.7 % for Arabic and Latin scripts, respectively. Hence, combination of deep-learned features with handcrafted features leads to a considerable improvement of the recognition accuracy.

First Page


Last Page




Publication Date


This document is currently not available here.