Bangla handwritten character segmentation using structural features: A supervised and bootstrapping approach
Article Type
Research Article
Publication Title
ACM Transactions on Asian and Low-Resource Language Information Processing
Abstract
In this article, we propose a new framework for segmentation of Bangla handwritten word images into meaningful individual symbols or pseudo-characters. Existing segmentation algorithms are not usually treated as a classification problem. However, in the present study, the segmentation algorithm is looked upon as a two-class supervised classification problem. The method employs an SVM classifier to select the segmentation points on the word image on the basis of various structural features. For training of the SVM classifier, an unannotated training set is prepared first using candidate segmenting points. The training set is then clustered, and each cluster is labeled manually with minimal manual intervention. A semi-automatic bootstrapping technique is also employed to enlarge the training set from new samples. The overall architecture describes a basic step toward building an annotation system for the segmentation problem, which has not so far been investigated. The experimental results show that our segmentation method is quite efficient in segmenting not only word images but also handwritten texts. As a part of this work, a database of Bangla handwritten word images has also been developed. Considering our data collection method and a statistical analysis of our lexicon set, we claim that the relevant characteristics of an ideal lexicon set are present in our handwritten word image database.
DOI
10.1145/2890497
Publication Date
4-1-2016
Recommended Citation
Bhowmik, Tapan Kumar; Parui, Swapan Kumar; Roy, Utpal; and Schomaker, Lambert, "Bangla handwritten character segmentation using structural features: A supervised and bootstrapping approach" (2016). Journal Articles. 4113.
https://digitalcommons.isical.ac.in/journal-articles/4113
Comments
Open Access; Green Open Access