Cleaning of online bangla free-form handwriten text
Article Type
Research Article
Publication Title
ACM Transactions on Asian and Low-Resource Language Information Processing
Abstract
In the normal free-form handwritten text, repetition (repeated writing of the same stroke several times in the same place), over-writing, and crossing out are very common. In this article, we call the presence of these three types of writing as "noise." Cleaning to extract useful text from such types of noisy text is an important task for robust recognition. To the best of our knowledge, no work has been reported on cleaning of such noise from online text in any scripts and hence, in this article, we propose an automatic text-cleaning approach for online handwriting recognition. Here, at first, crossing out noise with straight strike-through lines is detected using the straightness criteria of online strokes. Next, regions containing repetition, over-writing, and other types of crossing out are located using the positional information of the overlapping strokes. Stroke density, self-intersections of strokes etc. are computed from the strokes of located regions to predict the type of noise and this type of information is used as follows for their cleaning. For cleaning of crossing outs, all strokes of the crossing-out region are removed. For cleaning repetition and over-writing, strokes written earlier are removed, keeping the latest strokes. Finally, delayed strokes are properly arranged and word is passed to online recognizer. Though recognition of free-form handwriting is quite difficult, in this attempt, we obtained up to 70.71% improvement in word-recognition accuracy after noise cleaning.
DOI
10.1145/3145538
Publication Date
9-1-2017
Recommended Citation
Bhattacharya, Nilanjana; Pal, Umapada; and Roy, Partha Pratim, "Cleaning of online bangla free-form handwriten text" (2017). Journal Articles. 2414.
https://digitalcommons.isical.ac.in/journal-articles/2414