Journal Articles

An approach for detecting and cleaning of struck-out handwritten text

Bidyut B. Chaudhuri, Indian Statistical Institute, Kolkata
Chandranath Adak, Indian Statistical Institute, Kolkata

Article Type

Research Article

Publication Title

Pattern Recognition

Abstract

This paper deals with the identification and processing of struck-out texts in unconstrained offline handwritten document images. If run on the OCR engine, such texts will produce nonsense character-string outputs. Here we present a combined (a) pattern classification and (b) graph-based method for identifying such texts. In case of (a), a feature-based two-class (normal vs. struck-out text) SVM classifier is used to detect moderate-sized struck-out components. In case of (b), skeleton of the text component is considered as a graph and the strike-out stroke is identified using a constrained shortest path algorithm. To identify zigzag or wavy struck-outs, all paths are found and some properties of zigzag and wavy line are utilized. Some other types of strike-out stroke are also detected by modifying the above method. The large sized multi-word and multi-line struck-outs are segmented into smaller components and treated as above. The detected struck-out texts can then be blocked from entering the OCR engine. In another kind of application involving historical documents, page images along with their annotated ground-truth are to be generated. In this case the strike-out strokes can be deleted from the words and then fed to the OCR engine. For this purpose an inpainting-based cleaning approach is employed. We worked on 500 pages of documents and obtained an overall F-Measure of 91.56% (91.06%) in English (Bengali) script for struck-out text detection. Also, for strike-out stroke identification and deletion, the F-Measures obtained were 89.65% (89.31%) and 91.16% (89.29%), respectively.

First Page

282

Last Page

294

DOI

10.1016/j.patcog.2016.07.032

Publication Date

1-1-2017

Recommended Citation

Chaudhuri, Bidyut B. and Adak, Chandranath, "An approach for detecting and cleaning of struck-out handwritten text" (2017). Journal Articles. 2807.
https://digitalcommons.isical.ac.in/journal-articles/2807

This document is currently not available here.

COinS

Journal Articles

An approach for detecting and cleaning of struck-out handwritten text

Article Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Browse

Search

Author Corner

Links

Journal Articles

An approach for detecting and cleaning of struck-out handwritten text

Authors

Article Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Share

Browse

Search

Author Corner

Links