Segmentation of offline handwritten Arabic text

Document Type

Conference Article

Publication Title

1st IEEE International Workshop on Arabic Script Analysis and Recognition, ASAR 2017

Abstract

Arabic script is cursive in both printed and handwritten forms. This intrinsic nature of cursiveness renders the segmentation task challenging. An Arabic word generally consists of multiple parts known as Parts of Arabic Words (PAWs) or simply sub-words. Sub-words share the same vertical space quite frequently which makes vertical projection segmentation technique inefficient. Several Arabic letters have annexed parts (diacritics) which are located above or below the main parts of the character. The relative positions of the annexed parts and main parts vary a lot in handwritten text. In this paper the task of segmenting offline handwritten Arabic text up to character level is taken up. Firstly, graph-theoretic modeling is utilized to extract connected components of word image. These components are subjected to a thorough analysis to facilitate the segmentation of input image into sub-words. In the sequel diacritics are removed. Then, large number of candidate segmentation points is identified based on two strategies that utilize stroke thickness as a heuristic. Final segmentation points are obtained using a set of rules on the candidate segmentation points. Finally, each sub-word is segmented and diacritics are brought back to their respective segments taking into account the issue of diacritics displacement. Experimentation is conducted on a set of handwritten images of Arabic text drawn from IFN/ENIT dataset. The results obtained are encouraging.

First Page

41

Last Page

45

DOI

10.1109/ASAR.2017.8067757

Publication Date

10-13-2017

This document is currently not available here.

Share

COinS