Low Resource Degraded Quality Document Image Binarization - Domain Adaptation is the Way
Document Type
Conference Article
Publication Title
ACM International Conference Proceeding Series
Abstract
Usually, image binarization plays a crucial role in automatic analysis of degraded documents from their captured images. However, this binarization task is often difficult due to a number of reasons including the high similarity between noisy background and faded foreground pixels. The study presented here is particularly focused on binarization of images of low-resource degraded quality documents based on a set of recently collected image samples of several rare, ancient and severely degraded quality printed documents of Bangla, the 2nd and 5th most popular script of India and the world respectively. This new collection of degraded document image samples will henceforth be referred as 'ISIDDI2' and it consists of 139 images of Bangla old document pages. Samples of 'ISIDDI', another existing database of degraded Bangla document image samples, have also been used in the present study. A novel deep architecture based on attention UNET++ with dilated convolution operation is proposed for this binarization task. The model is optimized using human vision perceptible distance reciprocal distortion (DRD) loss. Since the binarization ground truth of samples of both 'ISIDDI2' and 'ISIDDI' are not available, the proposed network has been trained using samples of DIBCO and H-DIBCO datasets and an unsupervised domain adaptation (DA) module is employed for adaptation of the proposed architecture to the degradation patterns of 'ISIDDI2' or 'ISIDDI' samples. The proposed binarization strategy includes certain post-processing operation based on a modified k-neighbourhood based approach for recovery of broken characters. Results of our extensive experimentation show that the proposed binarization strategy has improved the binarization output of state-of-the-art methods on both ISIDDI2 and ISIDDI datasets. Also, its performance on well-known DIBCO samples is satisfactory.
DOI
10.1145/3571600.3571614
Publication Date
12-8-2022
Recommended Citation
Kundu, Ahana and Bhattacharya, Ujjwal, "Low Resource Degraded Quality Document Image Binarization - Domain Adaptation is the Way" (2022). Conference Articles. 370.
https://digitalcommons.isical.ac.in/conf-articles/370