Low Resource Degraded Quality Document Image Binarization - Domain Adaptation is the Way

Document Type

Conference Article

Publication Title

ACM International Conference Proceeding Series


Usually, image binarization plays a crucial role in automatic analysis of degraded documents from their captured images. However, this binarization task is often difficult due to a number of reasons including the high similarity between noisy background and faded foreground pixels. The study presented here is particularly focused on binarization of images of low-resource degraded quality documents based on a set of recently collected image samples of several rare, ancient and severely degraded quality printed documents of Bangla, the 2nd and 5th most popular script of India and the world respectively. This new collection of degraded document image samples will henceforth be referred as 'ISIDDI2' and it consists of 139 images of Bangla old document pages. Samples of 'ISIDDI', another existing database of degraded Bangla document image samples, have also been used in the present study. A novel deep architecture based on attention UNET++ with dilated convolution operation is proposed for this binarization task. The model is optimized using human vision perceptible distance reciprocal distortion (DRD) loss. Since the binarization ground truth of samples of both 'ISIDDI2' and 'ISIDDI' are not available, the proposed network has been trained using samples of DIBCO and H-DIBCO datasets and an unsupervised domain adaptation (DA) module is employed for adaptation of the proposed architecture to the degradation patterns of 'ISIDDI2' or 'ISIDDI' samples. The proposed binarization strategy includes certain post-processing operation based on a modified k-neighbourhood based approach for recovery of broken characters. Results of our extensive experimentation show that the proposed binarization strategy has improved the binarization output of state-of-the-art methods on both ISIDDI2 and ISIDDI datasets. Also, its performance on well-known DIBCO samples is satisfactory.



Publication Date


This document is currently not available here.