Author (Researcher Name)

Date of Submission

6-2024

Date of Award

6-13-2025

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science

Department

Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)

Supervisor

Bhattacharya, Ujjwal

Abstract (Summary of the Work)

In this study, I explored degraded document binarization by reviewing two recent model frameworks and implementing their models using PyTorch. The first model is based on cGANs, specifically the DE-GAN [41] framework, which enhances degraded documents by restoring their quality prior to binarization. The second model employs vision transformers [40], inspired by the DocBinFormer architecture, which uses an autoencoder in both the encoder and decoder for effective binarization. Both models were evaluated on the ISI-Bengali dataset. Experimental results demonstrate that DE-GAN improved document quality by 4% compared to the degraded input, while the vision transformer model achieved a 14% improvement, highlighting the effectiveness of transformer-based approaches for document enhancement and binarization.

Comments

System generated keywords

Control Number

CS2310

DOI

https://dspace.isical.ac.in/items/d031a1ea-41ef-4ae7-9b4f-c02c0b544e77

DSpace Identifier

http://hdl.handle.net/10263/7585

Share

COinS