Conference Articles

A Hybrid Deep Architecture for Robust Recognition of Text Lines of Degraded Printed Documents

Chandan Biswas, Indian Statistical Institute, Kolkata
Partha Sarathi Mukherjee, Indian Statistical Institute, Kolkata
Koyel Ghosh, Nopany Institute of Health Care Studies
Ujjwal Bhattacharya, Indian Statistical Institute, Kolkata
Swapan K. Parui, Indian Statistical Institute, Kolkata

Document Type

Conference Article

Publication Title

Proceedings - International Conference on Pattern Recognition

Abstract

During the last 20 years, significant research studies have been undertaken for automatic recognition of printed documents. The same is true for Bangla, a major Indian script. All these studies were mainly centered on comparatively well-behaved good quality printed documents. However, many of the large archives include significant volumes of older documents which are so degraded in their present form that they cannot be reasonably transcribed using the existing OCR (Optical Character Recognition) approaches. On the other hand, automatic recognition of printed contents of these documents has significant application potentials such as generation of descriptive metadata, full-text searching, information extraction etc. The contributions made in the present study are (i) creation of a moderately large annotated database of degraded Bangla documents towards their recognition studies, (ii) development of a Gaussian mixture model based strategy for extraction of text components from complex noisy background of such documents and (iii) development of a line level recognition scheme for degraded Bangla documents. We have studied two different CNN-BLSTM-CTC hybrid architectures for this recognition problem. The winning architecture uses the first convolution layer of the CNN in a fashion similar to the inception model of deep learning methodologies.

First Page

3174

Last Page

3179

DOI

10.1109/ICPR.2018.8545409

Publication Date

11-26-2018

Recommended Citation

Biswas, Chandan; Mukherjee, Partha Sarathi; Ghosh, Koyel; Bhattacharya, Ujjwal; and Parui, Swapan K., "A Hybrid Deep Architecture for Robust Recognition of Text Lines of Degraded Printed Documents" (2018). Conference Articles. 36.
https://digitalcommons.isical.ac.in/conf-articles/36

This document is currently not available here.

COinS

Conference Articles

A Hybrid Deep Architecture for Robust Recognition of Text Lines of Degraded Printed Documents

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Browse

Search

Author Corner

Links

Conference Articles

A Hybrid Deep Architecture for Robust Recognition of Text Lines of Degraded Printed Documents

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Share

Browse

Search

Author Corner

Links