Conference Articles

Automatic extraction of text and non-text information directly from compressed document images

Mohammed Javed, NMAM Institute of Technology
P. Nagabhushan, University of Mysore
Bidyut B. Chaudhuri, Indian Statistical Institute, Kolkata

Document Type

Conference Article

Publication Title

Advances in Intelligent Systems and Computing

Abstract

Texts, images, audios, and videos form the major volume in Big Data being generated in today’s tech-savvy world. Such data are preferably archived and transmitted in the compressed form to realize storage and transmission efficiency. Through compression, though data becomes storage and transmission efficient, its processing gets expensive as it requires decompression as many times the data needs to be processed; and this requires additional computing resources. Therefore it would be novel, if the data processing and information extraction could be carried out directly from the compressed data without subjecting it to decompression. In this backdrop, the research paper demonstrates a novel technique of extracting text and non-text information straight from compressed document images (supported by TIFF and PDF formats) using the correlation-entropy features that are directly computed from the compressed representation. The experimental results reported on compressed printed text document images validate the proposed method, and also demonstrate the fact that the text and non-text information extracted from the compressed document are identical to that obtained from uncompressed representation.

First Page

Last Page

DOI

10.1007/978-3-319-52941-7_5

Publication Date

1-1-2017

Recommended Citation

Javed, Mohammed; Nagabhushan, P.; and Chaudhuri, Bidyut B., "Automatic extraction of text and non-text information directly from compressed document images" (2017). Conference Articles. 347.
https://digitalcommons.isical.ac.in/conf-articles/347

This document is currently not available here.

COinS

Conference Articles

Automatic extraction of text and non-text information directly from compressed document images

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Browse

Search

Author Corner

Links

Conference Articles

Automatic extraction of text and non-text information directly from compressed document images

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Share

Browse

Search

Author Corner

Links