Conference Articles

Spotting of keyword directly in run-length compressed documents

Mohammed Javed, University of Mysore
P. Nagabhushan, University of Mysore
Bidyut Baran Chaudhuri, University of Mysore

Document Type

Conference Article

Publication Title

Advances in Intelligent Systems and Computing

Abstract

With the rapid growth of digital libraries, e-governance and Internet applications, huge volume of documents are being generated, communicated and archived in the compressed form to provide better storage and transfer efficiencies. In such a large repository of compressed documents, the frequently used operations like keyword searching and document retrieval have to be carried out after decompression and subsequently with the help of an OCR. Therefore developing keyword spotting technique directly in compressed documents is a potential and challenging research issue. In this backdrop, the paper presents a novel approach for searching keywords directly in run-length compressed documents without going through the stages of decompression and OCRing. The proposed method extracts simple and straightforward font size invariant features like number of run transitions and correlation of runs over the selected regions of test words, and matches with that of the user queried word. In the subsequent step, based on the matching score, the keywords are spotted in the compressed document. The idea of decompression-less and OCR-less word spotting directly in compressed documents is the major contribution of this paper. The method is experimented on a data set of compressed documents and the preliminary results obtained validate the proposed idea.

First Page

367

Last Page

376

DOI

10.1007/978-981-10-2104-6_33

Publication Date

1-1-2017

Recommended Citation

Javed, Mohammed; Nagabhushan, P.; and Chaudhuri, Bidyut Baran, "Spotting of keyword directly in run-length compressed documents" (2017). Conference Articles. 355.
https://digitalcommons.isical.ac.in/conf-articles/355

This document is currently not available here.

COinS

Conference Articles

Spotting of keyword directly in run-length compressed documents

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Browse

Search

Author Corner

Links

Conference Articles

Spotting of keyword directly in run-length compressed documents

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Share

Browse

Search

Author Corner

Links