Spotting of keyword directly in run-length compressed documents

Document Type

Conference Article

Publication Title

Advances in Intelligent Systems and Computing

Abstract

With the rapid growth of digital libraries, e-governance and Internet applications, huge volume of documents are being generated, communicated and archived in the compressed form to provide better storage and transfer efficiencies. In such a large repository of compressed documents, the frequently used operations like keyword searching and document retrieval have to be carried out after decompression and subsequently with the help of an OCR. Therefore developing keyword spotting technique directly in compressed documents is a potential and challenging research issue. In this backdrop, the paper presents a novel approach for searching keywords directly in run-length compressed documents without going through the stages of decompression and OCRing. The proposed method extracts simple and straightforward font size invariant features like number of run transitions and correlation of runs over the selected regions of test words, and matches with that of the user queried word. In the subsequent step, based on the matching score, the keywords are spotted in the compressed document. The idea of decompression-less and OCR-less word spotting directly in compressed documents is the major contribution of this paper. The method is experimented on a data set of compressed documents and the preliminary results obtained validate the proposed idea.

First Page

367

Last Page

376

DOI

10.1007/978-981-10-2104-6_33

Publication Date

1-1-2017

This document is currently not available here.

Share

COinS