Spotting of keyword directly in run-length compressed documents
Document Type
Conference Article
Publication Title
Advances in Intelligent Systems and Computing
Abstract
With the rapid growth of digital libraries, e-governance and Internet applications, huge volume of documents are being generated, communicated and archived in the compressed form to provide better storage and transfer efficiencies. In such a large repository of compressed documents, the frequently used operations like keyword searching and document retrieval have to be carried out after decompression and subsequently with the help of an OCR. Therefore developing keyword spotting technique directly in compressed documents is a potential and challenging research issue. In this backdrop, the paper presents a novel approach for searching keywords directly in run-length compressed documents without going through the stages of decompression and OCRing. The proposed method extracts simple and straightforward font size invariant features like number of run transitions and correlation of runs over the selected regions of test words, and matches with that of the user queried word. In the subsequent step, based on the matching score, the keywords are spotted in the compressed document. The idea of decompression-less and OCR-less word spotting directly in compressed documents is the major contribution of this paper. The method is experimented on a data set of compressed documents and the preliminary results obtained validate the proposed idea.
First Page
367
Last Page
376
DOI
10.1007/978-981-10-2104-6_33
Publication Date
1-1-2017
Recommended Citation
Javed, Mohammed; Nagabhushan, P.; and Chaudhuri, Bidyut Baran, "Spotting of keyword directly in run-length compressed documents" (2017). Conference Articles. 355.
https://digitalcommons.isical.ac.in/conf-articles/355