A comparative study of two recent word spotting techniques in the run-length compressed domain
Document Type
Conference Article
Publication Title
2017 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2017
Abstract
This paper presents a comparative study of two recent word spotting techniques ([1] and [2]) directly in the run-length compressed domain. The first technique is based on partial decompression and limited usage of OCR, and the second technique is completely decompression-less and OCR-less. Both the word spotting techniques use word bounding box ratio feature initially for matching words in the database of compressed document images. For all the matching test-words, the word spotting strategy in the first model is to decompress and OCR first two characters, and then match with the keyword characters. If the matching is successful, then the remaining characters of the test-word are decompressed and OCRed, and eventually matched with the keyword. The word spotting strategy applied in the second model is to extract run based features like number of run transitions and the corresponding correlation of runs along the selected regions of the matching test word, and then match with that of the specified keyword. The proposed methods work in Run-Length Compressed Domain (RLCD) with the capability of operating on CCITT Group 3 1D, CCITT Group 3 2D, and CCITT Group 4 2D compressed documents supported by TIFF and PDF file formats. In the current paper, the efficacy of the proposed models is demonstrated through experimental results and comparative analysis.
First Page
824
Last Page
830
DOI
10.1109/ICACCI.2017.8125944
Publication Date
11-30-2017
Recommended Citation
Javed, Mohammed; Nagabhushank, P.; and Chaudhuri, Bidyut B., "A comparative study of two recent word spotting techniques in the run-length compressed domain" (2017). Conference Articles. 186.
https://digitalcommons.isical.ac.in/conf-articles/186