A comparative study of two recent word spotting techniques in the run-length compressed domain

Document Type

Conference Article

Publication Title

2017 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2017

Abstract

This paper presents a comparative study of two recent word spotting techniques ([1] and [2]) directly in the run-length compressed domain. The first technique is based on partial decompression and limited usage of OCR, and the second technique is completely decompression-less and OCR-less. Both the word spotting techniques use word bounding box ratio feature initially for matching words in the database of compressed document images. For all the matching test-words, the word spotting strategy in the first model is to decompress and OCR first two characters, and then match with the keyword characters. If the matching is successful, then the remaining characters of the test-word are decompressed and OCRed, and eventually matched with the keyword. The word spotting strategy applied in the second model is to extract run based features like number of run transitions and the corresponding correlation of runs along the selected regions of the matching test word, and then match with that of the specified keyword. The proposed methods work in Run-Length Compressed Domain (RLCD) with the capability of operating on CCITT Group 3 1D, CCITT Group 3 2D, and CCITT Group 4 2D compressed documents supported by TIFF and PDF file formats. In the current paper, the efficacy of the proposed models is demonstrated through experimental results and comparative analysis.

First Page

824

Last Page

830

DOI

10.1109/ICACCI.2017.8125944

Publication Date

11-30-2017

This document is currently not available here.

Share

COinS