Use of Gaussian Pyramid for Mser Based Text Extraction from Scene Image.

Date of Submission

December 2014

Date of Award

Winter 12-12-2015

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science


Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)


Bhattacharya, Ujjwal (CVPR-Kolkata; ISI)

Abstract (Summary of the Work)

The potential of automatic extraction of texts from scene image as an application is ever increasing with the advancement of technology especially after market deluging with smartphones. However, it is a difficult problem considering the enormous variations in lighting conditions, presence of noise etc. in such images. Researchers are now working extensively towards developing a robust strategy for this purpose. A few standard databases of camera captured scene images are now available publicly for reporting the performance of each new strategy. During the last one year we studied several strategies towards the development of a robust method for extraction of scene texts from such camera captured outdoor scenes. In this study, we developed a novel scheme for scene text extraction using Gaussian pyramid decomposition of input image and obtaining Maximally Stable Extremal Regions (MSERs) at each level of the Gaussian pyramid to use information at different scales. We select only a subset of MSERs at each level based on a few commonly used rules. We carefully decided a set of weights for combining the selected MSERs at different levels and formed a combined set of MSERs. These combined MSERs provide the initial guess of possible text regions in the input image, In the next phase, we compute three features such as strong edge, stroke-width and edge gradient for individual MSERs corresponding to the initial guess and designed a rule to discard the non-text MSERs of the combined set. The proposed method is naturally scale-insensitive to a reasonable extent. Moreover, it is script independent. Experimental results on the ICDAR 2003 competition dataset have been obtained. Additionally, we simulated the approach on several outdoor scene images captured locally, which contains Bangla and/or Devanagari texts. Finally, we compared the performance of the proposed method with three other state-of-the-art approaches.


ProQuest Collection ID:

Control Number


Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.


This document is currently not available here.