Date of Submission

5-28-2005

Date of Award

5-28-2006

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Doctoral Thesis

Degree Name

Doctor of Philosophy

Subject Name

Computer Science

Department

Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)

Supervisor

Chaudhuri, Bidyut Baran (CVPR-Kolkata; ISI)

Abstract (Summary of the Work)

This thesis presents a systematic study on recognition of printed and handwritten mathematical expressions. Automatic recognition of printed expressions is an essential requirement for efficient Optical Character Recognition (OCR) of scientific paper documents. On the other hand, recognition of handwritten expressions has been tried for online environment. Here expressions are written using electronic data tablet/stylus providing a convenient alternative to keyboard or mouse used for data entry into a computer.The previous studies dealing with different aspects of expression recognition are, at first, reviewed. Next, the scope of the present thesis, its layout and contributions are outlined. Discussion on OCR of printed expressions starts with constructing a representative corpus of scientific documents taken from various branches of science. Methods for groundtruthing expressions contained in the documents, statistical analysis of the corpus, etc. are presented to facilitate research on expression recognition.Next, issues related to recognition of expressions are elaborately discussed in a chapter wise manner. In case of printed documents, identification of expression zones is considered for smooth upgradation of the existing OCR systems to properly handle documents containing expressions. Such an identification task keeps the main OCR engine undisturbed while a specially designed module can work for recognition of expressions. Online recognition of handwritten expressions assumes expressions are entered in isolation and therefore, no component for identification of expression zones is needed under online environment.Recognition of expressions under any environment (printed or handwritten) involves two major stages: (i) symbol recognition and (ii) interpretation of expression structure. Techniques to realize these stages are presented for both the printed and handwritten expressions. All processing modules are methodically tested on a large dataset to attest the feasibility of the proposed approaches.Errors encountered in different modules are analyzed in detail and a set of errorcorrecting rules is formulated. The design of rules exploits several contextual information to improve the overall expression recognition accuracy for both the printed and handwritten expressions. A method for evaluating performance of an expression recognition system has been presented. The proposed performance measure considers several nontrivial issues related to an expression recognition task and provides a single figure of merit to judge the efficiency of a system. The thesis has been concluded with a summary of its achievements and a discussion on future extension of the present study

Comments

ProQuest Collection ID: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:28843774

Control Number

ISILib-TH351

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

DOI

http://dspace.isical.ac.in:8080/jspui/handle/10263/2146

Included in

Mathematics Commons

Share

COinS