Date of Submission


Date of Award


Institute Name (Publisher)

Indian Statistical Institute

Document Type

Doctoral Thesis

Degree Name

Doctor of Philosophy

Subject Name

Computer Science


Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)


Chaudhuri, Bidyut Baran (CVPR-Kolkata; ISI)

Abstract (Summary of the Work)

This thesis presents a systematic study on recognition of printed and handwritten mathematical expressions. Automatic recognition of printed expressions is an essential requirement for efficient Optical Character Recognition (OCR) of scientific paper documents. On the other hand, recognition of handwritten expressions has been tried for online environment. Here expressions are written using electronic data tablet/stylus providing a convenient alternative to keyboard or mouse used for data entry into a computer.The previous studies dealing with different aspects of expression recognition are, at first, reviewed. Next, the scope of the present thesis, its layout and contributions are outlined. Discussion on OCR of printed expressions starts with constructing a representative corpus of scientific documents taken from various branches of science. Methods for groundtruthing expressions contained in the documents, statistical analysis of the corpus, etc. are presented to facilitate research on expression recognition.Next, issues related to recognition of expressions are elaborately discussed in a chapter wise manner. In case of printed documents, identification of expression zones is considered for smooth upgradation of the existing OCR systems to properly handle documents containing expressions. Such an identification task keeps the main OCR engine undisturbed while a specially designed module can work for recognition of expressions. Online recognition of handwritten expressions assumes expressions are entered in isolation and therefore, no component for identification of expression zones is needed under online environment.Recognition of expressions under any environment (printed or handwritten) involves two major stages: (i) symbol recognition and (ii) interpretation of expression structure. Techniques to realize these stages are presented for both the printed and handwritten expressions. All processing modules are methodically tested on a large dataset to attest the feasibility of the proposed approaches.Errors encountered in different modules are analyzed in detail and a set of errorcorrecting rules is formulated. The design of rules exploits several contextual information to improve the overall expression recognition accuracy for both the printed and handwritten expressions. A method for evaluating performance of an expression recognition system has been presented. The proposed performance measure considers several nontrivial issues related to an expression recognition task and provides a single figure of merit to judge the efficiency of a system. The thesis has been concluded with a summary of its achievements and a discussion on future extension of the present study


ProQuest Collection ID:

Control Number


Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.


Included in

Mathematics Commons