Master’s Dissertations

No-Reference Quality Assessment for OCR'D Documents.

Arnab Biswas, Indian Statistical InstituteFollow

Date of Submission

December 2016

Date of Award

Winter 12-12-2017

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science

Department

Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)

Supervisor

Garain, Utpal (CVPR-Kolkata; ISI)

Abstract (Summary of the Work)

This thesis deals with predicting quality of a text document that has been generated by an OCR system. As OCR systems are prone to make mistakes while converting an imaged document to machine readable form, this research concerns with finding errors in an OCRâ€™d text and classify the OCR document accordingly. So far OCR community has dealt with this problem by following either of these two methods: (i) manual labeling of the errors or (ii) comparing the OCRâ€™d document against the true text file. Manual counting of errors is infeasible in commercial situation whereas the true text file if often not available to compare with. This work attempts to develop methods for automatic prediction of OCRâ€™d documents under no-reference condition. Bengali has been taken as the reference language. Use of lexicons and language models has been explored in several directions. Experiment with a large corpus of OCRâ€™d documents shows that a lexicon based approach coupled with a suitable edit distance measure could be a viable method for no-reference quality assessment of OCRâ€™d documents.

Comments

ProQuest Collection ID: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:28843110

Control Number

ISI-DISS-2016-352

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

DOI

http://dspace.isical.ac.in:8080/jspui/handle/10263/6512

Recommended Citation

Biswas, Arnab, "No-Reference Quality Assessment for OCR'D Documents." (2017). Master’s Dissertations. 96.
https://digitalcommons.isical.ac.in/masters-dissertations/96

This document is currently not available here.

COinS

Master’s Dissertations

No-Reference Quality Assessment for OCR'D Documents.

Date of Submission

Date of Award

Institute Name (Publisher)

Document Type

Degree Name

Subject Name

Department

Supervisor

Abstract (Summary of the Work)

Comments

Control Number

Creative Commons License

DOI

Recommended Citation

Browse

Search

Author Corner

Links

Master’s Dissertations

No-Reference Quality Assessment for OCR'D Documents.

Author (Researcher Name)

Date of Submission

Date of Award

Institute Name (Publisher)

Document Type

Degree Name

Subject Name

Department

Supervisor

Abstract (Summary of the Work)

Comments

Control Number

Creative Commons License

DOI

Recommended Citation

Share

Browse

Search

Author Corner

Links