Development of a Windows-Based Software for Building of Bengali Database from Scanned Documents (Soft Copy of Bengali Dictionary).

Date of Submission

December 2008

Date of Award

Winter 12-12-2009

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science


Applied Statistics Unit (ASU-Kolkata)


Sarkar, Palash (ASU-Kolkata; ISI)

Abstract (Summary of the Work)

To learn a language we always need the help of a dictionary in that language. Even for writing a document in any language it will be advantageous to have a dictionary. In the 21st century we are not using pen and paper for writing something. We are always typing in computer for preparing any document. This is not only true for English, but also for any other languages. All languages have their own dictionaries, but these are printed version in the form of a book. Now there is a need for soft copy of dictionary in each subject. The soft copy of dictionary can be linked to the editor of that language.In this dissertation we try to build a software, which will serve the purpose of a soft copy of Bengali Dictionary like the soft copy of Oxford Dictionary available in English. A small GUI will be displayed on the screen where the user are supposed to write the word whose meaning he/she wants to know. Then if the user press the 'search' button of the GUI, the meaning will be displayed to the user. There will be a option in the GUI such that the user Can hear the pronounciation of the word he/she typed in the text area, The software must provide the facility of cross linking of words L.e. if the user wants to see the meaning of another word which is displayed as a meaning of other word then simply by clicking the mouse over the word be can find the meaning of that word. So there will be no need to write the word again.Till today no soft copy of Bengali Dictionary is available to the best of our knowledge. That's why we are trying to make such a dictionary.1.1 Design of WorkTo make a dictionary we've to first built a Bengali word database. Typing all the words from a bengali dictionary is cumbersome. So our idea is to use a software which can automate the process. For this reason we take help of Optical Charecter Recognition (OCR) software for reocognition of Bengali words. So we thought that preparing for a Bengali Dictionary database would be very casy. Just giving the scanned page of Bengali Dictionary as the input of the OCR software, will give us the soft copy of that page. In this way we can have whole of the dictionary as a soft copy in our machine. But the output of the OCR gives the unstructured TEX representation of the scanned dictionary pages. Now we've developed a program, depending upon the pattern of the TEX output, which will put the TEX output of the OCR into the database in a structured way(i.e. segmenting into different parts). So using OCR we've been able to make a Bengali database.Now comes the second phase of our project i.e. making of a GUI where the user can type his Bengali words to find the meaning of the word which is already in the database and displaying the meaning of the word in bengali fonts to the user. The user write the word in romanized bengali. We have converted this to the corresponding LTEX representation and then applying a search query in the database extracted the meaning of the word. Then TEX representation of the meaning of the word is converted to the core- sponding Bengali fonts and displayed on the screen. User can also hear the pronounciation of his word from this dictionary which will be one of the advantages over hard copy of dictionary.


ProQuest Collection ID:

Control Number


Creative Commons License

Creative Commons Attribution 4.0 International License
