Rule Based "Sandhi Bicched" (De-Euphonization) of Bengali.

Date of Submission

December 1992

Date of Award

Winter 12-12-1993

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science


Machine Intelligence Unit (MIU-Kolkata)


Chaudhuri, Bidyut Baran (CVPR-Kolkata; ISI)

Abstract (Summary of the Work)

In an inflectional language, words are formed as a result of conjoining of more primitive linguistic entities called morphemes Meanings of words are derived from the meanings of the constituent morphemes. Since most Indian languages are richly inflectional, efficient morphological level processing is nec- Onsary in Natural Language Processing(NLP) systems for Indian languages. The major responsibility of a morphological sub-system in an overall NLP system is to parse a word in to its constituent morphemes. Quite often, a stem morpheme may be considered to be constituted as a result of combining two simpler stems. Sometimes the stems that combine undergo euphony or Sandhi, i.e., the final portion of the left stem and the initial portion of the right stem undergo a deformation in spelling. The deformations along the boundary are guided by well defined morpho-rules. The system described here augments an earlier morphological processor for Bengali in incorporat- ing ability to perform fragmentation of stems. By incorporating this feature, euphonized stems need not be explicitly stored. As a result, a high degree of redundany is achieved in the storage of the lexical database. The tech involves imparting additional power to finite state automata.


ProQuest Collection ID:

Control Number


Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.


This document is currently not available here.