Software for Maintaining a Conceptual Hierarchy of Objects to Aid in Theta-Mapping in a Natural Language Processing System for Bangla.

Date of Submission

December 1992

Date of Award

Winter 12-12-1993

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science


Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)


Chaudhuri, Bidyut Baran (CVPR-Kolkata; ISI)

Abstract (Summary of the Work)

Natural Language lrocessing (NLP) is primarily concerned with making the machine understand inherent meanings of sentences in a natural language. The concept of inherent meaning or semantics of a sentence is however quite fuzzy. Most workers on NLP explain the semantics of a sentence in the following manner. A (simple) sentence describes an action. The action can be thought of as a drama being staged. The role of the name of the drama is played by the verb. Just like in scripts of actual dramas, various roles are specified for a verb. In a sentence, various phrases play the different roles of the verb of the sentence. In literature (1), the roles described above are called thematic roles or theta roles (θ-roles). Different verbs have different number of theta roles. Although linguists have not yet been able to propose a mutually agreeable set of 8-roles, some of the roles are quite standard. For example, almost all verbs have an AGENT θ-role, which signifies who performed the action described by the verb. Similarly, there are some verbs which describe actions where some object gets affected, as in Ram killed the tiger, where tiger gets affected. Such verbs signify. the affected object by a, PATIENT θ-role. In the appendix, a list of the θ-roles considered in the present project have been listed. The accepted definition of the semantics of a sentence is a mapping that associates the θ-role(s) of the verb of the sentences to the various phrases. For Bangla (Bengali), the phrases that play a θ-role in a sentence of any language are either Noun Phrases (NP) or Prepositional/Post-positional Phrases (PP). In either type of phrases, thereis a main noun or pronoun (A pronoun ultimately is just a wild-card for a noun) with other optional structures (words or subwords) decorating the main noun or pronoun. In the semantic definition of a sentence, the theta-mapping involves only the main or head noun of a phrase.Since human beings are also required to understand meanings of sentences, it is imperative that they are able to perform theta-mapping in a fast and computationally efficient manner. In order to aid the brain to carry out the computation effectively, speakers of a particular language agree upon certain rules of syntaz for the language. The syntactic rules specify how different entities – phrases, words, parts of words (called morphemes), etc. are ar- ranged in a sentence. Generally speaking, and more so when one attempts to emulate human linguistic behaviour as in NLP, only syntactically correct sentences can have semantic descriptions. The rules of syntax are normally such that most of the theta-mapping can be done in the process of verifying the syntactic correctness of a sentence, (as in syntar-directed translation incompilers). In computer science, there is a branch of study dealing with al- gorithms for proving sentences to be syntactically correct. Such algorithms are called parsers.Natural Language parsers are primarily involved in detecting the verb and the other phrases of a sentence along with the case of each phrase. Case is overt (normally) syntactic manifestation signified by a) order of occurrence of the phrase (as in English and similar languages), or b) Conjoining between the stem morpheme of the head noun of a phrase and a case-marking inflection (as in Indinn languages), or c) a combination of a) and b) above.


ProQuest Collection ID:

Control Number


Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.


This document is currently not available here.