In search of a suitable method for disambiguation of word senses in Bengali

Article Type

Research Article

Publication Title

International Journal of Speech Technology

Abstract

The paper presents a study on word sense disambiguation (WSD) in Bengali, one of the less resourced Indian languages. The overall work is carried out in two sequential phases. In the first phase, four well-known approaches, which are often applied for sense disambiguation of words, are studied using the traditional methods. In the course of application, suitable modifications are made as well as implemented for eliciting desired results. In the second stage, a combined approach is proposed based on the results obtained from initial experiments. Within ‘supervised module’ the four commonly used methods, namely, the Decision Tree (DT) method, Support Vector Machine (SVM) method, Artificial Neural Network (ANN) method, and the Naïve Bayes (NB) method are used at the baseline for the purpose of classification of senses. These baseline strategies produced 63.84%, 76.9%, 76.23%, and 80.23% accurate results, respectively, when these methods are tested on 13 mostly used Bengali ambiguous words retrieved from a Bengali text corpus. Next, two major modifications are applied on these baseline strategies to increase the level of accuracy: (a) incorporation of Lemmatization process in the system (that produces 68.30%, 79%, 78.23%, and 82.30% accurate results, respectively), and (b) operation of Bootstrapping on the systems (including lemmatization feature), which produces 70.92%, 79.15%, 79.53%, and 83% accuracy, respectively. Next, in a knowledge-based method, the traditional Lesk algorithm is implemented at the baseline which produces 31% accurate result in sense disambiguation. This strategy is further modified by Context Expansion (CE) method in the sentences using the Bengali WordNet to produce 75% accuracy. Within ‘unsupervised module’, the baseline strategy produced a 36.2% accurate result in sense disambiguation task. To enhance the level of performance, two modifications are adopted over this baseline strategy: (a) Principal Component Analysis (PCA) over the feature vector, which produces 51.2% accuracy in WSD task, and (b) Context Expansion of the sentences using the Bengali WordNet with PCA, which produces 61% accuracy in sense disambiguation task. Finally, a combined approach is adopted after considering all the effective aspects of the three methods, and it produces the highest level accuracy (92%) in the task of sense disambiguation.

First Page

439

Last Page

454

DOI

10.1007/s10772-020-09787-8

Publication Date

6-1-2021

This document is currently not available here.

Share

COinS