Journal Articles

Improving semantic similarity with cross-lingual resources: A study in Bangla — A low resourced language

Rajat Pandit, West Bengal State University
Saptarshi Sengupta, University of Minnesota Duluth
Sudip Kumar Naskar, Jadavpur University
Niladri Sekhar Dash, Indian Statistical Institute, Kolkata
Mohini Mohan Sardar, West Bengal State University

Article Type

Research Article

Publication Title

Informatics

Abstract

Semantic similarity is a long-standing problem in natural language processing (NLP). It is a topic of great interest as its understanding can provide a look into how human beings comprehend meaning and make associations between words. However, when this problem is looked at from the viewpoint of machine understanding, particularly for under resourced languages, it poses a different problem altogether. In this paper, semantic similarity is explored in Bangla, a less resourced language. For ameliorating the situation in such languages, the most rudimentary method (path-based) and the latest state-of-the-art method (Word2Vec) for semantic similarity calculation were augmented using cross-lingual resources in English and the results obtained are truly astonishing. In the presented paper, two semantic similarity approaches have been explored in Bangla, namely the path-based and distributional model and their cross-lingual counterparts were synthesized in light of the English WordNet and Corpora. The proposed methods were evaluated on a dataset comprising of 162 Bangla word pairs, which were annotated by five expert raters. The correlation scores obtained between the four metrics and human evaluation scores demonstrate a marked enhancement that the cross-lingual approach brings into the process of semantic similarity calculation for Bangla.

DOI

10.3390/informatics6020019

Publication Date

5-5-2019

Comments

Open Access, Gold, Green

Recommended Citation

Pandit, Rajat; Sengupta, Saptarshi; Naskar, Sudip Kumar; Dash, Niladri Sekhar; and Sardar, Mohini Mohan, "Improving semantic similarity with cross-lingual resources: A study in Bangla — A low resourced language" (2019). Journal Articles. 843.
https://digitalcommons.isical.ac.in/journal-articles/843

Download

COinS

Journal Articles

Improving semantic similarity with cross-lingual resources: A study in Bangla — A low resourced language

Article Type

Publication Title

Abstract

DOI

Publication Date

Comments

Recommended Citation

Browse

Search

Author Corner

Links

Journal Articles

Improving semantic similarity with cross-lingual resources: A study in Bangla — A low resourced language

Authors

Article Type

Publication Title

Abstract

DOI

Publication Date

Comments

Recommended Citation

Share

Browse

Search

Author Corner

Links