Accurate word alignment induction for Bangla-Odia using the em algorithm

Document Type

Book Chapter

Publication Title

Information and Knowledge Systems

Abstract

This study uses the Expectation-Maximization (EM) method to demonstrate extremely accurate word alignment between Bangla and Odia. The full mathematical process is built out and illustrated in this case using a series of examples from selected Bangla-Odia words. This EM approach works in conjunction with the "argmax function, " which monitors the mapping between two or more words of the source and destination languages in sentences, to ascertain the highest probable probability value. It is possible to calculate the lexical link between the words in two parallel sentences, and the results show which word in the target language is aligned with which word in the source language. Iteratively computing some probability values in terms of maximum likelihood estimation (MLE) or looping the EM algorithm can be used to find the MLE or maximum a posterior (MAP) of parameters in the probabilities model, where the model depends on the latent variable that is not observed. The necessity for numerous machine learning techniques and mathematical modelling makes lexical alignment for translation one of the hardest challenges to solve. We have made an effort to clarify the several lexical issues that come up when assessing bilingual literature that was translated from Bangla (the source language) to Odia in light of all of these challenges (as the target language). The biggest obstacle and most difficult task in word alignment is finding a solution to the "word divergence" or "lexical divergence" problem. The only way to solve it, despite the fact that the EM algorithm cannot, is to use a bilingual dictionary, commonly referred to as a lexical database, which is based on scientific study and tested totally mathematically. Bilingual dictionaries or lexical databases are widely used to address word divergence or lexical divergence issues at the phrase level. Finding single word units from the source text that are changed into multiword units in the target text presents the most problem.

First Page

1

Last Page

16

Publication Date

12-12-2023

This document is currently not available here.

Share

COinS