Application of Expectation–Maximization Algorithm to Solve Lexical Divergence in Bangla–Odia Machine Translation

Document Type

Conference Article

Publication Title

Smart Innovation, Systems and Technologies


This paper shows the word alignment between Odia–Bangla languages using the expectation–maximization (EM) algorithm with high accuracy output. The entire mathematical calculation is worked out and shown here by taking some Bangla–Odia sentences as a set of examples. The EM algorithm helps to find out the maximum likelihood probability value with the collaboration of the ‘argmax function’ that follows the mapping between two or more words of source and target language sentences. The lexical relationship among the words between two parallel sentences is known after calculating some mathematical values, and those values indicate which word of the target language is aligned with which word of the source language. As the EM algorithm is an iterative or looping process, the word relationship between source and target languages is easily found out by calculating some probability values in terms of maximum likelihood estimation (MLE) in an iterative way. To find the MLE or maximum a posterior (MAP) of parameters in the probability model, the model depends on unobserved latent variable(s). For years, it has been one of the toughest challenges because the process of lexical alignment for translation involves several machine learning algorithms and mathematical modeling. Keeping all these issues in mind, we have attempted to describe the nature of lexical problems that arise at the time of analyzing bilingual translated texts between Bangla (as source language) and Odia (as the target language). In word alignment, handling the ‘word divergence’ or ‘lexical divergence’ problem is the main issue and a challenging task, though it is not solved by EM algorithm, it is only possible through a bilingual dictionary or called as a lexical database that is experimentally examined and tested only mathematically. Problems of word divergence are normally addressed at the phrase level using bilingual dictionaries or lexical databases. The basic challenge lies in the identification of the single word units of the source text which are converted into multiword units in the target text.

First Page


Last Page




Publication Date


This document is currently not available here.