NeuMorph: Neural morphological tagging for low-resource languages - An experimental study for indic languages

Article Type

Research Article

Publication Title

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

This article deals with morphological tagging for low-resource languages. For this purpose, five Indic languages are taken as reference. In addition, two severely resource-poor languages, Coptic and Kurmanji, are also considered. The task entails prediction of the morphological tag (case, degree, gender, etc.) of an incontext word. We hypothesize that to predict the tag of a word, considering its longer context such as the entire sentence is not always necessary. In this light, the usefulness of convolution operation is studied resulting in a convolutional neural network (CNN) based morphological tagger. Our proposedmodel (BLSTM-CNN) achieves insightful results in comparison to the present state-of-the-art. Following the recent trend, the task is carried out under three different settings: Single language, across languages, and across keys. Whereas the previous models used only character-level features, we show that the addition of word vectors along with character-level embedding significantly improves the performance of all the models. Since obtaining high-quality word vectors for resource-poor languages remains a challenge, in that scenario, the proposed character-level BLSTM-CNN proves to be most effective.

DOI

10.1145/3342354

Publication Date

10-1-2019

Share

COinS