Bilingual Parallel Corpora Mining from the Web for Improving Hindi-English Machine Translation System.
Date of Submission
December 2015
Date of Award
Winter 12-12-2016
Institute Name (Publisher)
Indian Statistical Institute
Document Type
Master's Dissertation
Degree Name
Master of Technology
Subject Name
Computer Science
Department
Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)
Supervisor
Garain, Utpal (CVPR-Kolkata; ISI)
Abstract (Summary of the Work)
Bilingual parallel corpora is used in many applications in Natural Language Processing (NLP) and beyond. Machine Translation System is a well known application to use bilingual parallel corpora. Our work presents a system to mine bilingual parallel corpora from the web. We first collected candidate sites that contain Hindi-English text by initially supplying Hindi-English language pair and a list of Hindi words to the system. Hindi-English parallel corpora is mined from these candidate websites. Although our system has space for improvements but the resultant parallel corpus is very accurate and good. We have not built a very big Hindi-English parallel corpus because our initial goal was to find an approach. We have shown the improvements in machine translation by this Hindi-English parallel corpus. Details of our Hindi-English parallel corpora, mined from the web till now are also given. In our system no manual efforts are required. Our system can be used to mine domain specific as well as general domain bilingual parallel corpora. As data are growing over the web with time, our system can be used to build larger parallel corpora in future.
Control Number
ISI-DISS-2015-324
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
DOI
http://dspace.isical.ac.in:8080/jspui/handle/10263/6481
Recommended Citation
Kumar, Ravindra, "Bilingual Parallel Corpora Mining from the Web for Improving Hindi-English Machine Translation System." (2016). Master’s Dissertations. 236.
https://digitalcommons.isical.ac.in/masters-dissertations/236
Comments
ProQuest Collection ID: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:28843259