Search in Transliterated Domain for Marathi.
Date of Submission
December 2015
Date of Award
Winter 12-12-2016
Institute Name (Publisher)
Indian Statistical Institute
Document Type
Master's Dissertation
Degree Name
Master of Technology
Subject Name
Computer Science
Department
Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)
Supervisor
Mitra, Mandar (CVPR-Kolkata; ISI)
Abstract (Summary of the Work)
A large number of languages, including Arabic, Russian, and most of the South and South East Asian languages, are written using indigenous scripts. However, often the Webster and the user generated content in these languages are written using Roman script for various reasons. A challenge that search engines face while processing transliterated queries and documents is that of extensive spelling variation.In this report, we handle the word language identification problem for Marathi language which is written in Roman script and also has English language words. We have considered a method based on character level n-grams, to address this problem. Our method gives around 98% mean accuracy using 10-fold cross validation. In addition to word language identification, we also handle transliteration of Marathi language words from Roman script to Devanagari script. Our method provides around 95% character level unigram precision.We have also implemented an ad-hoc retrieval for Marathi news data based on transliterated queries. We compared the results obtained with the original Devanagari script queries and found that the performance of ad-hoc retrieval system did not deteriorate significantly.
Control Number
ISI-DISS-2015-319
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
DOI
http://dspace.isical.ac.in:8080/jspui/handle/10263/6476
Recommended Citation
Patil, Dnyaneshwar Shivshankar, "Search in Transliterated Domain for Marathi." (2016). Master’s Dissertations. 155.
https://digitalcommons.isical.ac.in/masters-dissertations/155
Comments
ProQuest Collection ID: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:28843175