Part 1. An Explainer for Information Retrieval Research. Part 2. Open Domain Complex Question Answering.
Date of Submission
December 2020
Date of Award
Winter 12-12-2021
Institute Name (Publisher)
Indian Statistical Institute
Document Type
Master's Dissertation
Degree Name
Master of Technology
Subject Name
Computer Science
Department
Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)
Supervisor
Mitra, Mandar (CVPR-Kolkata; ISI)
Abstract (Summary of the Work)
This thesis is organised in two parts. First, an explainability in Information retrieval (IR) research where we focus on the performance of the IR models. We present a toolkit I-REX to illustrate the performance and explainability of IR systems. It is an interactive interface built on top of Lucene and gives a white box view of any proposed method. It is implemented as a web based and as well as shell based interface to provide an intuitive explanations and performance of IR systems. The baseline retrieval models such as LM, BM25 and DFR, and a set of well-defined features enable debugging the performance of retrieval experiments such as ad-hoc IR or query expansion. Next we worked on an open domain complex factoid Question Answering (QA). Creating annotated data in QA problem requires lot of resources and it is very time consuming. The available datasets are often domain specific and most of the times created for some specific languages. Therefore we mainly focus on answering the questions in an unsupervised way. As a benchmark data we used the data provided by Lu et al. (Quest). It mainly focuses on complex questions which cannot be answered by knowledge graphs (KGs) directly. Our architecture uses corpus signals over the various documents along with the traditional QA pipeline to answer the complex questions. We proposed a set of modified evaluation protocols to overcome some serious pitfalls in the evaluation measure used in Quest. We also compared the performances of our architecture with another neural benchmark model DrQA. Experiments on this benchmark datasets have shown that our model significantly outperforms Quest and DrQA. We find this very encouraging since DrQA is trained on SQuAD, TREC Questions, WebQuestion, WikiMovies while our proposed method is unsupervised in nature.
Control Number
ISI-DISS-2020-19
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
DOI
http://dspace.isical.ac.in:8080/jspui/handle/10263/7173
Recommended Citation
Saha, Sourav, "Part 1. An Explainer for Information Retrieval Research. Part 2. Open Domain Complex Question Answering." (2021). Master’s Dissertations. 19.
https://digitalcommons.isical.ac.in/masters-dissertations/19
Comments
ProQuest Collection ID: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:28842708