Automated Text Illustration Problem.
Date of Submission
December 2016
Date of Award
Winter 12-12-2017
Institute Name (Publisher)
Indian Statistical Institute
Document Type
Master's Dissertation
Degree Name
Master of Technology
Subject Name
Computer Science
Department
Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)
Supervisor
Mitra, Mandar (CVPR-Kolkata; ISI)
Abstract (Summary of the Work)
Image and text are the two different ways of communication. The readability and comprehensibility of a large volume of text can be increased vastly using a sequence of images. It is a very important research question that how efficiently a query should be formulated from a segment of text to retrieve relevant images from a data set to illustrate the text. In this project, we proposed a number of system to counter this problem which can illustrate a text by most appropriate images. To construct a system like this, we used ImageCLEF 2010 and Wikipedia 2016 data set. In the first phase of this work a set of children stories has been illustrated by the images of ImageCLEF 2010 data set. In the second phase of this work, Wikipedia 2016 data dump was used to investigate how our query formulation method works in a vast amount of data set like Wikipedia. An image data set was made from this data dump and the textual information of Wikipedia page has been used as query. Some of the research challenges in this project was to develop an automated text illustrating system including techniques to automatically extract out the concepts to be illustrated from a full text page, explore how to use these extracted concepts for query representation in order to retrieve a ranked list of images per query and finally investigating how merge the ranked lists obtained from each individual concept to present a single ranked list of candidate relevant images per text page. In this work for query formulation segmentation of text, relevance feedback method, POS tag based technique has been used. It has been found from the subsequent experiments that in the ImageCLEF 2010 data set, the query formulation and expansion technique based on relevance feedback method performs better than all other approaches. On other side in Wikipedia 2016 data set, POS tag based method outperforms all other query formulation technique mainly because in this approach only noun phrases are used to formulate query. So instead of a verbose query, this method gives a concise and crisp query yet appropriate to describe the core content of a large text. In this method also the detailed performance analyses of various system has been reported with different standard metrics of information retrieval field.
Control Number
ISI-DISS-2016-309
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
DOI
http://dspace.isical.ac.in:8080/jspui/handle/10263/6466
Recommended Citation
Chakraborty, Swarnendu, "Automated Text Illustration Problem." (2017). Master’s Dissertations. 150.
https://digitalcommons.isical.ac.in/masters-dissertations/150
Comments
ProQuest Collection ID: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:28843169