Automated Text Illustration Problem.

Date of Submission

December 2016

Date of Award

Winter 12-12-2017

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science


Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)


Mitra, Mandar (CVPR-Kolkata; ISI)

Abstract (Summary of the Work)

Image and text are the two different ways of communication. The readability and comprehensibility of a large volume of text can be increased vastly using a sequence of images. It is a very important research question that how efficiently a query should be formulated from a segment of text to retrieve relevant images from a data set to illustrate the text. In this project, we proposed a number of system to counter this problem which can illustrate a text by most appropriate images. To construct a system like this, we used ImageCLEF 2010 and Wikipedia 2016 data set. In the first phase of this work a set of children stories has been illustrated by the images of ImageCLEF 2010 data set. In the second phase of this work, Wikipedia 2016 data dump was used to investigate how our query formulation method works in a vast amount of data set like Wikipedia. An image data set was made from this data dump and the textual information of Wikipedia page has been used as query. Some of the research challenges in this project was to develop an automated text illustrating system including techniques to automatically extract out the concepts to be illustrated from a full text page, explore how to use these extracted concepts for query representation in order to retrieve a ranked list of images per query and finally investigating how merge the ranked lists obtained from each individual concept to present a single ranked list of candidate relevant images per text page. In this work for query formulation segmentation of text, relevance feedback method, POS tag based technique has been used. It has been found from the subsequent experiments that in the ImageCLEF 2010 data set, the query formulation and expansion technique based on relevance feedback method performs better than all other approaches. On other side in Wikipedia 2016 data set, POS tag based method outperforms all other query formulation technique mainly because in this approach only noun phrases are used to formulate query. So instead of a verbose query, this method gives a concise and crisp query yet appropriate to describe the core content of a large text. In this method also the detailed performance analyses of various system has been reported with different standard metrics of information retrieval field.


ProQuest Collection ID:

Control Number


Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.


This document is currently not available here.