Phrase Rank-An Iterative Graph-Based Algorithm for Unsupervised Key-Phrases Extraction Extraction.
Date of Submission
December 2015
Date of Award
Winter 12-12-2016
Institute Name (Publisher)
Indian Statistical Institute
Document Type
Master's Dissertation
Degree Name
Master of Technology
Subject Name
Computer Science
Department
Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)
Supervisor
Majumdar, Debapriyo (CVPR-Kolkata; ISI)
Abstract (Summary of the Work)
The Key-Phrases are the set of important phrases from the text, where each phrase provide some unique and important information from the document and the complete set can represent whole document. The user can get a quick insight about the document by providing summary as the key-phrases. These Key-Phrases can be utlized for indexing and in other Information Retrieval Applications.Many researchers suggested both the supervised and unsupervised techniques for extracting the keyphrases from the text document. Where as we focused our study only on the unsupervised algorithms. The state-of-art unsupervised algorithms suggested by researchers, are based on different models, some of them are based on simple clustering, while the others are designed on language models or graph-based models. We studied various graph-based algorithms that includes Text-Rank, Single-Rank and ExpandRank algorithms and design a new enhanced graph-based model to extract the key-phrases from the text-document.The old graph-based models used the undirected word-graph, that is able to capture only termterm association from the document. Our new enhanced graph-based model is able to represent more and better relationships between the terms by using directed wegithed graph. Apart from this our new model designed to constructs better multi-word phrases by proper validation checks and uses phrasegraph instead of word-graph. The purpose of using the phrase-graph is to capture the relationship between the phrases of text document, which is a better representation of the document than the word graph.We evaluated our model by comparing the classification efficiency of generated key-phrases by our model with the generated phrases by old graph based models, and we also perform a manual evaluation to check whether our algorithm is able to generate better valid multi-word phrases. Both the evaluation results in the favour of our new model, We are getting 44.05% classification accuracy by our new model as compare to the 38.60% accuracy by the Text-Rank algorithm and 33.48% accuracy by Single-Rank algorithm, which is around 14% improvement over the Text-Rank algorithm and 24% improvement over Single-Rank algorithm. The manual evalution also shows that our new model is able to generate more valid phrases than the Text-Rank and Single-Rank algorithm.
Control Number
ISI-DISS-2015-322
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
DOI
http://dspace.isical.ac.in:8080/jspui/handle/10263/6479
Recommended Citation
Chhabra, Mayur, "Phrase Rank-An Iterative Graph-Based Algorithm for Unsupervised Key-Phrases Extraction Extraction." (2016). Master’s Dissertations. 47.
https://digitalcommons.isical.ac.in/masters-dissertations/47
Comments
ProQuest Collection ID: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:28843060