Phrase Rank-An Iterative Graph-Based Algorithm for Unsupervised Key-Phrases Extraction Extraction.

Date of Submission

December 2015

Date of Award

Winter 12-12-2016

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science

Department

Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)

Supervisor

Majumdar, Debapriyo (CVPR-Kolkata; ISI)

Abstract (Summary of the Work)

The Key-Phrases are the set of important phrases from the text, where each phrase provide some unique and important information from the document and the complete set can represent whole document. The user can get a quick insight about the document by providing summary as the key-phrases. These Key-Phrases can be utlized for indexing and in other Information Retrieval Applications.Many researchers suggested both the supervised and unsupervised techniques for extracting the keyphrases from the text document. Where as we focused our study only on the unsupervised algorithms. The state-of-art unsupervised algorithms suggested by researchers, are based on different models, some of them are based on simple clustering, while the others are designed on language models or graph-based models. We studied various graph-based algorithms that includes Text-Rank, Single-Rank and ExpandRank algorithms and design a new enhanced graph-based model to extract the key-phrases from the text-document.The old graph-based models used the undirected word-graph, that is able to capture only termterm association from the document. Our new enhanced graph-based model is able to represent more and better relationships between the terms by using directed wegithed graph. Apart from this our new model designed to constructs better multi-word phrases by proper validation checks and uses phrasegraph instead of word-graph. The purpose of using the phrase-graph is to capture the relationship between the phrases of text document, which is a better representation of the document than the word graph.We evaluated our model by comparing the classification efficiency of generated key-phrases by our model with the generated phrases by old graph based models, and we also perform a manual evaluation to check whether our algorithm is able to generate better valid multi-word phrases. Both the evaluation results in the favour of our new model, We are getting 44.05% classification accuracy by our new model as compare to the 38.60% accuracy by the Text-Rank algorithm and 33.48% accuracy by Single-Rank algorithm, which is around 14% improvement over the Text-Rank algorithm and 24% improvement over Single-Rank algorithm. The manual evalution also shows that our new model is able to generate more valid phrases than the Text-Rank and Single-Rank algorithm.

Comments

ProQuest Collection ID: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:28843060

Control Number

ISI-DISS-2015-322

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

DOI

http://dspace.isical.ac.in:8080/jspui/handle/10263/6479

This document is currently not available here.

Share

COinS