Large Scale Hierarchical Text Classification.
Date of Submission
December 2013
Date of Award
Winter 12-12-2014
Institute Name (Publisher)
Indian Statistical Institute
Document Type
Master's Dissertation
Degree Name
Master of Technology
Subject Name
Computer Science
Department
Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)
Supervisor
Parui, Swapan Kumar (CVPR-Kolkata; ISI)
Abstract (Summary of the Work)
Due to the growing amount of textual data, automatic methods for organizing the data are needed. Automatic text classication is one of this methods. It automatically assigns documents to a set of classes based on the textual content of the document.Large-scale multi-labeled text classification is an emerging field because real web data have about several millions of samples and about half a million of non-exclusive categories. But this is a challenging task in that it is hard for a single algorithm to achieve both performance and scalability at the same time.Normally, the set of classes is hierarchically structured but most of todays classication approaches ignore hierarchical structures, thereby loosing valuable human knowledge.This thesis exploits the hierarchical organization of classes to improve accuracy and reduce computational complexity.Experiments are performed on Track 1 medium size wikipidia data set from ECML/PKDD 2012 discovery challenge. A top-down hierarchical classification method has been proposed using local classifier at each intermediate node.
Control Number
ISI-DISS-2013-303
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
DOI
http://dspace.isical.ac.in:8080/jspui/handle/10263/6460
Recommended Citation
Saha, Gourab, "Large Scale Hierarchical Text Classification." (2014). Master’s Dissertations. 66.
https://digitalcommons.isical.ac.in/masters-dissertations/66
Comments
ProQuest Collection ID: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:28843079