Large Scale Hierarchical Text Classification.

Date of Submission

December 2013

Date of Award

Winter 12-12-2014

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science


Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)


Parui, Swapan Kumar (CVPR-Kolkata; ISI)

Abstract (Summary of the Work)

Due to the growing amount of textual data, automatic methods for organizing the data are needed. Automatic text classication is one of this methods. It automatically assigns documents to a set of classes based on the textual content of the document.Large-scale multi-labeled text classification is an emerging field because real web data have about several millions of samples and about half a million of non-exclusive categories. But this is a challenging task in that it is hard for a single algorithm to achieve both performance and scalability at the same time.Normally, the set of classes is hierarchically structured but most of todays classication approaches ignore hierarchical structures, thereby loosing valuable human knowledge.This thesis exploits the hierarchical organization of classes to improve accuracy and reduce computational complexity.Experiments are performed on Track 1 medium size wikipidia data set from ECML/PKDD 2012 discovery challenge. A top-down hierarchical classification method has been proposed using local classifier at each intermediate node.


ProQuest Collection ID:

Control Number


Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.


This document is currently not available here.