Hierarchical Approach to Document Classification of 20 Newsgroup Dataset.
Date of Submission
December 2016
Date of Award
Winter 12-12-2017
Institute Name (Publisher)
Indian Statistical Institute
Document Type
Master's Dissertation
Degree Name
Master of Technology
Subject Name
Computer Science
Department
Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)
Supervisor
Parui, Swapan Kumar (CVPR-Kolkata; ISI)
Abstract (Summary of the Work)
The aim of the dissertation is to come up with a good algorithm that will help classify the documents of the 20 newsgroup data set to it’s proper classes. Different methods are applied using the vector representation of documents (number of times a uni-gram occurs in a document) to come up with a method that gives best accuracy after classification. Hierarchical structure for classification was followed and different methods were experimented with to see which one gives the best accuracy. Different ways to detect outliers in the training set were also applied and these outliers were removed from the training set to improve accuracy
Control Number
ISI-DISS-2016-335
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
DOI
http://dspace.isical.ac.in:8080/jspui/handle/10263/6492
Recommended Citation
Dhamija, Kanishka, "Hierarchical Approach to Document Classification of 20 Newsgroup Dataset." (2017). Master’s Dissertations. 156.
https://digitalcommons.isical.ac.in/masters-dissertations/156
Comments
ProQuest Collection ID: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:28843176