A supervised term selection technique for effective text categorization
Article Type
Research Article
Publication Title
International Journal of Machine Learning and Cybernetics
Abstract
Term selection methods in text categorization effectively reduce the size of the vocabulary to improve the quality of classifier. Each corpus generally contains many irrelevant and noisy terms, which eventually reduces the effectiveness of text categorization. Term selection, thus, focuses on identifying the relevant terms for each category without affecting the quality of text categorization. A new supervised term selection technique have been proposed for dimensionality reduction. The method assigns a score to each term of a corpus based on its similarity with all the categories, and then all the terms of the corpus are ranked accordingly. Subsequently the significant terms of each category are selected to create the final subset of terms irrespective of the size of the category. The performance of the proposed term selection technique is compared with the performance of nine other term selection methods for categorization of several well known text corpora using kNN and SVM classifiers. The empirical results show that the proposed method performs significantly better than the other methods in most of the cases of all the corpora.
First Page
877
Last Page
892
DOI
10.1007/s13042-015-0421-y
Publication Date
10-1-2016
Recommended Citation
Basu, Tanmay and Murthy, C. A., "A supervised term selection technique for effective text categorization" (2016). Journal Articles. 4081.
https://digitalcommons.isical.ac.in/journal-articles/4081