Hybrid approach for text categorization: A case study with Bangla news article
Article Type
Research Article
Publication Title
Journal of Information Science
Abstract
The incredible expansion of online texts due to the Internet has intensified and revived the interest of sorting, managing and categorising the documents into their respective domains. This shows the pressing need for automatic text categorization system to assign a document into its appropriate domain. In this article, the focus is on showcasing the effectiveness of a hybrid approach that works elegantly by combining text-based and graph-based features. The hybrid approach was applied on 14,373 Bangla articles with 57,22,569 tokens collected from various online news corpora covering nine categories. This article also presents the individual application of both the features to explicate how they generally work. For classification purposes, the feature sets were passed through the Bayesian classification methods which yield satisfactory results with 98.73% accuracy for Naïve Bayes Multinomial (NBM). Also, to test the robustness and language independency of the system, the experiments were performed on two popular English datasets as well.
First Page
762
Last Page
777
DOI
https://10.1177/01655515211027770
Publication Date
6-1-2023
Recommended Citation
Dhar, Ankita; Mukherjee, Himadri; Roy, Kaushik; Santosh, K. C.; and Dash, Niladri Sekhar, "Hybrid approach for text categorization: A case study with Bangla news article" (2023). Journal Articles. 3686.
https://digitalcommons.isical.ac.in/journal-articles/3686