Journal Articles

CESS-A system to categorize bangla web text documents

Ankita Dhar, West Bengal State University
Himadri Mukherjee, West Bengal State University
Niladri Sekhar Dash, Indian Statistical Institute, Kolkata
Kaushik Roy, West Bengal State University

Article Type

Research Article

Publication Title

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Technology has evolved remarkably, which has led to an exponential increase in the availability of digital text documents of disparate domains over the Internet. This makes the retrieval of the information a very much time- and resource-consuming task. Thus, a system that can categorize such documents based on their domains can truly help the users in obtaining the required information with relative ease and also reduce the workload of the search engines. This article presents a text categorization system (CESS) that categorizes text document using newly proposed hybrid features that combines term frequency-inverse document frequency-inverse class frequency and modified chi-square methods. Experiments were performed on real-world Bangla documents from eight domains comprises of 24,29,857 tokens, and the highest accuracy of 99.91% has been obtained with multilayer perceptron-based classification. Also, the experiments were tested on Reuters-21578 and 20 Newsgroups datasets and obtained accuracies of 97.29% and 94.67%, respectively, to show the language-independent nature of the system.

DOI

10.1145/3398070

Publication Date

8-1-2020

Recommended Citation

Dhar, Ankita; Mukherjee, Himadri; Dash, Niladri Sekhar; and Roy, Kaushik, "CESS-A system to categorize bangla web text documents" (2020). Journal Articles. 175.
https://digitalcommons.isical.ac.in/journal-articles/175

Link to Full Text

COinS

Journal Articles

CESS-A system to categorize bangla web text documents

Article Type

Publication Title

Abstract

DOI

Publication Date

Recommended Citation

Browse

Search

Author Corner

Links

Journal Articles

CESS-A system to categorize bangla web text documents

Authors

Article Type

Publication Title

Abstract

DOI

Publication Date

Recommended Citation

Share

Browse

Search

Author Corner

Links