Journal Articles

Text categorization: past and present

Article Type

Research Article

Publication Title

Artificial Intelligence Review

Abstract

Automatic text categorization is the operation of sorting out the text documents into pre-defined text categories using some machine learning algorithms. Normally, it defines the most important approaches to organizing and making the use of a large volume of information exists in unstructured form. Nowadays, text categorization is becoming an extensively researched field of text mining and processing of languages. Word sense, semantic relationships among terms, text documents and categories are quite essential in order of enhancing the performances of categorization. Various surveys on text categorization have already been available which involve techniques of various text representation schemes to such extent but do not include several approaches that have been explored in text categorization over the standard techniques. Here, an exhaustive analysis of different text categorization approaches over the conventional approaches has been undertaken. This survey paper explores a wide variety of algorithms used for categorizing text documents and tries to assemble the existing works into three basic fields: conventional methods, fuzzy logic-based methods, deep learning-based methods. Further, conventional methods have been categorized into three fields: text categorization using handcrafted features, text categorization using nature-inspired algorithms and text categorization using graph-based methods. Furthermore, this survey provides a clear idea about the available libraries used for different algorithms, availability of datasets, categorization technologies explored in various non-Indian and Indian languages as well.

First Page

3007

Last Page

3054

DOI

10.1007/s10462-020-09919-1

Publication Date

4-1-2021

Recommended Citation

Dhar, Ankita; Mukherjee, Himadri; Dash, Niladri Sekhar; and Roy, Kaushik, "Text categorization: past and present" (2021). Journal Articles. 2025.
https://digitalcommons.isical.ac.in/journal-articles/2025

This document is currently not available here.

COinS

Journal Articles

Text categorization: past and present

Article Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Browse

Search

Author Corner

Links

Journal Articles

Text categorization: past and present

Authors

Article Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Share

Browse

Search

Author Corner

Links