Conference Articles

Application of TF-IDF feature for categorizing documents of online Bangla web text corpus

Ankita Dhar, West Bengal State University
Niladri Sekhar Dash, Indian Statistical Institute, Kolkata
Kaushik Roy, West Bengal State University

Document Type

Conference Article

Publication Title

Advances in Intelligent Systems and Computing

Abstract

This paper explores the use of standard features as well as machine learning approaches for categorizing Bangla text documents of online Web corpus. The TF-IDF feature with dimensionality reduction technique (40% of TF) is used here for bringing in precision in the whole process of lexical matching for identification of domain category or class of a piece of text document. This approach stands on the generic observation that text categorization or text classification is a task of automatically sorting out a set of text documents into some predefined sets of text categories. Although an ample range of methods have been applied on English texts for categorization, limited studies are carried out on Indian language texts including that of Bangla. Hence, an attempt is made here to analyze the level of efficiency of the categorization method mentioned above for Bangla text documents. For verification and validation, Bangla text documents that are obtained from various online Web sources are normalized and used as inputs for the experiment. The experimental results show that the feature extraction method along with LIBLINEAR classification model can generate quite satisfactory performance by attaining good results in terms of high-dimensional feature sets and relatively noisy document feature vectors.

First Page

Last Page

DOI

10.1007/978-981-10-7566-7_6

Publication Date

1-1-2018

Recommended Citation

Dhar, Ankita; Dash, Niladri Sekhar; and Roy, Kaushik, "Application of TF-IDF feature for categorizing documents of online Bangla web text corpus" (2018). Conference Articles. 158.
https://digitalcommons.isical.ac.in/conf-articles/158

This document is currently not available here.

COinS

Conference Articles

Application of TF-IDF feature for categorizing documents of online Bangla web text corpus

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Browse

Search

Author Corner

Links

Conference Articles

Application of TF-IDF feature for categorizing documents of online Bangla web text corpus

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Share

Browse

Search

Author Corner

Links