Conference Articles

Classification of text documents through distance measurement: An experiment with multi-domain Bangla text documents

Ankita Dhar, West Bengal State University
Niladri Sekhar Dash, Indian Statistical Institute, Kolkata
Kaushik Roy, West Bengal State University

Document Type

Conference Article

Publication Title

Proceedings - 2017 3rd International Conference on Advances in Computing, Communication and Automation (Fall), ICACCA 2017

Abstract

This paper explores the use of two similarity measures for categorizing Bangla text documents into their respective domains. Cosine Similarity and Euclidean Distance have been usedasthe similarity measures on the vector space model based on TF-IDF feature. The domains of interest are Business, State, Medical, Sports, and Science texts which are used as inputs for analysis. The recognition accuracy of 95.80% for Cosine Similarity and 95.20% for Euclidean Distanceare achieved on 1000 text documents. This confirms that unsupervised feature extraction technique may be treated as one of the useful methods for automatic text classification in Bangla (and for other Indian language documents), if input texts are not pre-classified based on certain predefined linguistic or statistical parameters. Comparative experiments on the dataset using several classification algorithm show that the distance measures perform better compare to other classifiers.

First Page

Last Page

DOI

10.1109/ICACCAF.2017.8344721

Publication Date

7-2-2017

Recommended Citation

Dhar, Ankita; Dash, Niladri Sekhar; and Roy, Kaushik, "Classification of text documents through distance measurement: An experiment with multi-domain Bangla text documents" (2017). Conference Articles. 214.
https://digitalcommons.isical.ac.in/conf-articles/214

This document is currently not available here.

COinS

Conference Articles

Classification of text documents through distance measurement: An experiment with multi-domain Bangla text documents

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Browse

Search

Author Corner

Links

Conference Articles

Classification of text documents through distance measurement: An experiment with multi-domain Bangla text documents

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Share

Browse

Search

Author Corner

Links