Journal Articles

On some transformations of high dimension, low sample size data for nearest neighbor classification

Subhajit Dutta, Indian Institute of Technology Kanpur
Anil K. Ghosh, Indian Statistical Institute, Kolkata

Article Type

Research Article

Publication Title

Machine Learning

Abstract

For data with more variables than the sample size, phenomena like concentration of pairwise distances, violation of cluster assumptions and presence of hubness often have adverse effects on the performance of the classic nearest neighbor classifier. To cope with such problems, some dimension reduction techniques like those based on random linear projections and principal component directions have been proposed in the literature. In this article, we construct nonlinear transformations of the data based on inter-point distances, which also lead to reduction in data dimension. More importantly, for such high dimension low sample size data, they enhance separability among the competing classes in the transformed space. When the classic nearest neighbor classifier is used on the transformed data, it usually yields lower misclassification rates. Under appropriate regularity conditions, we derive asymptotic results on misclassification probabilities of nearest neighbor classifiers based on the l2 norm and the lp norms (with p∈(0,1]) in the transformed space, when the training sample size remains fixed and the dimension of the data grows to infinity. Strength of the proposed transformations in the classification context is demonstrated by analyzing several simulated and benchmark data sets.

First Page

Last Page

DOI

10.1007/s10994-015-5495-y

Publication Date

1-1-2016

Comments

Open Access; Bronze Open Access

Recommended Citation

Dutta, Subhajit and Ghosh, Anil K., "On some transformations of high dimension, low sample size data for nearest neighbor classification" (2016). Journal Articles. 4349.
https://digitalcommons.isical.ac.in/journal-articles/4349

Link to Full Text

COinS

Journal Articles

On some transformations of high dimension, low sample size data for nearest neighbor classification

Article Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Comments

Recommended Citation

Browse

Search

Author Corner

Links

Journal Articles

On some transformations of high dimension, low sample size data for nearest neighbor classification

Authors

Article Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Comments

Recommended Citation

Share

Browse

Search

Author Corner

Links