On some transformations of high dimension, low sample size data for nearest neighbor classification
Article Type
Research Article
Publication Title
Machine Learning
Abstract
For data with more variables than the sample size, phenomena like concentration of pairwise distances, violation of cluster assumptions and presence of hubness often have adverse effects on the performance of the classic nearest neighbor classifier. To cope with such problems, some dimension reduction techniques like those based on random linear projections and principal component directions have been proposed in the literature. In this article, we construct nonlinear transformations of the data based on inter-point distances, which also lead to reduction in data dimension. More importantly, for such high dimension low sample size data, they enhance separability among the competing classes in the transformed space. When the classic nearest neighbor classifier is used on the transformed data, it usually yields lower misclassification rates. Under appropriate regularity conditions, we derive asymptotic results on misclassification probabilities of nearest neighbor classifiers based on the l2 norm and the lp norms (with p∈(0,1]) in the transformed space, when the training sample size remains fixed and the dimension of the data grows to infinity. Strength of the proposed transformations in the classification context is demonstrated by analyzing several simulated and benchmark data sets.
First Page
57
Last Page
83
DOI
10.1007/s10994-015-5495-y
Publication Date
1-1-2016
Recommended Citation
Dutta, Subhajit and Ghosh, Anil K., "On some transformations of high dimension, low sample size data for nearest neighbor classification" (2016). Journal Articles. 4349.
https://digitalcommons.isical.ac.in/journal-articles/4349
Comments
Open Access; Bronze Open Access