On some transformations of high dimension, low sample size data for nearest neighbor classification

Article Type

Research Article

Publication Title

Machine Learning

Abstract

For data with more variables than the sample size, phenomena like concentration of pairwise distances, violation of cluster assumptions and presence of hubness often have adverse effects on the performance of the classic nearest neighbor classifier. To cope with such problems, some dimension reduction techniques like those based on random linear projections and principal component directions have been proposed in the literature. In this article, we construct nonlinear transformations of the data based on inter-point distances, which also lead to reduction in data dimension. More importantly, for such high dimension low sample size data, they enhance separability among the competing classes in the transformed space. When the classic nearest neighbor classifier is used on the transformed data, it usually yields lower misclassification rates. Under appropriate regularity conditions, we derive asymptotic results on misclassification probabilities of nearest neighbor classifiers based on the l2 norm and the lp norms (with p∈(0,1]) in the transformed space, when the training sample size remains fixed and the dimension of the data grows to infinity. Strength of the proposed transformations in the classification context is demonstrated by analyzing several simulated and benchmark data sets.

First Page

57

Last Page

83

DOI

10.1007/s10994-015-5495-y

Publication Date

1-1-2016

Comments

Open Access; Bronze Open Access

Share

COinS