Nearest Neighbour Classification Based on Imbalanced Data: A Statistical Approach
Article Type
Research Article
Publication Title
Stat
Abstract
When the competing classes in a classification problem are not of comparable sizes, many popular classifiers exhibit a bias towards larger classes, and the nearest neighbour classifier is no exception. To take care of this problem, we develop a statistical method for nearest neighbour classification based on such imbalanced datasets. We first construct a classifier for the binary classification problem and then extend it to classification problems involving more than two classes. Unlike the existing oversampling or undersampling methods, our proposed classifiers do not need to generate pseudo observations or remove existing observations; hence, the results are exactly reproducible. We establish the Bayes risk consistency of these classifiers under appropriate regularity conditions. Their superior performance over the existing methods is amply demonstrated by analysing several simulated and benchmark datasets.
DOI
10.1002/sta4.70110
Publication Date
12-1-2025
Recommended Citation
Garg, Anvit; Ghosh, Anil K; and Sarkar, Soham, "Nearest Neighbour Classification Based on Imbalanced Data: A Statistical Approach" (2025). Journal Articles. 5472.
https://digitalcommons.isical.ac.in/journal-articles/5472