Nearest Neighbour Classification Based on Imbalanced Data: A Statistical Approach

Article Type

Research Article

Publication Title

Stat

Abstract

When the competing classes in a classification problem are not of comparable sizes, many popular classifiers exhibit a bias towards larger classes, and the nearest neighbour classifier is no exception. To take care of this problem, we develop a statistical method for nearest neighbour classification based on such imbalanced datasets. We first construct a classifier for the binary classification problem and then extend it to classification problems involving more than two classes. Unlike the existing oversampling or undersampling methods, our proposed classifiers do not need to generate pseudo observations or remove existing observations; hence, the results are exactly reproducible. We establish the Bayes risk consistency of these classifiers under appropriate regularity conditions. Their superior performance over the existing methods is amply demonstrated by analysing several simulated and benchmark datasets.

DOI

10.1002/sta4.70110

Publication Date

12-1-2025

Share

COinS