Reverse-nearest neighborhood based oversampling for imbalanced, multi-label datasets
Pattern Recognition Letters
In this article, we present a novel reverse-nearest neighborhood based oversampling scheme for the imbalanced labels of a multi-label dataset. Reverse nearest neighborhood of a query point includes all those points which contain the query point as one of their neighbor. It facilitates us to identify an adaptive number of neighbors (according to the density and distribution of points) instead of a fixed number of neighbors. We add label-specific synthetic minority instances in the reverse nearest neighborhood of the minority points of each label. Reverse nearest neighbor configuration also detects the singular minority points, which we avoid as seed points in the oversampling phase. On the oversampled data of each label, we train and invoke a Linear Support Vector Machine to complete the learning and testing. Results of the proposed method against comparing methods on class-imbalance focused metrics indicates its competence in handling differently imbalanced multi-label datasets.
Sadhukhan, P. and Palit, Sarbani, "Reverse-nearest neighborhood based oversampling for imbalanced, multi-label datasets" (2019). Journal Articles. 779.