Reverse-nearest neighborhood based oversampling for imbalanced, multi-label datasets

Article Type

Research Article

Publication Title

Pattern Recognition Letters

Abstract

In this article, we present a novel reverse-nearest neighborhood based oversampling scheme for the imbalanced labels of a multi-label dataset. Reverse nearest neighborhood of a query point includes all those points which contain the query point as one of their neighbor. It facilitates us to identify an adaptive number of neighbors (according to the density and distribution of points) instead of a fixed number of neighbors. We add label-specific synthetic minority instances in the reverse nearest neighborhood of the minority points of each label. Reverse nearest neighbor configuration also detects the singular minority points, which we avoid as seed points in the oversampling phase. On the oversampled data of each label, we train and invoke a Linear Support Vector Machine to complete the learning and testing. Results of the proposed method against comparing methods on class-imbalance focused metrics indicates its competence in handling differently imbalanced multi-label datasets.

First Page

813

Last Page

820

DOI

10.1016/j.patrec.2019.08.009

Publication Date

7-1-2019

This document is currently not available here.

Share

COinS