Integrating Unsupervised Clustering and Label-Specific Oversampling tTackle Imbalanced Multi-Label Data

Document Type

Conference Article

Publication Title

International Conference on Agents and Artificial Intelligence

Abstract

There is often a mixture of very frequent labels and very infrequent labels in multi-label datasets. This variation in label frequency, a type class imbalance, creates a significant challenge for building efficient multi-label classification algorithms. In this paper, we tackle this problem by proposing a minority class oversampling scheme, UCLSO, which integrates Unsupervised Clustering and Label-Specific data Oversampling. Clustering is performed to find out the key distinct and locally connected regions of a multi-label dataset (irrespective of the label information). Next, for each label, we explore the distributions of minority points in the cluster sets. Only the intra-cluster minority points are used to generate the synthetic minority points. Despite having the same cluster set across all labels, we will use the label-specific class information to obtain a variation in the distributions of the synthetic minority points (in congruence with the label-specific class memberships within the clusters) across the labels. The training dataset is augmented with the set of label-specific synthetic minority points, and classifiers are trained to predict the relevance of each label independently. Experiments using 12 multi-label datasets and several multi-label algorithms shows the competency of the proposed method over other competing algorithms in the given context.

First Page

489

Last Page

498

DOI

10.5220/0011901200003393

Publication Date

1-1-2023

Comments

Open Access, Hybrid Gold, Green

This document is currently not available here.

Share

COinS