Integrating Unsupervised Clustering and Label-Specific Oversampling tTackle Imbalanced Multi-Label Data
Document Type
Conference Article
Publication Title
International Conference on Agents and Artificial Intelligence
Abstract
There is often a mixture of very frequent labels and very infrequent labels in multi-label datasets. This variation in label frequency, a type class imbalance, creates a significant challenge for building efficient multi-label classification algorithms. In this paper, we tackle this problem by proposing a minority class oversampling scheme, UCLSO, which integrates Unsupervised Clustering and Label-Specific data Oversampling. Clustering is performed to find out the key distinct and locally connected regions of a multi-label dataset (irrespective of the label information). Next, for each label, we explore the distributions of minority points in the cluster sets. Only the intra-cluster minority points are used to generate the synthetic minority points. Despite having the same cluster set across all labels, we will use the label-specific class information to obtain a variation in the distributions of the synthetic minority points (in congruence with the label-specific class memberships within the clusters) across the labels. The training dataset is augmented with the set of label-specific synthetic minority points, and classifiers are trained to predict the relevance of each label independently. Experiments using 12 multi-label datasets and several multi-label algorithms shows the competency of the proposed method over other competing algorithms in the given context.
First Page
489
Last Page
498
DOI
10.5220/0011901200003393
Publication Date
1-1-2023
Recommended Citation
Sadhukhan, Payel; Pakrashi, Arjun; Palit, Sarbani; and Namee, Brian Mac, "Integrating Unsupervised Clustering and Label-Specific Oversampling tTackle Imbalanced Multi-Label Data" (2023). Conference Articles. 535.
https://digitalcommons.isical.ac.in/conf-articles/535
Comments
Open Access, Hybrid Gold, Green