Conference Articles

Integrating Unsupervised Clustering and Label-Specific Oversampling tTackle Imbalanced Multi-Label Data

Payel Sadhukhan, Institute for Advancing Intelligence
Arjun Pakrashi, University College Dublin
Sarbani Palit, Indian Statistical Institute, Kolkata
Brian Mac Namee, University College Dublin

Document Type

Conference Article

Publication Title

International Conference on Agents and Artificial Intelligence

Abstract

There is often a mixture of very frequent labels and very infrequent labels in multi-label datasets. This variation in label frequency, a type class imbalance, creates a significant challenge for building efficient multi-label classification algorithms. In this paper, we tackle this problem by proposing a minority class oversampling scheme, UCLSO, which integrates Unsupervised Clustering and Label-Specific data Oversampling. Clustering is performed to find out the key distinct and locally connected regions of a multi-label dataset (irrespective of the label information). Next, for each label, we explore the distributions of minority points in the cluster sets. Only the intra-cluster minority points are used to generate the synthetic minority points. Despite having the same cluster set across all labels, we will use the label-specific class information to obtain a variation in the distributions of the synthetic minority points (in congruence with the label-specific class memberships within the clusters) across the labels. The training dataset is augmented with the set of label-specific synthetic minority points, and classifiers are trained to predict the relevance of each label independently. Experiments using 12 multi-label datasets and several multi-label algorithms shows the competency of the proposed method over other competing algorithms in the given context.

First Page

489

Last Page

498

DOI

10.5220/0011901200003393

Publication Date

1-1-2023

Comments

Open Access, Hybrid Gold, Green

Recommended Citation

Sadhukhan, Payel; Pakrashi, Arjun; Palit, Sarbani; and Namee, Brian Mac, "Integrating Unsupervised Clustering and Label-Specific Oversampling tTackle Imbalanced Multi-Label Data" (2023). Conference Articles. 535.
https://digitalcommons.isical.ac.in/conf-articles/535

This document is currently not available here.

COinS

Conference Articles

Integrating Unsupervised Clustering and Label-Specific Oversampling tTackle Imbalanced Multi-Label Data

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Comments

Recommended Citation

Browse

Search

Author Corner

Links

Conference Articles

Integrating Unsupervised Clustering and Label-Specific Oversampling tTackle Imbalanced Multi-Label Data

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Comments

Recommended Citation

Share

Browse

Search

Author Corner

Links