Doctoral Theses

On Class Imbalanced Learning:Design of Non-parametricClassifiers, Performance Indices, and Deep Oversampling Strategies.

Sankha Mullick Dr., Indian Statistical InstituteFollow

Date of Submission

1-28-2021

Date of Award

1-28-2022

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Doctoral Thesis

Degree Name

Doctor of Philosophy

Subject Name

Mathematics

Department

Electronics and Communication Sciences Unit (ECSU-Kolkata)

Supervisor

Das, Swagatam (ECSU-Kolkata; ISI)

Abstract (Summary of the Work)

The relevance of classification is almost endless in the everyday application of machine learning. However, the performance of a classifier is only limited to the fulfillment of the inherent assumptions it makes about the training examples. For example, to facilitate unbiased learning a classifier is expected to be trained with an equal number of labeled data instances from all of the classes. However, in a large number of practical applications such as anomaly detection, semantic segmentation, disease prediction, etc. it may not be possible to gather an equal number of diverse training points for all the classes. This results in a class imbalance in the training set where some majority classes contain a significantly larger number of examples than the rest of the minority classes (usually corresponding to rare and important events). Consequently, a classifier trained in presence of class imbalance is likely to achieve better accuracy on the majority classes compared to the minority ones.Class imbalance not only adversely affects the performance of a classifier but also leads to improper validation of its merit by inducing bias on the performance evaluation indices. We start by proposing a couple of fundamental conditions violation of which leads an index to be susceptible to an altering extent of imbalance and a varying number of classes in the test set. Under the light of these conditions, we present a theoretical study on the applicability of different indices commonly used to evaluate a classifier in the presence of class imbalance. Over the past couple of decades, a vast collection of research work attempted to modify the classifier and the training set respectively by algorithm-level and data-level approaches, such that the bias induced by class imbalance can be mitigated. We follow this direction of research by focusing on the popular Fuzzy-k-Nearest Neighbor (FkNN) classifier. We start by theoretically validating the quality of the class membership of a test point estimated by FkNN. We further demonstrate that our analysis can explain the susceptibility of FkNN to class imbalance and propose a point-specific locally adaptive class weighting strategy as a remedy. Moreover, we show that class-specific feature weights in addition to global class weights can significantly improve the immunity of FkNN against class imbalance when both types of weights are optimized using a self-adaptive variant of Differential Evolution. The advent of deep learning introduced another direction of research where attempts were made to understand the extent to which the commendable efficacy of the deep learning systems can be compromised in presence of class imbalance and propose remedial measures. We attempt to contribute in this direction by proposing an adaptive artificial oversampling technique that can be applicable to an end-to-end deep image classifier. Our model is constructed using three networks, a classifier, a convex generator, and a discriminator. An adversarial game between the classifier and the convex generator leads the latter to generate difficult artificial minority instances in the distributed feature space, while the discriminator adversarially guides the convex generator to follow the intended class distribution. As concluding remarks we discuss the future scope of research in combating the effects of class imbalance, especially in the emerging applications.

Comments

ProQuest Collection ID: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:28843865

Control Number

ISILib-TH491

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

DOI

http://dspace.isical.ac.in:8080/jspui/handle/10263/2146

Recommended Citation

Mullick, Sankha Dr., "On Class Imbalanced Learning:Design of Non-parametricClassifiers, Performance Indices, and Deep Oversampling Strategies." (2022). Doctoral Theses. 445.
https://digitalcommons.isical.ac.in/doctoral-theses/445

Download

Included in

Mathematics Commons

COinS

Doctoral Theses

On Class Imbalanced Learning:Design of Non-parametricClassifiers, Performance Indices, and Deep Oversampling Strategies.

Date of Submission

Date of Award

Institute Name (Publisher)

Document Type

Degree Name

Subject Name

Department

Supervisor

Abstract (Summary of the Work)

Comments

Control Number

Creative Commons License

DOI

Recommended Citation

Included in

Browse

Search

Author Corner

Links

Doctoral Theses

On Class Imbalanced Learning:Design of Non-parametricClassifiers, Performance Indices, and Deep Oversampling Strategies.

Author (Researcher Name)

Date of Submission

Date of Award

Institute Name (Publisher)

Document Type

Degree Name

Subject Name

Department

Supervisor

Abstract (Summary of the Work)

Comments

Control Number

Creative Commons License

DOI

Recommended Citation

Included in

Share

Browse

Search

Author Corner

Links