Date of Submission
1-28-2021
Date of Award
1-28-2022
Institute Name (Publisher)
Indian Statistical Institute
Document Type
Doctoral Thesis
Degree Name
Doctor of Philosophy
Subject Name
Mathematics
Department
Electronics and Communication Sciences Unit (ECSU-Kolkata)
Supervisor
Das, Swagatam (ECSU-Kolkata; ISI)
Abstract (Summary of the Work)
The relevance of classification is almost endless in the everyday application of machine learning. However, the performance of a classifier is only limited to the fulfillment of the inherent assumptions it makes about the training examples. For example, to facilitate unbiased learning a classifier is expected to be trained with an equal number of labeled data instances from all of the classes. However, in a large number of practical applications such as anomaly detection, semantic segmentation, disease prediction, etc. it may not be possible to gather an equal number of diverse training points for all the classes. This results in a class imbalance in the training set where some majority classes contain a significantly larger number of examples than the rest of the minority classes (usually corresponding to rare and important events). Consequently, a classifier trained in presence of class imbalance is likely to achieve better accuracy on the majority classes compared to the minority ones.Class imbalance not only adversely affects the performance of a classifier but also leads to improper validation of its merit by inducing bias on the performance evaluation indices. We start by proposing a couple of fundamental conditions violation of which leads an index to be susceptible to an altering extent of imbalance and a varying number of classes in the test set. Under the light of these conditions, we present a theoretical study on the applicability of different indices commonly used to evaluate a classifier in the presence of class imbalance. Over the past couple of decades, a vast collection of research work attempted to modify the classifier and the training set respectively by algorithm-level and data-level approaches, such that the bias induced by class imbalance can be mitigated. We follow this direction of research by focusing on the popular Fuzzy-k-Nearest Neighbor (FkNN) classifier. We start by theoretically validating the quality of the class membership of a test point estimated by FkNN. We further demonstrate that our analysis can explain the susceptibility of FkNN to class imbalance and propose a point-specific locally adaptive class weighting strategy as a remedy. Moreover, we show that class-specific feature weights in addition to global class weights can significantly improve the immunity of FkNN against class imbalance when both types of weights are optimized using a self-adaptive variant of Differential Evolution. The advent of deep learning introduced another direction of research where attempts were made to understand the extent to which the commendable efficacy of the deep learning systems can be compromised in presence of class imbalance and propose remedial measures. We attempt to contribute in this direction by proposing an adaptive artificial oversampling technique that can be applicable to an end-to-end deep image classifier. Our model is constructed using three networks, a classifier, a convex generator, and a discriminator. An adversarial game between the classifier and the convex generator leads the latter to generate difficult artificial minority instances in the distributed feature space, while the discriminator adversarially guides the convex generator to follow the intended class distribution. As concluding remarks we discuss the future scope of research in combating the effects of class imbalance, especially in the emerging applications.
Control Number
ISILib-TH491
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
DOI
http://dspace.isical.ac.in:8080/jspui/handle/10263/2146
Recommended Citation
Mullick, Sankha Dr., "On Class Imbalanced Learning:Design of Non-parametricClassifiers, Performance Indices, and Deep Oversampling Strategies." (2022). Doctoral Theses. 445.
https://digitalcommons.isical.ac.in/doctoral-theses/445
Comments
ProQuest Collection ID: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:28843865