Doctoral Theses

Adaptation-Based Classi ers for Handling Some Problems with Multi-Label Data

Anwesha Law Dr., Indian Statistical Institute

Date of Submission

June 2022

Date of Award

6-1-2023

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Doctoral Thesis

Degree Name

Doctor of Philosophy

Subject Name

Computer Science

Department

Machine Intelligence Unit (MIU-Kolkata)

Supervisor

Ghosh, Ashish (MIU-Kolkata; ISI)

Abstract (Summary of the Work)

The concept of multi-label (ML) data generalizes the association of instances to classes by labelling each data sample with more than one class simultaneously. Since this data can belong to more than one class at the same time, instances that are multi-label in nature, should not be forcefully assigned a single label. It needs to be handled in its original form. However, various problems arise while dealing with multi-label data. In this thesis, four such issues have been highlighted and dealt with. The first problem is the large input dimension that sometimes occurs in multi-label data. Dimensionality reduction of the features helps to strike a balance between the feature size, the number of samples and the output dimension. The next limitation is that of a complex decision space with overlapping class boundaries. This occurs due to the instances belonging to multiple classes simultaneously. Various approaches such as improving the feature to class mapping, increasing the class separability and simplifying the decision space have been implemented. The third drawback arises due to a large number of classes and label-sets in multi-label data, most of which are under-represented. This emphasizes the problem of class imbalance that widely prevails in multi-label data. This imbalance has been handled through the usage of customized classifiers suitable for the data at hand. Finally, the problem of class correlation is to be handled in this thesis. Multiple classes simultaneously assigned to every instance indicates a possibility of a few classes co-occurring on numerous occasions. These frequently co-occurring classes might have some correlation among them which have been identified and utilized to improve the multi-label classification performance.This thesis addresses the above-mentioned issues to perform efficient multi-label classification. Smaller components that target the individual issues have been incorporated to build large classification models. The first work aims to reduce feature dimensions and learn a better feature to class mapping for the complex decision space. A shallow but fast network known as extreme learning machines (ELMs) has been cascaded with autoencoders (AEs) to propose a network that can handle both issues. Two variations of the network have been proposed. To further explore the overlapping boundaries of ML data, the second contribution increases the separability of the complex decision space and also incorporates dimensionality reduction. Functional link artificial neural network (FLANN) has been adopted here for the unique functional expansion capability that transforms the features to a higher dimension thus making it considerably more separable. After identifying the best configuration of the network, it has then been integrated with autoencoders to reduce the functionally expanded feature dimension and bring additional transformation into the multi-label data. While these classifiers display improved performance, they do not consider the problems of class imbalance or label correlation. Hence, the third work builds a tree of classifiers that handles the problem of class imbalance, simplifies decision space for the ease of learning and preserves label correlations. A novel label-set proximity-based technique has been devised that simplifies boundaries and splits the data while preserving label correlations. Every split is learned by a classifier suited for the balanced or imbalanced data at hand. While handling multiple issues together successfully, this classifier tree model preserves label correlations but does not explicitly use them to improve classification performance. In this regard, the final contribution specifically extracts underlying label correlations from the data and associates them with predictions of existing multi-label classifiers to improve the overall performance. A novel frequent label-set mining technique generates rules that help to improve scores predicted by the existing multi-label algorithms. This thesis incorporates various elements to handle the problems of multi-label data and converges them to create cohesive models for multi-label classification

Comments

ProQuest Collection ID: https://www.proquest.com/pqdtlocal1010185/dissertations/fromDatabasesLayer?accountid=27563

Control Number

ISILib-TH548

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

DOI

http://dspace.isical.ac.in:8080/jspui/handle/10263/2146

Recommended Citation

Law, Anwesha Dr., "Adaptation-Based Classi ers for Handling Some Problems with Multi-Label Data" (2023). Doctoral Theses. 534.
https://digitalcommons.isical.ac.in/doctoral-theses/534

Link to Full Text

COinS

Doctoral Theses

Adaptation-Based Classi ers for Handling Some Problems with Multi-Label Data

Date of Submission

Date of Award

Institute Name (Publisher)

Document Type

Degree Name

Subject Name

Department

Supervisor

Abstract (Summary of the Work)

Comments

Control Number

Creative Commons License

DOI

Recommended Citation

Browse

Search

Author Corner

Links

Doctoral Theses

Adaptation-Based Classi ers for Handling Some Problems with Multi-Label Data

Author (Researcher Name)

Date of Submission

Date of Award

Institute Name (Publisher)

Document Type

Degree Name

Subject Name

Department

Supervisor

Abstract (Summary of the Work)

Comments

Control Number

Creative Commons License

DOI

Recommended Citation

Share

Browse

Search

Author Corner

Links