Date of Submission

6-22-2017

Date of Award

11-20-2017

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Doctoral Thesis

Degree Name

Doctor of Philosophy

Subject Name

Computer Science

Department

Machine Intelligence Unit (MIU-Kolkata)

Supervisor

Maji, Pradipta (MIU; Indian Statistical Institute)

Abstract (Summary of the Work)

A huge amount of data is being generated continuously as a result of recent advancement and wide use of high-throughput technologies. With the rapid increase in size of data distributed worldwide, understanding the data has become critical. In this regard, dimensionality reduction and clustering have become the necessary preprocessing steps of multiple research areas and applications. One of the important problems of real life large data sets is uncertainty. Some of the sources of this uncertainty include imprecision in computation and vagueness in class denitions. The uncertainty may also be present in the denition of class membership function. In this background, the thesis addresses the problem of dimensionality reduction and clustering of real life data sets, in the presence of noise and uncertainty. The thesis rst presents the problem of feature selection using both type-1 and interval type-2 fuzzyrough sets, which are eective for dimensionality reduction of real life data sets when uncertainty is present in the data set. The properties of fuzzy-rough sets allow greater exibility in handling noisy and real valued data. While the concept of lower approximation and boundary region of rough sets deals with uncertainty, incompleteness, and vagueness in class de_nition, the use of either type-1 or interval type-2 fuzzy sets enables ecient handling of overlapping classes in uncertain environment. Moreover, a new concept of simultaneous attribute selection and feature extraction" is introduced for dimensionality reduction, integrating judiciously the merits of both feature selection and extraction. A scalable rough-fuzzy clustering algorithm is introduced for large real life data sets, where the theory of rough hypercuboid approach, interval type-2 fuzzy sets, and c-means algorithm are integrated judiciously to handle the uncertainty present in a data set. While the concept of rough hypercuboid approach deals with uncertainty, incompleteness, and vagueness in cluster denition, the use of fuzzy membership of interval type-2 fuzzy sets in the boundary region of a cluster enables ecient handling of overlapping partitions in uncertain environment. Finally, the application of both clustering and feature selection algorithms is demonstrated by grouping functionally similar microRNAs from microarray data. The proposed approach can automatically select the optimum set of features while clustering the microRNAs, making the complexity of the algorithm lower.

DSpace Identifier

http://hdl.handle.net/10263/7657

Recommended Citation

Garai, Partha, "Development of Some Scalable Pattern Recognition Algorithms for Real Life Data Analysis" (2017). Doctoral Theses. 630.
https://digitalcommons.isical.ac.in/doctoral-theses/630

Download

Included in

Other Computer Engineering Commons

COinS

Doctoral Theses

Development of Some Scalable Pattern Recognition Algorithms for Real Life Data Analysis

Date of Submission

Date of Award

Institute Name (Publisher)

Document Type

Degree Name

Subject Name

Department

Supervisor

Abstract (Summary of the Work)

DSpace Identifier

Recommended Citation

Included in

Browse

Search

Author Corner

Links

Doctoral Theses

Development of Some Scalable Pattern Recognition Algorithms for Real Life Data Analysis

Author (Researcher Name)

Date of Submission

Date of Award

Institute Name (Publisher)

Document Type

Degree Name

Subject Name

Department

Supervisor

Abstract (Summary of the Work)

DSpace Identifier

Recommended Citation

Included in

Share

Browse

Search

Author Corner

Links