Clustering of fuzzy data and simultaneous feature selection: A model selection approach
Fuzzy Sets and Systems
Fuzzy data occurs frequently in the fields of decision making, social sciences, and control theory. We consider the problem of clustering fuzzy data along with automatic component number detection and feature selection. A model selection criterion called minimum message length is used to address the problem of component number selection. The Bayesian framework can be adopted here, by applying an explicit prior distribution over the parameter values. We discuss both uninformative and informative priors. For the latter, a gradient descent algorithm for automatic optimization of the prior hyper-parameters is presented. The problem of simultaneous feature selection involves ordering the discriminative features according to their relative importance, and at the same time eliminating non-discriminative features. The feature selection problem is also formulated as a parameter estimation problem by extending the concept of feature saliency. Then the estimation can be computed simultaneously with the clustering steps. By combining the clustering, the cluster number detection and the feature selection into one estimation problem, we modified the fuzzy Expectation–Maximization (EM) algorithm to perform all of the estimation. Evaluation criteria are proposed and empirical study results are reported to showcase the efficacy of our proposals.
Saha, Arkajyoti and Das, Swagatam, "Clustering of fuzzy data and simultaneous feature selection: A model selection approach" (2018). Journal Articles. 1379.