Sparse Fuzzy Switching Regression Model.

Date of Submission

December 2016

Date of Award

Winter 12-12-2017

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science


Electronics and Communication Sciences Unit (ECSU-Kolkata)


Pal, Nikhil Ranjan (ECSU-Kolkata; ISI)

Abstract (Summary of the Work)

Unlike multiple regression, in switching regression, data are assumed to have come from more than one regression model but the association between the sample points and the models is not known. One approach to obtain the parameters of the switching regression model, is to formulate the problem using a mixture distribution. The estimators for this kind of distribution can be obtained using an iterative maximum likelihood method. The second approach is to obtain a fuzzy partition of the data using the fuzzy c-regression model (FCRM) algorithm. Here, the prototypes of the clusters are in the form of regression models. For switching regression, although there are evidences/reasons to believe that the data are generated by more than one model, usually it is not known whether all predictors are important for all regimes. This work is based around identifying useful predictors, independent variables, and eliminating the irrelevant ones in the fuzzy switching regression setup. We employ two different regularizers in the FCRM objective function to induce sparsity in the models and thereby select useful features. In the first case, the ordinary FCRM objective function is regularized using the least absolute shrinkage and selection operator (lasso) penalty i.e., using the l1 norm of the parameters of the regression models as the regularizer. In order to deal with the l1 norm, each parameter is modelled using two non-negative variables. For a given partition matrix, it leads to a bound constraint quadratic optimization problem. In the second case, we formulate the non-negatve garrotte penalty for the fuzzy c-regression model. In this case, for each variable we associate a non-negative weight or importance. We consider two versions of the problem: (1) for every model we use a different set of weights, (2) only one common set of weights is used for all models. We test both approaches on synthetic as well as real datasets. After comparing results of both the cases on these datasets, we conclude that garrotte is more effective in inducing sparisty, maintaining the same level of root mean square error. Lastly, we discuss a method to evaluate goodness of the feature selection methods. This evaluation method affirms that features selected by the non-negative garrotte penalty are useful.


ProQuest Collection ID:

Control Number


Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.


This document is currently not available here.