Support Vector Regression for Outlier Removal.

Date of Submission

December 2010

Date of Award

Winter 12-12-2011

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science


Machine Intelligence Unit (MIU-Kolkata)


Murthy, C. A. (MIU-Kolkata; ISI)

Abstract (Summary of the Work)

1.1 Introduction to ProblemThe Support Vector Machine (SVM) is a universal approach for solving the problems of multidimensional function estimation. Those approaches are all based on the Vapnik–Chervonenkis (VC) theory. Initially, it was designed to solve pattern recognition problems, where in order to find a decision rule with good generalization capability, a small subset of the training data, called the support vectors are selected. Experiments showed that it is easy to recognize high-dimensional identities using a small basis constructed from the selected support vectors. Recently, SVM has also been applied to various fields successfully such as classification, time prediction and regression. When SVM is employed to tackle the problems of function approximation and regression, the approaches are often referred to as the Support Vector Regression (SVR). The SVR type of function approximation is very effective, especially for the case of having a high-dimensional input space.In general, for real-world applications, observations are always subject to noise or outliers. The intuitive definition of outliers is that “an observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism”. Outliers may occur due to various reasons, such as erroneous measurements or noisy phenomenon appearing in the tail portion of some noise distribution functions. However, the traditional SVR is not effective in dealing with outliers in training data commonly encountered in practical applications. Thus, even a few outliers result in a poor regression. The basic idea of the proposed method consists in gradually partitioning data into outliers and inliers, and thus refining the estimation with the inliers.1.2 Brief Overview:-Linear RegressionRegression analysis includes any techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps us understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables—that is, the average value of the dependent variable when the independent variables are held fixed.Regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning. Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships.Regression models involve the following variables:The unknown parameters denoted as β; this may be a scalar or a vector of length k.The independent variables X.The dependent variable, Y. A regression model relates Y to a function of X and β.Y≈ f(X,β)In linear regression, the model specification is that the dependent variable, yi is a linear combination of the parameters (but need not be linear in the independent variables).Suppose we are given a data set {yi,xi1,....xip}ni=10 of n statistical units, a linear regression model assumes that the relationship between the dependent variable yi and the p-vector of regressors xi is approximately linear


ProQuest Collection ID:

Control Number


Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.


This document is currently not available here.