Learning with a Reject Option.

Date of Submission

December 2017

Date of Award

Winter 12-12-2018

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science

Department

Electronics and Communication Sciences Unit (ECSU-Kolkata)

Supervisor

Pal, Nikhil Ranjan (ECSU-Kolkata; ISI)

Abstract (Summary of the Work)

A major assumption traditional machine learning algorithms make is that the classes encountered during testing phase is always a subset of the classes encountered during training phase. However, in real world applications like biometric recognition, this assumption is violated most of the time. There might be some patterns in the test data that are located far from the training data used to train the classifier. In this scenario, instead of classifying the pattern into any of the known classes, the best option will be to reject it. Thus, when that is appropriate, our algorithm needs to have a mechanism to reject patterns instead of classifying them into any of the known classes.In this thesis, we propose two algorithms with a reject option. The first algorithm is an unsupervised one which uses a Self Organizing Map (SOM). SOMs are known to preserve topological properties of the input data like neighbourhood distances and density. We set a rejection threshold based on distances of points mapped to a SOM node. The second algorithm is a two stage rejection algorithm based on Extreme Value Theory. In the first stage, we model the data using a Gaussian Mixture Model for every class and set a rejection threshold based on extreme value distribution of the Mahalanobis distance of the points from the mixture components. In the second stage, we train Support Vector Machines (SVMs) for all the classes in a one-vs-all fashion. We set a rejection threshold based on the extreme value distribution of the SVM scores. To test our algorithms, we simulate an open set scenario, where our model is trained using only a subset of classes present in the dataset. Thus, while testing, there are data from known classes which our algorithm should classify and data from unknown classes which it should reject. We also analyze our algorithms by discussing their pros and cons and also provide some ideas to improve their performance further.

Comments

ProQuest Collection ID: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:28843333

Control Number

ISI-DISS-2009-371

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

DOI

http://dspace.isical.ac.in:8080/jspui/handle/10263/6830

This document is currently not available here.

Share

COinS