Classification of incomplete data integrating neural networks and evidential reasoning

Article Type

Research Article

Publication Title

Neural Computing and Applications

Abstract

When missing data are imputed by any method, there is some uncertainty associated with the imputed value. Consequently, when such imputed data are classified, some uncertainty will be propagated to the classifier output. This leads to two issues to address. First, reducing the uncertainty in the imputed value. Second, modeling and processing of the uncertainty associated with the classifier output to arrive at a better decision. To deal with the first issue, we use a latent space representation, while for the second issue we use Dempster-Shafer evidence theory. First, we train a neural network using the data without any missing value to generate a latent space representation of the input. The complete data set is now extended by deleting every feature once. These missing values are estimated using a nearest neighbor-based scheme. The network is then refined using this extended dataset to obtain a better latent space. This mechanism is expected to reduce the effect of the missing data on the latent space representation. Using the latent space representation of the complete data, we train two classifiers, support vector machines and evidential t-nearest neighbors. To classify an input with a missing value, we make a rough estimate of the missing value using the nearest neighbor rule and generate its latent space representation for classification by the classifiers. Using each classifier output, we generate a basic probability assignment (BPA) and all BPAs are combined to get an overall BPA. Final classification is done using Pignistic probabilities computed on the overall BPA. We use three different ways to defining BPAs. To avoid some problems of Dempster’s rule of aggregation, we also use several alternative aggregations including some T-norm-based methods. Note that, T-norm has been used for combination of belief function in Pichon and Denœux (in: NAFIPS 2008: 2008 annual meeting of the North American fuzzy information processing society, pp 1–6, 2008). To demonstrate the superiority of the proposed method, we compare its performance with four state-of-the-art techniques using both artificial and real datasets.

DOI

10.1007/s00521-021-06267-1

Publication Date

1-1-2021

This document is currently not available here.

Share

COinS