Conditional Masking to Numerical Data
Journal of Statistical Theory and Practice
Protecting the privacy of datasets has become hugely important these days. Many real-life datasets like income data and medical data need to be secured before making it public. However, security comes at the cost of losing some useful statistical information about the dataset. Data obfuscation deals with this problem of masking a dataset in such a way that the utility of the data is maximized while minimizing the risk of the disclosure of sensitive information. Two popular approaches to data obfuscation for numerical data involve (i) data swapping and (ii) adding noise to data. While the former masks well sacrificing the whole of correlation information, the latter gives estimates for most of the popular statistics like mean, variance, quantiles and correlation but fails to give an unbiased estimate of the distribution curve of the original data. In this paper, we propose a mixed method of obfuscation combining the above two approaches and discuss how the proposed method succeeds in giving an unbiased estimation of the distribution curve while giving reliable estimates of the other well-known statistics like moments and correlation.
Ghatak, Debolina and Roy, Bimal K., "Conditional Masking to Numerical Data" (2019). Journal Articles. 712.