Finding optimum width of discretization for gene expressions using functional annotations

Article Type

Research Article

Publication Title

Computers in Biology and Medicine

Abstract

Discretizing gene expression values is an important step in data preprocessing as it helps in reducing noise and experimental errors. This in turn provides better results in various tasks such as gene regulatory network analysis and disease prediction. A supervised discretization method for gene expressions using gene annotation is developed. The method is called “Gene Annotation Based Discretization” (GABD) where the discretization width is determined by maximizing the positive predictive value (PPV), computed using gene annotations, for top 20,000 gene pairs. The method can capture the gene similarity better than those obtained using original expressions. The performance of GABD is compared with some existing discretization methods like equal width discretization, equal frequency discretization and k-means discretization in terms of positive predictive value (PPV). The utility of GABD is also shown by clustering genes using k-medoid algorithm and thereby predicting the function of 23 unclassified Saccharomyces cerevisiae genes using p-value cut off 10−10. The source code for GABD is available at http://www.sampa.droppages.com/GABD.html.

First Page

59

Last Page

67

DOI

10.1016/j.compbiomed.2017.09.010

Publication Date

11-1-2017

This document is currently not available here.

Share

COinS