A New Feature Vector Based on Gene Ontology Terms for Protein-Protein Interaction Prediction
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Protein-protein interaction (PPI) plays a key role in understanding cellular mechanisms in different organisms. Many supervised classifiers like Random Forest (RF) and Support Vector Machine (SVM) have been used for intra or inter-species interaction prediction. For improving the prediction performance, in this paper we propose a novel set of features to represent a protein pair using their annotated Gene Ontology (GO) terms, including their ancestors. In our approach, a protein pair is treated as a document (bag of words), where the terms annotating the two proteins represent the words. Feature value of each word is calculated using information content of the corresponding term multiplied by a coefficient, which represents the weight of that term inside a document (i.e., a protein pair). We have tested the performance of the classifier using the proposed feature on different well known data sets of different species like S. cerevisiae, H. Sapiens, E. Coli, and D. melanogaster. We compare it with the other GO based feature representation technique, and demonstrate its competitive performance.
Bandyopadhyay, Sanghamitra and Mallick, Koushik, "A New Feature Vector Based on Gene Ontology Terms for Protein-Protein Interaction Prediction" (2017). Journal Articles. 2500.