A New Feature Vector Based on Gene Ontology Terms for Protein-Protein Interaction Prediction
Article Type
Research Article
Publication Title
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Abstract
Protein-protein interaction (PPI) plays a key role in understanding cellular mechanisms in different organisms. Many supervised classifiers like Random Forest (RF) and Support Vector Machine (SVM) have been used for intra or inter-species interaction prediction. For improving the prediction performance, in this paper we propose a novel set of features to represent a protein pair using their annotated Gene Ontology (GO) terms, including their ancestors. In our approach, a protein pair is treated as a document (bag of words), where the terms annotating the two proteins represent the words. Feature value of each word is calculated using information content of the corresponding term multiplied by a coefficient, which represents the weight of that term inside a document (i.e., a protein pair). We have tested the performance of the classifier using the proposed feature on different well known data sets of different species like S. cerevisiae, H. Sapiens, E. Coli, and D. melanogaster. We compare it with the other GO based feature representation technique, and demonstrate its competitive performance.
First Page
762
Last Page
770
DOI
10.1109/TCBB.2016.2555304
Publication Date
7-1-2017
Recommended Citation
Bandyopadhyay, Sanghamitra and Mallick, Koushik, "A New Feature Vector Based on Gene Ontology Terms for Protein-Protein Interaction Prediction" (2017). Journal Articles. 2500.
https://digitalcommons.isical.ac.in/journal-articles/2500