Date of Submission


Date of Award


Institute Name (Publisher)

Indian Statistical Institute

Document Type

Doctoral Thesis

Degree Name

Doctor of Philosophy

Subject Name

Computer Science


Machine Intelligence Unit (MIU-Kolkata)


De, Rajat Kumar (MIU-Kolkata; ISI)

Abstract (Summary of the Work)

Identification of toxins, which are either proteins or small molecules, from pathogens is of paramount importance due to their crucial role as first-line invaders infiltrating a host, often leading to infection of the host. These toxins can affect specific proteins, like enzymes that catalyze metabolic pathways, affect metabolites that form the basis of metabolic reactions, and prevent the progression of those pathways, or more generally they may affect the regular functioning of other proteins in signaling pathways in the host. In this regard, the thesis addresses the problem of identification of toxins, and the effect of perturbations by toxins on the host pathways based on three tasks: feature extraction, classification and pathway prediction. The thesis starts with in silico identification of such toxins in pathogens. This is followed by the analysis of the effect of toxins on various metabolic and signaling pathways of the host.Identification of effector proteins has been achieved using feature extraction and classification techniques. A lot of work has been done in the prediction of Type III and Type IV effector proteins based on their primary structure. However, this is not the case for Type VI effector proteins. In this regard, the thesis first introduces a novel framework for fast and accurate identification of Type VI effector proteins based on their primary and secondary structures. While working on Type VI effectors, it came into our attention that no attempts have been made for prediction of effectors based on their three-dimensional structure. This thesis introduces a unique set of three-dimensional structural features and builds a novel predictor using them. Since the effector protein dataset was unbalanced, we have introduced a novel algorithm for oversampling of an unbalanced biological dataset, which does not eliminate samples as noise and ensure generation of synthetic samples strictly in the vicinity of the minority class samples. Integrating the unique feature set and the oversampling algorithm, a novel effector protein predictor has been developed. Due to the unavailability of three-dimensional structure of Type VII effector proteins and their importance in spreading pathogenesis in hosts, we have developed a deep neural network-based system to uniquely identify Type VII effectors. The system identifies effectors based on the primary and secondary structure of Type VII effectors.Identification of toxins remains incomplete if their effect on host is not investigated. In this regard, along with identification of toxins, analysis of the effect of perturbations on various pathways by the novel algorithms has been furnished in the thesis. A new structurebased automated metabolic pathway prediction algorithm has been introduced, which predicts a probable pathway considering a set of metabolites. This algorithm has been applied to metabolic pathways of the hosts to study the effect of toxins on them. Apart from metabolic pathways, toxins also affect signaling pathways. This perturbation has been studied, and a novel algorithm has been developed to quantify the effect of the perturbation on these signal- ing pathways. Overall, this thesis is dedicated to the design of computational algorithms to identify the toxins secreted by pathogens and the effect of these toxins on the host pathways.


ProQuest Collection ID:

Control Number


Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.


Included in

Mathematics Commons