Finding 3D Structure of Proteins Using Characteristics of Short Sequences.

Date of Submission

December 2005

Date of Award

Winter 12-12-2006

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science


Electronics and Communication Sciences Unit (ECSU-Kolkata)


Pal, Nikhil Ranjan (ECSU-Kolkata; ISI)

Abstract (Summary of the Work)

1.1 IntroductionProteins are the most structurally complex macromolecules known. They are long chain of molecules. They can be regarded as necklaces of 20 different amino acids that are arranged in different order to make chains of up to thousands of amino acids. The result is an extreme variety of proteins, each type with its own unique structure and function. In order to carry out their function, each protein must take a particular shape, known as its fold. When a protein is put into a solvent, within a very short time it takes a particular 3D shape. This self assembling process is called folding.Sometimes the proteins misfold (i.e. do not fold correctly) and they can aggregate. Aggregation of misfolded proteins is believed to be the cause of some disorders such as Alzheimers lissuses, ParkiIns clisense, prion disease (e.g., mad cow discase) and soue caneers. The diverse range of diseases that results from protein misfolding has made this a subject of in- tense investigation: learning how proteins fold will teach us how to design protein-sized nano-machines that can do similar tasks and it will help us to prevent or reverse diseases in which proteins have departed from the correct folding route. However, it is very time consuming to find the 3D structure of a protein using X-ray Crystallographıy or Nuclear Magnetic Resonance(NMR) imaging. Hence, researchers are working on finding computational methods for protein fold prediction. In this thesis we shall propose some methods to predict 3D structures of proteins from its amino acid sequences exploiting statistical information available in proteins with known 3D structures. In particular we made the following contributions.1. We proposed a mechanism for generation of self-orgnnizing map for structures called Structural Self-Organizing Map(SSOM). This method can be applied in arcas other than protein folding also.2. We proposed a modified form of mountaiu clustering called Structural Mountain Clustering Method(SMCM) that is very effective for the prob- lem understudy and is simpler.3. The Structural Self-Organizing Mlap is then augmented by two subelus- tering methods resulting in two schemes for building block generation.4. We applied these three new methods to find representative hexaners from a given data base and compared performanwe of the proposeed schemes to an existing method.5. We then used the extracted hexamers to reconstruct some proteins. The results are quite good.


ProQuest Collection ID:

Control Number


Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.


This document is currently not available here.