Knowledge Discovery from Gene Expression Data in a Computational Intelligent Framework :Identifying marker genes and Cancer Subtypes.
Date of Submission
December 2005
Date of Award
Winter 12-12-2006
Institute Name (Publisher)
Indian Statistical Institute
Document Type
Master's Dissertation
Degree Name
Master of Technology
Subject Name
Computer Science
Department
Electronics and Communication Sciences Unit (ECSU-Kolkata)
Supervisor
Pal, Nikhil Ranjan (ECSU-Kolkata; ISI)
Abstract (Summary of the Work)
There has been a substantial improvement in cancer classification over last decades.But, there is no general approach for identifying new cancer types(class discovery) or for assigning a tumor to known classes (class prediction).There is no well accepted method for identifying "marker genes". Traditional histological classification of cancer sub- type is informative, but incomplete. Recent studies of gene expressions suggest that molecular classification can be used for effective diagnosis and prediction of the cancer type and treatment outcome. Here, we have made a study on microarray gene expression data of Lung Cancer with a view to discovering two types of knowledge : finding cancer subtypes(class discovery) and finding marker genes. In the con- text of the first problem the effect of various normalization schemes are studied in conjunction with different clustering algorithms. Experimentally, we found that because of the high dimensionality of the data c-means type clustering algorithms and their variants are not found to be very effective. So we applied a feature selection algorithm to reduce the dimension. Typically researchers use unsupervised feature selection for class discovery. Such features does not ensure class discriminating power. So we take a different route. We first find cancer subtypes(class discovery) and finding marker genes. In the con- text of the first problem the effect of various normalization schemes are studied in conjunction with different clustering algorithms. Experimentally, we found that because of the high dimensionality of the data c-means type clustering algorithms and their variants are not found to be very effective. So we applied a feature selection algorithm to reduce the dimension. Typically researchers use unsupervised feature selection for class discovery. Such features does not ensure class discriminating power. So we take a different route. We first find the marker genes. These reduces the dimensionality retaining the class discriminating power of the genes. These marker genes are then used to discover cancer subtypes. We could find just nine informative features(genes) to preserve the discriminating power. However, for class discovery we considered fifteen important features. Application of various clustering algorithms(C-means, Fuzzy c-means and Gustafson-Kessel method) and self organizing feature map (SOFM) finds clusters that are generally consistent with the histological classification. The analysis reveals previously defined types, subtypes and many additional details of Lung Cancer.
Control Number
ISI-DISS-2005-147
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
DOI
http://dspace.isical.ac.in:8080/jspui/handle/10263/6316
Recommended Citation
Chaturvedi, Shyam Sudhakar, "Knowledge Discovery from Gene Expression Data in a Computational Intelligent Framework :Identifying marker genes and Cancer Subtypes." (2006). Master’s Dissertations. 213.
https://digitalcommons.isical.ac.in/masters-dissertations/213
Comments
ProQuest Collection ID: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:28843236