Master’s Dissertations

Knowledge Discovery from Gene Expression Data in a Computational Intelligent Framework :Identifying marker genes and Cancer Subtypes.

Shyam Sudhakar Chaturvedi, Indian Statistical InstituteFollow

Date of Submission

December 2005

Date of Award

Winter 12-12-2006

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science

Department

Electronics and Communication Sciences Unit (ECSU-Kolkata)

Supervisor

Pal, Nikhil Ranjan (ECSU-Kolkata; ISI)

Abstract (Summary of the Work)

There has been a substantial improvement in cancer classification over last decades.But, there is no general approach for identifying new cancer types(class discovery) or for assigning a tumor to known classes (class prediction).There is no well accepted method for identifying "marker genes". Traditional histological classification of cancer sub- type is informative, but incomplete. Recent studies of gene expressions suggest that molecular classification can be used for effective diagnosis and prediction of the cancer type and treatment outcome. Here, we have made a study on microarray gene expression data of Lung Cancer with a view to discovering two types of knowledge : finding cancer subtypes(class discovery) and finding marker genes. In the con- text of the first problem the effect of various normalization schemes are studied in conjunction with different clustering algorithms. Experimentally, we found that because of the high dimensionality of the data c-means type clustering algorithms and their variants are not found to be very effective. So we applied a feature selection algorithm to reduce the dimension. Typically researchers use unsupervised feature selection for class discovery. Such features does not ensure class discriminating power. So we take a different route. We first find cancer subtypes(class discovery) and finding marker genes. In the con- text of the first problem the effect of various normalization schemes are studied in conjunction with different clustering algorithms. Experimentally, we found that because of the high dimensionality of the data c-means type clustering algorithms and their variants are not found to be very effective. So we applied a feature selection algorithm to reduce the dimension. Typically researchers use unsupervised feature selection for class discovery. Such features does not ensure class discriminating power. So we take a different route. We first find the marker genes. These reduces the dimensionality retaining the class discriminating power of the genes. These marker genes are then used to discover cancer subtypes. We could find just nine informative features(genes) to preserve the discriminating power. However, for class discovery we considered fifteen important features. Application of various clustering algorithms(C-means, Fuzzy c-means and Gustafson-Kessel method) and self organizing feature map (SOFM) finds clusters that are generally consistent with the histological classification. The analysis reveals previously defined types, subtypes and many additional details of Lung Cancer.

Comments

ProQuest Collection ID: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:28843236

Control Number

ISI-DISS-2005-147

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

DOI

http://dspace.isical.ac.in:8080/jspui/handle/10263/6316

Recommended Citation

Chaturvedi, Shyam Sudhakar, "Knowledge Discovery from Gene Expression Data in a Computational Intelligent Framework :Identifying marker genes and Cancer Subtypes." (2006). Master’s Dissertations. 213.
https://digitalcommons.isical.ac.in/masters-dissertations/213

This document is currently not available here.

COinS

Master’s Dissertations

Knowledge Discovery from Gene Expression Data in a Computational Intelligent Framework :Identifying marker genes and Cancer Subtypes.

Date of Submission

Date of Award

Institute Name (Publisher)

Document Type

Degree Name

Subject Name

Department

Supervisor

Abstract (Summary of the Work)

Comments

Control Number

Creative Commons License

DOI

Recommended Citation

Browse

Search

Author Corner

Links

Master’s Dissertations

Knowledge Discovery from Gene Expression Data in a Computational Intelligent Framework :Identifying marker genes and Cancer Subtypes.

Author (Researcher Name)

Date of Submission

Date of Award

Institute Name (Publisher)

Document Type

Degree Name

Subject Name

Department

Supervisor

Abstract (Summary of the Work)

Comments

Control Number

Creative Commons License

DOI

Recommended Citation

Share

Browse

Search

Author Corner

Links