Low-Rank Joint Subspace Construction for Cancer Subtype Discovery

Article Type

Research Article

Publication Title

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Abstract

Multimodal data integration is an important framework for cancer subtype discovery as it can blend the inherent properties of individual modalities with their cross-platform correlations to infer clinically relevant subtypes. The main problem here is the appropriate selection of relevant and complementary modalities. Another problem is the 'high dimension-low sample size' nature of each modality. The current research work proposes a novel algorithm to construct a low-rank joint subspace from the low-rank subspaces of individual high-dimensional modalities. Statistical hypothesis testing is introduced to effectively estimate the rank of each modality by separating the signal component from its noise counterpart. Two quantitative indices are proposed to evaluate the quality of different modalities, the first one assesses the degree of relevance of the cluster structure embedded within each modality, while the second measure evaluates the amount of cluster information shared between two modalities. To construct the joint subspace, the algorithm selects the most relevant modalities with maximum shared information. During data integration, the intersection between two subspaces is also considered to select cluster information and filter out the noise from different subspaces. The efficacy of clustering on the joint subspace, extracted by the proposed algorithm, is compared with that of several existing integrative clustering approaches on real-life multimodal data sets. Experimental results show that the identified subtypes have closer resemblance with the clinically established subtypes as compared to the subtypes identified by the existing approaches. Survival analysis has revealed the significant differences between survival profiles of the identified subtypes, while robustness analysis shows that the identified subtypes are not sensitive towards perturbation of the data sets.

First Page

1290

Last Page

1302

DOI

10.1109/TCBB.2019.2894635

Publication Date

7-1-2020

This document is currently not available here.

Share

COinS