Two-stage copula-driven, structure-aware gene selection for single-cell analysis

Article Type

Research Article

Publication Title

Array

Abstract

Single-cell RNA sequencing data are inherently sparse due to low starting RNA quantities and limited per-cell sequencing depth. This sparsity, combined with substantial cell-to-cell heterogeneity (arising from differences in cell cycle stage, cellular morphology, and reagent conditions) and pervasive technical noise, makes the selection of informative genes for downstream clustering extremely challenging. Copula-based modeling provides a powerful means to capture complex dependencies in such high-dimensional, sparse data. Motivated by these challenges, we propose a two-stage Structure-Aware, Copula-Guided Feature Selection (SCopFS) method for single-cell analysis. In the first stage, SCopFS employs a structure-aware sampling strategy via locality-sensitive hashing (LSH) to identify a suboptimal subset of highly relevant genes. In the second stage, it refines this subset using a copula-based multivariate dependency measure with iterative forward selection to ensure minimal redundancy. The resulting approach can identify stable and informative gene sets even in small scRNA-seq datasets while preserving intrinsic gene–gene dependency structures. By retaining expression information across a large number of genes, SCopFS outperforms state-of-the-art feature selection methods in downstream clustering tasks. Availability: All the codes and information are available at https://github.com/Snehalikalall/UCFS.

DOI

10.1016/j.array.2025.100623

Publication Date

12-1-2025

Share

COinS