COTAN v2: a Comprehensive and Versatile Framework for Single-Cell Gene Co-Expression Studies and Cell Type Identification
Author(s): Silvia Giulia Galfre',Marco Fantozzi,Daniel Puttini,Corrado Priami,Francesco Morandin
Affiliation(s): University of Pisa
The estimation of gene co-expression in single-cell RNA sequencing (scRNA-seq) is a critical step in the analysis of scRNA-seq data. The low efficiency of scRNA-seq methodologies makes sensitive computational approaches crucial to accurately infer transcription profiles in a cell population. COTAN is a statistical and computational method that analyzes the co-expression of gene pairs at the single-cell level. COTAN employs an innovative mathematical model that leads to a generalized contingency table framework. COTAN relies on the zero unique molecular identifier (UMI) counts distribution instead of focusing on positive counts to evaluate or extract different scores and information for gene correlation studies and gene or cell clustering. COTAN assesses whether gene pairs are correlated or anti-correlated, providing a new correlation index with an approximate p-value for the associated test of independence. It also checks whether single genes are differentially expressed, scoring them with a newly defined global differentiation index (GDI). COTAN plots and clusters genes according to their co-expression pattern with other genes to study gene interactions and identify cell-identity markers. Through the GDI, COTAN assesses whether a cell cluster is homogeneous or not, making it a valuable tool for cell clustering and assignment. COTAN v2 introduces a novel feature that uses gene GDI values to assess the biological uniformity of a cell cluster. This feature allows researchers to apply an iterative cell clustering pipeline and achieve a finer resolution of uniform clusters. COTAN shows high sensitivity in extracting information from small clusters and lowly expressed genes. Furthermore, COTAN leverages its contingency table framework to directly identify genes that are over-represented or under-represented in the cluster with respect to the rest of the dataset. COTAN computes an enrichment score for a given list of marker genes, which can be used to identify and merge small uniform clusters and to check a final cluster identification. The latest version of COTAN includes new functions and plots to check and clean the dataset and several visualization tools to help users explore and interpret their data. COTAN has a user-friendly interface that is easy to use and does not require extensive programming skills. The strength of COTAN is its ability to help researchers better understand scRNA-seq data. By identifying gene modules, cell types, and new marker genes, researchers gain insights into the underlying biology of their samples. This helps disease diagnosis, drug discovery, and other applications. In summary, COTAN is a powerful and versatile tool for the analysis of scRNA-seq data, with the potential to facilitate the discovery of new cell types and biological insights.