Analysis of multi-condition single-cell data with latent embedding multivariate regression
Author(s): Constantin Ahlmann-Eltze,Wolfgang Huber
Affiliation(s): EMBL Heidelberg
Social media: https://twitter.com/const_ae
Single-cell RNA sequencing with data from multiple biological conditions enables studying the response heterogeneity of a complex tissue to a treatment. Current approaches divide the cells into discrete groups and identify differentially expressed genes between corresponding groups. Here, we propose a method that operates without such grouping. Latent embedding multivariate regression (LEMUR) factorizes the logarithmized count matrix like principal component analysis (PCA) while at the same time accounting for the known covariates per cell. The method combines ideas from differential geometry on Grassmann manifolds with linear regression. We use LEMUR to study the effects of panobinostat, a non-selective HDAC inhibitor, on brain tumor biopsies. LEMUR regresses out the treatment-induced heterogeneity and uncovers the heterogeneity of the tumor and microenvironment cells shared across patients. We use LEMUR to detect drug-induced gene expression responses affecting subsets of cells in a continuous latent space that does not require clustering the cells. We find that the upregulation of TCEAL2 is limited to transcriptionally active tumor cells, whereas TRIM47 is downregulated across all tumor cells, including cells that show high expression of stress-response genes. We implemented LEMUR as an R package, that implements a toolbox for performing differential geometry on some common manifolds, with a user-friendly interface to analyze multi-condition single-cell data.