Semi-supervised probabilistic Factor Analysis (spFA) to uncover novel axes of variation in multi-omics data sets

Semi-supervised probabilistic Factor Analysis (spFA) to uncover novel axes of variation in multi-omics data sets


Author(s): Tümay Capraz, Wolfgang Huber

Affiliation(s): EMBL Heidelberg



High-throughput multi-omics techniques have revolutionised our understanding of how cells work at the molecular level. These powerful tools enable comprehensive analysis of genes, proteins, metabolites, and other biological molecules on a large scale, offering exciting possibilities in precision medicine and biomarker discovery. However, the data's complexity makes interpretation challenging due to its high-dimensionality. Here we present semi-supervised probabilistic Factor Analysis (spFA), a multi-omics integration method, which infers a set of low dimensional latent factors that capture the main sources of biological and technical variability. spFA enables the discovery of primary sources of variation while adjusting for known covariates and simultaneously disentangling variation that is shared between multiple omics modalities and specific to single modalities. We applied spFA to breast cancer and chronic lymphocytic leukaemia multi-omics studies, showcasing its ability to adjust for known covariates, while finding factors that capture novel sources of variation. The inferred factors are predictive of treatment outcomes, but orthogonal to known biomarkers. We anticipate that spFA will simplify the discovery of novel axes of variation in multi-omic data sets independent of established biomarkers.