consICA: multimodal data deconvolution, integration and elucidation of biological processes in cancer research
Author(s): Maryna Chepeleva,Tony Kaoma,Arnaud Muller,Sang-Yoon Kim,Vladimir Despotovic,Reka Toth,Petr V Nazarov
Affiliation(s): Luxembourg Institute of Health
Analysing cancer-related multiomics data, we need to overcome data complexity, tumor heterogeneity and technical biases which mask the important biological signals. This complexity is caused by natural variability in cell type proportions and clonal among tumor cells. Additionally, technical biases between experimental platforms may limit the direct comparison of patient data coming from different sources, especially mapping to large public datasets. We developed a Bioconductor (release 3.16) package consICA. It is based on the consensus independent component analysis (ICA) and allows projecting new samples into the space defined by biological signals in larger reference datasets, simultaneously correcting for technical biases. consICA is applicable both to bulk-sample and single-cell datasets and, in addition to a stable reference-free decomposition, includes instruments for the signal explanation. It allows to estimate variance for each component, run functional enrichment analysis, and link components to clinical factors and patient survival. Different nomenclatures for gene names are automatically processed in the functional annotation. consICA can be used to engineer features for machine learning models aimed at patient stratification, and furthermore provides pdf reports with explanations of independent components for easier interpretation of extracted signals. The approach was validated on in-house and public datasets (TCGA, GTEx, various single-cell RNA-seq) that included transcriptomics, epigenomics, miRNAs, proteomics and whole slide H&E-stained images (preprocessed using deep-learning models). The independent molecular signals showed stronger predictive features than individual genes and can be better linked to image-based features.