Multi-omics integration: a regression based approach

Multi-omics integration: a regression based approach


Author(s): Angelo Velle,Nicolò Gnoato,Ilaria Billato,Stefania Pirrotta,Enrica Calura,Chiara Romualdi

Affiliation(s): Department of Biology, University of Padova

Social media: https://twitter.com/angelo_velle

In the last years an increasing number of new techniques for omics data acquisition have been developed, so today we have access to large datasets containing different omics such as gene expression, methylation, copy number variation and miRNA expression data. For example, TCGA gives access to all these kinds of data for thousands of tumor samples. The major challenge nowadays is to capture all the information contained in these data. Most of the standard approaches for omics data modeling rely on the comparison of just one data type among different groups of samples. Since the different omics are biologically related, it is fundamental to statistically detect their interplay. In order to do that we need statistical models that take into account all the omics, trying to detect the key molecular players involved. To solve this need, in this work we provide an easy to use R package for omics data integration. The package will provide integration models for gene expression, miRNA expression, DNA methylation, Copy Number Variations and Transcription Factor (TF). Moreover, every kind of variable representing a gene expression regulator can be passed to the package as a TF. The package is designed to detect the association between the expression of a target and of its regulators, taking into account also their genomics modifications such as Copy Number Variations and methylation. For RNA sequencing data, the counts will be fitted using a negative binomial model, while in the case of microarray or other types of data, a linear model will be applied. In some cases the number of regulators for a given target could be very high, in order to handle this eventuality, we provide a penalized linear model that will automatically keep only the most important regulators. We are evaluating the possibility of adding more models for the integration, expanding it also to single cell data. In order to make the interpretation of the models straightforward for the users, we provide plots and tables that highlight the most important interactions and patterns across the analyzed omics. The package presented in this work provides a solid and easy to use way to solve the problem of multi omics integration allowing to detect and visualize their interplay.