De novo functional transcriptomics with RNA-seq and Ribo-seq

De novo functional transcriptomics with RNA-seq and Ribo-seq


Author(s): Roberto Albanese

Affiliation(s): Human Technopole



Our genome sequencing and assembly capabilities have hugely increased. However, annotating our genome is still challenging. Genes can produce multiple isoforms, which may have different expression levels and roles. Transcript functions can be elucidated by profiling ribosome positions with the protocol Ribo-seq. By using ribosome profiling data, it is possible to study the fates of cytoplasmic transcripts and to quantify translational levels. Moreover, data coming from proteomics technologies can support the functional characterization of coding transcript isoforms. The main goal of this project is the detection and characterization of sample-specific and condition-specific transcripts, ORFs, and proteins. We implemented a computational pipeline for de novo transcriptome assembly and de novo detection of Open Reading Frames (ORFs). We use a proteogenomic approach for the identification and quantification of peptides and proteins, and we perform differential expression analyses at the peptide and protein levels. We demonstrate that our computational workflow can be used to study the effects on gene expression of DUX4 activation in human skeletal muscle cells. DUX4 misexpression in skeletal muscle is known to cause facioscapulohumeral muscular dystrophy (FSHD). It also inhibits Nonsense-Mediated Decay (NMD), thus leading to the accumulation of incomplete transcripts and truncated proteins. While we use RNA-seq data to assemble known and novel transcripts, we exploit Ribo-seq data to identify unannotated translated transcript regions. Finally, by using a custom protein database as well as a proteomic dataset matching the time-points of the transcriptomic dataset, we identify differentially expressed novel proteins otherwise invisible to classical analysis methods. In conclusion, this work shows that integration of data about different steps of the regulatory cascade helps the identification, quantification and functional characterization of human transcripts and proteins.