Identification and analysis of gene and genome duplications with the doubletrouble Bioconductor package
Author(s): Fabrício Almeida-Silva,Yves Van de Peer
Affiliation(s): VIB-UGent Center for Plant Systems Biology
Social media: https://twitter.com/almeidasilvaf
Gene and genome duplications are a source of raw genetic material for evolution. However, whole-genome duplications (WGD) and small-scale duplications (SSD) contribute to genome evolution in different manners. Here, we present doubletrouble, an R/Bioconductor package that allows the identification and classification of duplicated genes from whole-genome protein sequences. The software provides classification schemes that can identify genes derived from WGD, tandem duplications, proximal duplications, transposon-derived duplications, and dispersed duplications. In addition, the software allows users to calculate substitution rates per substitution site (i.e., Ka and Ks) from duplicate pairs, find peaks in Ks distributions with Gaussian Mixture Models (GMMs), and classify gene pairs into age groups based on Ks peaks. Finally, we demonstrate the effectiveness of doubletrouble by identifying and classifying duplicate gene pairs for all species in Ensembl Genomes instances. Finally, to facilitate data reuse, we created a Shiny app that allows easy access to the sets of duplicate gene pairs for Ensembl Genomes species. doubletrouble offers a valuable tool kit for studying the contribution of WGD and SSD to genome evolution.