msqrob2PTM: differential abundance and differential usage analysis of MS-based proteomics data at the post-translational modification and peptidoform level
Author(s): Nina Demeulemeester,Lennart Martens,Lieven Clement
Affiliation(s): Ghent University - VIB
Novel multiple open-modification search engines that were developed in the proteomics community boost the identification of post-translational modifications (PTMs) with mass spectrometry (MS) based technologies. These developments can switch proteomics research in the next gear as PTMs are key switches in many cellular pathways that play vital roles in cell proliferation, migration, metastasis and ageing. However, despite the advances in PTM identification, statistical methods for sensitive PTM-level quantification and differential analysis are lagging behind. The lack of such methods can partly be explained by the inherently low abundance of PTMs in the sample and the confounding of PTM intensities with its parent protein abundance. Therefore, we have developed msqrob2PTM, a new workflow in the msqrob2 universe capable of differential abundance analysis at the PTM, as well as at the peptidoform (unique peptide-PTM combination) level. We argue that the analysis and data exploration at the peptidoform level is important for validating significant PTMs. Indeed, when assessing individual PTMs, one typically summarizes all peptidoforms carrying a particular PTM into one PTM expression value, e.g. by averaging or taking the median. However, peptidoforms can carry multiple PTMs. It is therefore important to evaluate and visualize all peptidoforms that contribute to a PTM to rule out significant PTMs that only stem from a subset of significant peptidoforms carrying another PTM, hinting that it might actually be the other PTM that is driving the differential abundance. Our workflows can flag both Differential Peptidoform (PTM) Abundance (DPA) and Differential Peptidoform (PTM) Usage (DPU) to enable a clear distinction between directly assessing DA of peptidoforms (DPA), and differences in the relative usage of peptidoforms regarding the overall abundance of the corresponding protein (DPU). For DPA, we directly model the log2-transformed peptidoform (PTM) intensities, while for DPU, we correct for parent protein abundance by an intermediate normalization step which calculates the log2-ratio of the peptidoform (PTM) intensities to their summarized parent protein intensities before doing the statistical analysis. We demonstrated the utility and performance of msqrob2PTM by applying it on both simulated datasets, spike-in datasets with known ground truth as well as biological PTM-rich datasets. Our results show that msqrobPTM is on par or surpassing the performance of the current state-of-the-art method, MSstatsPTM. Moreover, our workflows have the advantage over MSstatsPTM that they also provide output at the peptidoform level, which is key to validate the relevance of the returned PTMs and provided a more fine-grained resolution.