T.A.R.D.I.S.: Targeted Analysis and Raw Data Integration in Mass Spectrometry
Author(s): Pablo Vangeenderhuysen,Beata Pomian,Marilyn De Graeve,Lieselot Y. Hemeryck,Lynn Vanhaecke
Affiliation(s): Laboratory of Integrative Metabolomics (LIMET), Department of Translational Physiology, Infectiology and Public Health, Faculty of Veterinary Medicine, Ghent University, Merelbeke, Belgium
Social media: https://github.com/pablovgd
In recent years, ultra-high-performance liquid chromatography, coupled to high-resolution mass spectrometry (UHPLC-HRMS) has risen as the main method to measure small molecules that directly reflect the outcome of complex biochemical reactions in biological systems. These biomolecules, also referred to as the metabolome, are regarded as the endpoint of the biological cascade and are acknowledged to be the best reflection of the biological phenotype. To process, analyze and interpret these complex data, computational methods became of paramount importance. This research area is now known as computational metabolomics, an interdisciplinary science at the intersection of computer science, bioinformatics, chemistry, medicine, and biology. Within the field, significant efforts have been made in the development of tools to acquire and process spectral data, annotate spectra to identify molecules, perform pathway analysis, and integrate multi-omics data. Processing of spectral data is often referred to as peak picking and can be performed in a targeted (to detect known features of interest) and an untargeted fashion (all possible known and unknown features in the data). Many tools have been developed for untargeted processing, including MZmine 3, MS-DIAL 5, and XCMS. However, for targeted processing, efficient data processing tools that circumvent manual processing with vendor-specific software are lacking. Here we present the development of T.A.R.D.I.S., an open-source R package for Targeted Analysis and Raw Data Integration in Spectrometry. In previous work (preprint at https://doi.org/10.26434/chemrxiv-2023-5f252-v2), we applied an in-house targeted peak extraction algorithm (TaPEx), developed in R. TaPEx automatically processes a list of targeted compounds (defined by retention time and mass-to-charge ratio) and corrects for retention time drift using quality control (QC) samples. TaPEx successfully retrieved 80.4% of consistently detected compounds in a set of validation samples (manually verified by an experienced analyst using vendor software). TaPEx was benchmarked against vendor-specific software and an IPO/XCMS-based pipeline using Lifelines Deep cohort samples (n=97) and outperformed both approaches in detecting targeted compounds (81.3 vs. 56.7-66.0%), especially in the detection of lipids, organic acids, nucleosides, organic nitrogen- and heterocyclic compounds. Finally, TaPEx has been applied to the Flemish Gut Flora Project cohort (n=292) samples. Combined with an optimized extraction protocol, TaPEx reduced the sample-to-result time by 60%. The presented workflow offers an efficient, high-throughput methodology for comprehensive gut phenotyping, exemplified by application in two cohorts. Now, we aim to improve TaPEx significantly by adding peak quality control metrics to the algorithm, increasing its computational efficiency, and benchmarking more rigorously. TaPEx will be incorporated in T.A.R.D.I.S., for which we also aim to add elements of widely used peak picking algorithms such as centWave and ADAP and offer an easy-to-use GUI for users less experienced in R. Furthermore, we are exploring tools to predict retention time of compounds, which would improve retrieval rates significantly. T.A.R.D.I.S. is developed so that is suitable for different applications of UHPLC-HRMS, such as metabolomics, lipidomics, steroidomics, and DNA adductomics. Development of the package can be followed at https://github.com/pablovgd/T.A.R.D.I.S. Finally, we aim to benchmark the performance of the package, gather feedback from the community and submit it to Bioconductor.