Linear models for single-cell proteomics
Author(s): Christophe Vanderaa,Laurent Gatto
Affiliation(s): UCLouvain
Social media: https://twitter.com/c_vanderaa
Mass spectrometry (MS)-based single-cell proteomics (SCP) has become a credible player in the single-cell biology arena [1,2]. Continuous technical improvements have pushed the boundaries of sensitivity and throughput. However, the computational efforts to support the analysis of these complex data have been missing. Strong batch effects coupled to high proportions of missing values complicate the analysis, causing strong entanglement between biological and technical variability [3,4]. We propose a simple, yet powerful approach to address this need: linear models. We use linear regression to model and remove undesired technical factors while retaining the biological variability, even in the presence of high proportions of missing values. The key advantage of linear models lies in the interpretability of the results they generate. Inspired by previous research [5], we streamlined modelling and exploration of the patterns induced by known technical and biological factors. The exploration enables a thorough assessment of the model coefficients, and highlights key factors influencing SCP experiments. Further exploration of the unmodelled variance recovers unknown but biologically relevant patterns in the data, leveraging the power of single-cell proteomics technologies. We successfully applied our approach to a diverse collection of SCP datasets [6], and could demonstrate that it is also amenable for integrating datasets acquired using different technologies. We implemented and documented this approach in our Bioconductor package scp [7]. In summary, our approach represents a turning point for principled SCP data analysis, moving the tension point from how to perform the analysis to result generation and interpretation. [1] “Single-Cell Proteomics: Challenges and Prospects.” 2023. Nature Methods 20 (3): 317–18. [2] Bennett HM, Stephenson W, Rose CM, and Darmanis S. 2023. “Single-Cell Proteomics Enabled by next-Generation Sequencing or Mass Spectrometry.” Nature Methods, March. [3] Vanderaa C, and Gatto L. 2021. “Replication of Single-Cell Proteomics Data Reveals Important Computational Challenges.” Expert Review of Proteomics, October, 1–9. [4] Vanderaa C, and Gatto L. 2023. “The Current State of Single-Cell Proteomics Data Analysis.” Current Protocols 3 (1): e658. [5] Thiel M, Féraud B, Govaerts B. 2017. “ASCA+ and APCA+: Extensions of ASCA and APCA in the Analysis of Unbalanced Multifactorial Designs.” Journal of Chemometrics 31 (6): e2895. [6] Vanderaa C, and Gatto L.. scpdata: Single-Cell Proteomics Data Package. R package verison 1.6.0, <https://bioconductor.org/packages/release/data/experiment/html/scpdata.html>. [7] Vanderaa C, and Gatto L.. scp: Mass Spectrometry-Based Single-Cell Proteomics Data Analysis. R package version 1.8.0, <https://bioconductor.org/packages/release/bioc/html/scp.html>.