CytoPipeline: Building and visualizing automated pre-processing and quality control pipelines for flow cytometry data

CytoPipeline: Building and visualizing automated pre-processing and quality control pipelines for flow cytometry data


Author(s): Philippe Hauchamps,Dan Lin,Laurent Gatto

Affiliation(s): Computation Biology and Bioinformatics (CBIO) Unit, de Duve Institute, UCLouvain, Belgium



With the increase of the dimensionality in conventional flow cytometry data over the past years, there is a growing need to replace or complement traditional manual analysis (i.e. iterative 2D gating) with automated data analysis pipelines. Examples of such pipelines have been documented in the recent literature (e.g. [1],[2],[3]). A crucial part of these pipelines consists of pre-processing and applying quality control filtering to the raw data, in order to use high quality events in the downstream statistical analysis. This part can in turn be split into a number of elementary steps : margin events removal, signal compensation, scale transformations, debris and dead cells removal, batch effect correction,… However, when designing automated flow cytometry data analysis pipelines, assembling and assessing the pre-processing part can be challenging for a number of reasons. First, each of the involved elementary steps can be implemented using various methods and R packages. Second, the order of the steps can have an impact on the downstream analysis results. Finally, each method typically comes with its specific, unstandardized diagnostic and visualizations, making objective comparison difficult for the end user. Here, we present CytoPipeline, an R package suite for building, assessing and comparing pre-processing pipelines for flow cytometry data. To exemplify our tool, we present the steps involved in designing a pre-processing pipeline on a real life dataset and demonstrate the visualization utilities. We also show how CytoPipeline can nicely complement benchmarking tools, like e.g. PipeComp [4], by providing user intuitive insight into benchmarking results. References: [1] Quintelier, Katrien, Artuur Couckuyt, Annelies Emmaneel, Joachim Aerts, Yvan Saeys, and Sofie Van Gassen. 2021. “Analyzing High-Dimensional Cytometry Data Using FlowSOM.” Nature Protocols 16 (8): 3775–3801. [2] Ashhurst, Thomas Myles, Felix Marsh-Wakefield, Givanna Haryono Putri, Alanna Gabrielle Spiteri, Diana Shinko, Mark Norman Read, Adrian Lloyd Smith, and Nicholas Jonathan Cole King. 2021. “Integration, Exploration, and Analysis of High-Dimensional Single-Cell Cytometry Data Using Spectre.” Cytometry. Part A: The Journal of the International Society for Analytical Cytology, no. cyto.a.24350 (April). https://doi.org/10.1002/cyto.a.24350. [3] Nowicka, Malgorzata, Carsten Krieg, Helena L. Crowell, Lukas M. Weber, Felix J. Hartmann, Silvia Guglietta, Burkhard Becher, Mitchell P. Levesque, and Mark D. Robinson. 2017. “CyTOF Workflow: Differential Discovery in High-Throughput High-Dimensional Cytometry Datasets.” F1000Research 6 (May): 748. [4] Germain, Pierre-Luc, Anthony Sonrel, and Mark D. Robinson. 2020. “pipeComp, a General Framework for the Evaluation of Computational Pipelines, Reveals Performant Single Cell RNA-Seq Preprocessing Tools.” Genome Biology 21 (1): 227.