ViScoreR: label-based evaluation of dimensionality reduction by detecting local distortions

ViScoreR: label-based evaluation of dimensionality reduction by detecting local distortions


Author(s): David Novak, Sofie Van Gassen, Yvan Saeys

Affiliation(s): FWO & Inflammation Research Center, VIB-UGent

Social media: https://twitter.com/dwdnvwk

Dimensionality reduction (DR) of single-cell data (flow cytometry, CyTOF, scRNA-seq, CITE-seq) is important for visualisation and, increasingly, downstream structure learning. A growing number of non-linear DR techniques (t-SNE, UMAP, PHATE, TriMap) can transform data into informative low-dimensional embeddings quickly, with different emphasis on local and global structure preservation. In addition to unsupervised evaluation criteria for DR (LCMC, RNX curves), supervised scoring metrics can be used to inspect sources of error in embeddings of data for which (some) cell labels are available. We propose new evaluation methods for inspecting sources of error in terms of shape distortion and positional error of populations. We build on the previously proposed Neighbourhood Proportion Error (NPE) to provide population-level, rather than embedding-level, scores, and devise an algorithm for creating ‘neighbourhood composition plots’, which allow the user to inspect both ground-truth local neighbourhoods of cell populations and corresponding neighbourhoods in embeddings of the data. These metrics are implemented in a new R package ViScoreR.