SpaceTrooper is an R/Bioconductor package
for Quality Control (QC) of imaging-based spatial transcriptomics and
proteomics data. It provides multi-platform data harmonization,
cell-level QC, and visualization utilities. The package leverages SpatialExperiment
objects to support data from CosMx,
Xenium, and MERFISH technologies.
In this section, we show how to run the standard
SpaceTrooper workflow on
Spatial Proteomics data obtained with CosMx Protein Assay, to perform QC
by computing the Quality Score (QS).
For demonstration purposes, we showcase a small subset of the CosMx human tonsil dataset analyzed also in the paper.
SpaceTrooper provides a reading function to load data
into a SpatialExperiment object. For more details on how to
correctly specify the inputs, please refer to the SpaceTrooper
utilities vignette.
library(SpaceTrooper)
library(ggplot2)
protfolder <- system.file( "extdata", "S01_prot", package="SpaceTrooper")
spe <- readCosmxProteinSPE(protfolder, sampleName = "CosMx_Protein_Tonsil")
spe## class: SpatialExperiment
## dim: 69 1793
## metadata(4): fov_positions fov_dim polygons technology
## assays(1): counts
## rownames(69): 4-1BB B7-H3 ... Ms IgG1 Rb IgG
## rowData names(0):
## colnames(1793): f344_c1 f344_c10 ... f344_c998 f344_c999
## colData names(58): fov cellID ... cell sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : CenterX_global_px CenterY_global_px
## imgData names(1): sample_id
In this section, we show how to plot each cell position onto the map of the Fields of View (FOVs), whose coordinates are provided only for CosMx technology.
IMPORTANT: we have noticed that according to CosMx version, there could be a misalignment of FOVs and cell centroids, that can be easily corrected with a single line of code. Therefore, it is crucial to generate this plot to check for any spatial shift, as such misalignment directly affects the QS computation.
If any misalignment is observed, it can be corrected by adjusting the FoV coordinates directly. And this is exactly the case. Indeed, FoVs are shifted upward by one FoV height (which, for CosMx technology, corresponds to 4,256 pixels). You can correct this by subtracting that value from the FoVs y coordinates:
# code line for shift correction
metadata(spe)$fov_positions$y_global_px <- metadata(spe)$fov_positions$y_global_px - 4256Rerun the FoV plot to check whether the shift was corrected.
Cell centroids, shown in dark red, now are all contained by FoV boundaries, hence FoVs and cell centroids are aligned after shift correction.
The dataset is a subset with just a single FoVs, whose number is displayed at the center.
When an experiment has multiple FoVs, you can see the map and the topological organization of the FoVs, together with their numbers.
In this section we show how to load polygons after
SpatialExperiment creation, only for visualization
purposes. They are stored as an sf object within
colData.
This step is not mandatory for CosMx, because the pipeline can be executed even without them.
Please pay attention to any warnings. Cell polygons with fewer than
four vertices cannot be handled by the geometry packages used by
SpaceTrooper. Therefore, the corresponding cells are
discarded from the SpatialExperiment object.
The following sections will work the same way for all types of data.
The spatialPerCellQC function computes additional
metrics per each cell, that are saved inside the
SpatialExperimentand accessible through
colData(spe). It is mandatory to run this function before
computing QS.
The negProbList parameter is, by default, a vector
containing all control probe patterns encountered so far across the
supported technologies. Because these patterns continue to evolve, some
may not yet be included. If you find that your control probe patterns
are missing from the default list, you can define a custom vector, as
shown below.
By default, the function automatically removes 0 count cells, but
this can be handled with the rmZeros parameter.
spe <- spatialPerCellQC(spe, rmZeros=TRUE,
negProbList=c("Ms IgG1", "Rb IgG"))
colnames(colData(spe))## [1] "fov" "cellID"
## [3] "Area" "AspectRatio"
## [5] "Width" "Height"
## [7] "Mean.PanCK" "Max.PanCK"
## [9] "Mean.CD68" "Max.CD68"
## [11] "Mean.Membrane" "Max.Membrane"
## [13] "Mean.CD45" "Max.CD45"
## [15] "Mean.DAPI" "Max.DAPI"
## [17] "SplitRatioToLocal" "NucArea"
## [19] "NucAspectRatio" "Circularity"
## [21] "Eccentricity" "Perimeter"
## [23] "Solidity" "cell_id"
## [25] "X" "version"
## [27] "dualfiles" "Run_name"
## [29] "Run_Tissue_name" "ISH.concentration"
## [31] "Dash" "tissue"
## [33] "Panel" "assay_type"
## [35] "slide_ID" "median_RNA"
## [37] "RNA_quantile_0.75" "RNA_quantile_0.8"
## [39] "RNA_quantile_0.85" "RNA_quantile_0.9"
## [41] "RNA_quantile_0.95" "RNA_quantile_0.99"
## [43] "nCount_RNA" "nFeature_RNA"
## [45] "median_negprobes" "negprobes_quantile_0.75"
## [47] "negprobes_quantile_0.8" "negprobes_quantile_0.85"
## [49] "negprobes_quantile_0.9" "negprobes_quantile_0.95"
## [51] "negprobes_quantile_0.99" "nCount_negprobes"
## [53] "nFeature_negprobes" "Area_um"
## [55] "CenterX_local_px" "CenterY_local_px"
## [57] "cell" "sample_id"
## [59] "polygons" "sum"
## [61] "detected" "subsets_Ms IgG1_sum"
## [63] "subsets_Ms IgG1_detected" "subsets_Ms IgG1_percent"
## [65] "subsets_Rb IgG_sum" "subsets_Rb IgG_detected"
## [67] "subsets_Rb IgG_percent" "total"
## [69] "control_sum" "control_detected"
## [71] "target_sum" "target_detected"
## [73] "CenterX_global_px" "CenterY_global_px"
## [75] "ctrl_total_ratio" "log2Ctrl_total_ratio"
## [77] "CenterX_global_um" "CenterY_global_um"
## [79] "dist_border_x" "dist_border_y"
## [81] "dist_border" "log2AspectRatio"
## [83] "SignalDensity" "log2SignalDensity"
Several metrics are added, both derived from protein assay and cell morphology. Some of them are directly used to compute QS:
For a better detailed explanation of the other metrics, please refer to the SpaceTrooper utilities vignette.
computeQCScore function calculates QS per each cell. QS
combines several metrics into a formula:
log2SignalDensity, corresponding to signal density,
Area_um, i.e. cell size,
log2Ctrl_total_ratio, namely background signal and
log2AspectRatio combined with
dist_border, which jointly correspond to the border
effect (only for CosMx technology).
glmnet package is used to estimate the coefficients of
the formula, resulting in a robust score that captures low-quality
cells. During model training, the selected cells are used as good or bad
examples to learn how each term in the formula contributes to cell
quality.
IMPORTANT: please, pay attention to any warning. If a term has too few bad examples (fewer than 0.1% of the total number of cells), it is excluded from the formula and therefore not used in the QS computation.
The QS (stored as QC_score in coldata)
ranges from 0 to 1, with 0 meaning low-quality and 1 high-quality.
We are setting a seed to ensure reproducibility in this tutorial, because there are stochastic processes underlying QS computation.
set.seed(1713)
spe <- computeQCScore(spe)
format(summary(spe$QC_score), scientific=FALSE, digits = 4)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## "0.0000" "0.6901" "0.8816" "0.7753" "0.9519" "0.9975"
In this case, all the terms were used as no warning appeared.
Subsequently, it is possible also to assess which cells have a QS
lower than a certain threshold (default is 0.5) with the following
function. It creates a new column called low_qcscore inside
of colData.
##
## FALSE TRUE
## 1519 274
Using this threshold, 274 cells are flagged as low-quality. We do not suggest a fixed default threshold, but it is advisable to check the QS distribution before setting any threshold.
SpaceTrooper comes with several functions to view cells and metrics.
To view the distribution of whatever quantitative metric,
plotMetricHist comes in handy.
The QS distribution exhibits a left tail starting around 0.75.
Cell visualization can be obtained by using either centroids (recommended when the dataset has a large number of cells) or polygons.
plotCentroids plots cell centroids that can be colored
by a certain metric contained in the colData slot, by using
the colour_by parameter.
Additionally, if you have a palette column in colData,
containing colors for each cell, it can be given to palette
parameter, so that it automatically matches the column passed in
colour_by. As an example, we are using the cell types
obtained as described in the paper
with their own color palette.
labf <- system.file(file.path("extdata", "S01_prot",
"labels_tiny.tsv"), package="SpaceTrooper")
labs <- read.table(file=labf, sep="\t", header=TRUE, comment.char = "")
spe$labels <- as.factor(labs[match(spe$cell_id, labs$cell_id),]$label)
spe$labels_colors <- as.factor(labs[match(spe$cell_id, labs$cell_id),]$lab_color)
plotCentroids(spe, colourBy="labels", size=3, palette="labels_colors")Cell centroids colored by cell types allow to view the spatial distribution of cell populations in the sample.
When possible, polygon visualization gives a better overview of the
cells’ spatial organization and morphological characteristics. Polygons
are stored as an sf object within colData, so
they can be viewed using standard ggplot2 functions. For
CosMx technology, the polygons must be explicitly loaded as described in
Load polygons. SpaceTrooper provides
plotPolygons function that works just like
plotCentroids but takes polygons instead of centroids.
We can see in yellow and darkviolet that
there are few cells with extreme values of either
log2SignalDensity, Area_um,
log2Ctrl_total_ratio and log2AspectRatio.
Since all plotting functions are based on ggplot2, you
can easily customize the graphical outputs by adding standard ggplot2
components.
We can see that the QS is able to detect both the aspects highlighted
by log2SignalDensity,Area_um,
log2Ctrl_total_ratio or log2AspectRatio. Cells
that showed either lower signal density, bigger size, higher background
signal or border effect also display low QS (darker color).
It’s up to the user to choose an appropriate threshold to flag cells according to the observed QS distribution.
plotPolygons(spe, colourBy="low_qcscore") +
scale_fill_manual(values=c("TRUE"="red", "FALSE" = "#c0c8cf"))You can reruncomputeQCScoreFlags to check how many cells
would be flagged using another threshold.
##
## FALSE TRUE
## 1266 527
Using this threshold, 527 cells are flagged as low-quality.
plotPolygons(spe, colourBy="low_qcscore") +
scale_fill_manual(values=c("TRUE"="red", "FALSE" = "#c0c8cf"))The threshold is more stringent and more cells are flagged compared to the previous one. It’s up to you to choose a threshold that better suits your analysis needs.
In this vignette, we explored the main functionalities of the
SpaceTrooper package for imaging-based spatial omics data
QC.
For further insights on SpaceTrooper package usage, please refer to the SpaceTrooper utilities vignette.
Main steps shown are:
## R version 4.5.2 (2025-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ggplot2_4.0.2 SpaceTrooper_1.1.6
## [3] SpatialExperiment_1.21.0 SingleCellExperiment_1.33.0
## [5] SummarizedExperiment_1.41.1 Biobase_2.71.0
## [7] GenomicRanges_1.63.1 Seqinfo_1.1.0
## [9] IRanges_2.45.0 S4Vectors_0.49.0
## [11] BiocGenerics_0.57.0 generics_0.1.4
## [13] MatrixGenerics_1.23.0 matrixStats_1.5.0
## [15] BiocStyle_2.39.0
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.3 gridExtra_2.3
## [3] rlang_1.1.7 magrittr_2.0.4
## [5] scater_1.39.2 e1071_1.7-17
## [7] compiler_4.5.2 DelayedMatrixStats_1.33.0
## [9] SpatialExperimentIO_1.3.0 sfheaders_0.4.5
## [11] vctrs_0.7.1 shape_1.4.6.1
## [13] pkgconfig_2.0.3 fastmap_1.2.0
## [15] backports_1.5.0 magick_2.9.0
## [17] XVector_0.51.0 labeling_0.4.3
## [19] scuttle_1.21.0 rmarkdown_2.30
## [21] ggbeeswarm_0.7.3 purrr_1.2.1
## [23] bit_4.6.0 xfun_0.56
## [25] glmnet_4.1-10 cachem_1.1.0
## [27] beachmat_2.27.2 jsonlite_2.0.0
## [29] rhdf5filters_1.23.3 DelayedArray_0.37.0
## [31] Rhdf5lib_1.33.0 BiocParallel_1.45.0
## [33] irlba_2.3.7 broom_1.0.12
## [35] parallel_4.5.2 R6_2.6.1
## [37] bslib_0.10.0 RColorBrewer_1.1-3
## [39] limma_3.67.0 car_3.1-5
## [41] jquerylib_0.1.4 iterators_1.0.14
## [43] Rcpp_1.1.1 assertthat_0.2.1
## [45] knitr_1.51 R.utils_2.13.0
## [47] splines_4.5.2 Matrix_1.7-4
## [49] tidyselect_1.2.1 viridis_0.6.5
## [51] abind_1.4-8 yaml_2.3.12
## [53] codetools_0.2-20 lattice_0.22-7
## [55] tibble_3.3.1 withr_3.0.2
## [57] S7_0.2.1 evaluate_1.0.5
## [59] sf_1.0-24 survival_3.8-6
## [61] units_1.0-0 proxy_0.4-29
## [63] pillar_1.11.1 BiocManager_1.30.27
## [65] ggpubr_0.6.2 carData_3.0-6
## [67] KernSmooth_2.23-26 foreach_1.5.2
## [69] sparseMatrixStats_1.23.0 scales_1.4.0
## [71] class_7.3-23 glue_1.8.0
## [73] maketools_1.3.2 tools_4.5.2
## [75] BiocNeighbors_2.5.3 robustbase_0.99-7
## [77] sys_3.4.3 data.table_1.18.2.1
## [79] ScaledMatrix_1.19.0 locfit_1.5-9.12
## [81] ggsignif_0.6.4 buildtools_1.0.0
## [83] cowplot_1.2.0 rhdf5_2.55.13
## [85] grid_4.5.2 tidyr_1.3.2
## [87] DropletUtils_1.31.0 edgeR_4.9.2
## [89] beeswarm_0.4.0 BiocSingular_1.27.1
## [91] HDF5Array_1.39.0 vipor_0.4.7
## [93] rsvd_1.0.5 Formula_1.2-5
## [95] cli_3.6.5 viridisLite_0.4.3
## [97] S4Arrays_1.11.1 arrow_23.0.0.1
## [99] dplyr_1.2.0 DEoptimR_1.1-4
## [101] gtable_0.3.6 R.methodsS3_1.8.2
## [103] rstatix_0.7.3 sass_0.4.10
## [105] digest_0.6.39 classInt_0.4-11
## [107] ggrepel_0.9.6 SparseArray_1.11.10
## [109] dqrng_0.4.1 rjson_0.2.23
## [111] farver_2.1.2 htmltools_0.5.9
## [113] R.oo_1.27.1 lifecycle_1.0.5
## [115] h5mread_1.3.1 statmod_1.5.1
## [117] bit64_4.6.0-1