scater
: Single-cell analysis toolkit for expression in Rscater 1.6.2
This document gives an introduction to and overview of the functionality of the scater
package.
The scater
package is contains tools to help with the analysis of single-cell transcriptomic data, with the focus on RNA-seq data. The package features:
SingleCellExperiment
class as a data container for interoperability with a wide range of other Bioconductor packages;kallisto
and ‘Salmon’ for rapid quantification of transcript abundance and tight integration with scater
;To get up and running as quickly as possible, see the Quick Start section below. For see the various in-depth sections on various aspects of the functionality that follow.
NB: as of July 2017, scater
has switched from the SCESet
class previously defined within the package to the more widely applicable SingleCellExperiment
class. The functions toSingleCellExperiment
and updateSCESet
(for backwards compatibility) can be used to convert an old SCESet
object to a SingleCellExperiment
object.
Assuming you have a matrix containing expression count data summarised at the level of some features (gene, exon, region, etc.), then we first need to form a SingleCellExperiment
object containing the data. A SingleCellExperiment
object is the basic data container used in scater
and many other Bioconductor packages for single-cell data analysis.
Here we use the example data provided with the package, which gives us two objects, a matrix of counts and a dataframe with information about the cells we are studying:
suppressPackageStartupMessages(library(scater))
data("sc_example_counts")
data("sc_example_cell_info")
We use these objects to form a SingleCellExperiment
object containing all of the necessary information for our analysis:
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts), colData = sc_example_cell_info)
We always expect to have (raw) count data in a SingleCellExperiment
object. In almost all cases we will also want to have a log2-scale representation of the data. We expect this to be stored as the exprs
assay.
Here we use log2-counts-per-million with an offset of 1 as the exprs
values.
exprs(example_sce) <- log2(
calculateCPM(example_sce, use.size.factors = FALSE) + 1)
Subsetting is very convenient with this class. For example, we can filter out features (genes) that are not expressed in any cells:
keep_feature <- rowSums(exprs(example_sce) > 0) > 0
example_sce <- example_sce[keep_feature,]
Now we have the expression data neatly stored in a structure that can be used for lots of exciting analyses.
It is straight-forward to compute many quality control metrics. We typically provide one or more sets of “feature controls”, that is sets of genes or features that represent technical features of the expression data or are not of primary biological interest. QC metrics are computed especially for these feature sets are used to assess the quality of cells. Spike-in genes (such as the commonly-used ERCC set) and mitochondrial genes are typically useful as “feature controls”. Here, for demonstration, we just use the first 40 features.
example_sce <- calculateQCMetrics(example_sce,
feature_controls = list(eg = 1:40))
Now you can play around with your data using the graphical user interface (GUI), which opens an interactive dashboard in your browser!
scater_gui(example_sce)
Many plotting functions are available for visualising the data:
plotScater
: a plot method exists for SingleCellExperiment
objects, which gives an overview of expression across cells.plotQC
: various methods are available for producing QC diagnostic plots.plotPCA
: produce a principal components plot for the cells.plotTSNE
: produce a t-distributed stochastic neighbour embedding (reduced dimension) plot for the cells.plotDiffusionMap
: produce a diffusion map (reduced dimension) plot for the cells.plotMDS
: produce a multi-dimensional scaling plot for the cells.plotReducedDim
: plot a reduced-dimension representation of the cells.plotExpression
: plot expression levels for a defined set of features.plotPlatePosition
: plot cells in their position on a plate, coloured by cell metadata and QC metrics or feature expression level.plotColData
: plot cell metadata and QC metrics.plotRowData
: plot feature metadata and QC metrics.More detail on all plotting methods is provided in the data visualisation vignette.
Visualisations can highlight features and cells to be filtered out, which can be done easily with the subsetting capabilities of scater.
The QC plotting functions also enable the identification of important experimental variables, which can be conditioned out in the data normalisation step.
After QC and data normalisation (methods are available in scater
), the dataset is ready for downstream statistical modeling.
SingleCellExperiment
class, including transitioning from SCESet
objects in previous versions of scater
, see the "Transitioning from SCESet to SingleCellExperiment"
vignette;scater
for quality control, see the Quality control with scater
vignette;scater
, see the Data visualisation methods in scater
vignette;scater
, see the Expression quantification and import
vignette.