library(sesame)
sesameDataCache()
The main function to calculate the quality metrics is sesameQC_calcStats
. This function takes a SigDF, calculates the QC statistics, and returns a single S4 sesameQC
object, which can be printed directly to the console. To calculate QC metrics on a given list of samples or all IDATs in a folder, one can use sesameQC_calcStats
within the standard openSesame
pipeline. When used with openSesame
, a list of sesameQC
s will be returned. Note that one should turn off preprocessing using prep=""
:
## calculate metrics on all IDATs in a specific folder
openSesame(idat_dir, prep="", func=sesameQC_calcStats) qcs =
SeSAMe divides sample quality metrics into multiple groups. These groups are listed below and can be referred to by short keys. For example, “intensity” generates signal intensity-related quality metrics.
Short.Key | Description |
---|---|
detection | Signal Detection |
numProbes | Number of Probes |
intensity | Signal Intensity |
channel | Color Channel |
dyeBias | Dye Bias |
betas | Beta Value |
By default, sesameQC_calcStats
calculates all QC groups. To save time, one can compute a specific QC group by specifying one or multiple short keys in the funs=
argument:
sesameDataGet("EPIC.5.SigDF.normal")[1:2] # get two examples
sdfs <-## only compute signal detection stats
openSesame(sdfs, prep="", func=sesameQC_calcStats, funs="detection")
qcs =1]] qcs[[
##
## =====================
## | Detection
## =====================
## N. Probes w/ Missing Raw Intensity : 0 (num_dtna)
## % Probes w/ Missing Raw Intensity : 0.0 % (frac_dtna)
## N. Probes w/ Detection Success : 838020 (num_dt)
## % Detection Success : 96.7 % (frac_dt)
## N. Detection Succ. (after masking) : 838020 (num_dt_mk)
## % Detection Succ. (after masking) : 96.7 % (frac_dt_mk)
## N. Probes w/ Detection Success (cg) : 835491 (num_dt_cg)
## % Detection Success (cg) : 96.7 % (frac_dt_cg)
## N. Probes w/ Detection Success (ch) : 2471 (num_dt_ch)
## % Detection Success (ch) : 84.3 % (frac_dt_ch)
## N. Probes w/ Detection Success (rs) : 58 (num_dt_rs)
## % Detection Success (rs) : 98.3 % (frac_dt_rs)
We consider signal detection the most important QC metric.
One can retrieve the actual stat numbers from sesameQC
using the sesameQC_getStats (the following generates the fraction of probes with detection success):
sesameQC_getStats(qcs[[1]], "frac_dt")
## [1] 0.9666915
After computing the QCs, one can optionally combine the sesameQC
objects into a data frame for easy comparison.
## combine a list of sesameQC into a data frame
head(do.call(rbind, lapply(qcs, as.data.frame)))
num_dtna <int> | frac_dtna <dbl> | num_dt <int> | frac_dt <dbl> | num_dt_mk <int> | frac_dt_mk <dbl> | num_dt_cg <int> | ||
---|---|---|---|---|---|---|---|---|
PrEC.Rep1 | 0 | 0 | 838020 | 0.9666915 | 838020 | 0.9666915 | 835491 | |
PrEC.Rep2 | 0 | 0 | 838961 | 0.9677770 | 838961 | 0.9677770 | 836403 |
Note that when the input is an SigDF
object, calling sesameQC_calcStats
within openSesame
and as a standalone function are equivalent.
sesameDataGet('EPIC.1.SigDF')
sdf <- openSesame(sdf, prep="", func=sesameQC_calcStats, funs=c("detection"))
qc =## equivalent direct call
sesameQC_calcStats(sdf, c("detection"))
qc = qc
##
## =====================
## | Detection
## =====================
## N. Probes w/ Missing Raw Intensity : 0 (num_dtna)
## % Probes w/ Missing Raw Intensity : 0.0 % (frac_dtna)
## N. Probes w/ Detection Success : 834922 (num_dt)
## % Detection Success : 96.3 % (frac_dt)
## N. Detection Succ. (after masking) : 834922 (num_dt_mk)
## % Detection Succ. (after masking) : 96.3 % (frac_dt_mk)
## N. Probes w/ Detection Success (cg) : 832046 (num_dt_cg)
## % Detection Success (cg) : 96.4 % (frac_dt_cg)
## N. Probes w/ Detection Success (ch) : 2616 (num_dt_ch)
## % Detection Success (ch) : 89.2 % (frac_dt_ch)
## N. Probes w/ Detection Success (rs) : 58 (num_dt_rs)
## % Detection Success (rs) : 98.3 % (frac_dt_rs)
SeSAMe features comparison of your sample with public data sets. The sesameQC_rankStats()
function ranks the input sesameQC
object with sesameQC
calculated from public datasets. It shows the rank percentage of the input sample as well as the number of datasets compared.
sesameDataGet('EPIC.1.SigDF')
sdf <- sesameQC_calcStats(sdf, "intensity")
qc <- qc
##
## =====================
## | Signal Intensity
## =====================
## Mean sig. intensity : 3171.21 (mean_intensity)
## Mean sig. intensity (M+U) : 6342.41 (mean_intensity_MU)
## Mean sig. intensity (Inf.II) : 2991.85 (mean_ii)
## Mean sig. intens.(I.Grn IB) : 3004.33 (mean_inb_grn)
## Mean sig. intens.(I.Red IB) : 4670.97 (mean_inb_red)
## Mean sig. intens.(I.Grn OOB) : 318.55 (mean_oob_grn)
## Mean sig. intens.(I.Red OOB) : 606.99 (mean_oob_red)
## N. NA in M (all probes) : 0 (na_intensity_M)
## N. NA in U (all probes) : 0 (na_intensity_U)
## N. NA in raw intensity (IG) : 0 (na_intensity_ig)
## N. NA in raw intensity (IR) : 0 (na_intensity_ir)
## N. NA in raw intensity (II) : 0 (na_intensity_ii)
sesameQC_rankStats(qc, platform="EPIC")
##
## =====================
## | Signal Intensity
## =====================
## Mean sig. intensity : 3171.21 (mean_intensity) - Rank 15.7% (N=636)
## Mean sig. intensity (M+U) : 6342.41 (mean_intensity_MU)
## Mean sig. intensity (Inf.II) : 2991.85 (mean_ii) - Rank 15.6% (N=636)
## Mean sig. intens.(I.Grn IB) : 3004.33 (mean_inb_grn) - Rank 7.5% (N=636)
## Mean sig. intens.(I.Red IB) : 4670.97 (mean_inb_red) - Rank 21.2% (N=636)
## Mean sig. intens.(I.Grn OOB) : 318.55 (mean_oob_grn) - Rank 4.2% (N=636)
## Mean sig. intens.(I.Red OOB) : 606.99 (mean_oob_red) - Rank 3.6% (N=636)
## N. NA in M (all probes) : 0 (na_intensity_M)
## N. NA in U (all probes) : 0 (na_intensity_U)
## N. NA in raw intensity (IG) : 0 (na_intensity_ig)
## N. NA in raw intensity (IR) : 0 (na_intensity_ir)
## N. NA in raw intensity (II) : 0 (na_intensity_ii)
SeSAMe provides functions to create QC plots. Some functions takes sesameQC as input while others directly plot the SigDF objects. Here are some examples:
sesameQC_plotBar()
takes a list of sesameQC objects and creates bar plot for each metric calculated.
sesameQC_plotRedGrnQQ()
graphs the dye bias between the two color channels.
sesameQC_plotIntensVsBetas()
plots the relationship between β values and signal intensity and can be used to diagnose artificial readout and influence of signal background.
sesameQC_plotHeatSNPs()
plots SNP probes and can be used to detect sample swaps.
More about quality control plots can be found in Supplemental Vignette.
sessionInfo()
## R version 4.3.0 RC (2023-04-13 r84269)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ggplot2_3.4.2 tibble_3.2.1
## [3] SummarizedExperiment_1.30.1 Biobase_2.60.0
## [5] GenomicRanges_1.52.0 GenomeInfoDb_1.36.0
## [7] IRanges_2.34.0 S4Vectors_0.38.1
## [9] MatrixGenerics_1.12.0 matrixStats_0.63.0
## [11] knitr_1.42 sesame_1.18.4
## [13] sesameData_1.18.0 ExperimentHub_2.8.0
## [15] AnnotationHub_3.8.0 BiocFileCache_2.8.0
## [17] dbplyr_2.3.2 BiocGenerics_0.46.0
##
## loaded via a namespace (and not attached):
## [1] DBI_1.1.3 bitops_1.0-7
## [3] rlang_1.1.1 magrittr_2.0.3
## [5] compiler_4.3.0 RSQLite_2.3.1
## [7] png_0.1-8 vctrs_0.6.2
## [9] reshape2_1.4.4 stringr_1.5.0
## [11] pkgconfig_2.0.3 crayon_1.5.2
## [13] fastmap_1.1.1 XVector_0.40.0
## [15] ellipsis_0.3.2 fontawesome_0.5.1
## [17] labeling_0.4.2 utf8_1.2.3
## [19] promises_1.2.0.1 rmarkdown_2.21
## [21] tzdb_0.3.0 preprocessCore_1.62.1
## [23] purrr_1.0.1 bit_4.0.5
## [25] xfun_0.39 zlibbioc_1.46.0
## [27] cachem_1.0.8 jsonlite_1.8.4
## [29] blob_1.2.4 highr_0.10
## [31] later_1.3.1 DelayedArray_0.26.2
## [33] BiocParallel_1.34.1 interactiveDisplayBase_1.38.0
## [35] parallel_4.3.0 R6_2.5.1
## [37] bslib_0.4.2 stringi_1.7.12
## [39] RColorBrewer_1.1-3 jquerylib_0.1.4
## [41] Rcpp_1.0.10 wheatmap_0.2.0
## [43] readr_2.1.4 httpuv_1.6.9
## [45] Matrix_1.5-4 tidyselect_1.2.0
## [47] yaml_2.3.7 codetools_0.2-19
## [49] curl_5.0.0 lattice_0.21-8
## [51] plyr_1.8.8 shiny_1.7.4
## [53] withr_2.5.0 KEGGREST_1.40.0
## [55] evaluate_0.20 Biostrings_2.68.0
## [57] pillar_1.9.0 BiocManager_1.30.20
## [59] filelock_1.0.2 generics_0.1.3
## [61] RCurl_1.98-1.12 BiocVersion_3.17.1
## [63] hms_1.1.3 munsell_0.5.0
## [65] scales_1.2.1 BiocStyle_2.28.0
## [67] xtable_1.8-4 glue_1.6.2
## [69] tools_4.3.0 grid_4.3.0
## [71] AnnotationDbi_1.62.1 colorspace_2.1-0
## [73] GenomeInfoDbData_1.2.10 cli_3.6.1
## [75] rappdirs_0.3.3 fansi_1.0.4
## [77] S4Arrays_1.0.1 dplyr_1.1.2
## [79] gtable_0.3.3 sass_0.4.6
## [81] digest_0.6.31 ggrepel_0.9.3
## [83] farver_2.1.1 memoise_2.0.1
## [85] htmltools_0.5.5 lifecycle_1.0.3
## [87] httr_1.4.5 mime_0.12
## [89] bit64_4.0.5