STADyUM is an R package for simulating and analyzing transcription dynamics. It provides tools for:
STADyUM seamlessly integrates with the Bioconductor ecosystem by leveraging standard Bioconductor data structures such as GRanges for genomic interval handling and S4Vectors for custom Bioconductor-style data structures. While existing Bioconductor packages like INSPEcT focuses primarily on modeling RNA synthesis, processing, and degradation using both nascent and mature RNA-seq data through systems of differential equations, STADyUM takes a different modeling approach. STADyUM provides a probabilistic framework tailored specifically for nascent RNA data alone (such as PRO-seq, GRO-seq, 4sU-DRB-seq) including functionality for modeling steric hindrance between RNA polymerases and modeling pause-site kinetics explicitly.
The focus of this tutorial will be on using our simulator, SimPol,
for simulating RNA Pol II dynamics on a DNA template. SimPol tracks the
independent movement of RNAPs along the DNA templates of a large number
of cells. It accepts several key user-specified parameters, including
the initiation rate, pause-escape rate, a constant or variable
elongation rate, the mean and variance of pause sites across cells, as
well as the center-to-center spacing constraint between RNAPs, the
number of cells being simulated, the gene length, and the total time of
transcription. After running for the specified time, SimPol outputs a
SimulatePolymerase object that stores matrices at different
time points recording the RNAP positions. This is stored in slots
positionMatrices and finalPositionMatrix. It
also stores an estimated vector of read counts per nucleotide in slot
readCounts. More information on the simulator can be found
in the sections A simple probabilistic model for transcription
initiation, promoter-proximal pausing, and elongation, SimPol
Simulator, Generation of synthetic NRS data of Zhao et al. (2023)
simpol <- simulatePolymerase(k=50, ksd=25, kMin=17, kMax=200, geneLen=1950,
alpha=2, beta=0.5, zeta=2000, zetaSd=1000, zetaMin=1500, zetaMax=2500,
cellNum=100, polSize=33, addSpace=17, time=39, timesToRecord=NULL)
#> Total time used for the simulation is 0 mins.
show(simpol)
#> A SimulatePolymerase object with:
#> - gene length = 1950
#> - number of cells = 100
#> - pause sites mean = 55.22
#> - pause sites std dev = 22.38
#> - top 10 most occupied sites across all cells:
#> position 0: 100 polymerases
#> position 42: 5 polymerases
#> position 30: 4 polymerases
#> position 51: 4 polymerases
#> position 59: 4 polymerases
#> position 1: 3 polymerases
#> position 6: 3 polymerases
#> position 9: 3 polymerases
#> position 21: 3 polymerases
#> position 28: 3 polymerases
#> - gene body average read counts = 0.05
#> - pause region average read counts = 2.33
#> - percent of nucleotides with zero reads = 94.5 %
#>
#> To access the full simulation parameters, use: parameters(object)
#>
#> To access the all sampled read counts, use: readCounts(object)
readCounts <- readCounts(simpol)
plotCombinedCells(simpol, file="combinedCellsLollipopPlot.png")Estimates of transcription rates, such as chi and beta, are calculated given Expectation Maximization with likelihood formulas derived in Stationary Distribution and Inference at Steady-State section of Siepel (2022) and stated in Inference at Steady State section of Zhao et al. (2023). Note that chi is the estimator for the average read depth, excluding the pause peak and beta is the estimator for the ratio of chi to the read depth in the pause peak region. The estimator for beta is also given for a model adapted for varying pause sites across cells and is denoted by variable betaAdp whereas the fixed pause site estimate is denoted by betaOrg, refer to Allowing for variation in pause site section of Zhao et al. (2023).
simRates <- estimateTranscriptionRates(simpol, name="simRatesb0.5k50",
stericHindrance=TRUE)
#> Starting EM algorithm...
show(simRates)
#> A SimulationTranscriptionRates object with:
#> - Steric hindrance: TRUE
#> - Number of Trials with 5000 cells: 50
#>
#> Summary statistics for rate estimates:
#> - chi (gene body RNAP density): 0.28 RNAPs/bp
#> - betaOrg (ratio of gene body RNAP density to pause region
#> RNAP density, fixed sites): 0.4874
#> - betaAdp (ratio of gene body RNAP density to pause region
#> RNAP density, adapted model): 0.0036
#> - TSS Read Counts Averaged Across Trials: ~ 114 read
#> counts
#> - Gene Body Read Counts Averaged Across Trials: ~ 482 read
#> counts
#> - Landing Pad Read Counts Averaged Across Trials: ~ 3747
#> read counts
#> - phi (landing pad occupancy): 0.62
#>
#> EM Algorithm Convergence:
#> - Trials converged normally: 50/50 (100.0%)
#> - Trials converged to single site: 0/50 (0.0%)
#> - Trials reached max iterations without convergence: 0/50
#> (0.0%)Likelihood ratio tests can also be applied to simulated data. In this example, the simulated rates for the first simpol object is compared to the simulated rates from the second simpol object. As expected given the simpol parameters, the plots show that the second simulated rates have higher beta and higher mean pause sites.
simpol2 <- simulatePolymerase(k=100, ksd=25, kMin=75, kMax=200, geneLen=1950,
alpha=2, beta=10, zeta=2000, zetaSd=1000, zetaMin=1500, zetaMax=2500,
cellNum=100, polSize=33, addSpace=17, time=30, timesToRecord=NULL)
#> Total time used for the simulation is 0 mins.
simRates2 <- estimateTranscriptionRates(simpol2, name="simRatesb10k100",
stericHindrance=TRUE)
#> Starting EM algorithm...
simLRT <- likelihoodRatioTest(simRates, simRates2)
#> Warning: Unknown or uninitialised column: `geneId`.
#> Unknown or uninitialised column: `geneId`.
plotPauseSiteContourMapTwoConditions(simLRT, file="simPauseSitesContourMap.png",
width=7, height=5, dpi=150)sessionInfo()
#> R version 4.6.0 Patched (2026-05-01 r89994)
#> Platform: aarch64-apple-darwin23
#> Running under: macOS Tahoe 26.3.1
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: America/New_York
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] BiocFileCache_3.3.0 dbplyr_2.5.2 pracma_2.4.6
#> [4] ggplot2_4.0.3 plyranges_1.33.0 dplyr_1.2.1
#> [7] GenomicRanges_1.65.0 Seqinfo_1.3.0 IRanges_2.47.2
#> [10] S4Vectors_0.51.3 BiocGenerics_0.59.7 generics_0.1.4
#> [13] STADyUM_1.3.1
#>
#> loaded via a namespace (and not attached):
#> [1] DBI_1.3.0 bitops_1.0-9
#> [3] httr2_1.2.2 rlang_1.2.0
#> [5] magrittr_2.0.5 otel_0.2.0
#> [7] matrixStats_1.5.0 compiler_4.6.0
#> [9] RSQLite_3.53.1 systemfonts_1.3.2
#> [11] vctrs_0.7.3 pkgconfig_2.0.3
#> [13] crayon_1.5.3 fastmap_1.2.0
#> [15] backports_1.5.1 XVector_0.53.0
#> [17] labeling_0.4.3 utf8_1.2.6
#> [19] Rsamtools_2.29.0 rmarkdown_2.31
#> [21] UCSC.utils_1.9.0 ragg_1.5.2
#> [23] purrr_1.2.2 bit_4.6.0
#> [25] xfun_0.58 cachem_1.1.0
#> [27] cigarillo_1.3.0 GenomeInfoDb_1.49.1
#> [29] jsonlite_2.0.0 progress_1.2.3
#> [31] blob_1.3.0 DelayedArray_0.39.3
#> [33] BiocParallel_1.47.0 prettyunits_1.2.0
#> [35] broom_1.0.13 parallel_4.6.0
#> [37] R6_2.6.1 bslib_0.11.0
#> [39] RColorBrewer_1.1-3 rtracklayer_1.73.0
#> [41] car_3.1-5 jquerylib_0.1.4
#> [43] Rcpp_1.1.1-1.1 SummarizedExperiment_1.43.0
#> [45] knitr_1.51 BiocBaseUtils_1.15.1
#> [47] Matrix_1.7-5 tidyselect_1.2.1
#> [49] dichromat_2.0-0.1 abind_1.4-8
#> [51] yaml_2.3.12 codetools_0.2-20
#> [53] curl_7.1.0 lattice_0.22-9
#> [55] tibble_3.3.1 Biobase_2.73.1
#> [57] withr_3.0.2 S7_0.2.2
#> [59] evaluate_1.0.5 isoband_0.3.0
#> [61] Biostrings_2.81.3 pillar_1.11.1
#> [63] ggpubr_0.6.3 filelock_1.0.3
#> [65] MatrixGenerics_1.25.0 carData_3.0-6
#> [67] ggpointdensity_0.2.1 RCurl_1.98-1.19
#> [69] hms_1.1.4 scales_1.4.0
#> [71] glue_1.8.1 tools_4.6.0
#> [73] BiocIO_1.23.3 data.table_1.18.4
#> [75] ggsignif_0.6.4 GenomicAlignments_1.49.0
#> [77] XML_3.99-0.23 cowplot_1.2.0
#> [79] grid_4.6.0 tidyr_1.3.2
#> [81] restfulr_0.0.17 Formula_1.2-5
#> [83] cli_3.6.6 rappdirs_0.3.4
#> [85] textshaping_1.0.5 S4Arrays_1.13.0
#> [87] viridisLite_0.4.3 gtable_0.3.6
#> [89] rstatix_0.7.3 sass_0.4.10
#> [91] digest_0.6.39 SparseArray_1.13.2
#> [93] rjson_0.2.23 farver_2.1.2
#> [95] memoise_2.0.1 htmltools_0.5.9
#> [97] lifecycle_1.0.5 httr_1.4.8
#> [99] bit64_4.8.2 MASS_7.3-65