This tutorial demonstrates how to use the STADyUM package to perform comparative analysis of pause-escape kinetics and pause-site distributions across cell types using the likelihood-ratio tests (LRTs) developed in Zeng et al. (2026)
In this workflow, we will:
TranscriptionRates for human CD4+ T cells and human CD14
monocytes. To learn how to estimate transcription rates from PRO-seq
data, refer to STADyUM_Rate_Estimation_Tutorial.Rmd.likelihoodRatioTestThis example uses human PRO-seq data from two immunologically distinct cell types to demonstrate comparative transcription rate estimation.
knitr::opts_chunk$set(dpi = 72, fig.width = 5, fig.height = 4)
library(STADyUM) # For transcription rate estimation
library(GenomicRanges) # For genomic data manipulation
library(dplyr)
library(plyranges)
#library(tidyverse) # For data manipulation and visualization
library(ggplot2) # For plotting
library(pracma)
library(BiocFileCache)# Required by Bioconductor to check for connection
if (!requireNamespace("BiocFileCache", quietly = TRUE) ||
!curl::has_internet()) {
knitr::opts_chunk$set(eval = FALSE)
message("No internet connection; vignette code not evaluated.")
}
bfc <- BiocFileCache(ask = FALSE)
zip_path <- bfcrpath(bfc, "https://zenodo.org/records/20618059/files/lrt_vignette_data.zip")## adding rname 'https://zenodo.org/records/20618059/files/lrt_vignette_data.zip'
unzip(zip_path, exdir = tempdir())
data_dir <- file.path(tempdir(), "vignettes", "lrt_test_data")
# STADyUM `ExperimentTranscriptionRates` objects containing transcription rates estimated from human CD4+ T cell and human CD14+ monocyte PRO-seq data.
cd4_rate <- readRDS(file.path(data_dir, 'cd4_rate.RDS'))
cd14_rate <- readRDS(file.path(data_dir, 'cd14_rate.RDS'))
scale_factor <- 38359083 / 33294862The likelihoodRatioTest() function compares
promoter-proximal pausing kinetics between two
TranscriptionRates objects from
estimateTranscriptionRates(). For experimental data, genes
present in both conditions are matched, then three complementary
likelihood-ratio tests are run per transcription unit:
spikeInFile argument).The scaleFactor argument accounts for sequencing-depth
differences between samples when comparing pause-related statistics.
Here we use the ratio of total mapped reads between CD4+ T cells and
CD14 monocytes.
For each gene, the function returns log2 fold changes, LRT test
statistics, and Benjamini-Hochberg adjusted p-values. Results are stored
in a TranscriptionRatesLRT object with three per-gene
tables (chiTbl, betaTbl, and
fkTbl) and a merged summary table (lrtTbl)
used by the plotting functions below.
lrt <- likelihoodRatioTest(cd4_rate, cd14_rate, scale_factor)
# The merged per-gene LRT summary table is built automatically and stored in
# the lrtTbl slot, ready to be passed straight to the plotting methods below.
head(lrtTbl(lrt))## # A tibble: 6 × 29
## geneId beta1 beta2 lfc fkMean1 fkMean2 fkVar1 fkVar2 tStats p
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ENSG0000… 2.34e-4 1.44e-4 -0.700 40.5 55.9 583. 606. 18.7 9.97e-10
## 2 ENSG0000… 2.21e-4 9.66e-5 -1.19 68.5 78.2 1585. 1874. 18.3 1.50e- 9
## 3 ENSG0000… 2.11e-4 5.89e-4 1.48 81.8 86.3 1636. 2659. 19.2 5.84e-10
## 4 ENSG0000… 2.42e-4 1.76e-4 -0.461 96.3 129. 2993. 1990. 4.98 1.60e- 3
## 5 ENSG0000… 4.46e-5 2.09e-5 -1.10 63.0 57.0 680. 548. 23.5 7.04e-12
## 6 ENSG0000… 6.95e-5 8.27e-5 0.251 40.5 54.6 588. 2846. 0.773 2.14e- 1
## # ℹ 19 more variables: padj <dbl>, CD4_chi <dbl>, CD4_fkSD <dbl>,
## # CD4_betaGroup <fct>, CD4_chiGroup <fct>, CD4_sdGroup <chr>, CD14_chi <dbl>,
## # CD14_fkSD <dbl>, CD14_betaGroup <fct>, CD14_chiGroup <fct>,
## # CD14_sdGroup <chr>, betaCategory <fct>, chi_mean <dbl>,
## # chi_mean_group <fct>, chi_lfc <dbl>, chi_lfc_group <fct>,
## # pause_change <chr>, deltaSD <dbl>, deltaMean <dbl>
# Each plotting method takes the TranscriptionRatesLRT object directly and
# reads the merged table from its lrtTbl slot internally.
p0 <- plotBetaQuantileHeatmap(lrt)
print(p0)Each cell shows the number of genes that fall into a given β quintile in CD4+ T cells (x-axis) and CD14+ monocytes (y-axis). Strong enrichment along the diagonal indicates that the relative ordering of genes by pause-escape rate is substantially preserved across cell types, even though widespread gene-level changes in β are detected by the LRT.
p1 <- plotBetaScatter(lrt,
label_x = -6,
label_y = -1,
x_lim = c(-6, -1),
y_lim = c(-6, -1))
print(p1)
LRT analysis identified widespread gene-level changes in β between human
CD4+ T cells and CD14+ monocytes, with 25.8% of genes showing
significantly increased β values and 36.3% showing significantly
decreased β values.
To compare kinetic and distributional changes directly, we mapped shifts in β against shifts in σ. Although individual genes display changes in both parameters, their joint distribution shows minimal overall association, indicating weak coupling between kinetic and distributional changes.
## Warning in geom_label(data = label_df, aes(x = .data$x, y = .data$y, label =
## .data$label), : Ignoring unknown parameters: `label.colour`
Although σ values show global alignment between the two cell types, substantial gene-level deviations from the diagonal are evident. Stratifying genes by sharp and broad pausing groups in each cell type reveals four classes of change. Notably, approximately 30.3% (1606 out of 5300) of genes switch between sharp and broad pausing groups between the two cell types, indicating that variation in pausing dispersion frequently involves changes between sharper and broader pause-site distributions.
Together, these analyses suggest that pause-escape kinetics and pause-site distribution exhibit distinct modes of variation across cellular contexts. While the relative ordering of genes by pause-escape rate remains largely preserved across cell types, pause-escape kinetics retain substantial regulatory plasticity. Variation in pausing dispersion more frequently involves changes between sharper and broader pause-site distributions. The weak association between changes in β and σ further supports the view that kinetic and distributional aspects of pausing can vary differently across cellular contexts.
## R version 4.6.0 Patched (2026-05-01 r89994)
## Platform: aarch64-apple-darwin23
## Running under: macOS Tahoe 26.3.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] BiocFileCache_3.3.0 dbplyr_2.5.2 pracma_2.4.6
## [4] ggplot2_4.0.3 plyranges_1.33.0 dplyr_1.2.1
## [7] GenomicRanges_1.65.0 Seqinfo_1.3.0 IRanges_2.47.2
## [10] S4Vectors_0.51.3 BiocGenerics_0.59.7 generics_0.1.4
## [13] STADyUM_1.3.1
##
## loaded via a namespace (and not attached):
## [1] DBI_1.3.0 bitops_1.0-9
## [3] httr2_1.2.2 rlang_1.2.0
## [5] magrittr_2.0.5 otel_0.2.0
## [7] matrixStats_1.5.0 compiler_4.6.0
## [9] RSQLite_3.53.1 vctrs_0.7.3
## [11] pkgconfig_2.0.3 crayon_1.5.3
## [13] fastmap_1.2.0 backports_1.5.1
## [15] XVector_0.53.0 labeling_0.4.3
## [17] utf8_1.2.6 Rsamtools_2.29.0
## [19] rmarkdown_2.31 UCSC.utils_1.9.0
## [21] purrr_1.2.2 bit_4.6.0
## [23] xfun_0.58 cachem_1.1.0
## [25] cigarillo_1.3.0 GenomeInfoDb_1.49.1
## [27] jsonlite_2.0.0 blob_1.3.0
## [29] DelayedArray_0.39.3 BiocParallel_1.47.0
## [31] broom_1.0.13 parallel_4.6.0
## [33] R6_2.6.1 bslib_0.11.0
## [35] RColorBrewer_1.1-3 rtracklayer_1.73.0
## [37] car_3.1-5 jquerylib_0.1.4
## [39] Rcpp_1.1.1-1.1 SummarizedExperiment_1.43.0
## [41] knitr_1.51 BiocBaseUtils_1.15.1
## [43] Matrix_1.7-5 tidyselect_1.2.1
## [45] dichromat_2.0-0.1 abind_1.4-8
## [47] yaml_2.3.12 codetools_0.2-20
## [49] curl_7.1.0 lattice_0.22-9
## [51] tibble_3.3.1 Biobase_2.73.1
## [53] withr_3.0.2 S7_0.2.2
## [55] evaluate_1.0.5 isoband_0.3.0
## [57] Biostrings_2.81.3 pillar_1.11.1
## [59] ggpubr_0.6.3 filelock_1.0.3
## [61] MatrixGenerics_1.25.0 carData_3.0-6
## [63] ggpointdensity_0.2.1 RCurl_1.98-1.19
## [65] scales_1.4.0 glue_1.8.1
## [67] tools_4.6.0 BiocIO_1.23.3
## [69] data.table_1.18.4 ggsignif_0.6.4
## [71] GenomicAlignments_1.49.0 XML_3.99-0.23
## [73] cowplot_1.2.0 grid_4.6.0
## [75] tidyr_1.3.2 restfulr_0.0.17
## [77] Formula_1.2-5 cli_3.6.6
## [79] rappdirs_0.3.4 S4Arrays_1.13.0
## [81] viridisLite_0.4.3 gtable_0.3.6
## [83] rstatix_0.7.3 sass_0.4.10
## [85] digest_0.6.39 SparseArray_1.13.2
## [87] rjson_0.2.23 farver_2.1.2
## [89] memoise_2.0.1 htmltools_0.5.9
## [91] lifecycle_1.0.5 httr_1.4.8
## [93] bit64_4.8.2 MASS_7.3-65