Introduction

This tutorial demonstrates how to use the STADyUM package to perform comparative analysis of pause-escape kinetics and pause-site distributions across cell types using the likelihood-ratio tests (LRTs) developed in Zeng et al. (2026)

In this workflow, we will:

  1. Load pre-prepared estimated transcription rates objects TranscriptionRates for human CD4+ T cells and human CD14 monocytes. To learn how to estimate transcription rates from PRO-seq data, refer to STADyUM_Rate_Estimation_Tutorial.Rmd.
  2. Run likelihoodRatioTest
  3. Perform visualizations such as scatter plots, density plots, and heatmaps built into STADyUM to analyze the differences in pause-escape kinetics and pause site distribution

This example uses human PRO-seq data from two immunologically distinct cell types to demonstrate comparative transcription rate estimation.

Overview of the comparative probabilistic framework for promoter-proximal pausing analysis across conditions, cell types, and species. The framework uses likelihood-ratio tests to quantify changes in pause-escape kinetics and pausing distributions from nascent RNA sequencing data
Overview of the comparative probabilistic framework for promoter-proximal pausing analysis across conditions, cell types, and species. The framework uses likelihood-ratio tests to quantify changes in pause-escape kinetics and pausing distributions from nascent RNA sequencing data

Setup

Load Required Libraries and Data

knitr::opts_chunk$set(dpi = 72, fig.width = 5, fig.height = 4)

library(STADyUM)           # For transcription rate estimation
library(GenomicRanges)     # For genomic data manipulation
library(dplyr) 
library(plyranges)        
#library(tidyverse)         # For data manipulation and visualization
library(ggplot2)           # For plotting
library(pracma)
library(BiocFileCache)
# Required by Bioconductor to check for connection
if (!requireNamespace("BiocFileCache", quietly = TRUE) ||
    !curl::has_internet()) {
  knitr::opts_chunk$set(eval = FALSE)
  message("No internet connection; vignette code not evaluated.")
}

bfc <- BiocFileCache(ask = FALSE)
zip_path <- bfcrpath(bfc, "https://zenodo.org/records/20618059/files/lrt_vignette_data.zip")
## adding rname 'https://zenodo.org/records/20618059/files/lrt_vignette_data.zip'
unzip(zip_path, exdir = tempdir())
data_dir <- file.path(tempdir(), "vignettes", "lrt_test_data")

# STADyUM `ExperimentTranscriptionRates` objects containing transcription rates estimated from human CD4+ T cell and human CD14+ monocyte PRO-seq data.
cd4_rate <- readRDS(file.path(data_dir, 'cd4_rate.RDS'))
cd14_rate <- readRDS(file.path(data_dir, 'cd14_rate.RDS'))
scale_factor <- 38359083 / 33294862

Running STADyUM

Likelihood Ratio Test Calculations

The likelihoodRatioTest() function compares promoter-proximal pausing kinetics between two TranscriptionRates objects from estimateTranscriptionRates(). For experimental data, genes present in both conditions are matched, then three complementary likelihood-ratio tests are run per transcription unit:

  1. Chi (χ) LRT — tests whether elongation activity differs between conditions using gene-body read counts. Chi estimates from condition 2 can be scaled by library-size or spike-in factors (via the optional spikeInFile argument).
  2. Beta (β) LRT — tests whether pause-release rate differs between conditions using promoter-proximal pause-site read counts. The null model (H0) assumes a shared β across conditions; the alternative (H1) allows separate β per condition. Both models are fit with expectation-maximization (EM).
  3. Pause-site distribution (Fk) LRT — tests whether the shape of the pause-site distribution differs between conditions, again comparing shared versus condition-specific models fit with EM.

The scaleFactor argument accounts for sequencing-depth differences between samples when comparing pause-related statistics. Here we use the ratio of total mapped reads between CD4+ T cells and CD14 monocytes.

For each gene, the function returns log2 fold changes, LRT test statistics, and Benjamini-Hochberg adjusted p-values. Results are stored in a TranscriptionRatesLRT object with three per-gene tables (chiTbl, betaTbl, and fkTbl) and a merged summary table (lrtTbl) used by the plotting functions below.

lrt <- likelihoodRatioTest(cd4_rate, cd14_rate, scale_factor)

# The merged per-gene LRT summary table is built automatically and stored in
# the lrtTbl slot, ready to be passed straight to the plotting methods below.
head(lrtTbl(lrt))
## # A tibble: 6 × 29
##   geneId      beta1   beta2    lfc fkMean1 fkMean2 fkVar1 fkVar2 tStats        p
##   <chr>       <dbl>   <dbl>  <dbl>   <dbl>   <dbl>  <dbl>  <dbl>  <dbl>    <dbl>
## 1 ENSG0000… 2.34e-4 1.44e-4 -0.700    40.5    55.9   583.   606. 18.7   9.97e-10
## 2 ENSG0000… 2.21e-4 9.66e-5 -1.19     68.5    78.2  1585.  1874. 18.3   1.50e- 9
## 3 ENSG0000… 2.11e-4 5.89e-4  1.48     81.8    86.3  1636.  2659. 19.2   5.84e-10
## 4 ENSG0000… 2.42e-4 1.76e-4 -0.461    96.3   129.   2993.  1990.  4.98  1.60e- 3
## 5 ENSG0000… 4.46e-5 2.09e-5 -1.10     63.0    57.0   680.   548. 23.5   7.04e-12
## 6 ENSG0000… 6.95e-5 8.27e-5  0.251    40.5    54.6   588.  2846.  0.773 2.14e- 1
## # ℹ 19 more variables: padj <dbl>, CD4_chi <dbl>, CD4_fkSD <dbl>,
## #   CD4_betaGroup <fct>, CD4_chiGroup <fct>, CD4_sdGroup <chr>, CD14_chi <dbl>,
## #   CD14_fkSD <dbl>, CD14_betaGroup <fct>, CD14_chiGroup <fct>,
## #   CD14_sdGroup <chr>, betaCategory <fct>, chi_mean <dbl>,
## #   chi_mean_group <fct>, chi_lfc <dbl>, chi_lfc_group <fct>,
## #   pause_change <chr>, deltaSD <dbl>, deltaMean <dbl>

Visualization

Pause-escape Rate Quantile Heatmap

# Each plotting method takes the TranscriptionRatesLRT object directly and
# reads the merged table from its lrtTbl slot internally.
p0 <- plotBetaQuantileHeatmap(lrt)

print(p0)

Each cell shows the number of genes that fall into a given β quintile in CD4+ T cells (x-axis) and CD14+ monocytes (y-axis). Strong enrichment along the diagonal indicates that the relative ordering of genes by pause-escape rate is substantially preserved across cell types, even though widespread gene-level changes in β are detected by the LRT.

Pause-escape Rate Scatter Plot

p1 <- plotBetaScatter(lrt,
                      label_x = -6,
                      label_y = -1,
                      x_lim   = c(-6, -1),
                      y_lim   = c(-6, -1))

print(p1)

LRT analysis identified widespread gene-level changes in β between human CD4+ T cells and CD14+ monocytes, with 25.8% of genes showing significantly increased β values and 36.3% showing significantly decreased β values.

Pause-escape Rate and Pausing Dispersion Contour Plot

p2 <- plotDeltaBetaSigma(lrt, filter_lfc = TRUE)

print(p2)

To compare kinetic and distributional changes directly, we mapped shifts in β against shifts in σ. Although individual genes display changes in both parameters, their joint distribution shows minimal overall association, indicating weak coupling between kinetic and distributional changes.

Pausing Dispersion Contour Plot

p3 <- plotFksdDensity(lrt)
## Warning in geom_label(data = label_df, aes(x = .data$x, y = .data$y, label =
## .data$label), : Ignoring unknown parameters: `label.colour`
print(p3)

Although σ values show global alignment between the two cell types, substantial gene-level deviations from the diagonal are evident. Stratifying genes by sharp and broad pausing groups in each cell type reveals four classes of change. Notably, approximately 30.3% (1606 out of 5300) of genes switch between sharp and broad pausing groups between the two cell types, indicating that variation in pausing dispersion frequently involves changes between sharper and broader pause-site distributions.

Summary

Together, these analyses suggest that pause-escape kinetics and pause-site distribution exhibit distinct modes of variation across cellular contexts. While the relative ordering of genes by pause-escape rate remains largely preserved across cell types, pause-escape kinetics retain substantial regulatory plasticity. Variation in pausing dispersion more frequently involves changes between sharper and broader pause-site distributions. The weak association between changes in β and σ further supports the view that kinetic and distributional aspects of pausing can vary differently across cellular contexts.

Session Information

sessionInfo()
## R version 4.6.0 Patched (2026-05-01 r89994)
## Platform: aarch64-apple-darwin23
## Running under: macOS Tahoe 26.3.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] BiocFileCache_3.3.0  dbplyr_2.5.2         pracma_2.4.6        
##  [4] ggplot2_4.0.3        plyranges_1.33.0     dplyr_1.2.1         
##  [7] GenomicRanges_1.65.0 Seqinfo_1.3.0        IRanges_2.47.2      
## [10] S4Vectors_0.51.3     BiocGenerics_0.59.7  generics_0.1.4      
## [13] STADyUM_1.3.1       
## 
## loaded via a namespace (and not attached):
##  [1] DBI_1.3.0                   bitops_1.0-9               
##  [3] httr2_1.2.2                 rlang_1.2.0                
##  [5] magrittr_2.0.5              otel_0.2.0                 
##  [7] matrixStats_1.5.0           compiler_4.6.0             
##  [9] RSQLite_3.53.1              vctrs_0.7.3                
## [11] pkgconfig_2.0.3             crayon_1.5.3               
## [13] fastmap_1.2.0               backports_1.5.1            
## [15] XVector_0.53.0              labeling_0.4.3             
## [17] utf8_1.2.6                  Rsamtools_2.29.0           
## [19] rmarkdown_2.31              UCSC.utils_1.9.0           
## [21] purrr_1.2.2                 bit_4.6.0                  
## [23] xfun_0.58                   cachem_1.1.0               
## [25] cigarillo_1.3.0             GenomeInfoDb_1.49.1        
## [27] jsonlite_2.0.0              blob_1.3.0                 
## [29] DelayedArray_0.39.3         BiocParallel_1.47.0        
## [31] broom_1.0.13                parallel_4.6.0             
## [33] R6_2.6.1                    bslib_0.11.0               
## [35] RColorBrewer_1.1-3          rtracklayer_1.73.0         
## [37] car_3.1-5                   jquerylib_0.1.4            
## [39] Rcpp_1.1.1-1.1              SummarizedExperiment_1.43.0
## [41] knitr_1.51                  BiocBaseUtils_1.15.1       
## [43] Matrix_1.7-5                tidyselect_1.2.1           
## [45] dichromat_2.0-0.1           abind_1.4-8                
## [47] yaml_2.3.12                 codetools_0.2-20           
## [49] curl_7.1.0                  lattice_0.22-9             
## [51] tibble_3.3.1                Biobase_2.73.1             
## [53] withr_3.0.2                 S7_0.2.2                   
## [55] evaluate_1.0.5              isoband_0.3.0              
## [57] Biostrings_2.81.3           pillar_1.11.1              
## [59] ggpubr_0.6.3                filelock_1.0.3             
## [61] MatrixGenerics_1.25.0       carData_3.0-6              
## [63] ggpointdensity_0.2.1        RCurl_1.98-1.19            
## [65] scales_1.4.0                glue_1.8.1                 
## [67] tools_4.6.0                 BiocIO_1.23.3              
## [69] data.table_1.18.4           ggsignif_0.6.4             
## [71] GenomicAlignments_1.49.0    XML_3.99-0.23              
## [73] cowplot_1.2.0               grid_4.6.0                 
## [75] tidyr_1.3.2                 restfulr_0.0.17            
## [77] Formula_1.2-5               cli_3.6.6                  
## [79] rappdirs_0.3.4              S4Arrays_1.13.0            
## [81] viridisLite_0.4.3           gtable_0.3.6               
## [83] rstatix_0.7.3               sass_0.4.10                
## [85] digest_0.6.39               SparseArray_1.13.2         
## [87] rjson_0.2.23                farver_2.1.2               
## [89] memoise_2.0.1               htmltools_0.5.9            
## [91] lifecycle_1.0.5             httr_1.4.8                 
## [93] bit64_4.8.2                 MASS_7.3-65
Zeng, Xin, Gilad Barshad, Rebecca Hassett, et al. 2026. “A Comparative Analysis of Promoter-Proximal Pausing Reveals Kinetic and Distributional Dimensions of Variation.” bioRxiv, ahead of print. https://doi.org/10.64898/2026.06.01.729264.