1 Introduction

LimROTS extends its reproducibility-optimized testing framework to survival analysis via the LimROTS_survival() function. It fits a Cox proportional hazards model Therneau (2024) — or, optionally, a competing risks model via the Fine–Gray subdistribution approach Gray (2024) — per feature across bootstrap resamples. The reproducibility-optimization step then selects the parameters \(\alpha_1\) and \(\alpha_2\) that maximize the overlap of top-ranked features over those resamples, yielding:

\[t_{\alpha(p)} = \frac{\beta_{(p)}}{\alpha_1 + \alpha_2 \times SE_{(p)}}\]

where \(\beta_{(p)}\) is the Cox model coefficient (log hazard ratio) and \(SE_{(p)}\) is its standard error for feature \(p\).

P-values are derived from permutation-based null distributions using the qvalue package Storey et al. (2024). This vignette demonstrates both models using a Crohn’s disease gut microbiome dataset.

2 Citation

If you use LimROTS in your research, please cite:

Anwar, A. M., Jeba, A., Lahti, L., & Coffey, E. (2025). LimROTS: A Hybrid Method Integrating Empirical Bayes and Reproducibility-Optimized Statistics for Robust Differential Expression Analysis. Bioinformatics, btaf570. https://doi.org/10.1093/bioinformatics/btaf570

3 Dataset Description

We use the crohn_survival dataset from the miaTime package Lahti, Borman, and Simsek (2024). This is a simulated dataset based on a real Crohn’s disease microbiome study (Gevers and others 2014; Calle and others 2023). It contains 150 individuals and 48 gut microbial taxa stored as a TreeSummarizedExperiment object.

The relevant colData columns are:

Column Description
diagnosis Group label ("CD" = Crohn’s disease, "Control")
event Binary event indicator (1 = Crohn’s disease, 0 = censored)
event_time Time to event or censoring (unitless)

The analysis goal is to identify taxa whose abundance is associated with time to Crohn’s disease diagnosis.

3.1 Loading the Data

library(LimROTS, quietly = TRUE)
library(mia, quietly = TRUE)
library(miaTime, quietly = TRUE)
library(TreeSummarizedExperiment, quietly = TRUE)
library(BiocParallel, quietly = TRUE)
library(SummarizedExperiment, quietly = TRUE)
library(ggplot2, quietly = TRUE)
data("crohn_survival")
print(crohn_survival)
#> class: TreeSummarizedExperiment 
#> dim: 48 150 
#> metadata(0):
#> assays(1): counts
#> rownames(48): g__Turicibacter g__Parabacteroides ...
#>   f__Rikenellaceae_g__ g__Bilophila
#> rowData names(4): order family genus taxonomy_id
#> colnames(150): 1939.SKBTI.0678 1939.SKBTI.0516 ... 1939.SKBTI.0237
#>   1939.SKBTI.0253.a
#> colData names(4): sample_id event event_time diagnosis
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: NULL
#> rowTree: NULL
#> colLinks: NULL
#> colTree: NULL

crohn_survival is a TreeSummarizedExperiment. We first apply the centered log-ratio (CLR) transformation using mia::transformAssay(), which stores the result as a named assay ("clr") alongside the original counts. We then coerce to a plain SummarizedExperiment as required by LimROTS_survival():

crohn_survival <- transformAssay(
    crohn_survival,
    assay.type = "counts",
    method     = "clr",
    pseudocount = TRUE,
    name       = "clr"
)
#> The assay contains already only strictly positive values. It is not recommended to add pseudocountin such case. To force adding pseudocount you canprovide a numeric value for the pseudocount argument.
assayNames(crohn_survival)
#> [1] "counts" "clr"
taxa_names <- rownames(crohn_survival)
crohn_survival <- as(crohn_survival, "SummarizedExperiment")
# as() coercion can drop rownames; restore them
if (is.null(rownames(crohn_survival))) {
    rownames(crohn_survival) <- taxa_names
}
print(crohn_survival)
#> class: SummarizedExperiment 
#> dim: 48 150 
#> metadata(0):
#> assays(2): counts clr
#> rownames(48): g__Turicibacter g__Parabacteroides ...
#>   f__Rikenellaceae_g__ g__Bilophila
#> rowData names(0):
#> colnames(150): 1939.SKBTI.0678 1939.SKBTI.0516 ... 1939.SKBTI.0237
#>   1939.SKBTI.0253.a
#> colData names(4): sample_id event event_time diagnosis

Sample names in crohn_survival contain . characters, which can cause issues in formula parsing. We replace them with _ in both the assay column names and the colData row names:

clean_names <- gsub("\\.", "_", colnames(crohn_survival))
colnames(crohn_survival) <- clean_names
rownames(colData(crohn_survival)) <- clean_names
# Rename event_time -> time as required by LimROTS_survival()
colnames(colData(crohn_survival))[
    colnames(colData(crohn_survival)) == "event_time"] <- "time"

Inspect the survival outcome distribution:

table(colData(crohn_survival)$diagnosis,
      colData(crohn_survival)$event)
#>          
#>            0  1
#>   CD       0 75
#>   control 75  0

4 Cox Proportional Hazards Analysis

The standard Cox model tests each taxon for an association with time-to-event independently of any other covariates.

4.1 Running LimROTS_survival

set.seed(1234, kind = "default", sample.kind = "default")
crohn_cox <- LimROTS_survival(
    x           = crohn_survival,
    assay.type  = "clr",
    niter       = 30,    # use >= 1000 for real analyses
    K           = 10,     # number of top taxa
    meta.info   = c("time", "event"),
    formula.str = "~ Surv(time, event)",
    competing_risks   = FALSE,
    verbose           = FALSE
)
#> Data is SummarizedExperiment object
#> Assay: clr will be used
#> Sanity check completed successfully!
#> Using MulticoreParam (Unix-like OS) with two workers.
#> qvalue() failed (return NULL): missing values and NaN's not allowed if 'na.rm' is FALSE

Note: niter = 30is used here to reduce build time. For production analyses, use niter = 1000 or higher and increase the number of parallel workers via the BPPARAM argument:

# Windows
BPPARAM <- SnowParam(workers = 4)
# Linux / macOS
BPPARAM <- MulticoreParam(workers = 4)
# Pass to LimROTS_survival(..., BPPARAM = BPPARAM)

4.2 Results

The results are appended to the rowData and metadata slots of the returned SummarizedExperiment.

cox_res <- data.frame(
    rowData(crohn_cox),
    row.names = rownames(crohn_cox)
)
head(cox_res[order(cox_res$pvalue), ])
#>                      statistics       pvalue qvalue FDR  exp_coef   BH.pvalue
#> g__Actinomyces        0.4097402 0.0003472222     NA   0 1.4211011 0.003333333
#> f__Gemellaceae_g__    0.3184780 0.0003472222     NA   0 1.3033357 0.003333333
#> g__Lachnospira        0.3245194 0.0003472222     NA   0 0.7613151 0.003333333
#> g__Granulicatella     0.3266076 0.0003472222     NA   0 1.3154138 0.003333333
#> g__Streptococcus      0.3690089 0.0003472222     NA   0 1.3544171 0.003333333
#> o__Clostridiales_g__  0.2836266 0.0010416667     NA   0 0.7889692 0.008333333

Volcano plot of log hazard ratios versus \(-\log_{10}\)(q-value):

cox_res$significant <- cox_res$FDR < 0.05

ggplot(cox_res, aes(
    x     = log(exp_coef),
    y     = -log10(pvalue),
    color = significant
)) +
    geom_point(alpha = 0.7, size = 2) +
    geom_hline(
        yintercept = -log10(0.05),
        linetype   = "dashed",
        color      = "steelblue"
    ) +
    scale_color_manual(
        values = c("FALSE" = "grey60", "TRUE" = "firebrick"),
        labels = c("FALSE" = "FDR \u2265 0.05", "TRUE" = "FDR < 0.05")
    ) +
    labs(
        title = "LimROTS Survival \u2014 Cox Model",
        x     = "Log Hazard Ratio",
        y     = expression(-log[10](pvalue)),
        color = NULL
    ) +
    theme_bw(base_size = 12) +
    theme(legend.position = "top")

5 Competing Risks Analysis

When a competing event (e.g., death from another cause, remission) prevents the event of interest from occurring, a competing risks model is more appropriate than the standard Cox model. LimROTS_survival() supports the Fine–Gray subdistribution hazard model Gray (2024) when competing_risks = TRUE.

The event variable must be coded as:

  • 0 — censored
  • 1 — event of interest
  • 2 — competing event

5.1 Preparing the Data

The crohn_survival dataset has a binary event column (0/1). For illustration, we recode half of the censored observations as having experienced a competing event:

crohn_cr <- crohn_survival

event_vec <- colData(crohn_cr)$event
censored_idx <- which(event_vec == 0)

set.seed(42)
competing_idx <- sample(censored_idx, floor(length(censored_idx) / 2))
event_vec[competing_idx] <- 2L

colData(crohn_cr)$event <- event_vec
table(colData(crohn_cr)$event)
#> 
#>  0  1  2 
#> 38 75 37

Note: In a real study, the competing event column should reflect a genuine competing risk recorded in the original data (e.g., surgical intervention, withdrawal). The recoding above is purely for demonstration purposes.

5.2 Running LimROTS_survival

set.seed(1234, kind = "default", sample.kind = "default")
crohn_cr <- LimROTS_survival(
    x           = crohn_cr,
    assay.type  = "clr",
    niter       = 30,
    K           = 10,
    meta.info   = c("time", "event"),
    formula.str = "~ Surv(time, event)",
    competing_risks   = TRUE,
    verbose           = FALSE
)
#> Data is SummarizedExperiment object
#> Assay: clr will be used
#> Sanity check completed successfully!
#> Using MulticoreParam (Unix-like OS) with two workers.

5.3 Results

cr_res <- data.frame(
    rowData(crohn_cr),
    row.names = rownames(crohn_cr)
)
head(cr_res[order(cr_res$pvalue), ])
#>                     statistics       pvalue       qvalue FDR  exp_coef
#> g__Bacteroides        4.019771 0.0003472222 0.0006430041   0 0.8111478
#> g__Faecalibacterium   3.753486 0.0003472222 0.0006430041   0 0.8339016
#> g__Dialister          3.815381 0.0003472222 0.0006430041   0 1.1853826
#> g__Actinomyces        3.580579 0.0003472222 0.0006430041   0 1.4457244
#> f__Gemellaceae_g__    4.347935 0.0003472222 0.0006430041   0 1.3092273
#> g__Roseburia          3.941297 0.0003472222 0.0006430041   0 0.7931892
#>                       BH.pvalue
#> g__Bacteroides      0.001851852
#> g__Faecalibacterium 0.001851852
#> g__Dialister        0.001851852
#> g__Actinomyces      0.001851852
#> f__Gemellaceae_g__  0.001851852
#> g__Roseburia        0.001851852

Volcano plot for the competing risks model:

cr_res$significant <- cr_res$FDR < 0.05

ggplot(cr_res, aes(
    x     = log(exp_coef),
    y     = -log10(pvalue),
    color = significant
)) +
    geom_point(alpha = 0.7, size = 2) +
    geom_hline(
        yintercept = -log10(0.05),
        linetype   = "dashed",
        color      = "steelblue"
    ) +
    scale_color_manual(
        values = c("FALSE" = "grey60", "TRUE" = "firebrick"),
        labels = c("FALSE" = "FDR \u2265 0.05", "TRUE" = "FDR < 0.05")
    ) +
    labs(
        title = "LimROTS Survival \u2014 Competing Risks Model",
        x     = "Log Subdistribution Hazard Ratio",
        y     = expression(-log[10](pvalue)),
        color = NULL
    ) +
    theme_bw(base_size = 12) +
    theme(legend.position = "top")

6 Quality Control

plot(metadata(crohn_cr)[["q_values"]])

hist(metadata(crohn_cr)[["q_values"]])

print(summary(metadata(crohn_cr)[["q_values"]]))
sessionInfo()
#> R version 4.6.0 alpha (2026-04-05 r89794)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] miaTime_1.1.0                   mia_1.19.7                     
#>  [3] TreeSummarizedExperiment_2.19.0 Biostrings_2.79.5              
#>  [5] XVector_0.51.0                  SingleCellExperiment_1.33.2    
#>  [7] MultiAssayExperiment_1.37.4     ggplot2_4.0.2                  
#>  [9] BiocParallel_1.45.0             LimROTS_1.3.23                 
#> [11] SummarizedExperiment_1.41.1     Biobase_2.71.0                 
#> [13] GenomicRanges_1.63.2            Seqinfo_1.1.0                  
#> [15] IRanges_2.45.0                  S4Vectors_0.49.1               
#> [17] BiocGenerics_0.57.0             generics_0.1.4                 
#> [19] MatrixGenerics_1.23.0           matrixStats_1.5.0              
#> [21] BiocStyle_2.39.0               
#> 
#> loaded via a namespace (and not attached):
#>   [1] DBI_1.3.0                   gridExtra_2.3              
#>   [3] permute_0.9-10              rlang_1.2.0                
#>   [5] magrittr_2.0.5              scater_1.39.4              
#>   [7] otel_0.2.0                  compiler_4.6.0             
#>   [9] mgcv_1.9-4                  DelayedMatrixStats_1.33.0  
#>  [11] vctrs_0.7.2                 reshape2_1.4.5             
#>  [13] stringr_1.6.0               pkgconfig_2.0.3            
#>  [15] crayon_1.5.3                fastmap_1.2.0              
#>  [17] magick_2.9.1                scuttle_1.21.2             
#>  [19] labeling_0.4.3              rmarkdown_2.31             
#>  [21] ggbeeswarm_0.7.3            DirichletMultinomial_1.53.0
#>  [23] tinytex_0.59                purrr_1.2.1                
#>  [25] bluster_1.21.1              xfun_0.57                  
#>  [27] cachem_1.1.0                beachmat_2.27.3            
#>  [29] jsonlite_2.0.0              DelayedArray_0.37.1        
#>  [31] cluster_2.1.8.2             irlba_2.3.7                
#>  [33] parallel_4.6.0              R6_2.6.1                   
#>  [35] bslib_0.10.0                stringi_1.8.7              
#>  [37] RColorBrewer_1.1-3          limma_3.67.0               
#>  [39] jquerylib_0.1.4             Rcpp_1.1.1                 
#>  [41] bookdown_0.46               knitr_1.51                 
#>  [43] DECIPHER_3.7.1              igraph_2.2.2               
#>  [45] Matrix_1.7-5                splines_4.6.0              
#>  [47] tidyselect_1.2.1            qvalue_2.43.0              
#>  [49] dichromat_2.0-0.1           abind_1.4-8                
#>  [51] yaml_2.3.12                 viridis_0.6.5              
#>  [53] vegan_2.7-3                 codetools_0.2-20           
#>  [55] lattice_0.22-9              tibble_3.3.1               
#>  [57] plyr_1.8.9                  treeio_1.35.0              
#>  [59] withr_3.0.2                 S7_0.2.1                   
#>  [61] evaluate_1.0.5              survival_3.8-6             
#>  [63] pillar_1.11.1               BiocManager_1.30.27        
#>  [65] sparseMatrixStats_1.23.0    scales_1.4.0               
#>  [67] tidytree_0.4.7              glue_1.8.0                 
#>  [69] lazyeval_0.2.3              tools_4.6.0                
#>  [71] BiocNeighbors_2.5.4         ScaledMatrix_1.19.0        
#>  [73] fs_2.0.1                    grid_4.6.0                 
#>  [75] ecodive_2.2.2               tidyr_1.3.2                
#>  [77] ape_5.8-1                   nlme_3.1-169               
#>  [79] beeswarm_0.4.0              BiocSingular_1.27.1        
#>  [81] vipor_0.4.7                 cli_3.6.5                  
#>  [83] rsvd_1.0.5                  rappdirs_0.3.4             
#>  [85] S4Arrays_1.11.1             viridisLite_0.4.3          
#>  [87] dplyr_1.2.1                 gtable_0.3.6               
#>  [89] yulab.utils_0.2.4           sass_0.4.10                
#>  [91] digest_0.6.39               SparseArray_1.11.13        
#>  [93] ggrepel_0.9.8               decontam_1.31.0            
#>  [95] farver_2.1.2                htmltools_0.5.9            
#>  [97] cmprsk_2.2-12               lifecycle_1.0.5            
#>  [99] statmod_1.5.1               MASS_7.3-65

References

Calle, M. Luz, and others. 2023. “Coda4microbiome: Compositional Data Analysis for Microbiome Cross-Sectional and Longitudinal Studies.” BMC Bioinformatics 24: 82. https://doi.org/10.1186/s12859-023-05205-3.

Gevers, Dirk, and others. 2014. “The Treatment-Naive Microbiome in New-Onset Crohn’s Disease.” Cell Host & Microbe 15 (4): 382–92. https://doi.org/10.1016/j.chom.2014.02.005.

Gray, Bob. 2024. Cmprsk: Subdistribution Analysis of Competing Risks. https://CRAN.R-project.org/package=cmprsk.

Lahti, Leo, Tuomas Borman, and Yagmur Simsek. 2024. MiaTime: Microbiome Time Series Analysis. https://microbiome.github.io/miaTime/.

Storey, John D., Andrew J. Bass, Alan Dabney, and David Robinson. 2024. Qvalue: Q-Value Estimation for False Discovery Rate Control. https://doi.org/10.18129/B9.bioc.qvalue.

Therneau, Terry M. 2024. A Package for Survival Analysis in R. https://CRAN.R-project.org/package=survival.