LimROTS extends its reproducibility-optimized testing framework to
survival analysis via the LimROTS_survival() function. It fits a Cox
proportional hazards model Therneau (2024) — or, optionally, a competing risks
model via the Fine–Gray subdistribution approach Gray (2024) — per feature
across bootstrap resamples. The reproducibility-optimization step then
selects the parameters \(\alpha_1\) and \(\alpha_2\) that maximize the
overlap of top-ranked features over those resamples, yielding:
\[t_{\alpha(p)} = \frac{\beta_{(p)}}{\alpha_1 + \alpha_2 \times SE_{(p)}}\]
where \(\beta_{(p)}\) is the Cox model coefficient (log hazard ratio) and \(SE_{(p)}\) is its standard error for feature \(p\).
P-values are derived from permutation-based null distributions using the
qvalue package Storey et al. (2024). This vignette demonstrates both models using a
Crohn’s disease gut microbiome dataset.
If you use LimROTS in your research, please cite:
Anwar, A. M., Jeba, A., Lahti, L., & Coffey, E. (2025). LimROTS: A Hybrid Method Integrating Empirical Bayes and Reproducibility-Optimized Statistics for Robust Differential Expression Analysis. Bioinformatics, btaf570. https://doi.org/10.1093/bioinformatics/btaf570
We use the crohn_survival dataset from the miaTime package Lahti, Borman, and Simsek (2024).
This is a simulated dataset based on a real Crohn’s disease microbiome
study (Gevers and others 2014; Calle and others 2023). It contains 150 individuals
and 48 gut microbial taxa stored as a TreeSummarizedExperiment
object.
The relevant colData columns are:
| Column | Description |
|---|---|
diagnosis |
Group label ("CD" = Crohn’s disease, "Control") |
event |
Binary event indicator (1 = Crohn’s disease, 0 = censored) |
event_time |
Time to event or censoring (unitless) |
The analysis goal is to identify taxa whose abundance is associated with time to Crohn’s disease diagnosis.
library(LimROTS, quietly = TRUE)
library(mia, quietly = TRUE)
library(miaTime, quietly = TRUE)
library(TreeSummarizedExperiment, quietly = TRUE)
library(BiocParallel, quietly = TRUE)
library(SummarizedExperiment, quietly = TRUE)
library(ggplot2, quietly = TRUE)
data("crohn_survival")
print(crohn_survival)
#> class: TreeSummarizedExperiment
#> dim: 48 150
#> metadata(0):
#> assays(1): counts
#> rownames(48): g__Turicibacter g__Parabacteroides ...
#> f__Rikenellaceae_g__ g__Bilophila
#> rowData names(4): order family genus taxonomy_id
#> colnames(150): 1939.SKBTI.0678 1939.SKBTI.0516 ... 1939.SKBTI.0237
#> 1939.SKBTI.0253.a
#> colData names(4): sample_id event event_time diagnosis
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: NULL
#> rowTree: NULL
#> colLinks: NULL
#> colTree: NULL
crohn_survival is a TreeSummarizedExperiment. We first apply the
centered log-ratio (CLR) transformation using mia::transformAssay(),
which stores the result as a named assay ("clr") alongside the
original counts. We then coerce to a plain SummarizedExperiment as
required by LimROTS_survival():
crohn_survival <- transformAssay(
crohn_survival,
assay.type = "counts",
method = "clr",
pseudocount = TRUE,
name = "clr"
)
#> The assay contains already only strictly positive values. It is not recommended to add pseudocountin such case. To force adding pseudocount you canprovide a numeric value for the pseudocount argument.
assayNames(crohn_survival)
#> [1] "counts" "clr"
taxa_names <- rownames(crohn_survival)
crohn_survival <- as(crohn_survival, "SummarizedExperiment")
# as() coercion can drop rownames; restore them
if (is.null(rownames(crohn_survival))) {
rownames(crohn_survival) <- taxa_names
}
print(crohn_survival)
#> class: SummarizedExperiment
#> dim: 48 150
#> metadata(0):
#> assays(2): counts clr
#> rownames(48): g__Turicibacter g__Parabacteroides ...
#> f__Rikenellaceae_g__ g__Bilophila
#> rowData names(0):
#> colnames(150): 1939.SKBTI.0678 1939.SKBTI.0516 ... 1939.SKBTI.0237
#> 1939.SKBTI.0253.a
#> colData names(4): sample_id event event_time diagnosis
Sample names in crohn_survival contain . characters, which can cause
issues in formula parsing. We replace them with _ in both the assay
column names and the colData row names:
clean_names <- gsub("\\.", "_", colnames(crohn_survival))
colnames(crohn_survival) <- clean_names
rownames(colData(crohn_survival)) <- clean_names
# Rename event_time -> time as required by LimROTS_survival()
colnames(colData(crohn_survival))[
colnames(colData(crohn_survival)) == "event_time"] <- "time"
Inspect the survival outcome distribution:
table(colData(crohn_survival)$diagnosis,
colData(crohn_survival)$event)
#>
#> 0 1
#> CD 0 75
#> control 75 0
The standard Cox model tests each taxon for an association with time-to-event independently of any other covariates.
LimROTS_survivalset.seed(1234, kind = "default", sample.kind = "default")
crohn_cox <- LimROTS_survival(
x = crohn_survival,
assay.type = "clr",
niter = 30, # use >= 1000 for real analyses
K = 10, # number of top taxa
meta.info = c("time", "event"),
formula.str = "~ Surv(time, event)",
competing_risks = FALSE,
verbose = FALSE
)
#> Data is SummarizedExperiment object
#> Assay: clr will be used
#> Sanity check completed successfully!
#> Using MulticoreParam (Unix-like OS) with two workers.
#> qvalue() failed (return NULL): missing values and NaN's not allowed if 'na.rm' is FALSE
Note:
niter = 30is used here to reduce build time. For production analyses, useniter = 1000or higher and increase the number of parallel workers via theBPPARAMargument:# Windows BPPARAM <- SnowParam(workers = 4) # Linux / macOS BPPARAM <- MulticoreParam(workers = 4) # Pass to LimROTS_survival(..., BPPARAM = BPPARAM)
The results are appended to the rowData and metadata slots of the
returned SummarizedExperiment.
cox_res <- data.frame(
rowData(crohn_cox),
row.names = rownames(crohn_cox)
)
head(cox_res[order(cox_res$pvalue), ])
#> statistics pvalue qvalue FDR exp_coef BH.pvalue
#> g__Actinomyces 0.4097402 0.0003472222 NA 0 1.4211011 0.003333333
#> f__Gemellaceae_g__ 0.3184780 0.0003472222 NA 0 1.3033357 0.003333333
#> g__Lachnospira 0.3245194 0.0003472222 NA 0 0.7613151 0.003333333
#> g__Granulicatella 0.3266076 0.0003472222 NA 0 1.3154138 0.003333333
#> g__Streptococcus 0.3690089 0.0003472222 NA 0 1.3544171 0.003333333
#> o__Clostridiales_g__ 0.2836266 0.0010416667 NA 0 0.7889692 0.008333333
Volcano plot of log hazard ratios versus \(-\log_{10}\)(q-value):
cox_res$significant <- cox_res$FDR < 0.05
ggplot(cox_res, aes(
x = log(exp_coef),
y = -log10(pvalue),
color = significant
)) +
geom_point(alpha = 0.7, size = 2) +
geom_hline(
yintercept = -log10(0.05),
linetype = "dashed",
color = "steelblue"
) +
scale_color_manual(
values = c("FALSE" = "grey60", "TRUE" = "firebrick"),
labels = c("FALSE" = "FDR \u2265 0.05", "TRUE" = "FDR < 0.05")
) +
labs(
title = "LimROTS Survival \u2014 Cox Model",
x = "Log Hazard Ratio",
y = expression(-log[10](pvalue)),
color = NULL
) +
theme_bw(base_size = 12) +
theme(legend.position = "top")
When a competing event (e.g., death from another cause, remission)
prevents the event of interest from occurring, a competing risks model
is more appropriate than the standard Cox model. LimROTS_survival()
supports the Fine–Gray subdistribution hazard model Gray (2024) when
competing_risks = TRUE.
The event variable must be coded as:
The crohn_survival dataset has a binary event column (0/1). For
illustration, we recode half of the censored observations as having
experienced a competing event:
crohn_cr <- crohn_survival
event_vec <- colData(crohn_cr)$event
censored_idx <- which(event_vec == 0)
set.seed(42)
competing_idx <- sample(censored_idx, floor(length(censored_idx) / 2))
event_vec[competing_idx] <- 2L
colData(crohn_cr)$event <- event_vec
table(colData(crohn_cr)$event)
#>
#> 0 1 2
#> 38 75 37
Note: In a real study, the competing event column should reflect a genuine competing risk recorded in the original data (e.g., surgical intervention, withdrawal). The recoding above is purely for demonstration purposes.
LimROTS_survivalset.seed(1234, kind = "default", sample.kind = "default")
crohn_cr <- LimROTS_survival(
x = crohn_cr,
assay.type = "clr",
niter = 30,
K = 10,
meta.info = c("time", "event"),
formula.str = "~ Surv(time, event)",
competing_risks = TRUE,
verbose = FALSE
)
#> Data is SummarizedExperiment object
#> Assay: clr will be used
#> Sanity check completed successfully!
#> Using MulticoreParam (Unix-like OS) with two workers.
cr_res <- data.frame(
rowData(crohn_cr),
row.names = rownames(crohn_cr)
)
head(cr_res[order(cr_res$pvalue), ])
#> statistics pvalue qvalue FDR exp_coef
#> g__Bacteroides 4.019771 0.0003472222 0.0006430041 0 0.8111478
#> g__Faecalibacterium 3.753486 0.0003472222 0.0006430041 0 0.8339016
#> g__Dialister 3.815381 0.0003472222 0.0006430041 0 1.1853826
#> g__Actinomyces 3.580579 0.0003472222 0.0006430041 0 1.4457244
#> f__Gemellaceae_g__ 4.347935 0.0003472222 0.0006430041 0 1.3092273
#> g__Roseburia 3.941297 0.0003472222 0.0006430041 0 0.7931892
#> BH.pvalue
#> g__Bacteroides 0.001851852
#> g__Faecalibacterium 0.001851852
#> g__Dialister 0.001851852
#> g__Actinomyces 0.001851852
#> f__Gemellaceae_g__ 0.001851852
#> g__Roseburia 0.001851852
Volcano plot for the competing risks model:
cr_res$significant <- cr_res$FDR < 0.05
ggplot(cr_res, aes(
x = log(exp_coef),
y = -log10(pvalue),
color = significant
)) +
geom_point(alpha = 0.7, size = 2) +
geom_hline(
yintercept = -log10(0.05),
linetype = "dashed",
color = "steelblue"
) +
scale_color_manual(
values = c("FALSE" = "grey60", "TRUE" = "firebrick"),
labels = c("FALSE" = "FDR \u2265 0.05", "TRUE" = "FDR < 0.05")
) +
labs(
title = "LimROTS Survival \u2014 Competing Risks Model",
x = "Log Subdistribution Hazard Ratio",
y = expression(-log[10](pvalue)),
color = NULL
) +
theme_bw(base_size = 12) +
theme(legend.position = "top")
plot(metadata(crohn_cr)[["q_values"]])
hist(metadata(crohn_cr)[["q_values"]])
print(summary(metadata(crohn_cr)[["q_values"]]))
sessionInfo()
#> R version 4.6.0 alpha (2026-04-05 r89794)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] miaTime_1.1.0 mia_1.19.7
#> [3] TreeSummarizedExperiment_2.19.0 Biostrings_2.79.5
#> [5] XVector_0.51.0 SingleCellExperiment_1.33.2
#> [7] MultiAssayExperiment_1.37.4 ggplot2_4.0.2
#> [9] BiocParallel_1.45.0 LimROTS_1.3.23
#> [11] SummarizedExperiment_1.41.1 Biobase_2.71.0
#> [13] GenomicRanges_1.63.2 Seqinfo_1.1.0
#> [15] IRanges_2.45.0 S4Vectors_0.49.1
#> [17] BiocGenerics_0.57.0 generics_0.1.4
#> [19] MatrixGenerics_1.23.0 matrixStats_1.5.0
#> [21] BiocStyle_2.39.0
#>
#> loaded via a namespace (and not attached):
#> [1] DBI_1.3.0 gridExtra_2.3
#> [3] permute_0.9-10 rlang_1.2.0
#> [5] magrittr_2.0.5 scater_1.39.4
#> [7] otel_0.2.0 compiler_4.6.0
#> [9] mgcv_1.9-4 DelayedMatrixStats_1.33.0
#> [11] vctrs_0.7.2 reshape2_1.4.5
#> [13] stringr_1.6.0 pkgconfig_2.0.3
#> [15] crayon_1.5.3 fastmap_1.2.0
#> [17] magick_2.9.1 scuttle_1.21.2
#> [19] labeling_0.4.3 rmarkdown_2.31
#> [21] ggbeeswarm_0.7.3 DirichletMultinomial_1.53.0
#> [23] tinytex_0.59 purrr_1.2.1
#> [25] bluster_1.21.1 xfun_0.57
#> [27] cachem_1.1.0 beachmat_2.27.3
#> [29] jsonlite_2.0.0 DelayedArray_0.37.1
#> [31] cluster_2.1.8.2 irlba_2.3.7
#> [33] parallel_4.6.0 R6_2.6.1
#> [35] bslib_0.10.0 stringi_1.8.7
#> [37] RColorBrewer_1.1-3 limma_3.67.0
#> [39] jquerylib_0.1.4 Rcpp_1.1.1
#> [41] bookdown_0.46 knitr_1.51
#> [43] DECIPHER_3.7.1 igraph_2.2.2
#> [45] Matrix_1.7-5 splines_4.6.0
#> [47] tidyselect_1.2.1 qvalue_2.43.0
#> [49] dichromat_2.0-0.1 abind_1.4-8
#> [51] yaml_2.3.12 viridis_0.6.5
#> [53] vegan_2.7-3 codetools_0.2-20
#> [55] lattice_0.22-9 tibble_3.3.1
#> [57] plyr_1.8.9 treeio_1.35.0
#> [59] withr_3.0.2 S7_0.2.1
#> [61] evaluate_1.0.5 survival_3.8-6
#> [63] pillar_1.11.1 BiocManager_1.30.27
#> [65] sparseMatrixStats_1.23.0 scales_1.4.0
#> [67] tidytree_0.4.7 glue_1.8.0
#> [69] lazyeval_0.2.3 tools_4.6.0
#> [71] BiocNeighbors_2.5.4 ScaledMatrix_1.19.0
#> [73] fs_2.0.1 grid_4.6.0
#> [75] ecodive_2.2.2 tidyr_1.3.2
#> [77] ape_5.8-1 nlme_3.1-169
#> [79] beeswarm_0.4.0 BiocSingular_1.27.1
#> [81] vipor_0.4.7 cli_3.6.5
#> [83] rsvd_1.0.5 rappdirs_0.3.4
#> [85] S4Arrays_1.11.1 viridisLite_0.4.3
#> [87] dplyr_1.2.1 gtable_0.3.6
#> [89] yulab.utils_0.2.4 sass_0.4.10
#> [91] digest_0.6.39 SparseArray_1.11.13
#> [93] ggrepel_0.9.8 decontam_1.31.0
#> [95] farver_2.1.2 htmltools_0.5.9
#> [97] cmprsk_2.2-12 lifecycle_1.0.5
#> [99] statmod_1.5.1 MASS_7.3-65
Calle, M. Luz, and others. 2023. “Coda4microbiome: Compositional Data Analysis for Microbiome Cross-Sectional and Longitudinal Studies.” BMC Bioinformatics 24: 82. https://doi.org/10.1186/s12859-023-05205-3.
Gevers, Dirk, and others. 2014. “The Treatment-Naive Microbiome in New-Onset Crohn’s Disease.” Cell Host & Microbe 15 (4): 382–92. https://doi.org/10.1016/j.chom.2014.02.005.
Gray, Bob. 2024. Cmprsk: Subdistribution Analysis of Competing Risks. https://CRAN.R-project.org/package=cmprsk.
Lahti, Leo, Tuomas Borman, and Yagmur Simsek. 2024. MiaTime: Microbiome Time Series Analysis. https://microbiome.github.io/miaTime/.
Storey, John D., Andrew J. Bass, Alan Dabney, and David Robinson. 2024. Qvalue: Q-Value Estimation for False Discovery Rate Control. https://doi.org/10.18129/B9.bioc.qvalue.
Therneau, Terry M. 2024. A Package for Survival Analysis in R. https://CRAN.R-project.org/package=survival.