Environmental and lifestyle exposures play a central role in shaping host-associated biological systems, including the microbiome and metabolome. Integrative analysis of multi-omics data together with exposure information is therefore essential for understanding disease risk and environmental health mechanisms. However, many existing multi-omics integration tools focus on correlation or latent factor discovery without explicitly accounting for exposure variables.
ExpoRiskR is a Bioconductor package designed for exposure-aware multi-omics integration, providing standardized workflows for aligning, preprocessing, and analyzing microbiome, metabolomics, and exposure data. The package emphasizes interoperability with Bioconductor data structures, particularly the SummarizedExperiment class, enabling seamless integration with existing Bioconductor workflows. ExpoRiskR supports reproducible and interpretable analysis of exposure-adjusted cross-omics associations, making it especially suitable for exposome, environmental epidemiology, and disease onset studies.
High-throughput profiling technologies have enabled the simultaneous measurement of multiple molecular layers, such as the microbiome and metabolome, across large cohorts. In parallel, advances in exposure science have made it possible to quantify environmental and lifestyle factors that influence biological systems. Integrating these heterogeneous data types remains challenging due to differences in scale, structure, and confounding by exposures.
Most multi-omics integration approaches aim to identify shared variation or discriminative features across omics layers. While powerful, these methods often do not explicitly incorporate exposure variables into the integration framework, making it difficult to disentangle intrinsic biological relationships from exposure-driven effects.
ExpoRiskR was developed to address this gap by providing a coherent workflow to align samples across multiple omics layers and exposure metadata, preprocess heterogeneous data in a consistent manner, and enable downstream analysis of cross-omics associations in the context of measured exposures.
ExpoRiskR is implemented within the Bioconductor ecosystem to leverage its mature infrastructure for high-throughput biological data analysis. Bioconductor provides standardized data containers, rigorous software review, and strong guarantees of interoperability and reproducibility.
A key design principle of ExpoRiskR is native support for the SummarizedExperiment class. By operating directly on SummarizedExperiment objects, ExpoRiskR integrates naturally with existing Bioconductor workflows and can be combined with other Bioconductor packages without ad-hoc data transformations.
ExpoRiskR natively supports SummarizedExperiment objects through dedicated helper functions, enabling direct integration with Bioconductor workflows.
set.seed(7)
d <- generate_dummy_exporisk(
n = 12, p_micro = 5, p_metab = 6, p_expo = 3
)
aligned <- align_omics_se(
microbiome = d$microbiome,
metabolome = d$metabolome,
exposures = d$exposures,
meta = d$meta,
id_col = "sample_id",
strict = TRUE
)
aligned$se_microbiome
## class: SummarizedExperiment
## dim: 5 12
## metadata(0):
## assays(1): abundance
## rownames(5): micro_1 micro_2 micro_3 micro_4 micro_5
## rowData names(0):
## colnames(12): S1 S10 ... S8 S9
## colData names(5): sample_id outcome expo_1 expo_2 expo_3
aligned$se_metabolome
## class: SummarizedExperiment
## dim: 6 12
## metadata(0):
## assays(1): intensity
## rownames(6): metab_1 metab_2 ... metab_5 metab_6
## rowData names(0):
## colnames(12): S1 S10 ... S8 S9
## colData names(5): sample_id outcome expo_1 expo_2 expo_3
prepped <- prep_omics_se(aligned)
prepped
## $X
## micro_1 micro_2 micro_3 micro_4 micro_5
## S1 0.96280547 0.49814473 0.13225423 -0.19812021 1.985941121
## S10 0.51873745 0.88795075 0.28888014 0.46846818 -0.987800961
## S11 0.35726331 0.41016554 0.37517057 1.44914418 1.254101882
## S12 -1.15611767 -1.96343160 -0.06759319 -0.42001847 1.221623144
## S2 0.26460300 -1.56626266 -0.77608060 -0.80542213 -1.174711261
## S3 -0.80698984 -0.14138258 -0.87774006 -0.49430148 -0.565836250
## S4 -0.54817202 0.09251204 -0.79635845 -0.50057990 -0.399492091
## S5 -0.17120643 -0.86465355 -0.66862520 -1.31349826 0.085060252
## S6 1.84115243 0.17546235 -0.63470412 0.01619189 -0.092259816
## S7 0.67636537 0.31542631 -0.06096963 2.13570484 -0.314834932
## S8 -0.08532825 0.72016071 2.79432566 -0.86788420 -1.020051451
## S9 -1.85311281 1.43590796 0.29144063 0.53031557 0.008260362
##
## $Y
## metab_1 metab_2 metab_3 metab_4 metab_5 metab_6
## S1 1.5712429 -1.16289061 2.15807796 1.87140658 -1.5015621 0.15117522
## S10 1.1861832 1.47381448 -0.92195103 0.04002660 -0.6141845 1.58543227
## S11 1.1114371 -0.71964871 0.27921407 1.42698701 0.5957128 -0.04847733
## S12 0.1317138 1.12156629 0.24852391 -0.56284777 1.2029108 -0.73344698
## S2 -0.8690632 0.92947408 -0.58859908 -0.68363917 0.4450536 -1.35460560
## S3 -0.8916094 0.35332375 1.10502953 -0.90031879 1.5946824 -1.44861422
## S4 -0.1592589 -0.78102443 -1.13079326 0.40320431 -0.7554514 0.35249979
## S5 -0.0927472 0.15149007 -0.13783062 0.05774713 -0.4172125 1.35151675
## S6 -0.4272096 0.93604755 0.08503764 -1.24286251 -1.1505514 -1.13738301
## S7 -1.1768566 0.07661987 -0.14576299 -0.49809242 -0.6689004 0.52663318
## S8 0.9583355 -0.76118737 0.52630631 1.01953836 0.1330800 0.07534041
## S9 -1.3421677 -1.61758496 -1.47725244 -0.93114934 1.1364226 0.67992951
##
## $E
## expo_1 expo_2 expo_3
## S1 1.41930250 1.64714633 1.7672908
## S10 1.34830361 0.17509081 -0.1645179
## S11 0.01036126 0.73606065 -0.6534422
## S12 1.73280766 -1.78004535 -0.8559107
## S2 -1.12376188 -0.18105570 0.2109550
## S3 -0.75699093 1.28720452 1.0230380
## S4 -0.55115306 -0.04688015 0.7935528
## S5 -0.95872757 -1.31847657 -1.4576272
## S6 -0.94165220 -0.77072298 -0.4469839
## S7 0.29587279 -0.48818850 -1.2972346
## S8 -0.33557936 0.43924123 0.9750505
## S9 -0.13878282 0.30062570 0.1058294
Below we construct an exposure-adjusted microbe–metabolite network from the preprocessed matrices and visualize the resulting graph.
X <- prepped$X
Y <- prepped$Y
E <- prepped$E
net <- build_exposure_network(
X = X, Y = Y, E = E,
fdr = 0.8, # relaxed for vignette speed / non-empty illustration
max_pairs = 1500,
seed = 1
)
plot_exposure_network(net)
ExpoRiskR summarizes how strongly each exposure perturbs the estimated cross-omics associations and ranks exposures accordingly.
scores <- exposure_perturbation_score(
X = X, Y = Y, E = E,
fdr = 0.8,
max_pairs = 1500,
seed = 1
)
print(plot_exposure_ranking(scores, top_n = 15))
This example uses the simulated outcome included in the dummy metadata to illustrate simple exposure feature importance summaries.
outcome <- d$meta$outcome
names(outcome) <- d$meta$sample_id
print(plot_feature_importance(
E = E,
outcome = outcome,
top_n = 15
))
As an illustrative end-to-end example, ExpoRiskR can visualize discrimination performance for a simple risk model derived from the exposure-adjusted network.
print(plot_risk_roc(
X = X, Y = Y, E = E,
outcome = outcome,
edges = net$edges,
top_edges = 80
))
Because ExpoRiskR operates on SummarizedExperiment objects, users can seamlessly integrate it with other Bioconductor packages for downstream analysis, including differential testing, visualization, or additional statistical modeling.
ExpoRiskR provides a Bioconductor-native framework for exposure-aware multi-omics integration with a strong emphasis on interpretability and reproducibility. By supporting SummarizedExperiment objects and aligning with existing Bioconductor workflows, the package enables robust investigation of exposure-driven biological mechanisms in environmental health and disease studies.
sessionInfo()
## R version 4.6.0 alpha (2026-03-30 r89742)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ExpoRiskR_0.99.4 BiocStyle_2.39.0
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.10 generics_0.1.4
## [3] SparseArray_1.11.13 lattice_0.22-9
## [5] digest_0.6.39 magrittr_2.0.4
## [7] RColorBrewer_1.1-3 evaluate_1.0.5
## [9] grid_4.6.0 bookdown_0.46
## [11] fastmap_1.2.0 jsonlite_2.0.0
## [13] Matrix_1.7-5 tinytex_0.59
## [15] BiocManager_1.30.27 scales_1.4.0
## [17] jquerylib_0.1.4 abind_1.4-8
## [19] cli_3.6.5 rlang_1.1.7
## [21] XVector_0.51.0 Biobase_2.71.0
## [23] withr_3.0.2 cachem_1.1.0
## [25] DelayedArray_0.37.1 yaml_2.3.12
## [27] otel_0.2.0 S4Arrays_1.11.1
## [29] tools_4.6.0 dplyr_1.2.0
## [31] ggplot2_4.0.2 SummarizedExperiment_1.41.1
## [33] BiocGenerics_0.57.0 vctrs_0.7.2
## [35] R6_2.6.1 matrixStats_1.5.0
## [37] stats4_4.6.0 lifecycle_1.0.5
## [39] magick_2.9.1 Seqinfo_1.1.0
## [41] S4Vectors_0.49.0 IRanges_2.45.0
## [43] pkgconfig_2.0.3 pillar_1.11.1
## [45] bslib_0.10.0 gtable_0.3.6
## [47] glue_1.8.0 Rcpp_1.1.1
## [49] tidyselect_1.2.1 tibble_3.3.1
## [51] xfun_0.57 GenomicRanges_1.63.1
## [53] dichromat_2.0-0.1 MatrixGenerics_1.23.0
## [55] knitr_1.51 farver_2.1.2
## [57] htmltools_0.5.9 igraph_2.2.2
## [59] labeling_0.4.3 rmarkdown_2.31
## [61] compiler_4.6.0 S7_0.2.1