1 Introduction

This vignette provides a visual overview of the main dnaEPICO functions roles. It shows how the major functions organise their inputs, internal processing stages, and outputs.

The diagrams are intended as orientation maps. They are useful when deciding which function to run and checking how outputs move between stages. For detailed arguments and executable examples, use the local-use and pipeline-use vignettes.

Read each overview as follows:

The arrows show the expected flow from inputs to derived objects and files.
The grouped regions show the main analysis tasks performed inside a function.
The final nodes show the outputs that are returned to R or written to disk.
The same function can be used interactively or as part of a file-based pipeline, depending on the value of saveOutputs.

2 Required knowledge

dnaEPICO is built on core Bioconductor infrastructure for high-dimensional genomic data, with a focus on Illumina DNA methylation arrays. To read this vignette, we assume that you are already familiar with the general DNA methylation pipeline. If not, we recommend first reading this tutorial, which provides a practical introduction to the main concepts and analysis steps.

Preprocessing and quality control are performed using established Bioconductor tools, including minfi, ENmix, and wateRmelon. Downstream statistical modelling relies on base R and CRAN frameworks, including generalised linear models and linear mixed-effects models. Users are expected to have basic familiarity with R, Bioconductor pipelines, command-line execution, and Illumina IDAT file structures.

If you are asking yourself the question “Where do I start using Bioconductor?”, you might be interested in this blog post.

2.1 Probe-exclusion reference files

The lists of excluded probes depend on the Illumina methylation-array platform.

For Illumina HumanMethylationEPIC v2.0, use the cross-reactive probe-exclusion file from Peters et al. (2024).
For Illumina MethylationEPIC, also known as the 850k array, use the probe-exclusion resources from Pidsley et al. (2016). The supplementary files commonly used together are 13059_2016_1066_MOESM1_ESM.csv, 13059_2016_1066_MOESM4_ESM.csv, 13059_2016_1066_MOESM5_ESM.csv, and 13059_2016_1066_MOESM6_ESM.csv.
For Illumina HumanMethylation450k, use the cross-reactive and polymorphic probe resources from Chen et al. (2013).
Multiple probe-exclusion files can be supplied as a semicolon-separated value in probeExclusionPath. dnaEPICO reads probe IDs from each file, uses probeExclusionIdColumn when supplied, or otherwise auto-detects common probe-ID columns such as ProbeID, TargetID, IlmnID, and Name. The unique union of all probe IDs is then used to filter the normalised object.
For EPICv2, setting useEpicV2Manifest = TRUE also retrieves the expanded Peters et al. manifest from AnnotationHub resource AH116484. Probes flagged in selected manifest columns are added to the same exclusion set. By default, probes flagged by CH_WGBS_evidence, CH_BLAT, or MissingPos are removed, while MismatchPos is retained unless explicitly enabled.

3 Functions

The sections below describe the role of each main function in the package dnaEPICO. They are ordered according to the analysis path: preprocessing, surrogate-variable estimation, phenotype preparation, cross-sectional modelling, longitudinal modelling, and report generation.

3.1 preprocessingMinfiEwasWater()

preprocessingMinfiEwasWater() reads the phenotype table and IDAT files, builds the methylation objects, performs quality control and normalisation, filters probes, and estimates cell composition.

Its role in the package is to create analysis-ready methylation data:

Inputs: phenotype metadata, IDAT files, array annotation, and filtering settings.
Processing tasks: import, quality control, normalisation, probe filtering, metric extraction, and cell-composition estimation.
Outputs: filtered phenotype data, RGSet, beta values, M-values, copy-number values, quality-control figures, and phenoLC.

100%

preprocessingMinfiEwasWater() overview

3.2 svaEnmix()

svaEnmix() estimates surrogate variables from control-probe information and adds them to the phenotype table. This step helps represent technical variation that may otherwise influence downstream association models.

Its role is to prepare covariates for batch and technical adjustment:

Inputs: the phenotype table from preprocessing and the saved RGSet.
Processing tasks: control-probe extraction, surrogate-variable estimation, association checks with array-position metadata, and phenotype merging.
Outputs: surrogate-variable matrices, an updated phenotype table, and diagnostic summaries.

100%

svaEnmix() overview

3.3 preprocessingPheno()

preprocessingPheno() aligns phenotype information with methylation metrics. It prepares timepoint-specific data, combines longitudinal records, and creates export-ready files for external methylation-age tools.

Its role is to organise samples and methylation matrices for modelling:

Inputs: phenotype data, beta values, M-values, copy-number values, and timepoint settings.
Processing tasks: sample alignment, timepoint splitting, longitudinal merging, and Clock Foundation export preparation.
Outputs: timepoint-specific phenotype-methylation tables, combined longitudinal tables, and Clock Foundation input files.

100%

preprocessingPheno() overview

3.4 methylationGLM()

methylationGLM() fits cross-sectional methylation association models. It is designed for analyses where one phenotype is tested against CpG-level methylation while adjusting for selected covariates.

Its role is to run single-timepoint association testing:

Inputs: a phenotype-methylation table, phenotype variables, covariates, optional PRS mappings, and annotation settings.
Processing tasks: model preparation, CpG-GLM model fitting, diagnostic plotting, result summarisation, multiple-testing adjustment, and annotation.
Outputs: model objects, summary tables, diagnostic figures, significant-CpG exports, and annotated workbooks.

100%

methylationGLM() overview

3.5 methylationLME()

methylationLME() fits longitudinal mixed-effects models. It supports repeated-measures designs by including participant-level random effects and timepoint-related terms.

Its role is to model methylation change across repeated observations:

Inputs: combined longitudinal phenotype-methylation data, participant identifiers, timepoint variables, phenotypes, covariates, and annotation settings.
Processing tasks: longitudinal model preparation, CpG-LME model fitting, interaction testing, diagnostic plotting, summarisation, and annotation.
Outputs: mixed-model objects, interaction summaries, diagnostic figures, significant-interaction exports, and annotated longitudinal workbooks.

100%

methylationLME() overview

3.6 dnamReport()

dnamReport() assembles the main tables, figures, logs, and model summaries into a report website. It can be run after preprocessing and modelling outputs have been written to disk.

Its role is to make the package outputs easier to inspect and share:

Inputs: phenotype tables, QC figures, ENmix figures, SVA figures, metric figures, model annotation tables, detection P-value data, and logs.
Processing tasks: input validation, report file preparation, Quarto rendering, and post-processing.
Outputs: a rendered report site with organised tabs for data, preprocessing, modelling, logs, and supporting diagnostics.

100%

dnamReport() overview

4 Summary

The preprocessing function create quality-controlled methylation data and analysis-ready phenotype tables. The modelling functions fit cross-sectional and longitudinal association models. The report function gathers the resulting tables, figures, and logs into a browsable output.

In practice, the main functions are:

run preprocessingMinfiEwasWater() to prepare methylation objects and QC outputs,
run svaEnmix() when control-probe surrogate variables are needed,
run preprocessingPheno() to prepare modelling tables,
run methylationGLM() or methylationLME() for association testing, and
run dnamReport() to review the completed outputs.

5 Basics

Date the vignette was generated.

#> [1] "2026-07-19 17:33:13 EDT"

Wallclock time spent generating the vignette.

#> Time difference of 0.247 secs

R session information.

#> R version 4.6.1 (2026-06-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.24-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
#> [8] methods   base     
#> 
#> other attached packages:
#>  [1] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.1
#>  [2] IlluminaHumanMethylation450kmanifest_0.4.0        
#>  [3] minfi_1.59.1                                      
#>  [4] bumphunter_1.55.1                                 
#>  [5] locfit_1.5-9.12                                   
#>  [6] iterators_1.0.14                                  
#>  [7] foreach_1.5.2                                     
#>  [8] Biostrings_2.81.5                                 
#>  [9] XVector_0.53.0                                    
#> [10] SummarizedExperiment_1.43.0                       
#> [11] Biobase_2.73.1                                    
#> [12] MatrixGenerics_1.25.0                             
#> [13] matrixStats_1.5.0                                 
#> [14] GenomicRanges_1.65.1                              
#> [15] Seqinfo_1.3.0                                     
#> [16] IRanges_2.47.2                                    
#> [17] S4Vectors_0.51.5                                  
#> [18] BiocGenerics_0.59.10                              
#> [19] generics_0.1.4                                    
#> [20] dnaEPICO_0.99.34                                  
#> [21] BiocStyle_2.41.0                                  
#> 
#> loaded via a namespace (and not attached):
#>   [1] RColorBrewer_1.1-3        jsonlite_2.0.0           
#>   [3] magrittr_2.0.5            GenomicFeatures_1.65.0   
#>   [5] rmarkdown_2.31            BiocIO_1.23.3            
#>   [7] vctrs_0.7.3               multtest_2.69.0          
#>   [9] memoise_2.0.1             Rsamtools_2.29.0         
#>  [11] DelayedMatrixStats_1.35.0 RCurl_1.98-1.19          
#>  [13] ENmix_1.49.3              askpass_1.2.1            
#>  [15] htmltools_0.5.9           S4Arrays_1.13.0          
#>  [17] BiocBaseUtils_1.15.1      AnnotationHub_4.3.2      
#>  [19] dynamicTreeCut_1.63-1     curl_7.1.0               
#>  [21] Rhdf5lib_2.1.0            RPMM_1.25                
#>  [23] SparseArray_1.13.2        rhdf5_2.57.1             
#>  [25] sass_0.4.10               KernSmooth_2.23-26       
#>  [27] nor1mix_1.3-3             bslib_0.11.0             
#>  [29] httr2_1.3.0               plyr_1.8.9               
#>  [31] impute_1.87.0             cachem_1.1.0             
#>  [33] GenomicAlignments_1.49.1  lifecycle_1.0.5          
#>  [35] pkgconfig_2.0.3           Matrix_1.7-5             
#>  [37] R6_2.6.1                  fastmap_1.2.0            
#>  [39] digest_0.6.39             siggenes_1.87.0          
#>  [41] reshape_0.8.10            minfiData_0.59.0         
#>  [43] AnnotationDbi_1.75.0      irlba_2.3.7              
#>  [45] ExperimentHub_3.3.1       geneplotter_1.91.0       
#>  [47] RSQLite_3.53.3            base64_2.0.2             
#>  [49] filelock_1.0.3            httr_1.4.8               
#>  [51] abind_1.4-8               compiler_4.6.1           
#>  [53] beanplot_1.3.1            rngtools_1.5.2           
#>  [55] bit64_4.8.2               doParallel_1.0.17        
#>  [57] BiocParallel_1.47.0       DBI_1.3.0                
#>  [59] gplots_3.3.0              HDF5Array_1.41.0         
#>  [61] MASS_7.3-66               openssl_2.4.2            
#>  [63] rappdirs_0.3.4            DelayedArray_0.39.3      
#>  [65] rjson_0.2.23              caTools_1.18.3           
#>  [67] gtools_3.9.5              tools_4.6.1              
#>  [69] otel_0.2.0                rentrez_1.2.4            
#>  [71] glue_1.8.1                quadprog_1.5-8           
#>  [73] h5mread_1.5.0             restfulr_0.0.17          
#>  [75] nlme_3.1-170              rhdf5filters_1.25.0      
#>  [77] grid_4.6.1                cluster_2.1.8.2          
#>  [79] tzdb_0.5.0                preprocessCore_1.75.0    
#>  [81] tidyr_1.3.2               data.table_1.18.4        
#>  [83] hms_1.1.4                 xml2_1.6.0               
#>  [85] BiocVersion_3.24.0        pillar_1.11.1            
#>  [87] limma_3.69.2              genefilter_1.95.0        
#>  [89] splines_4.6.1             dplyr_1.2.1              
#>  [91] BiocFileCache_3.3.0       lattice_0.22-9           
#>  [93] survival_3.8-9            rtracklayer_1.73.0       
#>  [95] bit_4.6.0                 GEOquery_2.81.28         
#>  [97] annotate_1.91.0           tidyselect_1.2.1         
#>  [99] knitr_1.51                bookdown_0.47            
#> [101] xfun_0.60                 scrime_1.3.7             
#> [103] statmod_1.5.2             yaml_2.3.12              
#> [105] evaluate_1.0.5            codetools_0.2-20         
#> [107] cigarillo_1.3.1           tibble_3.3.1             
#> [109] BiocManager_1.30.27       cli_3.6.6                
#> [111] xtable_1.8-8              jquerylib_0.1.4          
#> [113] Rcpp_1.1.2                dbplyr_2.6.0             
#> [115] png_0.1-9                 XML_3.99-0.23            
#> [117] readr_2.2.0               blob_1.3.0               
#> [119] mclust_6.1.3              doRNG_1.8.6.3            
#> [121] sparseMatrixStats_1.25.0  bitops_1.0-9             
#> [123] illuminaio_0.55.0         purrr_1.2.2              
#> [125] crayon_1.5.3              rlang_1.3.0              
#> [127] KEGGREST_1.53.5

5.1 Asking for help

As package developers, we try to explain clearly how to use our packages and in which order to use the functions. But R and Bioconductor have a steep learning curve, so it is critical to learn where to ask for help. We would like to highlight the Bioconductor support site as the main resource for getting help. Please remember to use the dnaEPICO tag and check the older posts. If you want to receive help, please provide a small reproducible example and your session information so the source of the problem can be tracked efficiently.

dnaEPICO Overview

2026-07-19

Package

1 Introduction

2 Required knowledge

2.1 Probe-exclusion reference files

3 Functions

3.1 preprocessingMinfiEwasWater()

3.2 svaEnmix()

3.3 preprocessingPheno()

3.4 methylationGLM()

3.5 methylationLME()

3.6 dnamReport()

4 Summary

5 Basics

5.1 Asking for help