In general, I recommend against interpreting the fraction of variance explained by residuals. This fraction is driven by:
If you have additional variables that explain variation in measured gene expression, you should include them in order to avoid confounding with your variable of interest. But a particular residual fraction is not ‘good’ or ‘bad’ and is not a good metric of determining whether more variables should be included.
See GitHub page for up-to-date responses to users’ questions.
## R version 4.6.0 Patched (2026-05-01 r89994)
## Platform: aarch64-apple-darwin23
## Running under: macOS Tahoe 26.3.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.39 R6_2.6.1 fastmap_1.2.0 xfun_0.57 cachem_1.1.0 knitr_1.51
## [7] htmltools_0.5.9 rmarkdown_2.31 lifecycle_1.0.5 cli_3.6.6 sass_0.4.10 jquerylib_0.1.4
## [13] compiler_4.6.0 tools_4.6.0 evaluate_1.0.5 bslib_0.11.0 yaml_2.3.12 otel_0.2.0
## [19] rlang_1.2.0 jsonlite_2.0.0