GEOfastq can be installed from Bioconductor as
follows:
The NCBI Gene Expression
Omnibus (GEO) offers a convenient interface to explore
high-throughput experimental data such as RNA-seq. GEO deposits RNA-seq
data as sra files to the Sequence Read Archive (SRA) which can be
converted to fastq files using fastq-dump. This conversion
process can be quite slow and it is usually more convenient to download
fastq files for a GEO accession generated by the European Nucleotide
Archive (ENA). GEOfastq crawls GEO to retrieve metadata and
ENA fastq urls, and then downloads them.
To get fastq data for a GEO series, we first retrieve the metadata for a GEO accession:
Next, we extract the sample accessions for this study and retrieve the GEO metadata and ENA fastq url for an example:
gsm_names <- extract_gsms(gse_text)
gsm_name <- gsm_names[182]
srp_meta <- crawl_gsms(gsm_name)
#> 1 GSMs to processNow that we have retrieved the necessary metadata, we are ready to download the fastq files for this sample:
The following package and versions were used in the production of this vignette.
#> R version 4.6.0 Patched (2026-05-01 r89994)
#> Platform: aarch64-apple-darwin23
#> Running under: macOS Tahoe 26.3.1
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: America/New_York
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] GEOfastq_1.21.0
#>
#> loaded via a namespace (and not attached):
#> [1] doParallel_1.0.17 cli_3.6.6 knitr_1.51 rlang_1.2.0
#> [5] xfun_0.57 otel_0.2.0 jsonlite_2.0.0 RCurl_1.98-1.18
#> [9] plyr_1.8.9 htmltools_0.5.9 sass_0.4.10 rmarkdown_2.31
#> [13] evaluate_1.0.5 jquerylib_0.1.4 bitops_1.0-9 fastmap_1.2.0
#> [17] yaml_2.3.12 foreach_1.5.2 lifecycle_1.0.5 compiler_4.6.0
#> [21] codetools_0.2-20 Rcpp_1.1.1-1.1 digest_0.6.39 R6_2.6.1
#> [25] parallel_4.6.0 bslib_0.11.0 tools_4.6.0 iterators_1.0.14
#> [29] cachem_1.1.0