Contents

1 Processing sequencing Hi-C libraries with HiCool

The HiCool R/Bioconductor package provides an end-to-end interface to process and normalize Hi-C paired-end fastq reads into .(m)cool files.

  1. The heavy lifting (fastq mapping, pairs parsing and pairs filtering) is performed by the underlying lightweight hicstuff python library (https://github.com/koszullab/hicstuff).
  2. Pairs filering is done using the approach described in Cournac et al., 2012 and implemented in hicstuff.
  3. cooler (https://github.com/open2c/cooler) library is used to parse pairs into a multi-resolution, balanced .mcool file. .(m)cool is a compact, indexed HDF5 file format specifically tailored for efficiently storing HiC-based data. The .(m)cool file format was developed by Abdennur and Mirny and published in 2019.
  4. Internally, all these external dependencies are automatically installed and managed in R by a basilisk environment.

The main processing function offered in this package is HiCool(). To process .fastq reads into .pairs & .mcool files, one needs to provide:

x <- HiCool(
    r1 = '<PATH-TO-R1.fq.gz>', 
    r2 = '<PATH-TO-R2.fq.gz>', 
    restriction = '<RE1(,RE2)>', 
    binning = "<minimum resolution>", 
    genome = '<GENOME_ID>'
)

Here is a concrete example of Hi-C data processing.

library(HiCool)
hcf <- HiCool(
    r1 = HiContactsData::HiContactsData(sample = 'yeast_wt', format = 'fastq_R1'), 
    r2 = HiContactsData::HiContactsData(sample = 'yeast_wt', format = 'fastq_R2'), 
    restriction = 'DpnII,HinfI', 
    binning = 1000, 
    genome = 'R64-1-1', 
    output = './HiCool/'
)
#> HiCool :: Parsing HiCool arguments and checking files.
#> see ?HiContactsData and browseVignettes('HiContactsData') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
#> see ?HiContactsData and browseVignettes('HiContactsData') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
#> HiCool :: Fetching bowtie genome index files from AWS iGenomes S3 bucket...
#> HiCool :: Recovering bowtie2 genome index from AWS iGenomes...
#> HiCool :: Processing .fastq files to .mcool format. This might take a while.
#> HiCool :: Initializing processing of fastq files [tmp folder: /var/folders/r0/l4fjk6cj5xj0j3brt4bplpl40000gt/T//Rtmpjttliy/7ZFE8Y]...
#> HiCool :: Mapping fastq files...
#> HiCool :: Tidying up everything for you...
#> HiCool :: .fastq to .mcool processing done!
#> HiCool :: Check ./HiCool/folder to find the generated files
#> HiCool :: Generating HiCool report. This might take a while.
#> HiCool :: Report generated and available @ /private/var/folders/r0/l4fjk6cj5xj0j3brt4bplpl40000gt/T/RtmpAp013E/Rbuilda9697a1c9a69/HiCool/vignettes/HiCool/aced343e016e_7833^mapped-R64-1-1^7ZFE8Y.html
#> HiCool :: All processing successfully achieved. Congrats!
hcf
#> CoolFile object
#> .mcool file: ./HiCool//matrices/aced343e016e_7833^mapped-R64-1-1^7ZFE8Y.mcool 
#> resolution: 1000 
#> pairs file: ./HiCool//pairs/aced343e016e_7833^mapped-R64-1-1^7ZFE8Y.pairs 
#> metadata(3): log args stats
S4Vectors::metadata(hcf)
#> $log
#> [1] "./HiCool//logs/aced343e016e_7833^mapped-R64-1-1^7ZFE8Y.log"
#> 
#> $args
#> $args$r1
#> [1] "/Users/biocbuild/Library/Caches/org.R-project.R/R/ExperimentHub/aced343e016e_7833"
#> 
#> $args$r2
#> [1] "/Users/biocbuild/Library/Caches/org.R-project.R/R/ExperimentHub/aced52cff78d_7834"
#> 
#> $args$genome
#> [1] "/private/var/folders/r0/l4fjk6cj5xj0j3brt4bplpl40000gt/T/Rtmpjttliy/R64-1-1"
#> 
#> $args$binning
#> [1] "1000"
#> 
#> $args$restriction
#> [1] "DpnII,HinfI"
#> 
#> $args$iterative
#> [1] TRUE
#> 
#> $args$balancing_args
#> [1] " --min-nnz 10 --mad-max 5 "
#> 
#> $args$threads
#> [1] 1
#> 
#> $args$output
#> [1] "./HiCool/"
#> 
#> $args$exclude_chr
#> [1] "Mito|chrM|MT"
#> 
#> $args$keep_bam
#> [1] FALSE
#> 
#> $args$scratch
#> [1] "/var/folders/r0/l4fjk6cj5xj0j3brt4bplpl40000gt/T//Rtmpjttliy"
#> 
#> $args$wd
#> [1] "/private/var/folders/r0/l4fjk6cj5xj0j3brt4bplpl40000gt/T/RtmpAp013E/Rbuilda9697a1c9a69/HiCool/vignettes"
#> 
#> 
#> $stats
#> $stats$nFragments
#> [1] 1e+05
#> 
#> $stats$nPairs
#> [1] 64761
#> 
#> $stats$nDangling
#> [1] 9266
#> 
#> $stats$nSelf
#> [1] 1910
#> 
#> $stats$nDumped
#> [1] 32
#> 
#> $stats$nFiltered
#> [1] 53553
#> 
#> $stats$nDups
#> [1] 613
#> 
#> $stats$nUnique
#> [1] 52940
#> 
#> $stats$threshold_uncut
#> [1] 7
#> 
#> $stats$threshold_self
#> [1] 7

2 Optional parameters

Extra optional arguments can be passed to the hicstuff workhorse library:

3 Output files

The important files generated by HiCool are the following:

The diagnosis plots illustrate how pairs were filtered during the processing, using a strategy described in Cournac et al., BMC Genomics 2012. The event_distance chart represents the frequency of ++, +-, -+ and -- pairs in the library, as a function of the number of restriction sites between each end of the pairs, and shows the inferred filtering threshold. The event_distribution chart indicates the proportion of each type of pairs (e.g. dangling, uncut, abnormal, …) and the total number of pairs retained (3D intra + 3D inter).

Notes:

4 System dependencies

Processing Hi-C sequencing libraries into .pairs and .mcool files requires several dependencies, to (1) align reads to a reference genome, (2) manage alignment files (SAM), (3) filter pairs, (4) bin them to a specific resolution and (5)

All system dependencies are internally managed by basilisk.utils. HiCool maintains a conda environment containing:

The first time HiCool() is executed, a fresh conda environment will be created and required dependencies automatically installed. This ensures compatibility between the different system dependencies needed to process Hi-C fastq files.

5 Session info

sessionInfo()
#> R version 4.6.0 Patched (2026-05-01 r89994)
#> Platform: aarch64-apple-darwin23
#> Running under: macOS Tahoe 26.3.1
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: America/New_York
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] HiContactsData_1.15.0 ExperimentHub_3.3.0   AnnotationHub_4.3.0  
#>  [4] BiocFileCache_3.3.0   dbplyr_2.5.2          BiocGenerics_0.59.2  
#>  [7] generics_0.1.4        HiCool_1.13.0         HiCExperiment_1.13.0 
#> [10] BiocStyle_2.41.0     
#> 
#> loaded via a namespace (and not attached):
#>   [1] DBI_1.3.0                   httr2_1.2.2                
#>   [3] rlang_1.2.0                 magrittr_2.0.5             
#>   [5] otel_0.2.0                  matrixStats_1.5.0          
#>   [7] compiler_4.6.0              RSQLite_3.52.0             
#>   [9] dir.expiry_1.21.0           png_0.1-9                  
#>  [11] vctrs_0.7.3                 stringr_1.6.0              
#>  [13] pkgconfig_2.0.3             crayon_1.5.3               
#>  [15] fastmap_1.2.0               XVector_0.53.0             
#>  [17] rmdformats_1.0.4            rmarkdown_2.31             
#>  [19] sessioninfo_1.2.3           tzdb_0.5.0                 
#>  [21] strawr_0.0.92               purrr_1.2.2                
#>  [23] bit_4.6.0                   xfun_0.57                  
#>  [25] cachem_1.1.0                jsonlite_2.0.0             
#>  [27] blob_1.3.0                  rhdf5filters_1.25.0        
#>  [29] DelayedArray_0.39.2         Rhdf5lib_2.1.0             
#>  [31] BiocParallel_1.47.0         parallel_4.6.0             
#>  [33] R6_2.6.1                    bslib_0.11.0               
#>  [35] stringi_1.8.7               RColorBrewer_1.1-3         
#>  [37] reticulate_1.46.0           GenomicRanges_1.65.0       
#>  [39] jquerylib_0.1.4             Rcpp_1.1.1-1.1             
#>  [41] Seqinfo_1.3.0               bookdown_0.46              
#>  [43] SummarizedExperiment_1.43.0 knitr_1.51                 
#>  [45] IRanges_2.47.1              BiocBaseUtils_1.15.1       
#>  [47] Matrix_1.7-5                tidyselect_1.2.1           
#>  [49] dichromat_2.0-0.1           abind_1.4-8                
#>  [51] yaml_2.3.12                 codetools_0.2-20           
#>  [53] curl_7.1.0                  lattice_0.22-9             
#>  [55] tibble_3.3.1                withr_3.0.2                
#>  [57] InteractionSet_1.41.0       Biobase_2.73.1             
#>  [59] basilisk.utils_1.25.0       KEGGREST_1.53.0            
#>  [61] S7_0.2.2                    evaluate_1.0.5             
#>  [63] Biostrings_2.81.1           pillar_1.11.1              
#>  [65] BiocManager_1.30.27         filelock_1.0.3             
#>  [67] MatrixGenerics_1.25.0       stats4_4.6.0               
#>  [69] plotly_4.12.0               vroom_1.7.1                
#>  [71] BiocVersion_3.24.0          S4Vectors_0.51.2           
#>  [73] ggplot2_4.0.3               scales_1.4.0               
#>  [75] glue_1.8.1                  lazyeval_0.2.3             
#>  [77] tools_4.6.0                 BiocIO_1.23.3              
#>  [79] data.table_1.18.4           rhdf5_2.57.0               
#>  [81] grid_4.6.0                  tidyr_1.3.2                
#>  [83] crosstalk_1.2.2             AnnotationDbi_1.75.0       
#>  [85] basilisk_1.25.0             cli_3.6.6                  
#>  [87] rappdirs_0.3.4              S4Arrays_1.13.0            
#>  [89] viridisLite_0.4.3           dplyr_1.2.1                
#>  [91] gtable_0.3.6                sass_0.4.10                
#>  [93] digest_0.6.39               SparseArray_1.13.2         
#>  [95] htmlwidgets_1.6.4           farver_2.1.2               
#>  [97] memoise_2.0.1               htmltools_0.5.9            
#>  [99] lifecycle_1.0.5             httr_1.4.8                 
#> [101] bit64_4.8.0