library(sangeranalyseR)

1 Introduction

sangeranalyseR is an R package that provides fast, flexible, and reproducible workflows for assembling Sanger sequencing data into contigs. It is a free, open-source alternative to Geneious, CodonCode Aligner, and Phred-Phrap-Consed. The full reference manual is on the ReadTheDocs site; this vignette focuses on the recipes most users actually need: how to call the constructors, what each parameter does, and how to interpret the output.

The package is built around three S4 classes that form a containment hierarchy:

SangerAlignment   ← a set of contigs aligned to each other
└── SangerContig  ← one assembled contig (forward + reverse reads)
    └── SangerRead ← one ABIF or FASTA read

3 Constructor parameter reference

Every recipe above is built on three S4 constructors. The full parameter list is in ?SangerAlignment / ?SangerContig / ?SangerRead; the most-asked-about groups are summarised below.

3.1 Required parameters (REGEX path)

Parameter What it controls
inputSource "ABIF" (raw chromatograms) or "FASTA" (pre-called sequences).
processMethod "REGEX" (group reads by filename suffix) or "CSV" (explicit reads,direction,contig mapping).
ABIF_Directory Path to the directory of .ab1 files. Required for inputSource = "ABIF".
FASTA_File Path to a single FASTA file. Required for inputSource = "FASTA".
REGEX_SuffixForward A regex matched against forward-read filenames, e.g. "_F\\.ab1$". Pass NULL for reverse-only.
REGEX_SuffixReverse A regex matched against reverse-read filenames, e.g. "_R\\.ab1$". Pass NULL for forward-only.
contigName (SangerContig only) The label / prefix shared by reads in this contig.
CSV_NamesConversion Path to a CSV with three columns: reads, direction (F/R), contig. Required for processMethod = "CSV".

3.2 Trimming parameters

Parameter When used Default Notes
TrimmingMethod always "M1" "M1" (modified Mott) or "M2" (sliding window).
M1TrimmingCutoff TrimmingMethod = "M1" 0.0001 Cumulative probability cutoff. Tighter = more aggressive trim.
M2CutoffQualityScore TrimmingMethod = "M2" 20 Mean Phred threshold within the sliding window.
M2SlidingWindowSize TrimmingMethod = "M2" 10 Width of the sliding window in bp.
minReadLength always 20L Reads trimmed to less than this are dropped from the contig.
signalRatioCutoff inputSource = "ABIF" 0.33 Secondary peaks below this fraction of the primary peak are dropped.

3.3 Consensus parameters

Parameter Default Notes
consensusMethod "strict" "strict" (DECIPHER+IUPAC), "majority" (plurality vote, no IUPAC), "quality_weighted" (Phred-weighted).
qualityAware FALSE Shorthand for consensusMethod = "quality_weighted".
minFractionCall 0.5 DECIPHER minInformation for "strict" mode.
maxFractionLost 0.5 DECIPHER threshold for "strict" mode.
minOverlapBases 0L If > 0, log LOW_OVERLAP_WARN when smallest pairwise non-gap overlap < this.
minOverlapFraction 0.0 Same in fractional terms (overlap as a fraction of the shorter read).
alignSeqsParams list() Extra named args forwarded to DECIPHER::AlignSeqs (e.g. list(iterations = 1L, refinements = 1L)).

3.4 Performance / parallelism parameters

Parameter Default Notes
processorsNum 1 Legacy integer worker count. Honoured for backwards compatibility.
BPPARAM NULL Any BiocParallelParam. Auto-derived from processorsNum if NULL (SerialParam for 1, Multicore/Snow for ≥2).
lazyAA TRUE Skip eager 3-frame AA translation (Phase-6 default; ~35% wall-time saving). Use primaryAASeqS{1,2,3}() accessors.

4 Troubleshooting

Symptom Cause / fix
'qualityPhredScores' length cannot be zero ABIF has empty PCON.2 quality block (older 3500/Beckman firmware). Phase-15 fix: synthesises Phred 30 with MISSING_QUALITY_SCORES_WARN. Update to the devel branch.
'REGEX_SuffixReverse' must be character type on forward-only data Phase-15 fix: pass REGEX_SuffixReverse = NULL (or NA_character_) for forward-only datasets, plus minReadsNum = 1.
CONTIG_NUMBER_ZERO_ERROR even though each SangerContig() works individually Phase-15 fix: the CSV+ABIF aggregator no longer requires contig labels to be substrings of filenames; the reads column drives the lookup.
'x' must be an XStringSet object from writeFasta on a single-read contig Phase-15 fix: writeFastaSC detects empty alignment and writes a single-record FASTA from @contigSeq.
Consensus is full of IUPAC ambiguity codes Phase-17: try consensusMethod = "majority" or "quality_weighted". Also check pairwise overlap with minOverlapBases = 50L to detect spurious merges.
Reports render with empty AA tables Phase-8 fix: the RMD templates were reading @primaryAASeqS* slots directly under lazyAA = TRUE. Fixed in the devel branch — use primaryAASeqS1/S2/S3() accessors if you customise the templates.

5 Citation

Please cite the package via:

Kuan-Hao Chao, Kirston Barton, Sarah Palmer, Robert Lanfear (2021). sangeranalyseR: simple and interactive processing of Sanger sequencing data in R. Genome Biology and Evolution. doi:10.1093/gbe/evab028.

6 Session info

sessionInfo()
## R version 4.6.0 Patched (2026-05-01 r89994)
## Platform: aarch64-apple-darwin23
## Running under: macOS Tahoe 26.3.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] sangeranalyseR_1.23.0 sangerseqR_1.49.0     stringr_1.6.0        
##  [4] pwalign_1.9.0         DECIPHER_3.9.0        Biostrings_2.81.1    
##  [7] Seqinfo_1.3.0         XVector_0.53.0        IRanges_2.47.1       
## [10] S4Vectors_0.51.2      BiocGenerics_0.59.2   generics_0.1.4       
## [13] BiocStyle_2.41.0     
## 
## loaded via a namespace (and not attached):
##  [1] ade4_1.7-24           tidyselect_1.2.1      viridisLite_0.4.3    
##  [4] dplyr_1.2.1           farver_2.1.2          S7_0.2.2             
##  [7] fastmap_1.2.0         lazyeval_0.2.3        promises_1.5.0       
## [10] shinyjs_2.1.1         digest_0.6.39         mime_0.13            
## [13] lifecycle_1.0.5       magrittr_2.0.5        compiler_4.6.0       
## [16] rlang_1.2.0           sass_0.4.10           tools_4.6.0          
## [19] yaml_2.3.12           data.table_1.18.4     excelR_0.4.0         
## [22] knitr_1.51            htmlwidgets_1.6.4     RColorBrewer_1.1-3   
## [25] BiocParallel_1.47.0   purrr_1.2.2           shinyWidgets_0.9.1   
## [28] grid_4.6.0            xtable_1.8-8          ggplot2_4.0.3        
## [31] scales_1.4.0          MASS_7.3-65           dichromat_2.0-0.1    
## [34] cli_3.6.6             rmarkdown_2.31        crayon_1.5.3         
## [37] otel_0.2.0            httr_1.4.8            DBI_1.3.0            
## [40] ape_5.8-1             cachem_1.1.0          parallel_4.6.0       
## [43] BiocManager_1.30.27   vctrs_0.7.3           jsonlite_2.0.0       
## [46] bookdown_0.46         seqinr_4.2-44         plotly_4.12.0        
## [49] jquerylib_0.1.4       tidyr_1.3.2           ggdendro_0.2.0       
## [52] glue_1.8.1            codetools_0.2-20      DT_0.34.0            
## [55] stringi_1.8.7         gtable_0.3.6          later_1.4.8          
## [58] shinycssloaders_1.1.0 shinydashboard_0.7.3  tibble_3.3.1         
## [61] logger_0.4.2          pillar_1.11.1         htmltools_0.5.9      
## [64] R6_2.6.1              evaluate_1.0.5        shiny_1.13.0         
## [67] lattice_0.22-9        openxlsx_4.2.8.1      httpuv_1.6.17        
## [70] bslib_0.11.0          Rcpp_1.1.1-1.1        zip_2.3.3            
## [73] gridExtra_2.3         nlme_3.1-169          xfun_0.57            
## [76] pkgconfig_2.0.3