Random DAGs

Zuguang Gu ( z.gu@dkfz.de )

2023-11-28

simona provides functions for generating random DAGs. A random tree is first generated, later more links can be randomly added to form a more general DAG.

Random trees

dag_random_tree() generates a random tree. By default it generates a binary tree where all leaf terms have depth = 9.

library(simona)
set.seed(123)
tree1 = dag_random_tree()
tree1
## An ontology_DAG object:
##   Source: dag_random_tree 
##   1023 terms / 1022 relations / a tree
##   Root: 1 
##   Terms: 1, 10, 100, 1000, ...
##   Max depth: 9 
##   Aspect ratio: 56.89:1
dag_circular_viz(tree1)

Strictly speaking, tree1 is not random. The tree is growing from the root. In dag_random_tree(), there are several arguments that can be used for generating random trees.

The tree growing stops when the number of total terms exceeds max.

So the default call of dag_random_tree() is identical to:

dag_random_tree(n_children = 2, p_stop = 0, max = 2^10 - 1)

We can these arguments to some other values, such as:

tree2 = dag_random_tree(n_children = c(2, 6), p_stop = 0.5, max = 2000)
tree2
## An ontology_DAG object:
##   Source: dag_random_tree 
##   1999 terms / 1998 relations / a tree
##   Root: 1 
##   Terms: 1, 10, 100, 1000, ...
##   Max depth: 7 
##   Aspect ratio: 105.71:1
dag_circular_viz(tree2)

Random DAGs

A more general random DAG is generated based on the random tree. Taking tree1 which is already generated, the function dag_add_random_children() adds more random children to terms in tree1.

dag1 = dag_add_random_children(tree1)
dag1
## An ontology_DAG object:
##   Source: dag_add_random_children 
##   1023 terms / 1115 relations
##   Root: 1 
##   Terms: 1, 10, 100, 1000, ...
##   Max depth: 9 
##   Avg number of parents: 1.09
##   Avg number of children: 1.03
##   Aspect ratio: 56.89:1 (based on the longest distance from root)
##                 52.78:1 (based on the shortest distance from root)
dag_circular_viz(dag1)

There are three arguments that controls new child terms. We first introduce two of them.

Let’s try to generate a more dense DAG:

dag2 = dag_add_random_children(tree1, p_add = 0.6, new_children = c(2, 8))
dag2
## An ontology_DAG object:
##   Source: dag_add_random_children 
##   1023 terms / 2550 relations
##   Root: 1 
##   Terms: 1, 10, 100, 1000, ...
##   Max depth: 9 
##   Avg number of parents: 2.50
##   Avg number of children: 1.59
##   Aspect ratio: 56.89:1 (based on the longest distance from root)
##                 32.22:1 (based on the shortest distance from root)
dag_circular_viz(dag2)

By default, once a term t is going to add more child terms, it only selects new child terms from the terms that are:

  1. lower than t, i.e. with depths less than t’s depth in the DAG.
  2. not the child terms that t already has.

Then in this subset of candidate child terms, new child terms is randomly picked according to the numbers set in new_children.

The way to randomly pick new child terms can be implemented as a self-defined function. This function accepts two arguments, the dag object and an integer index of “current term”. In the following example, we implemented a function which only pick new child terms from term t’s offspring terms.

add_new_children_from_offspring = function(dag, i, new_children = c(1, 8)) {

    l = rep(FALSE, dag_n_terms(dag))
    offspring = dag_offspring(dag, i, in_labels = FALSE)
    if(length(offspring)) {
        l[offspring] = TRUE

        l[dag_children(dag, i, in_labels = FALSE)] = FALSE
    }

    candidates = which(l)
    n_candidates = length(candidates)
    if(n_candidates) {
        if(n_candidates < new_children[1]) {
            integer(0)
        } else {
            sample(candidates, min(n_candidates, sample(seq(new_children[1], new_children[2]), 1)))
        }
    } else {
        integer(0)
    }  
}

dag3 = dag_add_random_children(tree1, p_add = 0.6,
    add_random_children_fun = add_new_children_from_offspring)
dag3
## An ontology_DAG object:
##   Source: dag_add_random_children 
##   1023 terms / 1583 relations
##   Root: 1 
##   Terms: 1, 10, 100, 1000, ...
##   Max depth: 9 
##   Avg number of parents: 1.55
##   Avg number of children: 1.25
##   Aspect ratio: 56.89:1 (based on the longest distance from root)
##                 32.22:1 (based on the shortest distance from root)
dag_circular_viz(dag3)

Session info

sessionInfo()
## R version 4.3.2 Patched (2023-11-13 r85521)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] org.Hs.eg.db_3.18.0  AnnotationDbi_1.64.1 IRanges_2.36.0      
## [4] S4Vectors_0.40.2     Biobase_2.62.0       BiocGenerics_0.48.1 
## [7] igraph_1.5.1         simona_1.0.2         knitr_1.45          
## 
## loaded via a namespace (and not attached):
##  [1] KEGGREST_1.42.0         circlize_0.4.15         shape_1.4.6            
##  [4] rjson_0.2.21            xfun_0.41               bslib_0.6.0            
##  [7] visNetwork_2.1.2        htmlwidgets_1.6.3       GlobalOptions_0.1.2    
## [10] vctrs_0.6.4             tools_4.3.2             bitops_1.0-7           
## [13] curl_5.1.0              parallel_4.3.2          Polychrome_1.5.1       
## [16] RSQLite_2.3.3           highr_0.10              cluster_2.1.5          
## [19] blob_1.2.4              pkgconfig_2.0.3         RColorBrewer_1.1-3     
## [22] scatterplot3d_0.3-44    lifecycle_1.0.4         GenomeInfoDbData_1.2.11
## [25] compiler_4.3.2          textshaping_0.3.7       Biostrings_2.70.1      
## [28] codetools_0.2-19        ComplexHeatmap_2.18.0   clue_0.3-65            
## [31] GenomeInfoDb_1.38.1     htmltools_0.5.7         sass_0.4.7             
## [34] RCurl_1.98-1.13         yaml_2.3.7              crayon_1.5.2           
## [37] jquerylib_0.1.4         GO.db_3.18.0            ellipsis_0.3.2         
## [40] cachem_1.0.8            iterators_1.0.14        foreach_1.5.2          
## [43] digest_0.6.33           fastmap_1.1.1           grid_4.3.2             
## [46] colorspace_2.1-0        cli_3.6.1               DiagrammeR_1.0.10      
## [49] magrittr_2.0.3          bit64_4.0.5             rmarkdown_2.25         
## [52] XVector_0.42.0          httr_1.4.7              matrixStats_1.1.0      
## [55] bit_4.0.5               ragg_1.2.6              png_0.1-8              
## [58] GetoptLong_1.0.5        memoise_2.0.1           evaluate_0.23          
## [61] doParallel_1.0.17       rlang_1.1.2             Rcpp_1.0.11            
## [64] glue_1.6.2              DBI_1.1.3               xml2_1.3.5             
## [67] rstudioapi_0.15.0       jsonlite_1.8.7          R6_2.5.1               
## [70] systemfonts_1.0.5       zlibbioc_1.48.0