TileDBArray 1.18.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.33345751 0.01979869 -0.98004485 . -0.1617091 -0.8072959
## [2,] 1.16934280 -0.73548607 0.63727956 . 0.9418340 0.2111292
## [3,] 1.13736675 -0.58285918 0.88876682 . 0.1021233 -0.1899638
## [4,] 0.54990146 1.56330809 -0.73857339 . 0.9321317 -0.9747606
## [5,] 0.88315638 1.26195726 0.28166922 . 0.3753228 0.7533079
## ... . . . . . .
## [96,] 0.8824238 1.8221026 0.5008869 . -0.4643112 0.1775068
## [97,] 0.4055284 0.6823308 0.1487348 . 0.8506922 -0.2448860
## [98,] -1.1124455 0.5965398 0.2352240 . 0.5360998 -1.3407869
## [99,] -1.4059162 -1.0057115 0.4299390 . -0.2648462 0.4179076
## [100,] 1.0007959 0.3128641 0.3855044 . -0.5958133 0.1909563
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.33345751 0.01979869 -0.98004485 . -0.1617091 -0.8072959
## [2,] 1.16934280 -0.73548607 0.63727956 . 0.9418340 0.2111292
## [3,] 1.13736675 -0.58285918 0.88876682 . 0.1021233 -0.1899638
## [4,] 0.54990146 1.56330809 -0.73857339 . 0.9321317 -0.9747606
## [5,] 0.88315638 1.26195726 0.28166922 . 0.3753228 0.7533079
## ... . . . . . .
## [96,] 0.8824238 1.8221026 0.5008869 . -0.4643112 0.1775068
## [97,] 0.4055284 0.6823308 0.1487348 . 0.8506922 -0.2448860
## [98,] -1.1124455 0.5965398 0.2352240 . 0.5360998 -1.3407869
## [99,] -1.4059162 -1.0057115 0.4299390 . -0.2648462 0.4179076
## [100,] 1.0007959 0.3128641 0.3855044 . -0.5958133 0.1909563
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0.00 0.00 0.00 . 0 0
## [2,] 0.16 0.00 0.00 . 0 0
## [3,] 0.00 0.00 0.00 . 0 0
## [4,] 0.00 0.00 0.00 . 0 0
## [5,] 0.00 0.00 0.00 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0.00 0.00
## [997,] 0 0 0 . 0.00 0.00
## [998,] 0 0 0 . 0.00 0.00
## [999,] 0 0 0 . -0.27 0.00
## [1000,] 0 0 0 . 0.00 0.00
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] TRUE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 1.33345751 0.01979869 -0.98004485 . -0.1617091 -0.8072959
## GENE_2 1.16934280 -0.73548607 0.63727956 . 0.9418340 0.2111292
## GENE_3 1.13736675 -0.58285918 0.88876682 . 0.1021233 -0.1899638
## GENE_4 0.54990146 1.56330809 -0.73857339 . 0.9321317 -0.9747606
## GENE_5 0.88315638 1.26195726 0.28166922 . 0.3753228 0.7533079
## ... . . . . . .
## GENE_96 0.8824238 1.8221026 0.5008869 . -0.4643112 0.1775068
## GENE_97 0.4055284 0.6823308 0.1487348 . 0.8506922 -0.2448860
## GENE_98 -1.1124455 0.5965398 0.2352240 . 0.5360998 -1.3407869
## GENE_99 -1.4059162 -1.0057115 0.4299390 . -0.2648462 0.4179076
## GENE_100 1.0007959 0.3128641 0.3855044 . -0.5958133 0.1909563
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## 1.3334575 1.1693428 1.1373668 0.5499015 0.8831564 -0.4436069
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 1.33345751 0.01979869 -0.98004485 -0.42413030 -1.07246161
## GENE_2 1.16934280 -0.73548607 0.63727956 0.10957010 1.72199149
## GENE_3 1.13736675 -0.58285918 0.88876682 -0.41575085 0.74791710
## GENE_4 0.54990146 1.56330809 -0.73857339 -1.76332812 1.19784126
## GENE_5 0.88315638 1.26195726 0.28166922 0.34288648 0.15413474
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 2.66691501 0.03959737 -1.96008970 . -0.3234182 -1.6145918
## GENE_2 2.33868560 -1.47097214 1.27455912 . 1.8836680 0.4222583
## GENE_3 2.27473351 -1.16571836 1.77753363 . 0.2042466 -0.3799275
## GENE_4 1.09980293 3.12661619 -1.47714678 . 1.8642634 -1.9495211
## GENE_5 1.76631277 2.52391452 0.56333845 . 0.7506456 1.5066159
## ... . . . . . .
## GENE_96 1.7648476 3.6442053 1.0017739 . -0.9286224 0.3550137
## GENE_97 0.8110567 1.3646615 0.2974697 . 1.7013844 -0.4897719
## GENE_98 -2.2248911 1.1930796 0.4704480 . 1.0721997 -2.6815738
## GENE_99 -2.8118324 -2.0114230 0.8598780 . -0.5296923 0.8358152
## GENE_100 2.0015918 0.6257282 0.7710089 . -1.1916266 0.3819126
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7
## 6.6566121 -0.2656282 -7.2893219 0.4340305 7.4712686 26.0867363 21.0125206
## SAMP_8 SAMP_9 SAMP_10
## 5.2392382 4.7959130 -4.0724442
out %*% runif(ncol(out))
## [,1]
## GENE_1 -0.16198066
## GENE_2 1.64607956
## GENE_3 1.30304377
## GENE_4 0.44518923
## GENE_5 2.69263132
## GENE_6 0.02749748
## GENE_7 1.44106300
## GENE_8 -1.39593788
## GENE_9 0.08311081
## GENE_10 0.92257045
## GENE_11 -0.78839999
## GENE_12 -0.86803708
## GENE_13 0.46985776
## GENE_14 -0.05028901
## GENE_15 0.51383679
## GENE_16 1.36766653
## GENE_17 2.47518845
## GENE_18 2.97855271
## GENE_19 2.13215495
## GENE_20 -4.25258007
## GENE_21 1.31677690
## GENE_22 -0.25050927
## GENE_23 -0.45351248
## GENE_24 -1.55238181
## GENE_25 -0.81096144
## GENE_26 1.86246703
## GENE_27 -0.39838621
## GENE_28 -2.63400528
## GENE_29 -1.01058021
## GENE_30 4.74078405
## GENE_31 1.49424884
## GENE_32 -0.14313620
## GENE_33 2.44753783
## GENE_34 -0.39412769
## GENE_35 1.10037199
## GENE_36 -0.67672362
## GENE_37 -1.31904075
## GENE_38 1.35659777
## GENE_39 -2.77333984
## GENE_40 -0.63189978
## GENE_41 1.73968475
## GENE_42 1.73457706
## GENE_43 3.12840254
## GENE_44 0.88118268
## GENE_45 1.00535270
## GENE_46 2.48729457
## GENE_47 2.17593442
## GENE_48 -3.16250605
## GENE_49 1.46167179
## GENE_50 2.80513345
## GENE_51 2.05579518
## GENE_52 1.13882615
## GENE_53 3.56975195
## GENE_54 3.55206697
## GENE_55 -2.07794172
## GENE_56 -2.19287147
## GENE_57 -1.25468506
## GENE_58 0.36897145
## GENE_59 1.34198561
## GENE_60 -1.26393551
## GENE_61 -2.39657941
## GENE_62 -0.64268921
## GENE_63 3.70425218
## GENE_64 -1.68537185
## GENE_65 -0.41741428
## GENE_66 1.12003137
## GENE_67 1.03714990
## GENE_68 -0.67239495
## GENE_69 -0.66761662
## GENE_70 -0.30440320
## GENE_71 -1.86817585
## GENE_72 0.04924704
## GENE_73 -1.06637565
## GENE_74 1.51586688
## GENE_75 -0.01749785
## GENE_76 1.58926370
## GENE_77 -1.62518878
## GENE_78 -1.40110724
## GENE_79 -1.00580758
## GENE_80 -0.13269643
## GENE_81 -1.20182440
## GENE_82 -1.10538925
## GENE_83 2.13594989
## GENE_84 -1.64363906
## GENE_85 2.27753116
## GENE_86 -0.25684702
## GENE_87 -0.07904561
## GENE_88 0.92552474
## GENE_89 0.59094619
## GENE_90 0.12033589
## GENE_91 2.11892161
## GENE_92 3.31312093
## GENE_93 -3.04415874
## GENE_94 -0.05425111
## GENE_95 0.75139053
## GENE_96 2.28340025
## GENE_97 0.45124535
## GENE_98 -1.79767305
## GENE_99 0.11815132
## GENE_100 1.48711198
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.1092763 -0.7077284 0.4161023 . 0.59305421 -2.22780971
## [2,] 0.5358421 -2.4648280 -1.1752983 . -1.25897616 0.91819777
## [3,] -0.9789670 0.7757083 -0.2713795 . 0.05450334 0.55718128
## [4,] -0.5900174 1.1207118 -0.9881171 . 0.18314763 -0.03210485
## [5,] -0.6819769 -0.6274079 -1.6778280 . -2.01718393 -1.25315264
## ... . . . . . .
## [96,] -0.294621789 -0.312473326 -2.216667112 . -1.62822276 -0.06777314
## [97,] -1.613081383 1.114818880 0.556402603 . 1.00656224 0.35481390
## [98,] -0.507683400 0.488396183 -0.925076289 . 0.39501461 0.32625378
## [99,] -0.894304072 0.521825008 1.298977515 . 1.06695475 0.38587367
## [100,] -0.005133199 -1.393789354 -1.147119517 . 0.44528568 1.54176133
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.1092763 -0.7077284 0.4161023 . 0.59305421 -2.22780971
## [2,] 0.5358421 -2.4648280 -1.1752983 . -1.25897616 0.91819777
## [3,] -0.9789670 0.7757083 -0.2713795 . 0.05450334 0.55718128
## [4,] -0.5900174 1.1207118 -0.9881171 . 0.18314763 -0.03210485
## [5,] -0.6819769 -0.6274079 -1.6778280 . -2.01718393 -1.25315264
## ... . . . . . .
## [96,] -0.294621789 -0.312473326 -2.216667112 . -1.62822276 -0.06777314
## [97,] -1.613081383 1.114818880 0.556402603 . 1.00656224 0.35481390
## [98,] -0.507683400 0.488396183 -0.925076289 . 0.39501461 0.32625378
## [99,] -0.894304072 0.521825008 1.298977515 . 1.06695475 0.38587367
## [100,] -0.005133199 -1.393789354 -1.147119517 . 0.44528568 1.54176133
sessionInfo()
## R version 4.5.0 RC (2025-04-04 r88126)
## Platform: x86_64-apple-darwin20
## Running under: macOS Monterey 12.7.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.21 TileDBArray_1.18.0 DelayedArray_0.34.0
## [4] SparseArray_1.8.0 S4Arrays_1.8.0 IRanges_2.42.0
## [7] abind_1.4-8 S4Vectors_0.46.0 MatrixGenerics_1.20.0
## [10] matrixStats_1.5.0 BiocGenerics_0.54.0 generics_0.1.3
## [13] Matrix_1.7-3 BiocStyle_2.36.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.6.0 jsonlite_2.0.0 compiler_4.5.0
## [4] BiocManager_1.30.25 crayon_1.5.3 Rcpp_1.0.14
## [7] nanoarrow_0.6.0 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-7 R6_2.6.1
## [13] RcppCCTZ_0.2.13 XVector_0.48.0 tiledb_0.30.2
## [16] knitr_1.50 bookdown_0.43 bslib_0.9.0
## [19] rlang_1.1.6 cachem_1.1.0 xfun_0.52
## [22] sass_0.4.10 bit64_4.6.0-1 cli_3.6.4
## [25] spdl_0.0.5 digest_0.6.37 grid_4.5.0
## [28] lifecycle_1.0.4 data.table_1.17.0 evaluate_1.0.3
## [31] nanotime_0.3.12 zoo_1.8-14 rmarkdown_2.29
## [34] tools_4.5.0 htmltools_0.5.8.1