DelayedTensor 1.12.0
Authors: Koki Tsuyuzaki [aut, cre]
Last modified: 2024-10-29 16:15:12.703724
Compiled: Tue Oct 29 20:20:56 2024
suppressPackageStartupMessages(library("DelayedTensor"))
suppressPackageStartupMessages(library("DelayedArray"))
suppressPackageStartupMessages(library("HDF5Array"))
suppressPackageStartupMessages(library("DelayedRandomArray"))
darr1 <- RandomUnifArray(c(2,3,4))
darr2 <- RandomUnifArray(c(2,3,4))
There are several settings in DelayedTensor.
First, the sparsity of the intermediate DelayedArray objects
calculated inside DelayedTensor is set by setSparse
.
Note that the sparse mode is experimental.
Whether it contributes to higher speed and lower memory is quite dependent on the sparsity of the DelayedArray, and the current implementation does not recognize the block size, which may cause out-of-memory errors, when the data is extremely huge.
Here, we specify as.sparse
as FALSE
(this is also the default value for now).
DelayedTensor::setSparse(as.sparse=FALSE)
Next, the verbose message is suppressed by setVerbose
.
This is useful when we want to monitor the calculation process.
Here we specify as.verbose
as FALSE
(this is also the default value for now).
DelayedTensor::setVerbose(as.verbose=FALSE)
The block size of block processing is specified by setAutoBlockSize
.
When the sparse mode is off, all the functions of DelayedTensor
are performed as block processing,
in which each block vector/matrix/tensor is expanded to memory space
from on-disk file incrementally so as not to exceed the specified size.
Here, we specify the block size as 1E+8
.
setAutoBlockSize(size=1E+8)
## automatic block size set to 1e+08 bytes (was 1e+08)
Finally, the temporal directory to store the intermediate HDF5 files during running DelayedTensor is specified by setHDF5DumpDir
.
Note that in many systems the /var
directory has the storage limitation, so if there is no enough space, user should specify the other directory.
# tmpdir <- paste(sample(c(letters,1:9), 10), collapse="")
# dir.create(tmpdir, recursive=TRUE))
tmpdir <- tempdir()
setHDF5DumpDir(tmpdir)
These specified values are also extracted by each getter function.
DelayedTensor::getSparse()
## $delayedtensor.sparse
## [1] FALSE
DelayedTensor::getVerbose()
## $delayedtensor.verbose
## [1] FALSE
getAutoBlockSize()
## [1] 1e+08
getHDF5DumpDir()
## [1] "/home/biocbuild/bbs-3.20-bioc/tmpdir/RtmpnZjnFQ"
Unfold (a.k.a. matricizing) operations are used to reshape a tensor into a matrix.
In unfold
, row_idx
and col_idx
are specified to set which modes are used
as the row/column.
dmat1 <- DelayedTensor::unfold(darr1, row_idx=c(1,2), col_idx=3)
dmat1
## <6 x 4> HDF5Matrix object of type "double":
## [,1] [,2] [,3] [,4]
## [1,] 0.37268954 0.23070368 0.52094553 0.21342588
## [2,] 0.22935061 0.64118710 0.25782725 0.22290763
## [3,] 0.27138673 0.09924609 0.75624767 0.31642839
## [4,] 0.44581097 0.69191234 0.89248589 0.61861881
## [5,] 0.75288945 0.03899091 0.88588288 0.33481013
## [6,] 0.75134079 0.66556809 0.85359927 0.93574799
fold
is the inverse operation of unfold
, which is used to reshape
a matrix into a tensor.
In fold
, row_idx
/col_idx
are specified to set which modes correspond
the row/column of the output tensor and modes
is specified to set the mode of the output tensor.
dmat1_to_darr1 <- DelayedTensor::fold(dmat1,
row_idx=c(1,2), col_idx=3, modes=dim(darr1))
dmat1_to_darr1
## <2 x 3 x 4> DelayedArray object of type "double":
## ,,1
## [,1] [,2] [,3]
## [1,] 0.3726895 0.2713867 0.7528894
## [2,] 0.2293506 0.4458110 0.7513408
##
## ,,2
## [,1] [,2] [,3]
## [1,] 0.23070368 0.09924609 0.03899091
## [2,] 0.64118710 0.69191234 0.66556809
##
## ,,3
## [,1] [,2] [,3]
## [1,] 0.5209455 0.7562477 0.8858829
## [2,] 0.2578273 0.8924859 0.8535993
##
## ,,4
## [,1] [,2] [,3]
## [1,] 0.2134259 0.3164284 0.3348101
## [2,] 0.2229076 0.6186188 0.9357480
identical(as.array(darr1), as.array(dmat1_to_darr1))
## [1] TRUE
There are some wrapper functions of unfold
and fold
.
For example, in k_unfold
, mode m
is used as the row, and the other modes
are is used as the column.
k_fold
is the inverse operation of k_unfold
.
dmat2 <- DelayedTensor::k_unfold(darr1, m=1)
dmat2_to_darr1 <- k_fold(dmat2, m=1, modes=dim(darr1))
identical(as.array(darr1), as.array(dmat2_to_darr1))
## [1] TRUE
dmat3 <- DelayedTensor::k_unfold(darr1, m=2)
dmat3_to_darr1 <- k_fold(dmat3, m=2, modes=dim(darr1))
identical(as.array(darr1), as.array(dmat3_to_darr1))
## [1] TRUE
dmat4 <- DelayedTensor::k_unfold(darr1, m=3)
dmat4_to_darr1 <- k_fold(dmat4, m=3, modes=dim(darr1))
identical(as.array(darr1), as.array(dmat4_to_darr1))
## [1] TRUE
In rs_unfold
, mode m
is used as the row, and the other modes
are is used as the column.
rs_fold
and rs_unfold
also perform the same operations.
On the other hand, cs_unfold
specifies the mode m
as the column
and the other modes are specified as the column.
cs_fold
is the inverse operation of cs_unfold
.
dmat8 <- DelayedTensor::cs_unfold(darr1, m=1)
dmat8_to_darr1 <- DelayedTensor::cs_fold(dmat8, m=1, modes=dim(darr1))
identical(as.array(darr1), as.array(dmat8_to_darr1))
## [1] TRUE
dmat9 <- DelayedTensor::cs_unfold(darr1, m=2)
dmat9_to_darr1 <- DelayedTensor::cs_fold(dmat9, m=2, modes=dim(darr1))
identical(as.array(darr1), as.array(dmat9_to_darr1))
## [1] TRUE
dmat10 <- DelayedTensor::cs_unfold(darr1, m=3)
dmat10_to_darr1 <- DelayedTensor::cs_fold(dmat10, m=3, modes=dim(darr1))
identical(as.array(darr1), as.array(dmat10_to_darr1))
## [1] TRUE
In matvec
, m=2 is specified as unfold.
unmatvec
is the inverse operation of matvec
.
dmat11 <- DelayedTensor::matvec(darr1)
dmat11_darr1 <- DelayedTensor::unmatvec(dmat11, modes=dim(darr1))
identical(as.array(darr1), as.array(dmat11_darr1))
## [1] TRUE
ttm
multiplies a tensor by a matrix.
m
specifies in which mode the matrix will be multiplied.
dmatZ <- RandomUnifArray(c(10,4))
DelayedTensor::ttm(darr1, dmatZ, m=3)
## <2 x 3 x 10> DelayedArray object of type "double":
## ,,1
## [,1] [,2] [,3]
## [1,] 0.3216056 0.3957505 0.5466124
## [2,] 0.2881927 0.7352537 1.1043620
##
## ,,2
## [,1] [,2] [,3]
## [1,] 0.6357032 0.6344012 0.8686476
## [2,] 0.8163952 1.4508602 1.8942176
##
## ,,3
## [,1] [,2] [,3]
## [1,] 0.4872393 0.4700048 0.8573140
## [2,] 0.3845081 0.7977850 1.0936370
##
## ...
##
## ,,8
## [,1] [,2] [,3]
## [1,] 0.6503983 0.7490948 0.9885128
## [2,] 0.6036312 1.2599582 1.4441460
##
## ,,9
## [,1] [,2] [,3]
## [1,] 0.6162641 0.7211606 1.0989754
## [2,] 0.3780567 0.9988717 1.2163381
##
## ,,10
## [,1] [,2] [,3]
## [1,] 0.6872269 0.6436075 1.0473110
## [2,] 0.7002473 1.2189649 1.4425412
ttl
multiplies a tensor by multiple matrices.
ms
specifies in which mode these matrices will be multiplied.
dmatX <- RandomUnifArray(c(10,2))
dmatY <- RandomUnifArray(c(10,3))
dlizt <- list(dmatX = dmatX, dmatY = dmatY)
DelayedTensor::ttl(darr1, dlizt, ms=c(1,2))
## <10 x 10 x 4> DelayedArray object of type "double":
## ,,1
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.8321728 1.0965127 0.5574265 . 0.4347946 0.2268336
## [2,] 0.9338396 1.2846868 0.6276177 . 0.5146882 0.2823798
## ... . . . . . .
## [9,] 0.2119105 0.2806916 0.1420037 . 0.11144412 0.05851608
## [10,] 0.8276906 1.1231607 0.5556793 . 0.44853050 0.24232568
##
## ...
##
## ,,4
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.6520063 0.9621121 0.4845834 . 0.4077687 0.2226368
## [2,] 0.9899596 1.4466917 0.6842965 . 0.5933091 0.3400382
## ... . . . . . .
## [9,] 0.1730243 0.2549357 0.1272018 . 0.10751150 0.05913582
## [10,] 0.8035982 1.1773307 0.5663483 . 0.48707204 0.27560242
vec
collapses a DelayedArray into
a 1D DelayedArray (vector).
DelayedTensor::vec(darr1)
## <24> HDF5Array object of type "double":
## [1] [2] [3] . [23] [24]
## 0.3726895 0.2293506 0.2713867 . 0.3348101 0.9357480
fnorm
calculates the Frobenius norm of a DelayedArray.
DelayedTensor::fnorm(darr1)
## [1] 2.788941
innerProd
calculates the inner product value of two
DelayedArray.
DelayedTensor::innerProd(darr1, darr2)
## [1] 6.13551
Inner product multiplies two tensors and collapses to 0D tensor (norm). On the other hand, the outer product is an operation that leaves all subscripts intact.
DelayedTensor::outerProd(darr1[,,1], darr2[,,1])
## <2 x 3 x 2 x 3> HDF5Array object of type "double":
## ,,1,1
## [,1] [,2] [,3]
## [1,] 0.2377675 0.1731386 0.4803265
## [2,] 0.1463205 0.2844173 0.4793385
##
## ,,2,1
## [,1] [,2] [,3]
## [1,] 0.3427251 0.2495671 0.6923567
## [2,] 0.2109107 0.4099675 0.6909326
##
## ,,1,2
## [,1] [,2] [,3]
## [1,] 0.07911037 0.05760695 0.15981496
## [2,] 0.04868399 0.09463177 0.15948623
##
## ,,2,2
## [,1] [,2] [,3]
## [1,] 0.01638981 0.01193480 0.03310990
## [2,] 0.01008618 0.01960548 0.03304180
##
## ,,1,3
## [,1] [,2] [,3]
## [1,] 0.10470566 0.07624503 0.21152133
## [2,] 0.06443515 0.12524884 0.21108624
##
## ,,2,3
## [,1] [,2] [,3]
## [1,] 0.11955753 0.08705994 0.24152436
## [2,] 0.07357489 0.14301464 0.24102756
Using DelayedDiagonalArray
, we can originally create a diagonal
DelayedArray by specifying the dimensions (modes) and the values.
dgdarr <- DelayedTensor::DelayedDiagonalArray(c(5,6,7), 1:5)
dgdarr
## <5 x 6 x 7> sparse DelayedArray object of type "integer":
## ,,1
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 0 0 0 0 0
## [2,] 0 0 0 0 0 0
## [3,] 0 0 0 0 0 0
## [4,] 0 0 0 0 0 0
## [5,] 0 0 0 0 0 0
##
## ...
##
## ,,7
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0 0 0 0 0 0
## [2,] 0 0 0 0 0 0
## [3,] 0 0 0 0 0 0
## [4,] 0 0 0 0 0 0
## [5,] 0 0 0 0 0 0
Similar to the diag
of the base package,
the diag
of DelayedTensor is used to extract
and assign values to DelayedArray.
DelayedTensor::diag(dgdarr)
## <5> DelayedArray object of type "integer":
## [1] [2] [3] [4] [5]
## 1 2 3 4 5
DelayedTensor::diag(dgdarr) <- c(1111, 2222, 3333, 4444, 5555)
DelayedTensor::diag(dgdarr)
## <5> DelayedArray object of type "double":
## [1] [2] [3] [4] [5]
## 1111 2222 3333 4444 5555
modeSum
calculates the summation for a given mode m
of
a DelayedArray.
The mode specified as m
is collapsed into 1D as follows.
DelayedTensor::modeSum(darr1, m=1)
## <1 x 3 x 4> DelayedArray object of type "double":
## ,,1
## [,1] [,2] [,3]
## [1,] 0.6020402 0.7171977 1.5042302
##
## ,,2
## [,1] [,2] [,3]
## [1,] 0.8718908 0.7911584 0.7045590
##
## ,,3
## [,1] [,2] [,3]
## [1,] 0.7787728 1.6487336 1.7394821
##
## ,,4
## [,1] [,2] [,3]
## [1,] 0.4363335 0.9350472 1.2705581
DelayedTensor::modeSum(darr1, m=2)
## <2 x 1 x 4> DelayedArray object of type "double":
## ,,1
## [,1]
## [1,] 1.396966
## [2,] 1.426502
##
## ,,2
## [,1]
## [1,] 0.3689407
## [2,] 1.9986675
##
## ,,3
## [,1]
## [1,] 2.163076
## [2,] 2.003912
##
## ,,4
## [,1]
## [1,] 0.8646644
## [2,] 1.7772744
DelayedTensor::modeSum(darr1, m=3)
## <2 x 3 x 1> DelayedArray object of type "double":
## ,,1
## [,1] [,2] [,3]
## [1,] 1.337765 1.443309 2.012573
## [2,] 1.351273 2.648828 3.206256
Similar to modeSum
, modeMean
calculates the average value
for a given mode m
of a DelayedArray.
DelayedTensor::modeMean(darr1, m=1)
## <1 x 3 x 4> DelayedArray object of type "double":
## ,,1
## [,1] [,2] [,3]
## [1,] 0.3010201 0.3585989 0.7521151
##
## ,,2
## [,1] [,2] [,3]
## [1,] 0.4359454 0.3955792 0.3522795
##
## ,,3
## [,1] [,2] [,3]
## [1,] 0.3893864 0.8243668 0.8697411
##
## ,,4
## [,1] [,2] [,3]
## [1,] 0.2181668 0.4675236 0.6352791
DelayedTensor::modeMean(darr1, m=2)
## <2 x 1 x 4> DelayedArray object of type "double":
## ,,1
## [,1]
## [1,] 0.4656552
## [2,] 0.4755008
##
## ,,2
## [,1]
## [1,] 0.1229802
## [2,] 0.6662225
##
## ,,3
## [,1]
## [1,] 0.7210254
## [2,] 0.6679708
##
## ,,4
## [,1]
## [1,] 0.2882215
## [2,] 0.5924248
DelayedTensor::modeMean(darr1, m=3)
## <2 x 3 x 1> DelayedArray object of type "double":
## ,,1
## [,1] [,2] [,3]
## [1,] 0.3344412 0.3608272 0.5031433
## [2,] 0.3378181 0.6622070 0.8015640
There are some tensor specific product such as Hadamard product, Kronecker product, and Khatri-Rao product.
Suppose a tensor \(A \in \Re ^{I \times J}\) and a tensor \(B \in \Re ^{I \times J}\).
Hadamard product is defined as the element-wise product of \(A\) and \(B\).
Hadamard product can be extended to higher-order tensors.
\[ A \circ B = \begin{bmatrix} a_{11}b_{11} & a_{12}b_{12} & \cdots & a_{1J}b_{1J} \\ a_{21}b_{21} & a_{22}b_{22} & \cdots & a_{2J}b_{2J} \\ \vdots & \vdots & \ddots & \vdots \\ a_{I1}b_{I1} & a_{I2}b_{I2} & \cdots & a_{IJ}b_{IJ} \\ \end{bmatrix} \]
hadamard
calculates Hadamard product of two DelayedArray
objects.
prod_h <- DelayedTensor::hadamard(darr1, darr2)
dim(prod_h)
## [1] 2 3 4
hadamard_list
calculates Hadamard product of multiple
DelayedArray objects.
prod_hl <- DelayedTensor::hadamard_list(list(darr1, darr2))
dim(prod_hl)
## [1] 2 3 4
Suppose a tensor \(A \in \Re ^{I \times J}\) and a tensor \(B \in \Re ^{K \times L}\).
Kronecker product is defined as all the possible combination of element-wise product and the dimensions of output tensor are \({IK \times JL}\).
Kronecker product can be extended to higher-order tensors.
\[ A \otimes B = \begin{bmatrix} a_{11}B & a_{12}B & \cdots & a_{1J}B \\ a_{21}B & a_{22}B & \cdots & a_{2J}B \\ \vdots & \vdots & \ddots & \vdots \\ a_{I1}B & a_{I2}B & \cdots & a_{IJ}B \\ \end{bmatrix} \]
kronecker
calculates Kronecker product of two DelayedArray
objects.
prod_kron <- DelayedTensor::kronecker(darr1, darr2)
dim(prod_kron)
## [1] 4 9 16
kronecker_list
calculates Kronecker product of multiple
DelayedArray objects.
prod_kronl <- DelayedTensor::kronecker_list(list(darr1, darr2))
dim(prod_kronl)
## [1] 4 9 16
Suppose a tensor \(A \in \Re ^{I \times J}\) and a tensor \(B \in \Re ^{K \times J}\).
Khatri-Rao product is defined as the column-wise Kronecker product and the dimensions of output tensor is \({IK \times J}\).
\[ A \odot B = \begin{bmatrix} a_{1} \otimes a_{1} & a_{2} \otimes a_{2} & \cdots & a_{J} \otimes a_{J} \\ \end{bmatrix} \]
Khatri-Rao product can only be used for 2D tensors (matrices).
khatri_rao
calculates Khatri-Rao product of two DelayedArray
objects.
prod_kr <- DelayedTensor::khatri_rao(darr1[,,1], darr2[,,1])
dim(prod_kr)
## [1] 4 3
khatri_rao_list
calculates Khatri-Rao product of multiple
DelayedArray objects.
prod_krl <- DelayedTensor::khatri_rao_list(list(darr1[,,1], darr2[,,1]))
dim(prod_krl)
## [1] 4 3
list_rep
replicates an arbitrary number of any R object.
str(DelayedTensor::list_rep(darr1, 3))
## List of 3
## $ :Formal class 'RandomUnifArray' [package "DelayedRandomArray"] with 1 slot
## .. ..@ seed:Formal class 'RandomUnifArraySeed' [package "DelayedRandomArray"] with 6 slots
## .. .. .. ..@ min : num 0
## .. .. .. ..@ max : num 1
## .. .. .. ..@ dim : int [1:3] 2 3 4
## .. .. .. ..@ chunkdim: int [1:3] 2 3 4
## .. .. .. ..@ seeds :List of 1
## .. .. .. .. ..$ : int [1:2] 1910848067 473666656
## .. .. .. ..@ sparse : logi FALSE
## $ :Formal class 'RandomUnifArray' [package "DelayedRandomArray"] with 1 slot
## .. ..@ seed:Formal class 'RandomUnifArraySeed' [package "DelayedRandomArray"] with 6 slots
## .. .. .. ..@ min : num 0
## .. .. .. ..@ max : num 1
## .. .. .. ..@ dim : int [1:3] 2 3 4
## .. .. .. ..@ chunkdim: int [1:3] 2 3 4
## .. .. .. ..@ seeds :List of 1
## .. .. .. .. ..$ : int [1:2] 1910848067 473666656
## .. .. .. ..@ sparse : logi FALSE
## $ :Formal class 'RandomUnifArray' [package "DelayedRandomArray"] with 1 slot
## .. ..@ seed:Formal class 'RandomUnifArraySeed' [package "DelayedRandomArray"] with 6 slots
## .. .. .. ..@ min : num 0
## .. .. .. ..@ max : num 1
## .. .. .. ..@ dim : int [1:3] 2 3 4
## .. .. .. ..@ chunkdim: int [1:3] 2 3 4
## .. .. .. ..@ seeds :List of 1
## .. .. .. .. ..$ : int [1:2] 1910848067 473666656
## .. .. .. ..@ sparse : logi FALSE
modebind_list
collapses multiple DelayedArray objects
into single DelayedArray object.
m
specifies the collapsed dimension.
dim(DelayedTensor::modebind_list(list(darr1, darr2), m=1))
## [1] 4 3 4
dim(DelayedTensor::modebind_list(list(darr1, darr2), m=2))
## [1] 2 6 4
dim(DelayedTensor::modebind_list(list(darr1, darr2), m=3))
## [1] 2 3 8
rbind_list
is the row-wise modebind_list
and
collapses multiple 2D DelayedArray objects
into single DelayedArray object.
dim(DelayedTensor::rbind_list(list(darr1[,,1], darr2[,,1])))
## [1] 4 3
cbind_list
is the column-wise modebind_list
and
collapses multiple 2D DelayedArray objects
into single DelayedArray object.
dim(DelayedTensor::cbind_list(list(darr1[,,1], darr2[,,1])))
## [1] 2 6
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] DelayedRandomArray_1.14.0 HDF5Array_1.34.0
## [3] rhdf5_2.50.0 DelayedArray_0.32.0
## [5] SparseArray_1.6.0 S4Arrays_1.6.0
## [7] abind_1.4-8 IRanges_2.40.0
## [9] S4Vectors_0.44.0 MatrixGenerics_1.18.0
## [11] matrixStats_1.4.1 BiocGenerics_0.52.0
## [13] Matrix_1.7-1 DelayedTensor_1.12.0
## [15] BiocStyle_2.34.0
##
## loaded via a namespace (and not attached):
## [1] jsonlite_1.8.9 compiler_4.4.1 BiocManager_1.30.25
## [4] crayon_1.5.3 rsvd_1.0.5 Rcpp_1.0.13
## [7] rhdf5filters_1.18.0 parallel_4.4.1 jquerylib_0.1.4
## [10] BiocParallel_1.40.0 yaml_2.3.10 fastmap_1.2.0
## [13] lattice_0.22-6 R6_2.5.1 XVector_0.46.0
## [16] ScaledMatrix_1.14.0 knitr_1.48 einsum_0.1.2
## [19] bookdown_0.41 bslib_0.8.0 rlang_1.1.4
## [22] cachem_1.1.0 xfun_0.48 sass_0.4.9
## [25] cli_3.6.3 Rhdf5lib_1.28.0 BiocSingular_1.22.0
## [28] zlibbioc_1.52.0 digest_0.6.37 grid_4.4.1
## [31] irlba_2.3.5.1 rTensor_1.4.8 dqrng_0.4.1
## [34] lifecycle_1.0.4 evaluate_1.0.1 codetools_0.2-20
## [37] beachmat_2.22.0 rmarkdown_2.28 tools_4.4.1
## [40] htmltools_0.5.8.1