Details the changes made to limma’s diffSplice method.
Here we describe the improvements to limma’s
diffSplice
method. diffSplice
works on the bin-wise coefficient of the
linear model which corresponds to the log2 fold changes between conditions. It
compares the log2(fold change) ˆβk,g of a bin k belonging to
gene g, to a weighted average of log2(fold change) of all the other bins of
the same gene combined ˆBk,g (the subscript g will be henceforth
omitted for ease of reading). The weighted average of all the other bins in the
same gene is calculated by
ˆBk=∑Ni,i≠kwiˆβi∑Ni,i≠kwi
where wi=1u2i and ui refers to the diagonal elements of
the unscaled covariance matrix (XTVX)−1. X is the design matrix and V
corresponds to the weight matrix estimated by voom
. The difference of log2
fold changes, which is also the coefficient returned by diffSplice()
is then
calculated by ˆCk=ˆβk−ˆBk. Instead of
calculating the t-statistic with ˆCk, this value is scaled again in
the original code:
ˆDk=ˆCk√1−wk∑Niwi
and the t-statistic is calculated as:
tk=ˆDkuksg
s2g refers to the posterior residual variance of gene g, which is calculated by averaging the sample values of the residual variances of all the bins in the gene, and then squeezing these residual variances of all genes using empirical Bayes method. This assumes that the residual variance is constant across all bins of the same gene.
In , we applied three changes to the above method. First, the residual variances are not assumed to be constant across all bins of the same gene. This results in the sample values of the residual variances of every bin now being squeezed using empirical Bayes method, resulting in posterior variances s2i for every individual bin i. Second, the weights wi, used to calculate ˆBk, now incorporate the individual variances by wi=1s2iu2i. Third, the ˆCk value is directly used to calculate the t-statistic, which after all these changes now corresponds to
tk=ˆCkuksi.
Using a simulation from a previous DEU benchmark Soneson et al. 2016, we show that our modifications improve the accuracy of the original limma method:
Figure 1: diffSplice2 improves upon the original method in a DEU simulation
Of note, however, in most settings the accuracy of diffSplice2
is still
inferior to that of DEXSeq, which therefore remains
the method of choice for DEU. However, as the computationally-heavy
DEXSeq does not scale well and is not applicable in
all situations, we recommend diffSplice2
as the next best method.