R/EarlyConservationTest.R
EarlyConservationTest.Rd
The Reductive Early Conservation Test aims to statistically evaluate the
existence of a monotonically increasing phylotranscriptomic pattern based on TAI
or TDI
computations.
The corresponding p-value quantifies the probability that a given TAI or TDI pattern (or any phylotranscriptomics pattern)
does not follow an early conservation like pattern. A p-value < 0.05 indicates that the corresponding phylotranscriptomics pattern does
indeed follow an early conservation (low-high-high) shape.
EarlyConservationTest( ExpressionSet, modules = NULL, permutations = 1000, lillie.test = FALSE, plotHistogram = FALSE, runs = 10, parallel = FALSE, gof.warning = FALSE, custom.perm.matrix = NULL )
ExpressionSet | a standard PhyloExpressionSet or DivergenceExpressionSet object. |
---|---|
modules | a list storing three elements: early, mid, and late. Each element expects a numeric
vector specifying the developmental stages or experiments that correspond to each module.
For example, |
permutations | a numeric value specifying the number of permutations to be performed for the |
lillie.test | a boolean value specifying whether the Lilliefors Kolmogorov-Smirnov Test shall be performed to quantify the goodness of fit. |
plotHistogram | a boolean value specifying whether a Lillifor's Kolmogorov-Smirnov-Test shall be performed to test the goodness of fit of the approximated distribution, as well as additional plots quantifying the significance of the observed phylotranscriptomic pattern. |
runs | specify the number of runs to be performed for goodness of fit computations, in case |
parallel | performing |
gof.warning | a logical value indicating whether non significant goodness of fit results should be printed as warning. Default is |
custom.perm.matrix | a custom |
a list object containing the list elements:
p.value
: the p-value quantifying the statistical significance (low-high-high pattern) of the given phylotranscriptomics pattern.
std.dev
: the standard deviation of the N sampled phylotranscriptomics patterns for each developmental stage S.
lillie.test
: a boolean value specifying whether the Lillifors KS-Test returned a p-value > 0.05,
which indicates that fitting the permuted scores with a normal distribution seems plausible.
The reductive early conservation test is a permutation test based on the following test statistic.
(1) A set of developmental stages is partitioned into three modules - early, mid, and late - based on prior biological knowledge.
(2) The mean TAI
or TDI
value for each of the three modules T_early, T_mid, and T_late are computed.
(3) The two differences D1 = T_mid - T_early and D2 = T_late - T_early are calculated.
(4) The minimum D_min of D1 and D2 is computed as final test statistic of the reductive hourglass test.
In order to determine the statistical significance of an observed minimum difference D_min
the following permutation test was performed. Based on the bootMatrix
D_min
is calculated from each of the permuted TAI
or TDI
profiles,
approximated by a Gaussian distribution with method of moments estimated parameters returned by fitdist
,
and the corresponding p-value is computed by pnorm
given the estimated parameters of the Gaussian distribution.
The goodness of fit for the random vector D_min is statistically quantified by an Lilliefors (Kolmogorov-Smirnov) test
for normality.
In case the parameter plotHistogram = TRUE, a multi-plot is generated showing:
(1) A Cullen and Frey skewness-kurtosis plot generated by descdist
.
This plot illustrates which distributions seem plausible to fit the resulting permutation vector D_min.
In the case of the reductive early conservation test a normal distribution seemed plausible.
(2) A histogram of D_min combined with the density plot is plotted. D_min is then fitted by a normal distribution.
The corresponding parameters are estimated by moment matching estimation using the fitdist
function.
(3) A plot showing the p-values for N independent runs to verify that a specific p-value is biased by a specific permutation order.
(4) A barplot showing the number of cases in which the underlying goodness of fit (returned by Lilliefors (Kolmogorov-Smirnov) test
for normality) has shown to be significant (TRUE
) or not significant (FALSE
).
This allows to quantify the permutation bias and their implications on the goodness of fit.
Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012
Quint M et al. (2012). A transcriptomic hourglass in plant embryogenesis. Nature (490): 98-101.
Piasecka B, Lichocki P, Moretti S, et al. (2013) The hourglass and the early conservation models co-existing patterns of developmental constraints in vertebrates. PLoS Genet. 9(4): e1003476.
Hajk-Georg Drost
data(PhyloExpressionSetExample) # perform the early conservation test for a PhyloExpressionSet # here the prior biological knowledge is that stages 1-2 correspond to module 1 = early, # stages 3-5 to module 2 = mid (phylotypic module), and stages 6-7 correspond to # module 3 = late EarlyConservationTest(PhyloExpressionSetExample, modules = list(early = 1:2, mid = 3:5, late = 6:7), permutations = 1000) #> $p.value #> [1] 0.999896 #> #> $std.dev #> [1] 0.05501641 0.05399180 0.05227801 0.05082699 0.04933829 0.05165833 0.05448006 #> #> $lillie.test #> [1] NA #> # use your own permutation matrix based on which p-values (EarlyConservationTest) # shall be computed custom_perm_matrix <- bootMatrix(PhyloExpressionSetExample,100) EarlyConservationTest(PhyloExpressionSetExample, modules = list(early = 1:2, mid = 3:5, late = 6:7), custom.perm.matrix = custom_perm_matrix) #> $p.value #> [1] 0.9999673 #> #> $std.dev #> Zygote Quadrant Globular Heart Torpedo Bent Mature #> 0.05774599 0.05798849 0.05853098 0.05678427 0.05377584 0.05354774 0.05674971 #> #> $lillie.test #> [1] NA #>