The Reverse Hourglass Test aims to statistically evaluate the
existence of a reverse hourglass pattern based on TAI
or TDI
computations.
The corresponding p-value quantifies the probability that a given TAI or TDI pattern (or any phylotranscriptomics pattern)
does follow an hourglass like shape. A p-value < 0.05 indicates that the corresponding phylotranscriptomics pattern does
rather follow a reverse hourglass (low-high-low) shape.
Usage
ReverseHourglassTest(
ExpressionSet,
modules = NULL,
permutations = 1000,
lillie.test = FALSE,
plotHistogram = FALSE,
runs = 10,
parallel = FALSE,
gof.warning = FALSE,
custom.perm.matrix = NULL
)
Arguments
- ExpressionSet
a standard PhyloExpressionSet or DivergenceExpressionSet object.
- modules
a list storing three elements: early, mid, and late. Each element expects a numeric vector specifying the developmental stages or experiments that correspond to each module. For example,
module
= list(early = 1:2, mid = 3:5, late = 6:7) devides a dataset storing seven developmental stages into 3 modules.- permutations
a numeric value specifying the number of permutations to be performed for the
ReverseHourglassTest
.- lillie.test
a boolean value specifying whether the Lilliefors Kolmogorov-Smirnov Test shall be performed to quantify the goodness of fit.
- plotHistogram
a boolean value specifying whether a Lillifor's Kolmogorov-Smirnov-Test shall be performed to test the goodness of fit of the approximated distribution, as well as additional plots quantifying the significance of the observed phylotranscriptomic pattern.
- runs
specify the number of runs to be performed for goodness of fit computations, in case
plotHistogram
=TRUE
. In most casesruns
= 100 is a reasonable choice. Default isruns
= 10 (because it takes less computation time for demonstration purposes).- parallel
performing
runs
in parallel (takes all cores of your multicore machine).- gof.warning
a logical value indicating whether non significant goodness of fit results should be printed as warning. Default is
gof.warning = FALSE
.- custom.perm.matrix
a custom
bootMatrix
(permutation matrix) to perform the underlying test statistic. Default iscustom.perm.matrix = NULL
.
Value
a list object containing the list elements:
p.value
: the p-value quantifying the statistical significance (low-high-low pattern) of the given phylotranscriptomics pattern.
std.dev
: the standard deviation of the N sampled phylotranscriptomics patterns for each developmental stage S.
lillie.test
: a boolean value specifying whether the Lillifors KS-Test returned a p-value > 0.05,
which indicates that fitting the permuted scores with a normal distribution seems plausible.
Details
The reverse hourglass test is a permutation test based on the following test statistic.
(1) A set of developmental stages is partitioned into three modules - early, mid, and late - based on prior biological knowledge.
(2) The mean TAI
or TDI
value for each of the three modules T_early, T_mid, and T_late are computed.
(3) The two differences D1 = T_mid - T_early
and D2 = T_mid - T_late
are calculated.
(4) The minimum D_min
of D1
and D2
is computed as final test statistic of the reductive hourglass test.
In order to determine the statistical significance of an observed minimum difference D_min
the following permutation test was performed. Based on the bootMatrix
D_min
is calculated from each of the permuted TAI
or TDI
profiles,
approximated by a Gaussian distribution with method of moments estimated parameters returned by fitdist
,
and the corresponding p-value is computed by pnorm
given the estimated parameters of the Gaussian distribution.
The goodness of fit for the random vector D_min is statistically quantified by an Lilliefors (Kolmogorov-Smirnov) test for normality.
In case the parameter plotHistogram = TRUE, a multi-plot is generated showing:
(1) A Cullen and Frey skewness-kurtosis plot generated by descdist
.
This plot illustrates which distributions seem plausible to fit the resulting permutation vector D_min.
In the case of the Reverse Hourglass Test a normal distribution seemed plausible.
(2) A histogram of D_min
combined with the density plot is plotted. D_min is then fitted by a normal distribution.
The corresponding parameters are estimated by moment matching estimation using the fitdist
function.
(3) A plot showing the p-values for N independent runs to verify that a specific p-value is biased by a specific permutation order.
(4) A barplot showing the number of cases in which the underlying goodness of fit (returned by Lilliefors (Kolmogorov-Smirnov) test
for normality) has shown to be significant (TRUE
) or not significant (FALSE
).
This allows to quantify the permutation bias and their implications on the goodness of fit.
References
Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012
Quint M et al. (2012). A transcriptomic hourglass in plant embryogenesis. Nature (490): 98-101.
M. L. Delignette-Muller, R. Pouillot, J.-B. Denis and C. Dutang (2014), fitdistrplus: help to fit of a parametric distribution to non-censored or censored data.
Cullen AC and Frey HC (1999) Probabilistic techniques in exposure assessment. Plenum Press, USA, pp. 81-159.
Evans M, Hastings N and Peacock B (2000) Statistical distributions. John Wiley and Sons Inc.
Sokal RR and Rohlf FJ (1995) Biometry. W.H. Freeman and Company, USA, pp. 111-115.
Juergen Gross and bug fixes by Uwe Ligges (2012). nortest: Tests for Normality. R package version 1.0-2.
http://CRAN.R-project.org/package=nortest
Dallal, G.E. and Wilkinson, L. (1986): An analytic approximation to the distribution of Lilliefors' test for normality. The American Statistician, 40, 294-296.
Stephens, M.A. (1974): EDF statistics for goodness of fit and some comparisons. Journal of the American Statistical Association, 69, 730-737.
http://stackoverflow.com/questions/4290081/fitting-data-to-distributions?rq=1
http://stats.stackexchange.com/questions/45033/can-i-use-kolmogorov-smirnov-test-and-estimate-distribution-parameters
http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf
http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf
Examples
data(PhyloExpressionSetExample)
# perform the reductive hourglass test for a PhyloExpressionSet
# here the prior biological knowledge is that stages 1-2 correspond to module 1 = early,
# stages 3-5 to module 2 = mid (phylotypic module), and stages 6-7 correspond to
# module 3 = late
ReverseHourglassTest(PhyloExpressionSetExample,
modules = list(early = 1:2, mid = 3:5, late = 6:7),
permutations = 1000)
#> The phylotranscriptomic pattern may not follow a reverse hourglass pattern (low-high-low).
#>
#> [ Number of Eigen threads that are employed on your machine: 12 ]
#>
#> [ Computing age assignment permutations for test statistic ... ]
#>
[=========================================] 100%
#> [ Computing variances of permuted transcriptome signatures ... ]
#>
#> $p.value
#> [1] 1
#>
#> $std.dev
#> [1] 0.05543708 0.05423243 0.05240352 0.05121000 0.05051120 0.05210643 0.05712842
#>
#> $lillie.test
#> [1] NA
#>
# use your own permutation matrix based on which p-values (ReverseHourglassTest)
# shall be computed
custom_perm_matrix <- bootMatrix(PhyloExpressionSetExample,100)
#>
#> [ Number of Eigen threads that are employed on your machine: 12 ]
#>
#> [ Computing age assignment permutations for test statistic ... ]
#>
[=========================================] 100%
#> [ Computing variances of permuted transcriptome signatures ... ]
#>
ReverseHourglassTest(PhyloExpressionSetExample,
modules = list(early = 1:2, mid = 3:5, late = 6:7),
custom.perm.matrix = custom_perm_matrix)
#> The phylotranscriptomic pattern may not follow a reverse hourglass pattern (low-high-low).
#> $p.value
#> [1] 1
#>
#> $std.dev
#> Zygote Quadrant Globular Heart Torpedo Bent Mature
#> 0.05179833 0.04870956 0.05064932 0.04873651 0.04951337 0.04931970 0.05426546
#>
#> $lillie.test
#> [1] NA
#>