Perform the Reverse Hourglass Test

The Reverse Hourglass Test aims to statistically evaluate the existence of a reverse hourglass pattern based on TAI or TDI computations. The corresponding p-value quantifies the probability that a given TAI or TDI pattern (or any phylotranscriptomics pattern) does follow an hourglass like shape. A p-value < 0.05 indicates that the corresponding phylotranscriptomics pattern does rather follow a reverse hourglass (low-high-low) shape.

Usage

ReverseHourglassTest(
  ExpressionSet,
  modules = NULL,
  permutations = 1000,
  lillie.test = FALSE,
  plotHistogram = FALSE,
  runs = 10,
  parallel = FALSE,
  gof.warning = FALSE,
  custom.perm.matrix = NULL
)

Arguments

ExpressionSet: a standard PhyloExpressionSet or DivergenceExpressionSet object.
modules: a list storing three elements: early, mid, and late. Each element expects a numeric vector specifying the developmental stages or experiments that correspond to each module. For example, module = list(early = 1:2, mid = 3:5, late = 6:7) devides a dataset storing seven developmental stages into 3 modules.
permutations: a numeric value specifying the number of permutations to be performed for the ReverseHourglassTest.
lillie.test: a boolean value specifying whether the Lilliefors Kolmogorov-Smirnov Test shall be performed to quantify the goodness of fit.
plotHistogram: a boolean value specifying whether a Lillifor's Kolmogorov-Smirnov-Test shall be performed to test the goodness of fit of the approximated distribution, as well as additional plots quantifying the significance of the observed phylotranscriptomic pattern.
runs: specify the number of runs to be performed for goodness of fit computations, in case plotHistogram = TRUE. In most cases runs = 100 is a reasonable choice. Default is runs = 10 (because it takes less computation time for demonstration purposes).
parallel: performing runs in parallel (takes all cores of your multicore machine).
gof.warning: a logical value indicating whether non significant goodness of fit results should be printed as warning. Default is gof.warning = FALSE.
custom.perm.matrix: a custom bootMatrix (permutation matrix) to perform the underlying test statistic. Default is custom.perm.matrix = NULL.

Value

a list object containing the list elements:

p.value : the p-value quantifying the statistical significance (low-high-low pattern) of the given phylotranscriptomics pattern.

std.dev : the standard deviation of the N sampled phylotranscriptomics patterns for each developmental stage S.

lillie.test : a boolean value specifying whether the Lillifors KS-Test returned a p-value > 0.05, which indicates that fitting the permuted scores with a normal distribution seems plausible.

Details

The reverse hourglass test is a permutation test based on the following test statistic.

(1) A set of developmental stages is partitioned into three modules - early, mid, and late - based on prior biological knowledge.

(2) The mean TAI or TDI value for each of the three modules T_early, T_mid, and T_late are computed.

(3) The two differences D1 = T_mid - T_early and D2 = T_mid - T_late are calculated.

(4) The minimum D_min of D1 and D2 is computed as final test statistic of the reductive hourglass test.

In order to determine the statistical significance of an observed minimum difference D_min the following permutation test was performed. Based on the bootMatrix D_min is calculated from each of the permuted TAI or TDI profiles, approximated by a Gaussian distribution with method of moments estimated parameters returned by fitdist, and the corresponding p-value is computed by pnorm given the estimated parameters of the Gaussian distribution. The goodness of fit for the random vector D_min is statistically quantified by an Lilliefors (Kolmogorov-Smirnov) test for normality.

In case the parameter plotHistogram = TRUE, a multi-plot is generated showing:

(1) A Cullen and Frey skewness-kurtosis plot generated by descdist. This plot illustrates which distributions seem plausible to fit the resulting permutation vector D_min. In the case of the Reverse Hourglass Test a normal distribution seemed plausible.

(2) A histogram of D_min combined with the density plot is plotted. D_min is then fitted by a normal distribution. The corresponding parameters are estimated by moment matching estimation using the fitdist function.

(3) A plot showing the p-values for N independent runs to verify that a specific p-value is biased by a specific permutation order.

(4) A barplot showing the number of cases in which the underlying goodness of fit (returned by Lilliefors (Kolmogorov-Smirnov) test for normality) has shown to be significant (TRUE) or not significant (FALSE). This allows to quantify the permutation bias and their implications on the goodness of fit.

References

Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012

Quint M et al. (2012). A transcriptomic hourglass in plant embryogenesis. Nature (490): 98-101.

M. L. Delignette-Muller, R. Pouillot, J.-B. Denis and C. Dutang (2014), fitdistrplus: help to fit of a parametric distribution to non-censored or censored data.

Cullen AC and Frey HC (1999) Probabilistic techniques in exposure assessment. Plenum Press, USA, pp. 81-159.

Evans M, Hastings N and Peacock B (2000) Statistical distributions. John Wiley and Sons Inc.

Sokal RR and Rohlf FJ (1995) Biometry. W.H. Freeman and Company, USA, pp. 111-115.

Juergen Gross and bug fixes by Uwe Ligges (2012). nortest: Tests for Normality. R package version 1.0-2.

http://CRAN.R-project.org/package=nortest

Dallal, G.E. and Wilkinson, L. (1986): An analytic approximation to the distribution of Lilliefors' test for normality. The American Statistician, 40, 294-296.

Stephens, M.A. (1974): EDF statistics for goodness of fit and some comparisons. Journal of the American Statistical Association, 69, 730-737.

http://stackoverflow.com/questions/4290081/fitting-data-to-distributions?rq=1

http://stats.stackexchange.com/questions/45033/can-i-use-kolmogorov-smirnov-test-and-estimate-distribution-parameters

http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf

Author

Hajk-Georg Drost

Examples


data(PhyloExpressionSetExample)

# perform the reductive hourglass test for a PhyloExpressionSet
# here the prior biological knowledge is that stages 1-2 correspond to module 1 = early,
# stages 3-5 to module 2 = mid (phylotypic module), and stages 6-7 correspond to
# module 3 = late
ReverseHourglassTest(PhyloExpressionSetExample,
                       modules = list(early = 1:2, mid = 3:5, late = 6:7),
                       permutations = 1000)
#> The phylotranscriptomic pattern may not follow a reverse hourglass pattern (low-high-low).
#> 
#> [ Number of Eigen threads that are employed on your machine: 12 ]
#> 
#> [ Computing age assignment permutations for test statistic ... ]
#> 
[=========================================] 100%   
#> [ Computing variances of permuted transcriptome signatures ... ]
#> 
#> $p.value
#> [1] 1
#> 
#> $std.dev
#> [1] 0.05580531 0.05451119 0.05339218 0.05238450 0.05150209 0.05386725 0.05819421
#> 
#> $lillie.test
#> [1] NA
#> 


# use your own permutation matrix based on which p-values (ReverseHourglassTest)
# shall be computed
custom_perm_matrix <- bootMatrix(PhyloExpressionSetExample,100)
#> 
#> [ Number of Eigen threads that are employed on your machine: 12 ]
#> 
#> [ Computing age assignment permutations for test statistic ... ]
#> 
[=========================================] 100%   
#> [ Computing variances of permuted transcriptome signatures ... ]
#> 

ReverseHourglassTest(PhyloExpressionSetExample,
                     modules = list(early = 1:2, mid = 3:5, late = 6:7),
                     custom.perm.matrix = custom_perm_matrix)
#> The phylotranscriptomic pattern may not follow a reverse hourglass pattern (low-high-low).
#> $p.value
#> [1] 1
#> 
#> $std.dev
#>     Zygote   Quadrant   Globular      Heart    Torpedo       Bent     Mature 
#> 0.05299800 0.05326480 0.05441365 0.05308304 0.05398674 0.05469395 0.05343400 
#> 
#> $lillie.test
#> [1] NA
#>