Skip to contents

The Reductive Late Conservation Test aims to statistically evaluate the existence of a monotonically decreasing phylotranscriptomic pattern based on TAI or TDI computations. The corresponding p-value quantifies the probability that a given TAI or TDI pattern (or any phylotranscriptomics pattern) does not follow an late conservation like pattern. A p-value < 0.05 indicates that the corresponding phylotranscriptomics pattern does indeed follow an late conservation (high-high-low) shape.

Usage

LateConservationTest(
  ExpressionSet,
  modules = NULL,
  permutations = 1000,
  lillie.test = FALSE,
  plotHistogram = FALSE,
  runs = 10,
  parallel = FALSE,
  gof.warning = FALSE,
  custom.perm.matrix = NULL
)

Arguments

ExpressionSet

a standard PhyloExpressionSet or DivergenceExpressionSet object.

modules

a list storing three elements: early, mid, and late. Each element expects a numeric vector specifying the developmental stages or experiments that correspond to each module. For example, module = list(early = 1:2, mid = 3:5, late = 6:7) divides a dataset storing seven developmental stages into 3 modules.

permutations

a numeric value specifying the number of permutations to be performed for the ReductiveHourglassTest.

lillie.test

a boolean value specifying whether the Lilliefors Kolmogorov-Smirnov Test shall be performed to quantify the goodness of fit.

plotHistogram

a boolean value specifying whether a Lillifor's Kolmogorov-Smirnov-Test shall be performed to test the goodness of fit of the approximated distribution, as well as additional plots quantifying the significance of the observed phylotranscriptomic pattern.

runs

specify the number of runs to be performed for goodness of fit computations, in case plotHistogram = TRUE. In most cases runs = 100 is a reasonable choice. Default is runs = 10 (because it takes less computation time for demonstration purposes).

parallel

performing runs in parallel (takes all cores of your multicore machine).

gof.warning

a logical value indicating whether non significant goodness of fit results should be printed as warning. Default is gof.warning = FALSE.

custom.perm.matrix

a custom bootMatrix (permutation matrix) to perform the underlying test statistic. Default is custom.perm.matrix = NULL.

Value

a list object containing the list elements:

p.value : the p-value quantifying the statistical significance (low-high-high pattern) of the given phylotranscriptomics pattern.

std.dev : the standard deviation of the N sampled phylotranscriptomics patterns for each developmental stage S.

lillie.test : a boolean value specifying whether the Lillifors KS-Test returned a p-value > 0.05, which indicates that fitting the permuted scores with a normal distribution seems plausible.

Details

The reductive late conservation test is a permutation test based on the following test statistic.

(1) A set of developmental stages is partitioned into three modules - early, mid, and late - based on prior biological knowledge.

(2) The mean TAI or TDI value for each of the three modules T_early, T_mid, and T_late are computed.

(3) The two differences D1 = T_early - T_late and D2 = T_mid - T_late are calculated.

(4) The minimum D_min of D1 and D2 is computed as final test statistic of the reductive hourglass test.

In order to determine the statistical significance of an observed minimum difference D_min the following permutation test was performed. Based on the bootMatrix D_min is calculated from each of the permuted TAI or TDI profiles, approximated by a Gaussian distribution with method of moments estimated parameters returned by fitdist, and the corresponding p-value is computed by pnorm given the estimated parameters of the Gaussian distribution. The goodness of fit for the random vector D_min is statistically quantified by an Lilliefors (Kolmogorov-Smirnov) test for normality.

In case the parameter plotHistogram = TRUE, a multi-plot is generated showing:

(1) A Cullen and Frey skewness-kurtosis plot generated by descdist. This plot illustrates which distributions seem plausible to fit the resulting permutation vector D_min. In the case of the reductive late conservation test a normal distribution seemed plausible.

(2) A histogram of D_min combined with the density plot is plotted. D_min is then fitted by a normal distribution. The corresponding parameters are estimated by moment matching estimation using the fitdist function.

(3) A plot showing the p-values for N independent runs to verify that a specific p-value is biased by a specific permutation order.

(4) A barplot showing the number of cases in which the underlying goodness of fit (returned by Lilliefors (Kolmogorov-Smirnov) test for normality) has shown to be significant (TRUE) or not significant (FALSE). This allows to quantify the permutation bias and their implications on the goodness of fit.

References

Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012

Quint M et al. (2012). A transcriptomic hourglass in plant embryogenesis. Nature (490): 98-101.

Piasecka B, Lichocki P, Moretti S, et al. (2013) The hourglass and the early conservation models co-existing patterns of developmental constraints in vertebrates. PLoS Genet. 9(4): e1003476.

Author

Hajk-Georg Drost and Jaruwatana Sodai Lotharukpong

Examples


data(PhyloExpressionSetExample)

# perform the late conservation test for a PhyloExpressionSet
# here the prior biological knowledge is that stages 1-2 correspond to module 1 = early,
# stages 3-5 to module 2 = mid (phylotypic module), and stages 6-7 correspond to
# module 3 = late
LateConservationTest(PhyloExpressionSetExample,
                       modules = list(early = 1:2, mid = 3:5, late = 6:7), 
                       permutations = 1000)
#> The phylotranscriptomic pattern may not follow a late conservation pattern (high-mid-low or high-high-low).
#> 
#> [ Number of Eigen threads that are employed on your machine: 12 ]
#> 
#> [ Computing age assignment permutations for test statistic ... ]
#> 
[=========================================] 100%   
#> [ Computing variances of permuted transcriptome signatures ... ]
#> 
#> $p.value
#> [1] 0.9999978
#> 
#> $std.dev
#> [1] 0.05554746 0.05429588 0.05269882 0.05206943 0.05079230 0.05281953 0.05594038
#> 
#> $lillie.test
#> [1] NA
#> 


# use your own permutation matrix based on which p-values (LateConservationTest)
# shall be computed
custom_perm_matrix <- bootMatrix(PhyloExpressionSetExample,100)
#> 
#> [ Number of Eigen threads that are employed on your machine: 12 ]
#> 
#> [ Computing age assignment permutations for test statistic ... ]
#> 
[=========================================] 100%   
#> [ Computing variances of permuted transcriptome signatures ... ]
#> 

LateConservationTest(PhyloExpressionSetExample,
                       modules = list(early = 1:2, mid = 3:5, late = 6:7), 
                       custom.perm.matrix = custom_perm_matrix)
#> The phylotranscriptomic pattern may not follow a late conservation pattern (high-mid-low or high-high-low).
#> $p.value
#> [1] 0.9999946
#> 
#> $std.dev
#>     Zygote   Quadrant   Globular      Heart    Torpedo       Bent     Mature 
#> 0.05410165 0.05188825 0.04945213 0.04701504 0.04954394 0.05426393 0.06388890 
#> 
#> $lillie.test
#> [1] NA
#>