Perform Flat Line Test

This function quantifies the statistical significance of an observed phylotranscriptomic pattern. In detail, the Flat Line Test quantifies any significant deviation of an observed phylotranscriptomic pattern from a flat line.

Usage

FlatLineTest(
  ExpressionSet,
  permutations = 10000,
  plotHistogram = FALSE,
  runs = 10,
  parallel = FALSE,
  custom.perm.matrix = NULL
)

Arguments

ExpressionSet: a standard PhyloExpressionSet or DivergenceExpressionSet object.
permutations: a numeric value specifying the number of permutations that shall be performed for the FlatLineTest.
plotHistogram: a logical value indicating whether a detailed statistical analysis concerning the goodness of fit should be performed.
runs: specify the number of runs to be performed for goodness of fit computations. In most cases runs = 100 is a reasonable choice.
parallel: performing runs in parallel (takes all cores of your multicore machine).
custom.perm.matrix: a custom bootMatrix (permutation matrix) to perform the underlying test statistic. Default is custom.perm.matrix = NULL.

Value

a list object containing the list elements:

p.value the p-value quantifying the statistical significance (deviation from a flat line) of the given phylotranscriptomics pattern.
std.dev the standard deviation of the N sampled phylotranscriptomics patterns for each developmental stage S.
std.dev the Kolmogorov-Smirnov test satistics for fitting a gamma distribution to the variances of the dataset with permuted phylostrata.

Details

Internally the function performs N phylotranscriptomics pattern computations (TAI or TDI) based on sampled PhyloExpressionSets or DivergenceExpressionSets (see bootMatrix). The test statistics is being developed as follows:

The variance V_pattern of the S phylotranscriptomics values defines the test statistic for the FlatLineTest. The basic assumption is, that the variance of a flat line should be equivalent to zero for a perfect flat line. Any deviation from a flat line can be measured with a variance value > 0.

To determine the null distribution of V_p, all PS or DS values within each developmental stage s are randomly permuted, S surrogate phylotranscriptomics values are computed from this permuted dataset, and a surrogate value of V_p is computed from these S phylotranscriptomics values. This permutation process is repeated N times, yielding a histogram of V_p.

After applying a Lilliefors Kolmogorov-Smirnov Test for gamma distribution, V_p is approximated by a gamma distribution. The two parameters of the gamma distribution are estimated by the function fitdist from the fitdistrplus package by moment matching estimation. The fitted gamma distribution is considered the null distribution of V_pattern, and the p-value of the observed value of V_p is computed from this null distribution.

In case the parameter plotHistogram = TRUE, a multi-plot is generated showing:

(1) A Cullen and Frey skewness-kurtosis plot generated by descdist).

(2) A histogram of V_p combined with the density plot using the Method of Moments estimated parameters returned by the fitdist function using a gamma distribution.

(3) A plot showing the p-values for N independent runs to verify that a specific p-value is biased by a specific permutation order.

The goodness of fit for the random vector V_p is quantified statistically by an adapted Lilliefors (Kolmogorov-Smirnov) test for gamma distributions.

Note

In case there are extreme outlier expression values stored in the dataset (PhyloExpressionSet or DivergenceExpressionSet), the internal fitdist function that is based on the bootMatrix output might return a warning: "In densfun(x, parm[1], parm[2], ...) : NaNs were produced" which indicates that permutation results caused by extreme outlier expression values that could not be fitted accordingly. This warning will not be printed out when the corresponding outlier values are extracted from the dataset.

References

Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012

Quint M et al. (2012). A transcriptomic hourglass in plant embryogenesis. Nature (490): 98-101.

M. L. Delignette-Muller, R. Pouillot, J.-B. Denis and C. Dutang (2014), fitdistrplus: help to fit of a parametric distribution to non-censored or censored data.

Cullen AC and Frey HC (1999) Probabilistic techniques in exposure assessment. Plenum Press, USA, pp. 81-159.

Evans M, Hastings N and Peacock B (2000) Statistical distributions. John Wiley and Sons Inc.

Sokal RR and Rohlf FJ (1995) Biometry. W.H. Freeman and Company, USA, pp. 111-115.

Juergen Gross and bug fixes by Uwe Ligges (2012). nortest: Tests for Normality. R package version 1.0-2.

http://CRAN.R-project.org/package=nortest

Dallal, G.E. and Wilkinson, L. (1986): An analytic approximation to the distribution of Lilliefors test for normality. The American Statistician, 40, 294-296.

Stephens, M.A. (1974): EDF statistics for goodness of fit and some comparisons. Journal of the American Statistical Association, 69, 730-737.

http://stackoverflow.com/questions/4290081/fitting-data-to-distributions?rq=1

http://stats.stackexchange.com/questions/45033/can-i-use-kolmogorov-smirnov-test-and-estimate-distribution-parameters

http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf

Author

Hajk-Georg Drost

Examples


# read standard phylotranscriptomics data
data(PhyloExpressionSetExample)

# example PhyloExpressionSet using 100 permutations
FlatLineTest(PhyloExpressionSetExample,
             permutations  = 100,
             plotHistogram = FALSE)
#> 
#> [ Number of Eigen threads that are employed on your machine: 12 ]
#> 
#> [ Computing age assignment permutations for test statistic ... ]
#> 
[=========================================] 100%   
#> [ Computing variances of permuted transcriptome signatures ... ]
#> 
#> 
#> Total runtime of your permutation test: 0.017  seconds.
#> 
#> -> We recommended using at least 20000 permutations to achieve a sufficient permutation test.
#> $p.value
#> [1] 3.838297e-11
#> 
#> $std.dev
#> [1] 0.05100253 0.04925052 0.05095288 0.05056541 0.04906568 0.05528996 0.06561879
#> 
#> $ks.test
#> 
#> 	Asymptotic one-sample Kolmogorov-Smirnov test
#> 
#> data:  filtered_vars
#> D = 0.061236, p-value = 0.8475
#> alternative hypothesis: two-sided
#> 
#> 

# use your own permutation matrix based on which p-values (FlatLineTest)
# shall be computed
custom_perm_matrix <- bootMatrix(PhyloExpressionSetExample,100)
#> 
#> [ Number of Eigen threads that are employed on your machine: 12 ]
#> 
#> [ Computing age assignment permutations for test statistic ... ]
#> 
[=========================================] 100%   
#> [ Computing variances of permuted transcriptome signatures ... ]
#> 

FlatLineTest(PhyloExpressionSetExample,
             custom.perm.matrix = custom_perm_matrix)
#> 
#> -> We recommended using at least 20000 permutations to achieve a sufficient permutation test.
#> $p.value
#> [1] 1.02452e-28
#> 
#> $std.dev
#>     Zygote   Quadrant   Globular      Heart    Torpedo       Bent     Mature 
#> 0.05392123 0.05238076 0.05029294 0.04977498 0.04796508 0.04937687 0.06023252 
#> 
#> $ks.test
#> 
#> 	Exact one-sample Kolmogorov-Smirnov test
#> 
#> data:  filtered_vars
#> D = 0.063046, p-value = 0.9044
#> alternative hypothesis: two-sided
#> 
#>