This function transforms the gene expression set stored in an input PhloExpressionSet or DivergenceExpressionSet object and returns a PhloExpressionSet or DivergenceExpressionSet object with transformed expression levels. The resulting transformed PhloExpressionSet or DivergenceExpressionSet object can then be used for subsequent analyses based on transformed expression levels.
tf(ExpressionSet, FUN, pseudocount = 0, integerise = FALSE)
a standard PhloExpressionSet or DivergenceExpressionSet object.
any valid function that transforms gene expression levels.
a numeric value to be added to the expression matrix prior to transformation.
a boolean specifying whether the expression data should be rounded to the nearest integer.
a standard PhloExpressionSet or DivergenceExpressionSet object storing transformed gene expression levels.
Motivated by the dicussion raised by Piasecka et al., 2013, the influence of
gene expression transformation on the global phylotranscriptomics pattern does not seem negligible.
Hence, different transformations can result in qualitatively different TAI
or TDI
patterns.
Initially, the TAI
and TDI
formulas were defined for absolute expression levels.
So using the initial TAI
and TDI
formulas with transformed expression levels
might turn out in qualitatively different patterns when compared with non-transformed expression levels,
but might also belong to a different class of models, since different valid expression level transformation functions result in different patterns.
The purpose of this function is to allow the user to study the qualitative impact of different transformation functions on
the global TAI
and TDI
pattern, or on any subsequent phylotranscriptomics analysis.
The examples using the PhyloExpressionSetExample data set show that using common gene expression
transformation functions: log2
(Quackenbush, 2001 and 2002), sqrt
(Yeung et al., 2001),
boxcox
, or inverse hyperbolic sine transformation, each transformation results
in qualitatively different patterns.
Nevertheless, for each resulting pattern the statistical significance can be tested
using either the FlatLineTest
or ReductiveHourglassTest
(Drost et al., 2014)
to quantify the significance of interest.
Piasecka B, Lichocki P, Moretti S, et al. (2013) The hourglass and the early conservation models--co-existing patterns of developmental constraints in vertebrates. PLoS Genet. 9(4): e1003476.
Quint M., Drost H.G., Gabel A., Ullrich K.K., Boenn M., Grosse I. (2012) A transcriptomic hourglass in plant embryogenesis. Nature 490: 98-101.
Domazet-Loso T., Tautz D. (2010) A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature 468: 815-8.
Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012
K.Y. Yeung et al.: Model-based clustering and data transformations for gene expression data. Bioinformatics 2001, 17:977-987
K.Y. Yeung et al.: Supplement to Model-based clustering and data transformations for gene expression data - Data Transformations and the Gaussian mixture assumption. Bioinformatics 2001, 17:977-987
P.A.C. Hoen et al.: Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Research 2008, Vol. 36, No. 21
H.H. Thygesen et al.: Comparing transformation methods for DNA microarray data. BMC Bioinformatics 2004, 5:77
John Quackenbush: Microarray data normalization and transformation. Nature Genetics 2002, 32:496-501
John Quackenbush: Computational Analysis of Microarray Data. Nature Reviews 2001, 2:418-427
R. Nadon and J. Shoemaker: Statistical issues with microarrays: processing and analysis. TRENDS in Genetics 2002, Vol. 18 No. 5:265-271
B.P. Durbin et al.: A variance-stabilizing transformation for gene-expression microarray data. Bioinformatics 2002, 18:S105-S110
J. M. Bland et al.: Transforming data. BMJ 1996, 312:770
John B. Burbidge, Lonnie Magee and A. Leslie Robb (1988) Alternative Transformations to Handle Extreme Values of the Dependent Variable. Journal of the American Statistical Association, 83(401): 123-127.
G. E. P. Box and D. R. Cox (1964) An Analysis of Transformations. Journal of the Royal Statistical Society. Series B (Methodological), 26(2): 211-252.
data(PhyloExpressionSetExample)
# a simple example is to transform the gene expression levels
# of a given PhyloExpressionSet using a sqrt or log2 transformation
PES.sqrt <- tf(PhyloExpressionSetExample, sqrt)
PES.log2 <- tf(PhyloExpressionSetExample, log2)
# in case a given PhyloExpressionSet already stores gene expression levels
# that are log2 transformed and need to be re-transformed to absolute
# expression levels, to perform subsequent phylotranscriptomics analyses
# (that are defined for absolute expression levels), one can re-transform
# a PhyloExpressionSet like this:
PES.absolute <- tf(PES.log2 , function(x) 2^x)
# which should be the same as PhyloExpressionSetExample :
head(PhyloExpressionSetExample)
#> Phylostratum GeneID Zygote Quadrant Globular Heart
#> 1 1 at1g01040.2 2173.6352 1911.2001 1152.5553 1291.4224
#> 2 1 at1g01050.1 1501.0141 1817.3086 1665.3089 1564.7612
#> 3 1 at1g01070.1 1212.7927 1233.0023 939.2000 929.6195
#> 4 1 at1g01080.2 1016.9203 936.3837 1181.3381 1329.4734
#> 5 1 at1g01090.1 11424.5667 16778.1685 34366.6493 39775.6405
#> 6 1 at1g01120.1 844.0414 787.5929 859.6267 931.6180
#> Torpedo Bent Mature
#> 1 1000.2529 962.9772 1696.4274
#> 2 1496.3207 1114.6435 1071.6555
#> 3 864.2180 877.2060 894.8189
#> 4 1392.6429 1287.9746 861.2605
#> 5 56231.5689 66980.3673 7772.5617
#> 6 942.8453 870.2625 792.7542
head(PES.absolute)
#> Phylostratum GeneID Zygote Quadrant Globular Heart
#> 1 1 at1g01040.2 2173.6352 1911.2001 1152.5553 1291.4224
#> 2 1 at1g01050.1 1501.0141 1817.3086 1665.3089 1564.7612
#> 3 1 at1g01070.1 1212.7927 1233.0023 939.2000 929.6195
#> 4 1 at1g01080.2 1016.9203 936.3837 1181.3381 1329.4734
#> 5 1 at1g01090.1 11424.5667 16778.1685 34366.6493 39775.6405
#> 6 1 at1g01120.1 844.0414 787.5929 859.6267 931.6180
#> Torpedo Bent Mature
#> 1 1000.2529 962.9772 1696.4274
#> 2 1496.3207 1114.6435 1071.6555
#> 3 864.2180 877.2060 894.8189
#> 4 1392.6429 1287.9746 861.2605
#> 5 56231.5689 66980.3673 7772.5617
#> 6 942.8453 870.2625 792.7542
# plotting the TAI using log2 transformed expression levels
# and performing the Flat Line Test to obtain the p-value
PlotPattern(ExpressionSet = tf(PhyloExpressionSetExample, log2),
type = "l",
lwd = 5,
TestStatistic = "FlatLineTest")
#>
#> Total runtime of your permutation test: 0.413 seconds.
#>
#> -> We recommended using at least 20000 permutations to achieve a sufficient permutation test.
# for square-transformation (not square-root), we recommend using
# function(x) x*x rather than function(x) x^2, due to speed.
PES.absolute <- tf(PES.sqrt , function(x) x*x)
# and the result should be the same as PhyloExpressionSetExample :
head(PhyloExpressionSetExample)
#> Phylostratum GeneID Zygote Quadrant Globular Heart
#> 1 1 at1g01040.2 2173.6352 1911.2001 1152.5553 1291.4224
#> 2 1 at1g01050.1 1501.0141 1817.3086 1665.3089 1564.7612
#> 3 1 at1g01070.1 1212.7927 1233.0023 939.2000 929.6195
#> 4 1 at1g01080.2 1016.9203 936.3837 1181.3381 1329.4734
#> 5 1 at1g01090.1 11424.5667 16778.1685 34366.6493 39775.6405
#> 6 1 at1g01120.1 844.0414 787.5929 859.6267 931.6180
#> Torpedo Bent Mature
#> 1 1000.2529 962.9772 1696.4274
#> 2 1496.3207 1114.6435 1071.6555
#> 3 864.2180 877.2060 894.8189
#> 4 1392.6429 1287.9746 861.2605
#> 5 56231.5689 66980.3673 7772.5617
#> 6 942.8453 870.2625 792.7542
head(PES.absolute)
#> Phylostratum GeneID Zygote Quadrant Globular Heart
#> 1 1 at1g01040.2 2173.6352 1911.2001 1152.5553 1291.4224
#> 2 1 at1g01050.1 1501.0141 1817.3086 1665.3089 1564.7612
#> 3 1 at1g01070.1 1212.7927 1233.0023 939.2000 929.6195
#> 4 1 at1g01080.2 1016.9203 936.3837 1181.3381 1329.4734
#> 5 1 at1g01090.1 11424.5667 16778.1685 34366.6493 39775.6405
#> 6 1 at1g01120.1 844.0414 787.5929 859.6267 931.6180
#> Torpedo Bent Mature
#> 1 1000.2529 962.9772 1696.4274
#> 2 1496.3207 1114.6435 1071.6555
#> 3 864.2180 877.2060 894.8189
#> 4 1392.6429 1287.9746 861.2605
#> 5 56231.5689 66980.3673 7772.5617
#> 6 942.8453 870.2625 792.7542
# in case the expression matrix contains 0s, a pseudocount can be added prior
# to certain transformations, e.g. log2(x+1) where 1 is the pseudocount.
PhyloExpressionSetExample[4,3] = 0
PES.log2 <- tf(PhyloExpressionSetExample, log2, pseudocount = 0)
# this should return -Inf at PES.log2[4,3] the issue here is that
# -Inf cannot be used to compute the phylotranscriptomic profile.
PES.log2 <- tf(PhyloExpressionSetExample, log2, pseudocount = 1)
# log2 transformed expression levels can now be used in downstream analyses.
# to perform rank transformation
PES.rank <- tf(PhyloExpressionSetExample, FUN = function(x) apply(x, 2, base::rank))
# rlog and vst transformations are now also possible by loading the DESeq2 package
# and transforming the data with the parameter integerise = TRUE.
library(DESeq2) # make sure the DESeq2 version >= 1.29.15 for rlog
#> Error in library(DESeq2): there is no package called ‘DESeq2’
PES.vst <- tf(PhyloExpressionSetExample, vst, integerise = TRUE)
#> Error in eval(expr, envir, enclos): object 'vst' not found