This function transforms the gene expression set stored in an input PhloExpressionSet or DivergenceExpressionSet object and returns a PhloExpressionSet or DivergenceExpressionSet object with transformed expression levels. The resulting transformed PhloExpressionSet or DivergenceExpressionSet object can then be used for subsequent analyses based on transformed expression levels.
Arguments
- ExpressionSet
a standard PhloExpressionSet or DivergenceExpressionSet object.
- FUN
any valid function that transforms gene expression levels.
- pseudocount
a numeric value to be added to the expression matrix prior to transformation.
- integerise
a boolean specifying whether the expression data should be rounded to the nearest integer.
Value
a standard PhloExpressionSet or DivergenceExpressionSet object storing transformed gene expression levels.
Details
Motivated by the dicussion raised by Piasecka et al., 2013, the influence of
gene expression transformation on the global phylotranscriptomics pattern does not seem negligible.
Hence, different transformations can result in qualitatively different TAI
or TDI
patterns.
Initially, the TAI
and TDI
formulas were defined for absolute expression levels.
So using the initial TAI
and TDI
formulas with transformed expression levels
might turn out in qualitatively different patterns when compared with non-transformed expression levels,
but might also belong to a different class of models, since different valid expression level transformation functions result in different patterns.
The purpose of this function is to allow the user to study the qualitative impact of different transformation functions on
the global TAI
and TDI
pattern, or on any subsequent phylotranscriptomics analysis.
The examples using the PhyloExpressionSetExample data set show that using common gene expression
transformation functions: log2
(Quackenbush, 2001 and 2002), sqrt
(Yeung et al., 2001),
boxcox
, or inverse hyperbolic sine transformation, each transformation results
in qualitatively different patterns.
Nevertheless, for each resulting pattern the statistical significance can be tested
using either the FlatLineTest
or ReductiveHourglassTest
(Drost et al., 2014)
to quantify the significance of interest.
References
Piasecka B, Lichocki P, Moretti S, et al. (2013) The hourglass and the early conservation models–co-existing patterns of developmental constraints in vertebrates. PLoS Genet. 9(4): e1003476.
Quint M., Drost H.G., Gabel A., Ullrich K.K., Boenn M., Grosse I. (2012) A transcriptomic hourglass in plant embryogenesis. Nature 490: 98-101.
Domazet-Loso T., Tautz D. (2010) A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature 468: 815-8.
Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012
K.Y. Yeung et al.: Model-based clustering and data transformations for gene expression data. Bioinformatics 2001, 17:977-987
K.Y. Yeung et al.: Supplement to Model-based clustering and data transformations for gene expression data - Data Transformations and the Gaussian mixture assumption. Bioinformatics 2001, 17:977-987
P.A.C. Hoen et al.: Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Research 2008, Vol. 36, No. 21
H.H. Thygesen et al.: Comparing transformation methods for DNA microarray data. BMC Bioinformatics 2004, 5:77
John Quackenbush: Microarray data normalization and transformation. Nature Genetics 2002, 32:496-501
John Quackenbush: Computational Analysis of Microarray Data. Nature Reviews 2001, 2:418-427
R. Nadon and J. Shoemaker: Statistical issues with microarrays: processing and analysis. TRENDS in Genetics 2002, Vol. 18 No. 5:265-271
B.P. Durbin et al.: A variance-stabilizing transformation for gene-expression microarray data. Bioinformatics 2002, 18:S105-S110
J. M. Bland et al.: Transforming data. BMJ 1996, 312:770
John B. Burbidge, Lonnie Magee and A. Leslie Robb (1988) Alternative Transformations to Handle Extreme Values of the Dependent Variable. Journal of the American Statistical Association, 83(401): 123-127.
G. E. P. Box and D. R. Cox (1964) An Analysis of Transformations. Journal of the Royal Statistical Society. Series B (Methodological), 26(2): 211-252.
Examples
if (FALSE) { # \dontrun{
data(PhyloExpressionSetExample)
# a simple example is to transform the gene expression levels
# of a given PhyloExpressionSet using a sqrt or log2 transformation
PES.sqrt <- tf(PhyloExpressionSetExample, sqrt)
PES.log2 <- tf(PhyloExpressionSetExample, log2)
# plotting the TAI using log2 transformed expression levels
# and performing the Flat Line Test to obtain the p-value
PlotSignature(ExpressionSet = tf(PhyloExpressionSetExample, log2),
permutations = 1000)
# in case the expression matrix contains 0s, a pseudocount can be added prior
# to certain transformations, e.g. log2(x+1) where 1 is the pseudocount.
PhyloExpressionSetExample[4,3] = 0
PES.log2 <- tf(PhyloExpressionSetExample, log2, pseudocount = 0)
# this should return -Inf at PES.log2[4,3] the issue here is that
# -Inf cannot be used to compute the phylotranscriptomic profile.
PES.log2 <- tf(PhyloExpressionSetExample, log2, pseudocount = 1)
# log2 transformed expression levels can now be used in downstream analyses.
# to perform rank transformation
PES.rank <- tf(PhyloExpressionSetExample, FUN = function(x) apply(x, 2, base::rank))
# rlog and vst transformations are now also possible by loading the DESeq2 package
# and transforming the data with the parameter integerise = TRUE.
library(DESeq2) # make sure the DESeq2 version >= 1.29.15 for rlog
PES.vst <- tf(PhyloExpressionSetExample, vst, integerise = TRUE)
} # }