Overview
In the Introduction vignette users
learned how to perform the most fundamental computations and
visualizations for phylotranscriptomics analyses using the
myTAI
package. Especially in the Introduction
vignette we demonstrated how to perform Phylostratigraphy
and Divergence Stratigraphy
as well as the construction of
PhyloExpressionSets
and
DivergenceExpressionSets
using the MatchMap()
function.
In the Intermediate vignette users
learned about more detailed analyses and more specialized techniques to
investigate the observed phylotranscriptomics patterns
(TAI
, TDI
, RE
, etc.).
Investigating Age or Divergence Category Specific Expression Level Distributions
Gene expression levels are a fundamental aspect of phylotranscriptomics studies. In detail, phylotranscriptomic measures aim to quantify the expression intensity of genes deriving from common age or divergence categories to detect stages of evolutionary constraints. Hence, the gene expression distribution of age or divergence categories as well as their differences within and between stages or categories allow us to investigate the age (PS) or divergence (DS) category specific contribution to the corresponding transcriptome.
For this purpose, the PlotCategoryExpr()
aims to
visualize the expression level distribution of each phylostratum during
each time point or experiment as barplot, dot plot, or violin plot
enabling users to quantify the age (PS) or divergence (DS) category
specific contribution to the corresponding transcriptome.
This way of visualizing the gene expression distribution of each age (PS) or divergence (DS) category during all developmental stages or experiments allows users to detect specific age or divergence categories contributing significant levels of gene expression to the underlying biological process (transcriptome).
library(myTAI)
data(PhyloExpressionSetExample)
# category-centered visualization of PS specific expression level distributions (log-scale)
PlotCategoryExpr(ExpressionSet = PhyloExpressionSetExample,
legendName = "PS",
test.stat = TRUE,
type = "category-centered",
distr.type = "boxplot",
log.expr = TRUE)
Zygote Quadrant Globular Heart Torpedo Bent Mature
category-centered "***" "***" "***" "***" "***" "***" "***"
The resulting boxplot illustrates the log expression levels of each
phylostratum during each developmental stage. Additionally, a
Kruskal-Wallis Rank Sum Test as well as a Benjamini & Hochberg
p-value adjustment for multiple comparisons is performed
(test.stat = TRUE
) to statistically quantify the
differences between expression levels of different age or divergence
categories. This type of analysis allows users to detect stages or
experiments that show high deviation between age or divergence category
contributions to the overall transcriptome or no significant deviations
of age or divergence categories, suggesting equal age or divergence
category contributions to the overall transcriptome. The corresponding
P-values are printed to the console using the following notation:
’*’ = P-Value 0.05
’**’ = P-Value 0.005
’***’ = P-Value 0.0005
‘n.s.’ = not significant = P-Value > 0.05
In this case all developmental stages show significant differences in phylostratum specific gene expression.
Please notice that users need to define the
legendName
argument as PS
or DS
to specify whether the input ExpressionSet
is a
PhyloExpressionSet
(legendName = 'PS'
) or
DivergenceExpressionSet
(legendName = 'DS'
).
Alternatively, users can investigate the differences of gene
expression between all stages or experiments for
each age or divergence category by specifying
type = 'stage-centered'
.
library(myTAI)
data(PhyloExpressionSetExample)
# stage-centered visualization of PS specific expression level distributions (log-scale)
PlotCategoryExpr(ExpressionSet = PhyloExpressionSetExample,
legendName = "PS",
test.stat = TRUE,
type = "stage-centered",
distr.type = "boxplot",
log.expr = TRUE)
PS1 PS2 PS3 PS4 PS5 PS6 PS7 PS8 PS9 PS10 PS11 PS12
stage-centered "***" "***" "n.s." "*" "n.s." "*" "n.s." "n.s." "n.s." "***" "n.s." "***"
Here, the Kruskal-Wallis Rank Sum Test (with Benjamini & Hochberg p-value adjustment) quantifies whether or not the gene expression distribution of a single age or divergence category significantly changes throughout development or experiments. This type of analysis allows users to detect specific age or divergence categories that significantly change their expression levels throughout development or experiments.
In this case, users will observe that PS3,5,7-9,11 do not show significant differences of gene expression between developmental stages suggesting that their contribution to the overall transcriptome remains constant throughout development.
Finally, users can choose the following plot types to visualize expression distributions:
Argument: distr.type
distr.type = "boxplot"
This specification allows users to visualize the expression distribution of all PS or DS as boxplot.distr.type = "violin"
This specification allows users to visualize the expression distribution of all PS or DS as violin plot.distr.type = "dotplot"
This specification allows users to visualize the expression distribution of all PS or DS as dot plot.
Together, studies performed with PlotCategoryExpr()
allow users to conclude that genes originating in specific PS or DS
contribute significantly more to the overall transcriptome than other
genes originating from different PS or DS categories. More specialized
analyses such as PlotMeans()
, PlotRE()
,
PlotBarRE()
, TAI()
, TDI()
, etc.
will then allow them to study the exact mean expression patterns of
these age or divergence categories.
Users will notice that so far all examples shown above specified
log.expr = TRUE
illustrating boxplots based on log2
expression levels. This way of visualization allows better visual
comparisons between age or divergence categories. However, when
specifying log.expr = FALSE
absolute expression levels will
be visualized in the corresponding boxplot.
Alternatively, instead of specifying log.expr = TRUE
users can directly pass log2 transformed expression levels to
PlotCategoryExpr()
via
tf(PhyloExpressionSetExample,log2)
(when
log.expr = FALSE
):
data(PhyloExpressionSetExample)
# category-centered visualization of PS specific expression level distributions (log-scale)
PlotCategoryExpr(ExpressionSet = tf(PhyloExpressionSetExample, log2),
legendName = "PS",
test.stat = TRUE,
type = "category-centered",
distr.type = "boxplot",
log.expr = FALSE)
Zygote Quadrant Globular Heart Torpedo Bent Mature
category-centered "***" "***" "***" "***" "***" "***" "***"
Or any other expression level transformation,
e.g. sqrt
.
data(PhyloExpressionSetExample)
# category-centered visualization of PS specific expression level distributions (sqrt-scale)
PlotCategoryExpr(ExpressionSet = tf(PhyloExpressionSetExample, sqrt),
legendName = "PS",
test.stat = TRUE,
type = "category-centered",
distr.type = "boxplot",
log.expr = FALSE)
Zygote Quadrant Globular Heart Torpedo Bent Mature
category-centered "***" "***" "***" "***" "***" "***" "***"
Gene Subset Age or Divergence Category Specific Expression Level Distributions
In some cases, users wish to visualize the gene expression
distributions for a subset of genes in contrast to the entire
transcriptome. For this purpose, the gene.set
argument
allows users to specify the gene ids of a subset of genes that shall be
matched in the input ExpressionSet
and for which expression
level distributions shall be visualized.
library(myTAI)
data(PhyloExpressionSetExample)
# define an example gene subset (500 genes) which
# can be found in the input ExpressionSet
set.seed(234)
example.gene.set <- PhyloExpressionSetExample[sample(1:25260,500) , 2]
# visualize the gene expression distributions for these 500 genes (category-centered)
PlotCategoryExpr(ExpressionSet = PhyloExpressionSetExample,
legendName = "PS",
test.stat = TRUE,
type = "category-centered",
distr.type = "boxplot",
log.expr = TRUE,
gene.set = example.gene.set)
Zygote Quadrant Globular Heart Torpedo Bent Mature
category-centered "*" "*" "*" "*" "*" "*" "n.s."
Or analogously stage-centered
:
library(myTAI)
data(PhyloExpressionSetExample)
# define an example gene subset (500 genes) which
# can be found in the input ExpressionSet
set.seed(234)
example.gene.set <- PhyloExpressionSetExample[sample(1:25260,500) , 2]
# visualize the gene expression distributions for these 500 genes (stage-centered)
PlotCategoryExpr(ExpressionSet = PhyloExpressionSetExample,
legendName = "PS",
test.stat = TRUE,
type = "stage-centered",
distr.type = "boxplot",
log.expr = TRUE,
gene.set = example.gene.set)
PS1 PS2 PS3 PS4 PS5 PS6 PS7 PS8 PS9 PS10 PS11
stage-centered "n.s." "n.s." "n.s." "n.s." "n.s." "n.s." "n.s." "n.s." "n.s." "n.s." "n.s."
PS12
stage-centered "n.s."
For example, users interested in the enrichment of PS or DS values in
D. rerio brain genes (see Enrichment
Vignette for details) could also visualize their gene expression
distributions throughout development with
PlotCategoryExpr()
in cases where expression data is
available.
Computing the significant differences between gene expression distributions of PS or DS groups
As proposed by Quint et al., 2012 in some cases users wish to compare the difference of group specific expression levels using a statistical test.
For this purpose, the PlotGroupDiffs()
function performs
a test to quantify the statistical significance between the global
expression level distributions of groups of PS or DS. It therefore
allows users to investigate significant groups of PS or DS that
significantly differ in their gene expression level distribution within
specific developmental stages or experiments.
Analogous to the PlotRE()
or PlotMeans()
function (see Introduction for details),
users need to pass the Groups
to
PlotGroupDiffs()
specifying the groups that shall be
compared.
library(myTAI)
data(PhyloExpressionSetExample)
PlotGroupDiffs(ExpressionSet = PhyloExpressionSetExample,
Groups = list(group_1 = 1:3,group_2 = 4:12),
legendName = "PS",
plot.type = "p-vals",
type = "b",
lwd = 6,
xlab = "Ontogeny")
In cases where no plot shall be drawn and only the resulting p-value
shall be returned users can specify the plot.type = NULL
argument to receive only p-values returned by the underlying test
statistic.
library(myTAI)
data(PhyloExpressionSetExample)
# only receive the p-values without the corresponding plot
PlotGroupDiffs(ExpressionSet = PhyloExpressionSetExample,
Groups = list(group_1 = 1:3,group_2 = 4:12),
legendName = "PS",
plot.type = NULL)
Optionally, users can also visualize the difference in expression
level distributions of groups of PS/DS during each developmental stage
by specifying the plot.type = "boxplot"
argument.
library(myTAI)
data(PhyloExpressionSetExample)
# visualize difference as boxplot
PlotGroupDiffs(ExpressionSet = tf(PhyloExpressionSetExample,log2),
Groups = list(group_1 = 1:3,group_2 = 4:12),
legendName = "PS",
plot.type = "boxplot")
Here, we use log2 transformed expression levels for better
visualization (tf(PhyloExpressionSetExample,log2)
).
Internally, the PlotGroupDiffs()
function performs a
Wilcoxon Rank Sum test to quantify the statistical significance of PS/DS
group expression. This quantification allows users to detect
developmental stages of significant expression level differences between
PS/DS groups. In this example we chose genes originated before the
evolution of embryogenesis evolved in plants (Group1 = PS1-3) versus
genes originated after the evolution of embryogenesis evolved in plants
(Group2 = PS4-12). As a result, we observe that indeed the difference in
total gene expression between these groups is significant throughout
embryogenesis. In terms of the P-value quantification we observe that
the P-value is minimized towards the phylotypic period. Hence, the
expression level difference between the studied PS groups is maximized
during the phylotypic period.