Plot the Phylostratum or Divergence Stratum Enrichment of a given Gene Set
Source:R/PlotEnrichment.R
PlotEnrichment.Rd
This function computes and visualizes the significance of enriched (over or underrepresented) Phylostrata or Divergence Strata within an input test.set
.
Usage
PlotEnrichment(
ExpressionSet,
test.set,
use.only.map = FALSE,
measure = "log-foldchange",
complete.bg = TRUE,
legendName = "",
over.col = "steelblue",
under.col = "midnightblue",
epsilon = 1e-05,
cex.legend = 1,
cex.asterisk = 1,
plot.bars = TRUE,
p.adjust.method = NULL,
...
)
Arguments
- ExpressionSet
a standard PhyloExpressionSet or DivergenceExpressionSet object (in case
only.map = FALSE
).- test.set
a character vector storing the gene ids for which PS/DS enrichment analyses should be performed.
- use.only.map
a logical value indicating whether instead of a standard
ExpressionSet
only aPhylostratigraphic Map
orDivergene Map
is passed to this function.- measure
a character string specifying the measure that should be used to quantify over and under representation of PS/DS. Measures can either be
measure = "foldchange"
(odds) ormeasure = "log-foldchange"
(log-odds).- complete.bg
a logical value indicating whether the entire background set of the input
ExpressionSet
should be considered when performing Fisher's exact test (complete.bg = TRUE
) or whether genes that are stored intest.set
should be excluded from the background set before performing Fisher's exact test (complete.bg = FALSE
).- legendName
a character string specifying whether "PS" or "DS" are used to compute relative expression profiles.
- over.col
color of the overrepresentation bars.
- under.col
color of the underrepresentation bars.
- epsilon
a small value to shift values by epsilon to avoid log(0) computations.
- cex.legend
the
cex
value for the legend.- cex.asterisk
the
cex
value for the asterisk.- plot.bars
a logical value specifying whether or not bars should be visualized or whether only
p.values
andenrichment.matrix
should be returned.- p.adjust.method
correction method to adjust p-values for multiple comparisons (see
p.adjust
for possible methods). E.g.,p.adjust.method = "BH"
(Benjamini & Hochberg (1995)) orp.adjust.method = "bonferroni"
(Bonferroni correction).- ...
default graphics parameters.
Details
This Phylostratum or Divergence Stratum enrichment analysis is motivated by Sestak and Domazet-Loso (2015) who perform Phylostratum or Divergence Stratum enrichment analyses to correlate organ evolution with the origin of organ specific genes.
In detail this function takes the Phylostratum or Divergence Stratum distribution of all genes stored in the input ExpressionSet
as background set and
the Phylostratum or Divergence Stratum distribution of the test.set
and performes a fisher.test
for each Phylostratum or Divergence Stratum to quantify the statistical significance of over- or underrepresentated Phylostrata or Divergence Strata within the set of selected test.set
genes.
To visualize the odds or log-odds of over or underrepresented genes within the test.set
the following procedure is performed:
N_ij denotes the number of genes in group j and deriving from PS i, with i = 1, .. , n and where j = 1 denotes the background set and j = 2 denotes the
test.set
N_i. denotes the total number of genes within PS i
N_.j denotes the total number of genes within group j
N_.. is the total number of genes within all groups j and all PS i
f_ij = N_ij / N_.. and g_ij = f_ij / f_.j denote relative frequencies between groups
f_i. denotes the between group sum of f_ij
The result is the fold-change value (odds) denoted as C = g_i2 / f_i. which is visualized above and below zero.
In case a large number of Phylostrata or Divergence Strata is included in the input
ExpressionSet
, p-values returned by PlotEnrichment
should be adjusted for
multiple comparisons which can be done by specifying the p.adjust.method
argument.
References
Sestak and Domazet-Loso (2015). Phylostratigraphic Profiles in Zebrafish Uncover Chordate Origins of the Vertebrate Brain. Mol. Biol. Evol. 32(2): 299-312.
Examples
data(PhyloExpressionSetExample)
set.seed(123)
test_set <- sample(PhyloExpressionSetExample[ , 2],10000)
## Examples with complete.bg = TRUE
## Hence: the entire background set of the input ExpressionSet is considered
## when performing Fisher's exact test
# measure: log-foldchange
PlotEnrichment(ExpressionSet = PhyloExpressionSetExample,
test.set = test_set ,
legendName = "PS",
measure = "log-foldchange")
#> $p.values
#> PS1 PS2 PS3 PS4 PS5 PS6 PS7 PS8
#> 0.4693540 0.2894449 0.9407912 0.7844042 0.6941342 0.3376881 0.7599214 0.3534216
#> PS9 PS10 PS11 PS12
#> 0.2070409 0.5247578 0.8226034 0.4035404
#>
#> $enrichment.matrix
#> BG_Set Test_Set
#> PS1 -0.007173974 0.010879188
#> PS2 0.013314604 -0.020557911
#> PS3 0.004816575 -0.007381227
#> PS4 -0.006219847 0.009440103
#> PS5 0.016601366 -0.025707696
#> PS6 -0.027725930 0.041308721
#> PS7 0.022156857 -0.034481130
#> PS8 -0.068477640 0.098607672
#> PS9 -0.081220293 0.115747944
#> PS10 0.024420954 -0.038081789
#> PS11 -0.020107047 0.030153208
#> PS12 0.022430353 -0.034915301
#>
# measure: foldchange
PlotEnrichment(ExpressionSet = PhyloExpressionSetExample,
test.set = test_set ,
legendName = "PS",
measure = "foldchange")
#> $p.values
#> PS1 PS2 PS3 PS4 PS5 PS6 PS7 PS8
#> 0.4693540 0.2894449 0.9407912 0.7844042 0.6941342 0.3376881 0.7599214 0.3534216
#> PS9 PS10 PS11 PS12
#> 0.2070409 0.5247578 0.8226034 0.4035404
#>
#> $enrichment.matrix
#> BG_Set Test_Set
#> PS1 0.9950396 1.007570
#> PS2 1.0092721 -1.014352
#> PS3 1.0033453 -1.005131
#> PS4 0.9956976 1.006565
#> PS5 1.0115771 -1.017984
#> PS6 0.9809623 1.029052
#> PS7 1.0154904 -1.024211
#> PS8 0.9536013 1.070804
#> PS9 0.9452185 1.083597
#> PS10 1.0170758 -1.026755
#> PS11 0.9861409 1.021149
#> PS12 1.0156713 -1.024500
#>
## Examples with complete.bg = FALSE
## Hence: the test.set genes are excluded from the background set before
## Fisher's exact test is performed
# measure: log-foldchange
PlotEnrichment(ExpressionSet = PhyloExpressionSetExample,
test.set = test_set ,
complete.bg = FALSE,
legendName = "PS",
measure = "log-foldchange")
#> $p.values
#> PS1 PS2 PS3 PS4 PS5 PS6 PS7 PS8
#> 0.6315000 0.4817838 0.9727738 0.8469003 0.7924055 0.5264504 0.8657001 0.5361027
#> PS9 PS10 PS11 PS12
#> 0.3975804 0.6883696 0.8910157 0.5916625
#>
#> $enrichment.matrix
#> BG_Set Test_Set
#> PS1 -0.003093762 0.007785426
#> PS2 0.005800680 -0.014757232
#> PS3 0.002089538 -0.005291689
#> PS4 -0.002683565 0.006756538
#> PS5 0.007244479 -0.018463217
#> PS6 -0.011836114 0.029472607
#> PS7 0.009695676 -0.024785453
#> PS8 -0.028657241 0.069950431
#> PS9 -0.033781176 0.081966769
#> PS10 0.010698553 -0.027383236
#> PS11 -0.008615890 0.021537317
#> PS12 0.009816700 -0.025098601
#>
# measure: foldchange
PlotEnrichment(ExpressionSet = PhyloExpressionSetExample,
test.set = test_set ,
complete.bg = FALSE,
legendName = "PS",
measure = "foldchange")
#> $p.values
#> PS1 PS2 PS3 PS4 PS5 PS6 PS7 PS8
#> 0.6315000 0.4817838 0.9727738 0.8469003 0.7924055 0.5264504 0.8657001 0.5361027
#> PS9 PS10 PS11 PS12
#> 0.3975804 0.6883696 0.8910157 0.5916625
#>
#> $enrichment.matrix
#> BG_Set Test_Set
#> PS1 0.9978578 1.005411
#> PS2 1.0040290 -1.010282
#> PS3 1.0014499 -1.003676
#> PS4 0.9981414 1.004695
#> PS5 1.0050356 -1.012884
#> PS6 0.9918281 1.020642
#> PS7 1.0067492 -1.017344
#> PS8 0.9803147 1.049725
#> PS9 0.9768405 1.058501
#> PS10 1.0074452 -1.019167
#> PS11 0.9940378 1.015061
#> PS12 1.0068286 -1.017552
#>