Differential Gene Expression Analysis

Detect differentially expressed genes (DEGs) in a standard ExpressionSet object.

Usage

DiffGenes(
  ExpressionSet,
  nrep,
  method = "foldchange",
  p.adjust.method = NULL,
  comparison = NULL,
  alpha = NULL,
  filter.method = NULL,
  n = NULL,
  stage.names = NULL
)

Arguments

ExpressionSet

a standard PhyloExpressionSet or DivergenceExpressionSet object.

nrep

either a numeric value specifying the constant number of replicates per stage or a numeric vector specifying the variable number of replicates for each stage position.

method

method to detect differentially expressed genes.

p.adjust.method

p value correction method that is passed to p.adjust. Available options are:

p.adjust.method = "BH" (Benjamini-Hochberg correction)
p.adjust.method = "bonferroni" (Bonferroni correction)
p.adjust.method = "holm"
p.adjust.method = "hochberg"
p.adjust.method = "hommel"
p.adjust.method = "BY"
p.adjust.method = "fdr"

If p.adjust.method = NULL (Default) then no p-value correction is performed.

comparison

a character string specifying whether genes having fold-change or p-values below, above, or below AND above (both) the alpha value should be excluded from the dataset. In case comparison = "both" is chosen, the cut.off argument must be a two dimensional vector defining the lower alpha value at the first position and the upper alpha value at the second position.

alpha

a numeric value specifying the cut-off value above which Genes fulfilling the corresponding fold-change, log-fold-change, or p-value should be retained and returned by DiffGenes.

filter.method

a method how to alpha values in multiple stages. Options are "const", "min-set", and "n-set".

n

a numeric value for method = "n-set".

stage.names

a character vector specifying the new names of collapsed stages.

Details

All methods to perform detection of differentially expressed genes assume that your input dataset has been normalized before passing it to DiffGenes. For RNA-Seq data DiffGenes assumes that the libraries have been normalized to have the same size, i.e., to have the same expected column sum under the null hypothesis.

Available methods for the detection of differentially expressed genes:

method = "foldchange": ratio of replicate geometric means between developmental stages. Here, the DiffGenes functions assumes that absolute expression levels are stored in your input ExpresisonSet.
method = "log-foldchange": difference of replicate arithmetic means between developmental stages. Here, the DiffGenes functions assumes that log2a transformed expression levels are stored in your input ExpresisonSet.
method = "t.test": Welch t.test between replicate expression levels of two samples.
method = "wilcox.test": Wilcoxon Rank Sum Test between replicate expression levels of two samples.

Exclude non differentially expressed genes from the result dataset:

When specifying the alpha argument you furthermore, need to specify the filter.method to decide how non differentially expressed genes should be classified in multiple sample comparisons and which genes should be retained in the final dataset returned by DiffGenes. In other words, all genes < alpha based on the following filter.method are removed from the result dataset.

Following extraction criteria are implemented in this function:

const: all genes that have at least one sample comparison that undercuts or exceeds the alpha value cut.off will be excluded from the ExpressionSet. Hence, for a 7 stage ExpressionSet genes passing the alpha threshold in 6 stages will be retained in the ExpressionSet.
min-set: genes passing the alpha value in ceiling(n/2) stages will be retained in the ExpressionSet, where n is the number of stages in the ExpressionSet.
n-set: genes passing the alpha value in n stages will be retained in the ExpressionSet. Here, the argument n needs to be specified.

Note

In case input ExpressionSet objects store 0 values, internally all expression levels are shifted by +1 to allow sufficient fold-change and p-value computations. Additionally, a warning is printed to the console in case expression levels have been automatically shifted.

Author

Hajk-Georg Drost

Examples


data(PhyloExpressionSetExample)

# Detection of DEGs using the fold-change measure
DEGs <- DiffGenes(ExpressionSet = PhyloExpressionSetExample[ ,1:8],
                  nrep          = 2,
                  comparison    = "below",
                  method        = "foldchange",
                  stage.names   = c("S1","S2","S3"))


head(DEGs)
#>   Phylostratum      GeneID    S1->S2    S1->S3    S2->S1    S2->S3    S3->S1
#> 1            1 at1g01040.2 1.6706342 2.0767465 0.5985751 1.2430887 0.4815224
#> 2            1 at1g01050.1 1.0231405 1.2788699 0.9773829 1.2499456 0.7819403
#> 3            1 at1g01070.1 1.3087104 1.4044710 0.7641110 1.0731717 0.7120119
#> 4            1 at1g01080.2 0.7786521 0.7286129 1.2842706 0.9357362 1.3724709
#> 5            1 at1g01090.1 0.3744682 0.2255944 2.6704536 0.6024393 4.4327345
#> 6            1 at1g01120.1 0.9110850 0.9000931 1.0975925 0.9879354 1.1109962
#>      S3->S2
#> 1 0.8044478
#> 2 0.8000348
#> 3 0.9318173
#> 4 1.0686773
#> 5 1.6599182
#> 6 1.0122119


# Detection of DEGs using the log-fold-change measure
# when choosing method = "log-foldchange" it is assumed that
# your input expression matrix stores log2 expression levels 
log.DEGs <- DiffGenes(ExpressionSet = tf(PhyloExpressionSetExample[1:5,1:8],log2),
                      nrep          = 2,
                      comparison    = "below",
                      method        = "log-foldchange",
                      stage.names   = c("S1","S2","S3"))


head(log.DEGs)
#>   Phylostratum      GeneID      S1->S2     S1->S3      S2->S1      S2->S3
#> 1            1 at1g01040.2  0.74039587  1.0543251 -0.74039587  0.31392926
#> 2            1 at1g01050.1  0.03300429  0.3548696 -0.03300429  0.32186526
#> 3            1 at1g01070.1  0.38814592  0.4900268 -0.38814592  0.10188092
#> 4            1 at1g01080.2 -0.36094927 -0.4567755  0.36094927 -0.09582626
#> 5            1 at1g01090.1 -1.41708484 -2.1481970  1.41708484 -0.73111211
#>       S3->S1      S3->S2
#> 1 -1.0543251 -0.31392926
#> 2 -0.3548696 -0.32186526
#> 3 -0.4900268 -0.10188092
#> 4  0.4567755  0.09582626
#> 5  2.1481970  0.73111211


# Remove fold-change values < 2 from the dataset:

## first have a look at the range of fold-change values of all genes 
apply(DEGs[ , 3:8],2,range)
#>          S1->S2      S1->S3     S2->S1      S2->S3     S3->S1      S3->S2
#> [1,]  0.1081031  0.04752967 0.06022388  0.06115996  0.0244683  0.06450549
#> [2,] 16.6047090 40.86920354 9.25042630 15.50255551 21.0394884 16.35056753

# now remove genes undercutting the alpha = 2 threshold
# hence, remove genes having p-values <= 0.05 in at
# least one sample comparison
DEGs.alpha <- DiffGenes(ExpressionSet = PhyloExpressionSetExample[1:250 ,1:8],
                        nrep          = 2,
                        method        = "t.test",
                        alpha         = 0.05,
                        comparison    = "above",
                        filter.method = "n-set",
                        n             = 1,
                        stage.names   = c("S1","S2","S3"))

# now again have a look at the range and find
# that fold-change values of 2 are the min value
apply(DEGs.alpha[ , 3:5],2,range)
#>           S1<->S2     S1<->S3     S2<->S3
#> [1,] 0.0004200748 0.002629415 0.005515936
#> [2,] 0.2405307541 0.046734476 0.375736681

# now check whether each example has at least one stage with a p-value <= 0.05
head(DEGs.alpha)
#>    Phylostratum      GeneID     S1<->S2     S1<->S3    S2<->S3
#> 3             1 at1g01070.1 0.007037106 0.002629415 0.01981887
#> 5             1 at1g01090.1 0.026369854 0.035285481 0.08957681
#> 31            1 at1g01770.1 0.240530754 0.012040556 0.04951844
#> 45            1 at1g02130.1 0.008556501 0.046734476 0.32347638
#> 47            1 at1g02205.3 0.022021429 0.015537681 0.22278162
#> 93            1 at1g03190.1 0.016635354 0.012251604 0.14721584