Compare the number of motifs in a set of non-random versus random sequences. The resulting values are then tested for enrichment of certain motifs in real sequences compared to random sequences. Several tests statistics and approaches are available to quantify significant motif enrichment.

motif_enrichment(
  real_seqs,
  random_seqs,
  motifs,
  test = "fisher",
  alternative = "less",
  max.mismatch = 0,
  min.mismatch = 0,
  ...
)

Arguments

real_seqs

a file path to the fasta file storing the non-random set of sequences.

random_seqs

a file path to the fasta file storing the random set of sequences, e.g. generated with extract_random_seqs_from_genome.

motifs

a character vector storing a set of motifs that shall be counted within respective sequences.

test

tests statistics and models to quantify significant motif enrichment. Options are:

  • test = "fisher": Fisher's Exact Test for Count Data (see link[stats]{fisher.test} for details).

alternative

indicates the alternative hypothesis and must be one of "two.sided", "greater" or "less". You can specify just the initial letter. Only used in the 2 by 2 case.

max.mismatch

the maximum number of mismatching letters allowed (see matchPattern for details).

min.mismatch

the minimum number of mismatching letters allowed (see vcountPattern for details).

...

additional arguments passed to matchPattern.

Author

Hajk-Georg Drost