Statistical assessment of motif enrichments in a set of non-random versus randomly sampled gene promotor sequences for multiple species

Compare the number of motifs in a set of non-random versus randomly sampled gene promotor sequences within a set of subject genomes. The resulting values are then used to statistically assess the enrichment of certain motifs in real sequences compared to randomly sampled gene promotor sequences.

motif_enrichment_multi_promotor_seqs(
  blast_tbl,
  subject_genomes,
  annotation_files,
  annotation_format = "gff",
  test = "fisher",
  alternative = "two.sided",
  interval_width,
  motifs,
  max.mismatch = 0,
  min.mismatch = 0,
  ...
)

Arguments

blast_tbl

a blast_table.

subject_genomes

a character vector storing the file paths to the subject genomes that shall be used as subject references.

annotation_files

a character vector storing the file paths to the subject annotation files in .gff format that match the subject genomes.

annotation_format

the annotation format. Options are:

annotation_format = "gff"

test

test = "fisher": Fisher's Exact Test for Count Data (see link[stats]{fisher.test} for details).

alternative

indicates the alternative hypothesis and must be one of "two.sided", "greater" or "less". You can specify just the initial letter. Only used in the 2 by 2 case.

interval_width

total number of sequences that shall be sampled per subject genome.

motifs

a character vector storing (case sensitive) motif sequences for which abundance in the sampled sequences shall be assessed.

max.mismatch

maximum number of mismatches that are allowed between the sequence motif and the matching region in the sampled sequence.

min.mismatch

minimum number of mismatches that are allowed between the sequence motif and the matching region in the sampled sequence.

...

additional arguments passed to motif_compare.

Author

Hajk-Georg Drost

Arguments

See also

Author