Compare the number of motifs in a set of non-random versus random promotor sequences. Internally, promotor sequences are extracted upstream from the transcription start site (TSS) and have the length specified in interval_width. The resulting motif count values can then be used to test the enrichment of certain motifs in real sequences compared to randomly drawn gene promotor sequences. Each enrichment analysis is performed for a set of different species or genomes.

motif_compare_multi_promotor_seqs(
  blast_tbl,
  subject_genomes,
  annotation_files,
  annotation_format = "gff",
  interval_width,
  motifs,
  max.mismatch = 0,
  min.mismatch = 0,
  ...
)

Arguments

blast_tbl

a blast_table.

subject_genomes

a character vector storing the file paths to the subject genomes that shall be used as subject references.

annotation_files

a character vector storing the file paths to the subject annotation files in .gff format that match the subject genomes.

annotation_format

the annotation format. Options are:

  • annotation_format = "gff"

interval_width

total number of sequences that shall be sampled per subject genome.

motifs

a character vector storing (case sensitive) motif sequences for which abundance in the sampled sequences shall be assessed.

max.mismatch

maximum number of mismatches that are allowed between the sequence motif and the matching region in the sampled sequence.

min.mismatch

minimum number of mismatches that are allowed between the sequence motif and the matching region in the sampled sequence.

...

additional arguments passed to motif_compare.

Author

Hajk-Georg Drost