Given a genome assembly file and an corresponding annotation file users can retrieve all upstream promotor sequences of all genes from a genome.

extract_promotor_seqs_from_genome(
  annotation_file,
  genome_file,
  promotor_length = 500,
  annotation_format = "gtf",
  file_name = NULL,
  path = NULL,
  update = TRUE
)

Arguments

annotation_file

file path to the annotation file of the genome assembly in gtf or gff format.

genome_file

file path to the genome assembly file.

promotor_length

width of upstream promotors. This is -promotor_width bp from the transcription start site (TSS) of the gene.

annotation_format

format of the annotation file. Options are:

  • annotation_format = "gtf"

  • annotation_format = "gff"

file_name

file path to the output file storing the promotor sequences.

path

a file path to an output folder storing the promotor sequences.

update

shall previously generated promotor sequences be overwritten when again generated for the same genome assembly?

Author

Hajk-Georg Drost