Given an input folder storing dNdS tables generated by dNdS and annotation files stored in an annotation folder for the query (one annotation file) and subject species in gtf or gff file format, this function selects the best BLAST hit to represent either a gene locus (e.g. the splice variant of the gene locus with lowest e-value) or the best BLAST hit for a splice varaint.

generate_ortholog_tables_all(
  dNdS_folder,
  annotation_file_query,
  annotation_folder_subject,
  output_folder,
  output_type = "gene_locus",
  format = c("gtf", "gtf")
)

Arguments

dNdS_folder

file path to folder storing a dNdS tables generated with dNdS and stored conform with read.dnds.tbl.

annotation_file_query

file path to the annotation file of the query species in gtf or gff file format.

annotation_folder_subject

file path to a folder storing the annotation files of the subject species in gtf or gff file format.

output_folder

file path to a folder in which orthologs tables shall be stored.

output_type

type of ortholog table that shall be printed out (or stored in a variable). Available options are:

  • output_type = "gene_locus" (Default): find for each gene locus a representative splice variant that maximizes the sequence homology (in terms of smalles e-value and longest splice variant in case of same evalue) to the subject gene locus and its representative splice variant. The output table contains only once representative splice variant per gene locus.

  • output_type = "splice_variant": for each homologous gene locus determine for each splice variant their respective splice variant homolog. he output table contains several splice variants and their homologous splice variants per gene locus.

format

a vector of length 2 storing the annotation file formats of the query annotation file and subject annotation file: either gtf or gff format. E.g. format = c("gtf","gtf").

Author

Hajk-Georg Drost