Given an input folder storing dnds tables generated by dnds_across_multiple_species and annotation files stored in an annotation folder for the query (one annotation file) and subject species in gtf or gff file format, this function selects the best DIAMOND hit to represent either a gene locus (e.g. the splice variant of the gene locus with lowest e-value) or the best DIAMOND hit for a splice varaint.

import_dnds_across_multiple_species(
  dnds_output_folder,
  annotation_file_query,
  annotation_folder_subject,
  output_folder,
  output_type = "gene_locus",
  format = c("gtf", "gtf")
)

Arguments

dnds_output_folder

file path to folder storing dnds tables generated with dnds_across_multiple_species and stored conform with import_dnds_tbl.

annotation_file_query

file path to the annotation file of the query species in gtf or gff file format.

annotation_folder_subject

file path to a folder storing the annotation files of the subject species in gtf or gff file format.

output_folder

file path to a folder in which orthologs tables shall be stored.

output_type

type of ortholog table that shall be printed out (or stored in a variable). Available options are:

  • output_type = "gene_locus" (default): find for each gene locus a representative splice variant that maximizes the sequence homology (in terms of smallest e-value and longest splice variant in case of same evalue) to the subject gene locus and its representative splice variant. The output table contains only once representative splice variant per gene locus.

  • output_type = "splice_variant": for each homologous gene locus determine for each splice variant their respective splice variant homolog. he output table contains several splice variants and their homologous splice variants per gene locus.

format

a vector of length 2 storing the annotation file formats of the query annotation file and subject annotation file: either gtf or gff format. E.g. format = c("gtf","gtf").

See also

Author

Hajk-Georg Drost

Examples

if (FALSE) { # running dnds across several species using DIAMOND executable # path '/opt/miniconda3/bin/' dnds_across_multiple_species( query = system.file('seqs/ortho_thal_cds.fasta', package = 'homologr'), subjects_folder = system.file('seqs/map_gen_example', package = 'homologr'), diamond_exec_path = "/opt/miniconda3/bin/", aa_aln_type = "pairwise", aa_aln_tool = "NW", codon_aln_tool = "pal2nal", dnds_estimation = "Li", output_folder = file.path(tempdir(), "homologr_dnds_maps"), quiet = TRUE, cores = 1 ) # Import dnds tables by gene locus and splice varaint for a set of species import_dnds_tables <- import_dnds_across_multiple_species( dnds_output_folder = "homologr_dnds_maps", annotation_file_query = "system.file('seqs/ortho_thal.gtf', package = 'homologr')", annotation_folder_subject = system.file('seqs/plants_subject_files', package = 'homologr'), output_folder = file.path(tempdir(), "homologr_dnds_maps", "orthologs_tables"), output_type = "gene_locus", format = c("gtf", "gtf")) # look at results lapply(import_dnds_tables, head) }