This function allows you to compute dnds maps between a query organism and a set subject organisms stored in the same folder. The corresponding dnds maps are then stored in an output folder.

dnds_across_multiple_species(
  query,
  subjects_folder,
  output_folder,
  format = "fasta",
  ortho_detection = "DIAMOND-RBH",
  delete_corrupt_cds = FALSE,
  store_locally = FALSE,
  sensitivity_mode = "ultra-sensitive",
  out_format = "csv",
  evalue = "1E-5",
  max_target_seqs = 5000,
  cores = 1,
  hard_mask = TRUE,
  diamond_exec_path = NULL,
  add_makedb_options = NULL,
  add_diamond_options = NULL,
  min_qry_coverage_hsp = 50,
  min_qry_perc_identity = 10,
  aa_aln_type = "pairwise",
  aa_aln_tool = "NW",
  codon_aln_tool = "pal2nal",
  dnds_estimation = "Li",
  progress_bar = TRUE,
  diamond_hit_tables = file.path(output_folder, "diamond_hit_tables"),
  sep = ";",
  ...
)

Arguments

query

a character string specifying the path to the CDS file of the query organism.

subjects_folder

a character string specifying the path to the folder where CDS files of the subject organisms are stored.

output_folder

a character string specifying the path to the folder where output dnds maps should be stored.

format

a character string specifying the file format of the sequence file, e.g. format = "fasta", format = "gbk". See import_proteome for more details.

ortho_detection

a character string specifying the orthology inference method that shall be performed to detect orthologous genes. Options are:

  • ortho_detection ="DIAMOND-BH": DIAMOND unidirectional best hit.

  • ortho_detection = "DIAMOND-RBH": DIAMOND reciprocal/bidirectional best hit (default).

delete_corrupt_cds

a logical value indicating whether sequences with corrupt base triplets should be removed from the input file. This is the case when the length of coding sequences cannot be divided by 3 and thus the coding sequence contains at least one corrupt base triplet.

store_locally

a logical value indicating whether or not alignment files shall be stored locally rather than in tempdir().

sensitivity_mode

specify the level of alignment sensitivity. The higher the sensitivity level, the more deep homologs can be found, but at the cost of reduced computational speed.

  • sensitivity_mode = "fast" : fastest alignment mode, but least sensitive (default). Designed for finding hits of >70

  • sensitivity_mode = "mid-sensitive" : fast alignments between the fast mode and the sensitive mode in sensitivity.

  • sensitivity_mode = "sensitive" : fast alignments, but full sensitivity for hits >40

  • sensitivity_mode = "more-sensitive" : more sensitive than the sensitive mode.

  • sensitivity_mode = "very-sensitive" : sensitive alignment mode.

  • sensitivity_mode = "ultra-sensitive" : most sensitive alignment mode (sensitivity as high as BLASTP).

out_format

a character string specifying the format of the file in which the DIAMOND results shall be stored. Available options are:

  • out_format = "pair" : Pairwise

  • out_format = "xml" : XML

  • out_format = "csv" : Comma-separated file

evalue

a character string specifying the e-value for DIAMOND based orthology inference that is performed in the process of dnds computations. Please use the scientific notation.

max_target_seqs

maximum number of aligned sequences that shall be retained. Please be aware that max_target_seqs selects best hits based on the database entry and not by the best e-value. See details here: https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/bty833/5106166 .

cores

number of computing cores that shall be used to perform parallelized computations.

hard_mask

shall low complexity regions be hard masked with TANTAN? Default is db_hard_mask = TRUE.

diamond_exec_path

a path to the DIAMOND executable or conda/miniconda folder.

add_makedb_options

a character string specifying additional makedb options that shall be passed on to the diamond makedb command line call, e.g. add_make_options = "--taxonnames" (Default is add_diamond_options = NULL).

add_diamond_options

a character string specifying additional diamond options that shall be passed on to the diamond command line call, e.g. add_diamond_options = "--block-size 4.0 --compress 1 --no-self-hits" (Default is add_diamond_options = NULL).

min_qry_coverage_hsp

minimum qcovhsp (= query coverage of the HSP) of an orthologous hit (a value between 1 and 100).

min_qry_perc_identity

minimum perc_identity (= percent sequence identity between query and selected HSP) of an orthologous hit (a value between 1 and 100).

aa_aln_type

a character string specifying the amino acid alignement type: aa_aln_type = "multiple" or aa_aln_type = "pairwise". Default is aa_aln_type = "pairwise".

aa_aln_tool

a character string specifying the program that should be used e.g. "clustalw".

codon_aln_tool

a character string specifying the codon alignment tool that shall be used. Default is codon_aln_tool = "pal2nal". Right now only "pal2nal" can be selected as codon alignment tool.

dnds_estimation

a character string specifying the dnds estimation method, e.g. dnds_estimation = "Li" (default).

progress_bar

should a progress bar be shown. Default is progress_bar = TRUE.

diamond_hit_tables

a file path to a folder where DIAMOND hit tables internally generated with diamond_best_hits or diamond_reciprocal_best_hits will be stored.

sep

a file separator that is used to store maps as csv file.

...

additional parameters that shall be passed to dnds.

Details

Given a query organism and a set of subject organisms that are stored in the same folder, this function crawls through all subject organism and as a first step computes the pairwise dnds Maps between query and subject organism and as a second step stores the corresponding Map in an output folder.

See also

Author

Hajk-Georg Drost

Examples

if (FALSE) { # running dnds across several species dnds_across_multiple_species( query = system.file('seqs/ortho_thal_cds.fasta', package = 'homologr'), subjects_folder = system.file('seqs/map_gen_example', package = 'homologr'), aa_aln_type = "pairwise", aa_aln_tool = "NW", codon_aln_tool = "pal2nal", dnds_estimation = "Li", output_folder = "homologr_dnds_maps", quiet = TRUE, cores = 1 ) # running dnds across several species using DIAMOND executable # path '/opt/miniconda3/bin/' dnds_across_multiple_species( query = system.file('seqs/ortho_thal_cds.fasta', package = 'homologr'), subjects_folder = system.file('seqs/map_gen_example', package = 'homologr'), diamond_exec_path = "/opt/miniconda3/bin/", aa_aln_type = "pairwise", aa_aln_tool = "NW", codon_aln_tool = "pal2nal", dnds_estimation = "Li", output_folder = "homologr_dnds_maps", quiet = TRUE, cores = 1 ) }