Genome-wide pairwise inference of synonymous versus non-synonymous substitution rates (dnds) across multiple subject organisms

This function allows you to compute dnds maps between a query organism and a set subject organisms stored in the same folder. The corresponding dnds maps are then stored in an output folder.

dnds_across_multiple_species(
  query,
  subjects_folder,
  output_folder,
  format = "fasta",
  ortho_detection = "DIAMOND-RBH",
  delete_corrupt_cds = FALSE,
  store_locally = FALSE,
  sensitivity_mode = "ultra-sensitive",
  out_format = "csv",
  evalue = "1E-5",
  max_target_seqs = 5000,
  cores = 1,
  hard_mask = TRUE,
  diamond_exec_path = NULL,
  add_makedb_options = NULL,
  add_diamond_options = NULL,
  min_qry_coverage_hsp = 50,
  min_qry_perc_identity = 10,
  aa_aln_type = "pairwise",
  aa_aln_tool = "NW",
  codon_aln_tool = "pal2nal",
  dnds_estimation = "Li",
  progress_bar = TRUE,
  diamond_hit_tables = file.path(output_folder, "diamond_hit_tables"),
  sep = ";",
  ...
)

Arguments

query	a character string specifying the path to the CDS file of the query organism.
subjects_folder	a character string specifying the path to the folder where CDS files of the subject organisms are stored.
output_folder	a character string specifying the path to the folder where output dnds maps should be stored.
format	a character string specifying the file format of the sequence file, e.g. `format = "fasta"`, `format = "gbk"`. See `import_proteome` for more details.
ortho_detection	a character string specifying the orthology inference method that shall be performed to detect orthologous genes. Options are: `ortho_detection ="DIAMOND-BH"`: DIAMOND unidirectional best hit. `ortho_detection = "DIAMOND-RBH"`: DIAMOND reciprocal/bidirectional best hit (default).
delete_corrupt_cds	a logical value indicating whether sequences with corrupt base triplets should be removed from the input `file`. This is the case when the length of coding sequences cannot be divided by 3 and thus the coding sequence contains at least one corrupt base triplet.
store_locally	a logical value indicating whether or not alignment files shall be stored locally rather than in `tempdir()`.
sensitivity_mode	specify the level of alignment sensitivity. The higher the sensitivity level, the more deep homologs can be found, but at the cost of reduced computational speed. `sensitivity_mode = "fast"` : fastest alignment mode, but least sensitive (default). Designed for finding hits of >70 `sensitivity_mode = "mid-sensitive"` : fast alignments between the `fast` mode and the sensitive mode in sensitivity. `sensitivity_mode = "sensitive"` : fast alignments, but full sensitivity for hits >40 `sensitivity_mode = "more-sensitive"` : more sensitive than the `sensitive` mode. `sensitivity_mode = "very-sensitive"` : sensitive alignment mode. `sensitivity_mode = "ultra-sensitive"` : most sensitive alignment mode (sensitivity as high as BLASTP).
out_format	a character string specifying the format of the file in which the DIAMOND results shall be stored. Available options are: `out_format = "pair"` : Pairwise `out_format = "xml"` : XML `out_format = "csv"` : Comma-separated file
evalue	a character string specifying the e-value for DIAMOND based orthology inference that is performed in the process of dnds computations. Please use the scientific notation.
max_target_seqs	maximum number of aligned sequences that shall be retained. Please be aware that `max_target_seqs` selects best hits based on the database entry and not by the best e-value. See details here: https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/bty833/5106166 .
cores	number of computing cores that shall be used to perform parallelized computations.
hard_mask	shall low complexity regions be hard masked with TANTAN? Default is `db_hard_mask = TRUE`.
diamond_exec_path	a path to the DIAMOND executable or `conda/miniconda` folder.
add_makedb_options	a character string specifying additional makedb options that shall be passed on to the diamond makedb command line call, e.g. `add_make_options = "--taxonnames"` (Default is `add_diamond_options = NULL`).
add_diamond_options	a character string specifying additional diamond options that shall be passed on to the diamond command line call, e.g. `add_diamond_options = "--block-size 4.0 --compress 1 --no-self-hits"` (Default is `add_diamond_options = NULL`).
min_qry_coverage_hsp	minimum `qcovhsp` (= query coverage of the HSP) of an orthologous hit (a value between 1 and 100).
min_qry_perc_identity	minimum `perc_identity` (= percent sequence identity between query and selected HSP) of an orthologous hit (a value between 1 and 100).
aa_aln_type	a character string specifying the amino acid alignement type: `aa_aln_type = "multiple"` or `aa_aln_type = "pairwise"`. Default is `aa_aln_type = "pairwise"`.
aa_aln_tool	a character string specifying the program that should be used e.g. "clustalw".
codon_aln_tool	a character string specifying the codon alignment tool that shall be used. Default is `codon_aln_tool = "pal2nal"`. Right now only "pal2nal" can be selected as codon alignment tool.
dnds_estimation	a character string specifying the dnds estimation method, e.g. `dnds_estimation = "Li"` (default).
progress_bar	should a progress bar be shown. Default is `progress_bar = TRUE`.
diamond_hit_tables	a file path to a folder where DIAMOND hit tables internally generated with `diamond_best_hits` or `diamond_reciprocal_best_hits` will be stored.
sep	a file separator that is used to store maps as csv file.
...	additional parameters that shall be passed to `dnds`.

Details

Given a query organism and a set of subject organisms that are stored in the same folder, this function crawls through all subject organism and as a first step computes the pairwise dnds Maps between query and subject organism and as a second step stores the corresponding Map in an output folder.

Author

Hajk-Georg Drost

Examples

if (FALSE) {
# running dnds across several species
dnds_across_multiple_species(
   query      = system.file('seqs/ortho_thal_cds.fasta', package = 'homologr'),
   subjects_folder = system.file('seqs/map_gen_example', package = 'homologr'),
   aa_aln_type      = "pairwise",
   aa_aln_tool      = "NW",
   codon_aln_tool   = "pal2nal",
   dnds_estimation  = "Li",
   output_folder   = "homologr_dnds_maps",
   quiet           = TRUE,
   cores      = 1
)

# running dnds across several species using DIAMOND executable
# path '/opt/miniconda3/bin/'
dnds_across_multiple_species(
   query      = system.file('seqs/ortho_thal_cds.fasta', package = 'homologr'),
   subjects_folder = system.file('seqs/map_gen_example', package = 'homologr'),
   diamond_exec_path = "/opt/miniconda3/bin/",
   aa_aln_type      = "pairwise",
   aa_aln_tool      = "NW",
   codon_aln_tool   = "pal2nal",
   dnds_estimation  = "Li",
   output_folder   = "homologr_dnds_maps",
   quiet           = TRUE,
   cores      = 1
)
}

Genome-wide pairwise inference of synonymous versus non-synonymous substitution rates (dnds) across multiple subject organisms

Arguments

Details

See also

Author

Examples