dnds.Rd
This function takes the CDS files of two organisms of interest (query and subject) and computes the dNdS estimation values for orthologous gene pairs between these organisms.
dnds( query, subject, format = "fasta", ortho_detection = "DIAMOND-RBH", delete_corrupt_cds = FALSE, store_locally = FALSE, sensitivity_mode = "ultra-sensitive", out_format = "csv", evalue = "1E-5", max_target_seqs = 5000, cores = 1, hard_mask = TRUE, diamond_exec_path = NULL, add_makedb_options = NULL, add_diamond_options = NULL, aa_aln_type = "pairwise", aa_aln_tool = "NW", codon_aln_tool = "pal2nal", dnds_estimation = "Li", output_path = NULL, quiet = TRUE, clean_folders = FALSE, print_citation = TRUE )
query | a character string specifying the path to the CDS file of interest (query organism). |
---|---|
subject | a character string specifying the path to the CDS file of interest (subject organism). |
format | a character string specifying the file format of the sequence file, e.g. |
ortho_detection | a character string specifying the orthology inference method that shall be performed to detect orthologous genes. Options are:
|
delete_corrupt_cds | a logical value indicating whether sequences with corrupt base triplets should be removed from the input |
store_locally | a logical value indicating whether or not alignment files shall be stored locally rather than in |
sensitivity_mode | specify the level of alignment sensitivity. The higher the sensitivity level, the more deep homologs can be found, but at the cost of reduced computational speed.
|
out_format | a character string specifying the format of the file in which the DIAMOND results shall be stored. Available options are:
|
evalue | Expectation value (E) threshold for saving hits (default: |
max_target_seqs | maximum number of aligned sequences that shall be retained. Please be aware that |
cores | number of cores for parallel DIAMOND searches. |
hard_mask | shall low complexity regions be hard masked with TANTAN? Default is |
diamond_exec_path | a path to the DIAMOND executable or |
add_makedb_options | a character string specifying additional makedb options that shall be passed on to the diamond makedb command line call, e.g. |
add_diamond_options | a character string specifying additional diamond options that shall be passed on to the diamond command line call, e.g. |
aa_aln_type | a character string specifying the amino acid alignement type:
|
aa_aln_tool | a character string specifying the program that should be used e.g. "NW" (Needleman-Wunsch Global Alignments). |
codon_aln_tool | a character string specifying the codon alignment tool that shall be used. Default is |
dnds_estimation | the dNdS estimation method that shall be used. Options are:
|
output_path | a path to the location were the DIAMOND best hit output shall be stored. E.g. |
quiet | a logical value specifying whether the output of the corresponding alignment tool shall be printed out to the console.
Default is |
clean_folders | a logical value specifying whether all internal folders storing the output of used programs
shall be removed. Default is |
print_citation | a logical value indicating whether or not the citation message shall be printed. |
A tibble storing the dnds values of orthologous genes.
The dN/dS ratio quantifies the mode and strength of selection acting on a pair of orthologous genes. This selection pressure can be quantified by comparing synonymous substitution rates (dS) that are assumed to be neutral with nonsynonymous substitution rates (dN), which are exposed to selection as they change the amino acid composition of a protein (Mugal et al., 2013 http://mbe.oxfordjournals.org/content/31/1/212).
The homologr package provides the dnds
function to perform dNdS estimation on pairs of orthologous genes.
This function takes the CDS files of two organisms of interest (query
and subject
)
and computes the dN/dS estimation values for orthologous gene pairs between these organisms.
The following pipeline resembles the dN/dS estimation process:
1) Orthology Inference: e.g. DIAMOND reciprocal best hit (DIAMOND-RBH)
2) Pairwise sequence alignment: e.g. global pairwise alignments with Needleman-Wunsch
3) Codon Alignment: e.g. pal2nal program
4) dNdS estimation: e.g. Li's method
Note: it is assumed that when using dnds()
all corresponding multiple sequence alignment programs you
want to use are already installed on your machine and are executable via either
the default execution PATH
or you specifically define the location of the executable file
via the aa_aln_path
or blast_path
argument that can be passed to dnds()
.
The dnds()
function can be used choosing the following options:
ortho_detection
:
"DIAMOND-RBH"
(DIAMOND best reciprocal hit)
"DIAMOND-BH"
(DIAMOND best hit)
aa_aln_type
:
"pairwise"
aa_aln_tool
:
"NW"
(in case aa_aln_type = "pairwise"
)
codon_aln_tool
:
"pal2nal"
dnds_estimation
:
"Li" : Li's method
seqinr: http://seqinr.r-forge.r-project.org/
Li, W.-H., Wu, C.-I., Luo, C.-C. (1985) A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol, 2:150-174
Li, W.-H. (1993) Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J. Mol. Evol., 36:96-99.
Pamilo, P., Bianchi, N.O. (1993) Evolution of the Zfx and Zfy genes: Rates and interdependence between genes. Mol. Biol. Evol, 10:271-281
Hajk-Georg Drost
if (FALSE) { # get a dnds table using: # 1) reciprocal best hit for orthology inference (DIAMOND-RBH) # 2) Needleman-Wunsch for pairwise amino acid alignments # 3) pal2nal for codon alignments # 4) Li for dNdS estimation # 5) single core processing 'cores = 1' result <- dnds(quer = system.file('seqs/ortho_thal_cds.fasta', package = 'homologr'), subject = system.file('seqs/ortho_lyra_cds.fasta', package = 'homologr'), ortho_detection = "DIAMOND-RBH", aa_aln_type = "pairwise", aa_aln_tool = "NW", codon_aln_tool = "pal2nal", dnds_estimation = "Li", cores = 1 ) # look at output result # specify the path to the DIAMOND executable if it is not in the system path # using the 'diamond_exec_path' argument, e.g. diamond_exec_path = "/opt/miniconda3/bin/" result <- dnds(query = system.file('seqs/ortho_thal_cds.fasta', package = 'homologr'), subject = system.file('seqs/ortho_lyra_cds.fasta', package = 'homologr'), diamond_exec_path = "/opt/miniconda3/bin/", ortho_detection = "DIAMOND-RBH", aa_aln_type = "pairwise", aa_aln_tool = "NW", codon_aln_tool = "pal2nal", dnds_estimation = "Li", cores = 1 ) # look at output result }