diamond_reciprocal_best_hits.Rd
This function performs a DIAMOND search (best reciprocal hit) of a given set of protein sequences against a given database.
diamond_reciprocal_best_hits( query, subject, is_subject_db = FALSE, format = "fasta", sensitivity_mode = "ultra-sensitive", out_format = "csv", evalue = "1E-5", max_target_seqs = 5000, cores = 1, hard_mask = TRUE, diamond_exec_path = NULL, add_makedb_options = NULL, add_diamond_options = NULL, output_path = NULL )
query | a character string specifying the path to the protein sequence file of interest (query organism). |
---|---|
subject | a character string specifying the path to the protein sequence file of interest (subject organism). |
is_subject_db | logical specifying whether or not the |
format | a character string specifying the file format of the sequence file, e.g. |
sensitivity_mode | specify the level of alignment sensitivity. The higher the sensitivity level, the more deep homologs can be found, but at the cost of reduced computational speed.
|
out_format | a character string specifying the format of the file in which the DIAMOND results shall be stored. Available options are:
|
evalue | Expectation value (E) threshold for saving hits (default: |
max_target_seqs | maximum number of aligned sequences that shall be retained. Please be aware that |
cores | number of cores for parallel DIAMOND searches. |
hard_mask | shall low complexity regions be hard masked with TANTAN? Default is |
diamond_exec_path | a path to the DIAMOND executable or |
add_makedb_options | a character string specifying additional makedb options that shall be passed on to the diamond makedb command line call, e.g. |
add_diamond_options | a character string specifying additional diamond options that shall be passed on to the diamond command line call, e.g. |
output_path | a path to the location were the DIAMOND best hit output shall be stored. E.g. |
A tibble as returned by the diamond_reciprocal_best_hits
function, storing the query_ids
in the first column and the subject_ids
(reciprocal best hit homologs) in the second column.
Given a set of protein sequences (query sequences), a best hit diamond search (DRBH) is being performed.
Hajk-Georg Drost
if (FALSE) { # performing homology inference using the diamond best reciprocal hit (DRBH) method using protein sequences rec_best_hits <- diamond_reciprocal_best_hits( query = system.file('seqs/ortho_thal_aa.fasta', package = 'homologr'), subject = system.file('seqs/ortho_lyra_aa.fasta', package = 'homologr'), seq_type = "protein") # look at results rec_best_hits # store the DIAMOND output file to the current working directory rec_best_hits <- diamond_reciprocal_best_hits( query = system.file('seqs/ortho_thal_aa.fasta', package = 'homologr'), subject = system.file('seqs/ortho_lyra_aa.fasta', package = 'homologr'), seq_type = "protein", output_path = getwd()) # look at results rec_best_hits # run diamond_reciprocal_best_hits() with multiple cores rec_best_hits <- diamond_reciprocal_best_hits( query = system.file('seqs/ortho_thal_aa.fasta', package = 'homologr'), subject = system.file('seqs/ortho_lyra_aa.fasta', package = 'homologr'), cores = 2) # look at results rec_best_hits # performing homology inference using the diamond best hit (DRBH) method and # specifying the path to the DIAMOND executable (here miniconda path) rec_best_hits <- diamond_reciprocal_best_hits( query = system.file('seqs/ortho_thal_aa.fasta', package = 'orthologr'), subject = system.file('seqs/ortho_lyra_aa.fasta', package = 'orthologr'), diamond_exec_path = "/opt/miniconda3/bin/") # look at results rec_best_hits }