Run cds to cds BLAST searches to detect homologous cds sequences in a set of subject cds files.

detect_homologs_cds_to_cds(
  query,
  subject_cds,
  task = "blastn",
  blast_output_path = "blast_output",
  min_alig_length = 60,
  evalue = 1e-05,
  max.target.seqs = 5000,
  cores = 1,
  update = FALSE,
  ...
)

Arguments

query

path to input file in fasta format.

subject_cds

a character vector containing paths to subject files in fasta format.

task

nucleotide search task option. Options are:

  • task = "blastn" : Standard nucleotide-nucleotide comparisons (default) - Traditional BLASTN requiring an exact match of 11.

  • task = "blastn-short" : Optimized nucleotide-nucleotide comparisons for query sequences shorter than 50 nucleotides.

  • task = "dc-megablast" : Discontiguous megablast used to find more distant (e.g., interspecies) sequences.

  • task = "megablast" : Traditional megablast used to find very similar (e.g., intraspecies or closely related species) sequences.

  • task = "rmblastn"

blast_output_path

a path to a folder that will be created to store BLAST output tables for each individual query-cds search.

min_alig_length

minimum alignment length that shall be retained in the result dataset. All hit alignments with smaller hit alignment length will be removed automatically.

evalue

Expectation value (E) threshold for saving hits (default: evalue = 1E-5).

max.target.seqs

maximum number of aligned sequences that shall be kept. Default is max.target.seqs = 500.

cores

number of cores for parallel BLAST searches.

update

a logical value indicating whether or not pre-computed BLAST tables should be removed and re-computed (update = TRUE) or imported from existing file (update = FALSE) (Default).

...

additional arguments passed to blast_nucleotide_to_nucleotide.

Author

Hajk-Georg Drost