Run nucleotide to protein BLAST of reference sequences against a blast-able database or fasta file. Internally BLAST translates the nucleotide sequence into a protein sequence and then searches for hits.

  strand = "both",
  output.path = NULL,
  is.subject.db = FALSE,
  task = "blastx",
  db.import = FALSE,
  postgres.user = NULL,
  evalue = 0.001,
  out.format = "csv",
  cores = 1, = 10000,
  db.soft.mask = FALSE,
  db.hard.mask = FALSE,
  blast.path = NULL



path to input file in fasta format.


path to subject file in fasta format or blast-able database.


Query DNA strand(s) to search against database/subject. Options are:

  • strand = "both" : query against both DNA strands.

  • strand = "minus" : query against minus DNA strand.

  • strand = "plus" : query against plus DNA strand.


path to folder at which BLAST output table shall be stored. Default is output.path = NULL (hence getwd() is used).


logical specifying whether or not the subject file is a file in fasta format (is.subject.db = FALSE; default) or a fasta file that was previously converted into a blast-able database using makeblastdb (is.subject.db = TRUE).


nucleotide search task option. Options are:

  • task = "blastx" : Standard nucleotide-protein comparisons (default).

  • task = "blastx-fast" : Optimized nucleotide-protein comparisons.


shall the BLAST output be stored in a PostgresSQL database and shall a connection be established to this database? Default is db.import = FALSE. In case users wish to to only generate a BLAST output file without importing it to the current R session they can specify db.import = NULL.


when db.import = TRUE and out.format = "postgres" is selected, the BLAST output is imported and stored in a PostgresSQL database. In that case, users need to have PostgresSQL installed and initialized on their system. Please consult the Installation Vignette for details.


Expectation value (E) threshold for saving hits (default: evalue = 0.001).


a character string specifying the format of the file in which the BLAST results shall be stored. Available options are:

  • out.format = "pair" : Pairwise

  • out.format = "qa.ident" : Query-anchored showing identities

  • out.format = "qa.nonident" : Query-anchored no identities

  • out.format = "fq.ident" : Flat query-anchored showing identities

  • out.format = "fq.nonident" : Flat query-anchored no identities

  • out.format = "xml" : XML

  • out.format = "tab" : Tabular separated file

  • out.format = "tab.comment" : Tabular separated file with comment lines

  • out.format = "ASN.1.text" : Seqalign (Text ASN.1)

  • out.format = "ASN.1.binary" : Seqalign (Binary ASN.1)

  • out.format = "csv" : Comma-separated values

  • out.format = "ASN.1" : BLAST archive (ASN.1)

  • out.format = "json.seq.aln" : Seqalign (JSON)

  • out.format = "json.blast.multi" : Multiple-file BLAST JSON

  • out.format = "xml2.blast.multi" : Multiple-file BLAST XML2

  • out.format = "json.blast.single" : Single-file BLAST JSON

  • out.format = "xml2.blast.single" : Single-file BLAST XML2

  • out.format = "SAM" : Sequence Alignment/Map (SAM)

  • out.format = "report" : Organism Report


number of cores for parallel BLAST searches.

maximum number of aligned sequences that shall be retained. Please be aware that selects best hits based on the database entry and not by the best e-value. See details here: .


shall low complexity regions be soft masked? Default is db.soft.mask = FALSE.


shall low complexity regions be hard masked? Default is db.hard.mask = FALSE.


path to BLAST executables.

See also


if (FALSE) { blast_test <- blast_nucleotide_to_protein( query = system.file('seqs/qry_nn.fa', package = 'metablastr'), subject = system.file('seqs/sbj_aa.fa', package = 'metablastr'), output.path = tempdir(), db.import = FALSE) # look at results blast_test }