This function takes a FASTA file containing DNA or amino acid sequences that shall be aligned and computes a multiple alignment using a defined multiple alignment tool.

multi_aln(
  file,
  tool,
  get_aln = FALSE,
  path = NULL,
  multi_aln_name = NULL,
  params = NULL,
  quiet = FALSE,
  clean_folders = FALSE
)

Arguments

file

a character string specifying the path to the file storing the sequences in FASTA format.

tool

a character string specifying the program that should be used: "clustalw", "t_coffee", "muscle", "clustalo", and "mafft".

get_aln

a logical value indicating whether the produced alignment should be returned.

path

a character string specifying the path to the multiple alignment program (in case you don't use the default path).

multi_aln_name

a character string specifying the name of the stored alignment file. Default is multi_aln_name = NULL denoting a default name: 'toolname.aln' .

params

a character string listing the input paramters that shall be passed to corresponding alignment tool specified in the tool argument. Default is NULL, implicating that a set of default parameters is used when running the correponding tool, e.g. clustalw. Example: params = "-PWMATRIX=BLOSUM -TYPE=PROTEIN" for tool = "clustalw.

quiet

a logical value specifying whether the output of the corresponding multiple alignment tool shall be printed out to the console. Default is quiet = FALSE.

clean_folders

a boolean value spefiying whether all internall folders storing the output of used programs shall be removed. Default is clean_folders = FALSE.

Value

In case the argument get_aln is set TRUE, an object of class alignment of the seqinr package is returned.

Details

This function provides an interface between R and common multiple alignment programs such as "clustalw", "t_coffee", "muscle", "clustalo", and "mafft".

  • CLUSTALW :

    Different operating systems perform different execution calls to the clustalw program:

    • MacOS: 'clustalw2'

    • Linux: 'clustalw'

    • Windows: 'clustalw2.exe'

    In case you use the default path to the clustalw program, depending on your operating system, the following calls to clustalw should work properly on your system:

    • MacOS: system("clustalw2 -help")

    • Linux: system("clustalw -help")

    • Windows: system("clustalw2.exe -help")

    In case these procedures don't work properly, please use the path argument to specify the 'clustalw' execution path on your system:

    • MacOS: system("path/to/clustalw/clustalw2 -help")

    • Linux: system("path/to/clustalw/clustalw -help")

    • Windows: system("path/to/clustalw/clustalw2.exe -help")

  • T_COFFEE :

    In case you use the default path to the t_coffee program, the following calls to clustalw should work properly on your system:

    system("t_coffee -version")

    In case this procedures doesn't work properly, please use the path argument to specify the 't_coffee' execution path on your system:

    system("path/to/t_coffee/t_coffee -version")

  • MUSCLE :

    In case you use the default path to the muscle program, the following calls to muscle should work properly on your system:

    system("muscle -help")

    In case this procedures doesn't work properly, please use the path argument to specify the 'muscle' execution path on your system:

    system("path/to/muscle/muscle -help")

  • CLUSTALO :

    In case you use the default path to the clustalo program, the following calls to clustalo should work properly on your system:

    system("clustalo --help")

    In case this procedures doesn't work properly, please use the path argument to specify the 'clustalo' execution path on your system:

    system("path/to/clustalo/clustalo --help")

  • MAFFT :

    In case you use the default path to the mafft program, the following calls to mafft should work properly on your system:

    system("mafft -help")

    In case this procedures doesn't work properly, please use the path argument to specify the 'mafft' execution path on your system:

    system("path/to/mafft/mafft -help")

Note

Note that when using the clustalw.params, ... parameters, make sure the corresponding alignment tool returns a file in clustal format *.aln. This is only important when get_aln = TRUE.

References

Author

Hajk-Georg Drost and Sarah Scharfenberg

Examples

if (FALSE) {

# CLUSTALW Example:

# in case the default execution path of clustalw runs properly on your system
multi_aln(file    = system.file('seqs/aa_seqs.fasta', package = 'orthologr'),
          tool    = "clustalw", 
          get_aln = TRUE)

# in case the default execution path of clustalw is not set within the default path
multi_aln(file    = system.file('seqs/aa_seqs.fasta', package = 'orthologr'), 
          tool    = "clustalw", 
          get_aln = TRUE, 
          path    = "path/to/clustalw/")

# running clustalw using additional parameters
# details: system("clustalw2 -help")
multi_aln(file    = system.file('seqs/aa_seqs.fasta', package = 'orthologr'),
          tool    = "clustalw", 
          get_aln = TRUE, 
          params  = "-PWMATRIX=BLOSUM -TYPE=PROTEIN")
          
          
# T_COFFEE Example:

# in case the default execution path of t_coffee runs properly on your system
multi_aln(file    = system.file('seqs/aa_seqs.fasta', package = 'orthologr'),
          tool    = "t_coffee", 
          get_aln = TRUE)

# in case the default execution path of t_coffee is not set within the default path
multi_aln(file    = system.file('seqs/aa_seqs.fasta', package = 'orthologr'),
          tool    = "t_coffee", 
          get_aln = TRUE, 
          path    = "path/to/t_coffee/")

# running t_coffee using additional parameters
# details: http://www.tcoffee.org/Projects/tcoffee/#DOCUMENTATION
multi_aln(file    = system.file('seqs/aa_seqs.fasta', package = 'orthologr'),
          tool    = "t_coffee", 
          get_aln = TRUE,
          params  = "-mode expresso")
          


# MUSCLE Example:  

# in case the default execution path of muscle runs properly on your system
multi_aln(file    = system.file('seqs/aa_seqs.fasta', package = 'orthologr'),
          tool    = "muscle", 
          get_aln = TRUE)

# in case the default execution path of muscle is not set within the default path
multi_aln(file    = system.file('seqs/aa_seqs.fasta', package = 'orthologr'),
          tool    = "muscle", 
          get_aln = TRUE, 
          path    = "path/to/muscle/")

# running muscle using additional parameters
# details: http://www.drive5.com/muscle/manual/
multi_aln(file    = system.file('seqs/aa_seqs.fasta', package = 'orthologr'),
          tool    = "muscle", 
          get_aln = TRUE,
          params  = "-diags -clwstrict") 
          

# CLUSTALO Example:  

# in case the default execution path of clustalo runs properly on your system
multi_aln(file    = system.file('seqs/aa_seqs.fasta', package = 'orthologr'),
          tool    = "clustalo", 
          get_aln = TRUE)

# in case the default execution path of clustalo is not set within the default path
multi_aln(file    = system.file('seqs/aa_seqs.fasta', package = 'orthologr'),
          tool    = "clustalo", 
          get_aln = TRUE, 
          path    = "path/to/clustalo/")

# running clustalo using additional parameters
multi_aln(file    = system.file('seqs/aa_seqs.fasta', package = 'orthologr'),
          tool    = "clustalo", 
          get_aln = TRUE,
          params  = "--outfmt clu")         
          
          
                                            
# MAFFT Example:  

# in case the default execution path of mafft runs properly on your system
multi_aln(file    = system.file('seqs/aa_seqs.fasta', package = 'orthologr'),
          tool    = "mafft", 
          get_aln = TRUE)

# in case the default execution path of mafft is not set within the default path
multi_aln(file    = system.file('seqs/aa_seqs.fasta', package = 'orthologr'),
          tool    = "mafft", 
          get_aln = TRUE, 
          path    = "path/to/mafft/")

# running mafft using additional parameters
# details: http://www.drive5.com/mafft/manual/
multi_aln(file    = system.file('seqs/aa_seqs.fasta', package = 'orthologr'),
          tool    = "mafft", 
          get_aln = TRUE,
          params  = "--maxiterate 1 --clustalout")         
          
                                                                                  
}