Installing Prerequisite Tools

Most orthologr functions are interface functions that pass data to common bioinformatics tools, internally call the corresponding tool, and read their output as R object. For this purpose, when using interface functions in orthologr users need to install the underlying bioinformatics tools to obain accurate results.

The following sections provide step by step instructions or guidance on installing all bioinformatics tools for which R interface functions are implemented in orthologr.

Some tools are not trivial to install, so please read the corresponding sections carefully and execute test cases that are presented in each section.

Programming Languages

The following bioinformatics tools you are going to install are based on the these programming languages:

Please make sure these programming languages are installed and executable on the machines you are going to run orthologr on.

Pairwise Sequence Alignment Tools

The orthologr package provides interfaces to the pairwise alignment tools, BLAST and DIAMOND v2. We recommend the use of DIAMOND v2 as it saves time whilst being as sensitive as BLAST.

Install BLAST

BLAST (= Basic Local Alignment Search Tool) finds regions of similarity between biological sequences and is also used as underlying paradigm of most orthology inference methods.

  1. Go to ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ and download the system specific BLAST program.

  2. Install BLAST :

  • On a Windows machine (see installation manual: Windows) -> Please carefully read the Environment Variables section of the installation manual: Windows and make sure the execution PATH variable is set correctly.
  • On a Unix machine (see installation manual: Unix) -> Please carefully read the Configuration section of the installation manual: Unix and make sure the execution PATH variable is set correctly to usr/local/bin.

For example for Linux systems open the Terminal application and run (Thanks to Alexander Gabel):

# download BLAST+ version 2.2.31
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.31/ncbi-blast-2.2.31+-x64-linux.tar.gz

# extract the compiled version of BLAST
tar zxvpf ncbi-blast+2.2.31+-x64-linux.tar.gz

# copy BLAST files to `usr/local/bin`
cp ncbi-blast-2.2.31+/bin/* usr/local/bin

Alternatively users can set the system call to the BLAST programs by specifying the PATH variable (this is useful, because it allows an easier update of BLAST versions instead of deleting all BLAST programs from usr/local/bin):

# open vim text editor
vi .bash_profile

# type 'Shift' then I to edit the file .bash_profile
# and specify the export PATH
export PATH=${PATH}:/path/to/downloaded/blast/folder/ncbi-blast-2.2.31+/bin

# type 'ESC' then ':' then 'w' then 'q' to save and quit the .bash_profile file 

# log out from your server with
exit

# log in again and type
blastp -version

Now users should see the BLAST command line options.

Some tips

Based on our personal experience the installation of BLAST works best when copy/pasting the BLAST executables to the path usr/local/bin. In detail you can run the following steps to copy/paste the BLAST executables to usr/local/bin (on Unix systems). However, updating BLAST will then need to manually delete all previous BLAST programs from usr/local/bin :

Open the Terminal application on your system and type:

open /usr/local/bin 

Next, copy/paste the blastp, makeblastdb, etc files (BLAST executables) from your BLAST folder to /usr/local/bin. To do so you will need to enter the system password to allow the copy process.

After installing the BLAST program you can open an R session and type the following command to check whether or not BLAST can be executed from R.

# test whether blastp is correctly installed on your machine
system("blastp -version")
blastp: 2.2.31+
Package: blast 2.2.31, build Oct 27 2014 17:10:51

You should see this output if BLAST was installed correctly.

In case you find the following output:

sh: blastp: command not found

You should return to step 2) and install BLAST so that it can be executed from the default execution PATH.

These interface functions to BLAST+ are implemented in orthologr:

  • blast() : Interface function to BLAST+
  • blast_best() : Perform a BLAST+ best hit search
  • blast_rec() : Perform a BLAST+ reciprocal best hit (RBH) search
  • set_blast() : Preparing the parameters and databases for subsequent BLAST+ searches
  • blast.nr() : Perform a BLASTp search against NCBI nr
  • delta.blast() : Perform a DELTA-BLAST Search
  • advanced_blast() : Advanced interface function to BLAST+
  • advanced_makedb() : Advanced interface function to makeblastdb

Install DIAMOND2

DIAMOND2 (= Double Index alignment of Next-generation sequencing data) finds, like BLAST, regions of similarity between biological sequences. Unlike BLAST it is much much faster (up to 10 000X faster in the default fast mode and over 80X faster in the ultra-sensitive mode, which is as sensitive as BLAST). Thus, DIAMOND2 facilitates even faster orthology inference.

  1. Go to the download site in the DIAMOND2 wiki and follow the instructions for installation. DIAMOND2 is supported on Linux, macOS and Windows.

  2. Check the installation of DIAMOND2 by running the command

diamond --version
  1. After installing the DIAMOND2 program you can open an R session and type the following command to check whether or not DIAMOND2 can be executed from R.
# test whether diamond is correctly installed on your machine
system("diamond --version")
diamond version 2.1.8

You should see this output if DIAMOND2 was installed correctly.

In case you find the following output:

sh: diamond: command not found

You should return to step 1) and install DIAMOND2 so that it can be executed from the default execution PATH.

These interface functions to DIAMOND2 are implemented in orthologr, akin to the interface functions to BLAST+:

  • diamond() : Interface function to DIAMOND2
  • diamond_best() : Perform a diamond best hit search
  • diamond_rec() : Perform a diamond reciprocal best hit (RBH) search
  • set_diamond() : Preparing the parameters and databases for subsequent diamond searches

Furthermore, the following functions use DIAMOND2 by default, though the use of BLAST can be specified through the parameter aligner = "blast":

Multiple Sequence Alignment Tools

The orthologr package also provides interfaces to the following Multiple Alignment Tools. Nevertheless, non of them have to be installed if the corresponding interface functions are not used.

Install ClustalW2

To install ClustalW2 please go to the ClustalW homepage and download the corresponding clustalw2 program matching your operating system.

After downloading and unpacking the clustalw2 program, please go to the clustalw-2.1 folder and open a Terminal application to type (in this example for Mac OS X):

# copy clustalw2 files to `usr/local/bin`
cp clustalw2 usr/local/bin

Install T-Coffee

To install T-Coffee please go to the T-Coffee homepage and download the corresponding T-Coffee program matching your operating system.

Install MUSCLE

  • MUSCLE : Fast and accurate multiple alignment tool of nucleic acid and protein sequences

Install ClustalO

  1. Download the argtable program.

  2. Unzip the file.

  3. Run within the argtable folder:

./configure

make

make check

sudo make install
  1. Download ClustalO.

  2. Unzip the folder and run within the folder:

./configure

make

sudo make install

Install MAFFT

  • MAFFT : A tool for multiple sequence alignment and phylogeny

In orthologr the function multi_aln() provides interfaces to all of these multiple alignment tools as well as an pairwise alignment interface to the Biostrings package performing a Needleman-Wunsch algorithm.

Codon Alignment Tools

The codon alignment tool Pal2Nal is already integrated in the orthologr package and doesn’t need to be installed.

You don’t need to worry about downloading and installing PAL2NAL, it is already included in the orthologr package. The corresponding function codon_aln() takes a protein alignment and the corresponding coding sequences and returns a codon alignment by calling Pal2Nal from inside of the orthologr package.

dNdS Estimation Methods

dNdS estimation is a method to quantify the selection pressure acting on a specific protein sequence determined by pairwise comparisons of amino acid substitutions between two protein sequences and their corresponding codon alignments. Different models have been proposed to estimate this ratio quantifying selection pressure on proteins. The orthologr package includes the most common dNdS estimation methods.

Starting with an codon alignment returned by codon_aln() the function dNdS() computes the the dN, dS, and dNdS values of pairs of proteins.

Based on implementations provided by gestimator, ape, and KaKs_Calculator, the following dNdS Estimation Methods are available in orthologr:

  • Li : Li’s method (1993) -> provided by the ape package

  • Comeron : Comeron’s method (1995)

  • NG : Nei, M. and Gojobori, T. (1986)

  • LWL : Li, W.H., et al. (1985)

  • MLWL (Modified LWL), MLPB (Modified LPB): Tzeng, Y.H., et al. (2004)

  • YN : Yang, Z. and Nielsen, R. (2000)

  • MYN (Modified YN): Zhang, Z., et al. (2006)

For this purpose you need to have KaKs_Calculator installed on your system and executable from your default PATH, e,g, /usr/local/bin/.

Install KaKs_Calculator (For Linux/Unix/OS)

Please go to the KaKs_Calculator homepage and download KaKs_Calculator.

E.g.

# download KaKs_Calculator
wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/kaks-calculator/KaKs_Calculator1.2.tar.gz
# unzip
gzip -d KaKs_Calculator1.2.tar.gz
tar -xf KaKs_Calculator1.2.tar
# install
cd KaKs_Calculator1.2/src
sudo make
sudo cp KaKs_Calculator /usr/local/bin/

Now you should be able to run KaKs_Calculator via KaKs_Calculator -h in your bash or as system("KaKs_Calculator -h") in R.

The most recent version KaKs_Calculator2.0 can be found here.