## Installing Prerequisite Tools

Most orthologr functions are interface functions that pass data to common bioinformatics tools, internally call the corresponding tool, and read their output as R object. For this purpose, when using interface functions in orthologr users need to install the underlying bioinformatics tools to obain accurate results.

The following sections provide step by step instructions or guidance on installing all bioinformatics tools for which R interface functions are implemented in orthologr.

Some tools are not trivial to install, so please read the corresponding sections carefully and execute test cases that are presented in each section.

## Programming Languages

The following bioinformatics tools you are going to install are based on the these programming languages:

Please make sure these programming languages are installed and executable on the machines you are going to run orthologr on.

### Install BLAST

BLAST = Basic Local Alignment Search Tool finds regions of similarity between biological sequences and is also used as underlying paradigm of most fast orthology inference methods.

2. Install BLAST :

• On a Windows machine (see installation manual: Windows) -> Please carefully read the Environment Variables section of the installation manual: Windows and make sure the execution PATH variable is set correctly.
• On a Unix machine (see installation manual: Unix) -> Please carefully read the Configuration section of the installation manual: Unix and make sure the execution PATH variable is set correctly to usr/local/bin.

For example for Linux systems open the Terminal application and run (Thanks to Alexander Gabel):

# download BLAST+ version 2.2.31
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.31/ncbi-blast-2.2.31+-x64-linux.tar.gz

# extract the compiled version of BLAST
tar zxvpf ncbi-blast+2.2.31+-x64-linux.tar.gz

# copy BLAST files to usr/local/bin
cp ncbi-blast-2.2.31+/bin/* usr/local/bin

Alternatively users can set the system call to the BLAST programs by specifying the PATH variable (this is useful, because it allows an easier update of BLAST versions instead of deleting all BLAST programs from usr/local/bin):

# open vim text editor
vi .bash_profile

# type 'Shift' then I to edit the file .bash_profile
# and specify the export PATH

# type 'ESC' then ':' then 'w' then 'q' to save and quit the .bash_profile file

# log out from your server with
exit

blastp -version

Now users should see the BLAST command line options.

### Some tips

Based on our personal experience the installation of BLAST works best when copy/pasting the BLAST executables to the path usr/local/bin. In detail you can run the following steps to copy/paste the BLAST executables to usr/local/bin (on Unix systems). However, updating BLAST will then need to manually delete all previous BLAST programs from usr/local/bin :

Open the Terminal application on your system and type:

open /usr/local/bin


Next, copy/paste the blastp, makeblastdb, etc files (BLAST executables) from your BLAST folder to /usr/local/bin. To do so you will need to enter the system password to allow the copy process.

After installing the BLAST program you can open an R session and type the following command to check whether or not BLAST can be executed from R.

# test whether blastp is correctly installed on your machine
system("blastp -version")
blastp: 2.2.31+
Package: blast 2.2.31, build Oct 27 2014 17:10:51

You should see this output if BLAST was installed correctly.

In case you find the following output:

sh: blastp: command not found

You should return to step 2) and install BLAST so that it can be executed from the default execution PATH.

These interface functions to BLAST+ are implemented in orthologr:

• blast() : Interface function to BLAST+
• blast_best() : Perform a BLAST+ best hit search
• blast_rec() : Perform a BLAST+ reciprocal best hit (RBH) search
• set_blast() : Preparing the parameters and databases for subsequent BLAST+ searches
• blast.nr() : Perform a BLASTp search against NCBI nr
• delta.blast() : Perform a DELTA-BLAST Search
• advanced_blast() : Advanced interface function to BLAST+
• advanced_makedb() : Advanced interface function to makeblastdb

### Multiple Sequence Alignment Tools

The orthologr package also provides interfaces to the following Multiple Alignment Tools. Nevertheless, non of them have to be installed if the corresponding interface functions are not used.

### Install ClustalW2

To install ClustalW2 please go to the ClustalW homepage and download the corresponding clustalw2 program matching your operating system.

After downloading and unpacking the clustalw2 program, please go to the clustalw-2.1 folder and open a Terminal application to type (in this example for Mac OS X):

# copy clustalw2 files to usr/local/bin
cp clustalw2 usr/local/bin

### Install T-Coffee

To install T-Coffee please go to the T-Coffee homepage and download the corresponding T-Coffee program matching your operating system.

### Install MUSCLE

• MUSCLE : Fast and accurate multiple alignment tool of nucleic acid and protein sequences

### Install ClustalO

2. Unzip the file.

3. Run within the argtable folder:

./configure

make

make check

sudo make install

2. Unzip the folder and run within the folder:

./configure

make

sudo make install

### Install MAFFT

• MAFFT : A tool for multiple sequence alignment and phylogeny

In orthologr the function multi_aln() provides interfaces to all of these multiple alignment tools as well as an pairwise alignment interface to the Biostrings package performing a Needleman-Wunsch algorithm.

### Codon Alignment Tools

The codon alignment tool Pal2Nal is already integrated in the orthologr package and doesn’t need to be installed.

You don’t need to worry about downloading and installing PAL2NAL, it is already included in the orthologr package. The corresponding function codon_aln() takes a protein alignment and the corresponding coding sequences and returns a codon alignment by calling Pal2Nal from inside of the orthologr package.

### dNdS Estimation Methods

dNdS estimation is a method to quantify the selection pressure acting on a specific protein sequence determined by pairwise comparisons of amino acid substitutions between two protein sequences and their corresponding codon alignments. Different models have been proposed to estimate this ratio quantifying selection pressure on proteins. The orthologr package includes the most common dNdS estimation methods.

Starting with an codon alignment returned by codon_aln() the function dNdS() computes the the dN, dS, and dNdS values of pairs of proteins.

Based on implementations provided by gestimator, ape, and KaKs_Calculator, the following dNdS Estimation Methods are available in orthologr:

• Li : Li’s method (1993) -> provided by the ape package

• Comeron : Comeron’s method (1995)

• NG : Nei, M. and Gojobori, T. (1986)

• LWL : Li, W.H., et al. (1985)

• MLWL (Modified LWL), MLPB (Modified LPB): Tzeng, Y.H., et al. (2004)

• YN : Yang, Z. and Nielsen, R. (2000)

• MYN (Modified YN): Zhang, Z., et al. (2006)

For this purpose you need to have KaKs_Calculator installed on your system and executable from your default PATH, e,g, /usr/local/bin/.

### Install KaKs_Calculator (For Linux/Unix/OS)

E.g.

# download KaKs_Calculator
sudo cp KaKs_Calculator /usr/local/bin/
Now you should be able to run KaKs_Calculator via KaKs_Calculator -h in your bash or as system("KaKs_Calculator -h") in R.
The most recent version KaKs_Calculator2.0 can be found here.