Most orthologr
functions are interface functions that
pass data to common bioinformatics tools, internally call the
corresponding tool, and read their output as R object. For this purpose,
when using interface functions in orthologr
users need to
install the underlying bioinformatics tools to obain accurate
results.
The following sections provide step by step instructions or guidance
on installing all bioinformatics tools for which R interface functions
are implemented in orthologr
.
Some tools are not trivial to install, so please read the corresponding sections carefully and execute test cases that are presented in each section.
The following bioinformatics tools you are going to install are based on the these programming languages:
Please make sure these programming languages are installed and
executable on the machines you are going to run orthologr
on.
The orthologr
package provides interfaces to the
pairwise alignment tools, BLAST
and
DIAMOND v2
. We recommend the use of DIAMOND v2
as it saves time whilst being as sensitive as BLAST
.
BLAST
BLAST (= Basic Local Alignment Search Tool) finds regions of similarity between biological sequences and is also used as underlying paradigm of most orthology inference methods.
Go to ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ and download the system specific BLAST program.
Install BLAST :
Environment Variables
section of the
installation manual: Windows
and make sure the execution
PATH
variable is set correctly.Configuration
section of the installation manual: Unix
and make sure the
execution PATH
variable is set correctly to
usr/local/bin
.For example for Linux systems open the Terminal
application and run (Thanks to Alexander Gabel):
# download BLAST+ version 2.2.31
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.31/ncbi-blast-2.2.31+-x64-linux.tar.gz
# extract the compiled version of BLAST
tar zxvpf ncbi-blast+2.2.31+-x64-linux.tar.gz
# copy BLAST files to `usr/local/bin`
cp ncbi-blast-2.2.31+/bin/* usr/local/bin
Alternatively users can set the system call to the BLAST programs by
specifying the PATH
variable (this is useful, because it
allows an easier update of BLAST versions instead of deleting all BLAST
programs from usr/local/bin
):
# open vim text editor
vi .bash_profile
# type 'Shift' then I to edit the file .bash_profile
# and specify the export PATH
export PATH=${PATH}:/path/to/downloaded/blast/folder/ncbi-blast-2.2.31+/bin
# type 'ESC' then ':' then 'w' then 'q' to save and quit the .bash_profile file
# log out from your server with
exit
# log in again and type
blastp -version
Now users should see the BLAST command line options.
Based on our personal experience the installation of BLAST works best
when copy/pasting the BLAST executables to the path
usr/local/bin
. In detail you can run the following steps to
copy/paste the BLAST executables to usr/local/bin
(on Unix
systems). However, updating BLAST will then need to manually delete all
previous BLAST programs from usr/local/bin
:
Open the Terminal application on your system and type:
open /usr/local/bin
Next, copy/paste the blastp
, makeblastdb
,
etc files (BLAST executables) from your BLAST folder to
/usr/local/bin
. To do so you will need to enter the system
password to allow the copy process.
After installing the BLAST program you can open an R session and type the following command to check whether or not BLAST can be executed from R.
# test whether blastp is correctly installed on your machine
system("blastp -version")
blastp: 2.2.31+
Package: blast 2.2.31, build Oct 27 2014 17:10:51
You should see this output if BLAST was installed correctly.
In case you find the following output:
sh: blastp: command not found
You should return to step 2)
and install BLAST so that
it can be executed from the default execution PATH
.
These interface functions to BLAST+ are implemented in
orthologr
:
blast()
: Interface function to BLAST+blast_best()
: Perform a BLAST+ best hit searchblast_rec()
: Perform a BLAST+ reciprocal best hit
(RBH) searchset_blast()
: Preparing the parameters and databases
for subsequent BLAST+ searchesblast.nr()
: Perform a BLASTp search against NCBI
nrdelta.blast()
: Perform a DELTA-BLAST Searchadvanced_blast()
: Advanced interface function to
BLAST+advanced_makedb()
: Advanced interface function to
makeblastdbDIAMOND2
DIAMOND2
(= Double Index alignment of Next-generation sequencing data) finds,
like BLAST
, regions of similarity between biological
sequences. Unlike BLAST
it is much much faster (up to 10
000X faster in the default fast
mode and over 80X faster in
the ultra-sensitive
mode, which is as sensitive as
BLAST
). Thus, DIAMOND2
facilitates even
faster orthology inference.
Go to the download site in the DIAMOND2
wiki and follow the instructions for installation.
DIAMOND2
is supported on Linux, macOS and Windows.
Check the installation of DIAMOND2
by running the
command
diamond --version
DIAMOND2
program you can open an R
session and type the following command to check whether or not
DIAMOND2
can be executed from R.
# test whether diamond is correctly installed on your machine
system("diamond --version")
diamond version 2.1.8
You should see this output if DIAMOND2
was
installed correctly.
In case you find the following output:
sh: diamond: command not found
You should return to step 1)
and install
DIAMOND2
so that it can be executed from the default
execution PATH
.
These interface functions to DIAMOND2
are implemented in
orthologr
, akin to the interface functions to
BLAST+
:
diamond()
: Interface function to DIAMOND2diamond_best()
: Perform a diamond best hit searchdiamond_rec()
: Perform a diamond reciprocal best hit
(RBH) searchset_diamond()
: Preparing the parameters and databases
for subsequent diamond searchesFurthermore, the following functions use DIAMOND2
by
default, though the use of BLAST can be specified through the parameter
aligner = "blast"
:
dNdS()
: Compute dNdS values for two organismsdivergence_stratigraphy()
: Perform ‘Divergence
Stratigraphy’The orthologr
package also provides interfaces to the
following Multiple Alignment Tools. Nevertheless, non of them have to be
installed if the corresponding interface functions are not used.
ClustalW2
To install ClustalW2
please go to the ClustalW homepage and
download the corresponding clustalw2 program
matching your operating system.
After downloading and unpacking the clustalw2
program,
please go to the clustalw-2.1 folder and open a Terminal
application to type (in this example for Mac OS X):
# copy clustalw2 files to `usr/local/bin`
cp clustalw2 usr/local/bin
T-Coffee
To install T-Coffee
please go to the T-Coffee homepage
and download the corresponding T-Coffee
program matching your operating system.
MUSCLE
ClustalO
Download the argtable program.
Unzip the file.
Run within the argtable folder:
./configure
make
make check
sudo make install
Download ClustalO.
Unzip the folder and run within the folder:
./configure
make
sudo make install
MAFFT
In orthologr
the function multi_aln()
provides interfaces to all of these multiple alignment tools as well as
an pairwise alignment interface to the Biostrings
package performing a Needleman-Wunsch
algorithm.
The codon alignment tool Pal2Nal is already
integrated in the orthologr
package and doesn’t need to be
installed.
You don’t need to worry about downloading and installing
PAL2NAL, it is already included in the
orthologr
package. The corresponding function
codon_aln()
takes a protein alignment and the corresponding
coding sequences and returns a codon alignment by calling Pal2Nal
from inside of the orthologr
package.
dNdS estimation is a method to quantify the selection pressure acting
on a specific protein sequence determined by pairwise comparisons of
amino acid substitutions between two protein sequences and their
corresponding codon alignments. Different models have been proposed to
estimate this ratio quantifying selection pressure on proteins. The
orthologr
package includes the most common dNdS estimation
methods.
Starting with an codon alignment returned by codon_aln()
the function dNdS()
computes the the dN, dS, and dNdS
values of pairs of proteins.
Based on implementations provided by gestimator
,
ape
, and KaKs_Calculator,
the following dNdS Estimation Methods are available in
orthologr
:
Li
: Li’s method (1993) -> provided by the
ape package
Comeron : Comeron’s method (1995)
NG : Nei, M. and Gojobori, T. (1986)
LWL : Li, W.H., et al. (1985)
MLWL (Modified LWL), MLPB (Modified LPB): Tzeng, Y.H., et al. (2004)
YN : Yang, Z. and Nielsen, R. (2000)
MYN (Modified YN): Zhang, Z., et al. (2006)
For this purpose you need to have KaKs_Calculator
installed on your system and executable from your default
PATH
, e,g, /usr/local/bin/
.
Please go to the KaKs_Calculator homepage and download KaKs_Calculator.
E.g.
# download KaKs_Calculator
wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/kaks-calculator/KaKs_Calculator1.2.tar.gz
# unzip
gzip -d KaKs_Calculator1.2.tar.gz
tar -xf KaKs_Calculator1.2.tar
# install
cd KaKs_Calculator1.2/src
sudo make
sudo cp KaKs_Calculator /usr/local/bin/
Now you should be able to run KaKs_Calculator via
KaKs_Calculator -h
in your bash or as
system("KaKs_Calculator -h")
in R.
The most recent version KaKs_Calculator2.0
can be found
here.