Validate or annotate putative LTR transposons that have been predicted using LTRharvest or LTRdigest.
blast_repbase(
query,
repbase.path,
output = "RepbaseOutput.txt",
max.target.seqs = 10000,
eval = 1e-10,
cores = 1
)
file path to the putative LTR transposon sequences in fasta
format.
file path to the RepBase file in fasta
format.
file name of the BLAST output.
maximum number of hits that shall be retrieved that still fulfill the e-value criterium.
Default is max.target.seqs = 10000
.
e-value threshold for BLAST hit detection. Default is eval = 1E-10
.
number of cores to use to perform parallel computations.
The RepBase database provides a collection of curated transposable element annotations.
This function allows users to validate or annotate putative LTR transposons that have been predicted using LTRharvest or LTRdigest by blasting predicted LTR transposons to transposons known (annotated) in other species (e.g. such as Arabidopsis thaliana).
Internally, this function performs a blastn
search of the putative LTR transposons predicted
by LTRharvest or LTRdigest against the Repbase fasta file that is specified by the user.
For this purpose it is required that the user has a working version of BLAST+ running on his or her machine.
http://www.girinst.org/repbase/
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." J. Mol. Biol. 215:403-410.
Gish, W. & States, D.J. (1993) "Identification of protein coding regions by database similarity search." Nature Genet. 3:266-272.
Madden, T.L., Tatusov, R.L. & Zhang, J. (1996) "Applications of network BLAST server" Meth. Enzymol. 266:131-141.
Altschul, S.F., Madden, T.L., Schaeffer, A.A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D.J. (1997) "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic Acids Res. 25:3389-3402.
Zhang Z., Schwartz S., Wagner L., & Miller W. (2000), "A greedy algorithm for aligning DNA sequences" J Comput Biol 2000; 7(1-2):203-14.
if (FALSE) {
# Example annotation run against the A thaliana RepBase using 4 cores
q <- repbase.query(seq.file = "path/to/LTRtransposonSeqs.fasta",
repbase.path = "path/to/Athaliana_repbase.ref",
cores = 4)
Annot <- dplyr::select(dplyr::filter(dplyr::group_by(q,query_id),
(bit_score == max(bit_score))),
query_id:q_len,evalue,bit_score,scope)
# select only hits with a scope > 0.1
Annot.HighMatches <- dplyr::filter(Annot, scope >= 0.1)
# Annotate the proportion of hits
barplot(sort(table(unlist(lapply(stringr::str_split(
names(table(Annot.HighMatches$subject_id)),"_"),
function(x) x[2]))), decreasing = TRUE))
}