Validate or annotate putative LTR transposons that have been predicted using LTRharvest or LTRdigest.

blast_repbase(
  query,
  repbase.path,
  output = "RepbaseOutput.txt",
  max.target.seqs = 10000,
  eval = 1e-10,
  cores = 1
)

Arguments

query

file path to the putative LTR transposon sequences in fasta format.

repbase.path

file path to the RepBase file in fasta format.

output

file name of the BLAST output.

max.target.seqs

maximum number of hits that shall be retrieved that still fulfill the e-value criterium. Default is max.target.seqs = 10000.

eval

e-value threshold for BLAST hit detection. Default is eval = 1E-10.

cores

number of cores to use to perform parallel computations.

Details

The RepBase database provides a collection of curated transposable element annotations.

This function allows users to validate or annotate putative LTR transposons that have been predicted using LTRharvest or LTRdigest by blasting predicted LTR transposons to transposons known (annotated) in other species (e.g. such as Arabidopsis thaliana).

Internally, this function performs a blastn search of the putative LTR transposons predicted by LTRharvest or LTRdigest against the Repbase fasta file that is specified by the user.

For this purpose it is required that the user has a working version of BLAST+ running on his or her machine.

References

http://www.girinst.org/repbase/

Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." J. Mol. Biol. 215:403-410.

Gish, W. & States, D.J. (1993) "Identification of protein coding regions by database similarity search." Nature Genet. 3:266-272.

Madden, T.L., Tatusov, R.L. & Zhang, J. (1996) "Applications of network BLAST server" Meth. Enzymol. 266:131-141.

Altschul, S.F., Madden, T.L., Schaeffer, A.A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D.J. (1997) "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic Acids Res. 25:3389-3402.

Zhang Z., Schwartz S., Wagner L., & Miller W. (2000), "A greedy algorithm for aligning DNA sequences" J Comput Biol 2000; 7(1-2):203-14.

Author

Hajk-Georg Drost

Examples

if (FALSE) {
# Example annotation run against the A thaliana RepBase using 4 cores
q <- repbase.query(seq.file     = "path/to/LTRtransposonSeqs.fasta",
                  repbase.path = "path/to/Athaliana_repbase.ref",
                  cores        = 4)

Annot <- dplyr::select(dplyr::filter(dplyr::group_by(q,query_id),
                                    (bit_score == max(bit_score))),
                                     query_id:q_len,evalue,bit_score,scope)
# select only hits with a scope > 0.1
Annot.HighMatches <- dplyr::filter(Annot, scope >= 0.1)
# Annotate the proportion of hits
barplot(sort(table(unlist(lapply(stringr::str_split(
        names(table(Annot.HighMatches$subject_id)),"_"),
        function(x) x[2]))), decreasing = TRUE))
}