quality.filter.meta for several different ltr similarity thresholds.R/generate.multi.quality.filter.meta.R
generate.multi.quality.filter.meta.RdA helper function to apply the quality.filter function to diverse LTRpred annotations while probing different ltr similarity thresholds.
generate.multi.quality.filter.meta( kingdom, genome.folder, ltrpred.meta.folder, sim.options, cut.range.options, n.orfs = 0, strategy = "default", update = FALSE )
| kingdom | the taxonomic kingdom of the species for which |
|---|---|
| genome.folder | a file path to a folder storing the genome assembly files in fasta format that
were used to generate |
| ltrpred.meta.folder | a file path to a folder storing |
| sim.options | a numeric vector storing the ltr similarity thresholds that shall be probed. |
| cut.range.options | a numeric vector storing the similarity cut range thresholds that shall be probed. |
| n.orfs | minimum number of open reading frames a predicted retroelement shall possess. |
| strategy | quality filter strategy. Options are
|
| update | shall already existing |
A list with to list elements sim_file and gm_file. Each list element stores a data.frame:
sim_file (similarity file)
gm_file (genome metrics file)
Quality Control
ltr.similarity: Minimum similarity between LTRs. All TEs not matching this
criteria are discarded.
n.orfs: minimum number of Open Reading Frames that must be found between the
LTRs. All TEs not matching this criteria are discarded.
PBS or Protein Match: elements must either have a predicted Primer Binding
Site or a protein match of at least one protein (Gag, Pol, Rve, ...) between their LTRs. All TEs not matching this criteria are discarded.
The relative number of N's (= nucleotide not known) in TE <= 0.1. The relative number of N's is computed as follows: absolute number of N's in TE / width of TE.
Hajk-Georg Drost