R/extract_random_seqs_from_multiple_genomes.R
extract_random_seqs_from_multiple_genomes.Rd
In some cases, users may wish to extract sequences from randomly sampled loci of a particular length from a set of genomes. This function allows users to specify a number of sequences of a specified length that shall be randomly sampled from the genome. The sampling rule is as follows: For each locus independently sample:
1) choose randomly (equal probability: see sample.int
for details) from which of the given chromosomes the locus shall be sampled (replace = TRUE
).
2) choose randomly (equal probability: see sample.int
for details) from which strand (plus or minus) the locus shall be sampled (replace = TRUE
).
3) randomly choose (equal probability: see sample.int
the starting position of the locus in the sampled chromosome and strand (replace = TRUE
).
extract_random_seqs_from_multiple_genomes(
sample_size,
replace = TRUE,
prob = NULL,
interval_width,
subject_genomes,
file_name = NULL,
separated_by_genome = FALSE,
update = TRUE,
path = NULL
)
a non-negative integer giving the number of loci that shall be sampled.
logical value indicating whether sampling should be with replacement. Default: replace = TRUE
.
a vector of probability weights for obtaining the elements of the vector being sampled. Default is prob = NULL
.
the length of the locus that shall be sampled.
a vector containing file paths to the reference genomes that shall be queried (e.g. file paths returned by meta.retrieval
).
name of the fasta file that stores the BLAST hit sequences. This name will only be used when separated_by_genome = FALSE
.
a logical value indicating whether or not hit sequences from different genomes should be stored in the same
output fasta
file separated_by_genome = FALSE
(default) or in separate fasta
files separated_by_genome = TRUE
.
shall an existing file_name
file be overwritten (update = TRUE
; Default) or shall blast hit sequences be appended to the existing file (update = FALSE
)?
a folder path in which corresponding fasta
output files shall be stored.