R/extract_random_seqs_from_genome.R
extract_random_seqs_from_genome.Rd
This function allows users to specify a number of sequences of a specified length that shall be randomly sampled from the genome. The sampling rule is as follows: For each locus independently sample:
1) choose randomly (equal probability: see sample.int
for details) from which of the given chromosomes the locus shall be sampled (replace = TRUE
).
2) choose randomly (equal probability: see sample.int
for details) from which strand (plus or minus) the locus shall be sampled (replace = TRUE
).
3) randomly choose (equal probability: see sample.int
the starting position of the locus in the sampled chromosome and strand (replace = TRUE
).
extract_random_seqs_from_genome(
size,
replace = TRUE,
prob = NULL,
interval_width,
subject_genome,
file_name = NULL,
append = FALSE
)
a non-negative integer giving the number of loci that shall be sampled.
logical value indicating whether sampling should be with replacement. Default: replace = TRUE
.
a vector of probability weights for obtaining the elements of the vector being sampled. Default is prob = NULL
.
the length of the locus that shall be sampled.
file path to the fasta
file storing the subject genome.
a name of the output fasta
file that will store the sequences of the randomly
sampled loci.
shall new random sequences be added to an existing file_name
(append = TRUE
)
or should an existing file_name
be removed before storing new random sequences (append = FALSE
; Default)?