Extract random loci from a genome of interest — extract_random_seqs_from

This function allows users to specify a number of sequences of a specified length that shall be randomly sampled from the genome. The sampling rule is as follows: For each locus independently sample:

1) choose randomly (equal probability: see sample.int for details) from which of the given chromosomes the locus shall be sampled (replace = TRUE).
2) choose randomly (equal probability: see sample.int for details) from which strand (plus or minus) the locus shall be sampled (replace = TRUE).
3) randomly choose (equal probability: see sample.int the starting position of the locus in the sampled chromosome and strand (replace = TRUE).

extract_random_seqs_from_genome(
  size,
  replace = TRUE,
  prob = NULL,
  interval_width,
  subject_genome,
  file_name = NULL,
  append = FALSE
)

Arguments

size: a non-negative integer giving the number of loci that shall be sampled.
replace: logical value indicating whether sampling should be with replacement. Default: replace = TRUE.
prob: a vector of probability weights for obtaining the elements of the vector being sampled. Default is prob = NULL.
interval_width: the length of the locus that shall be sampled.
subject_genome: file path to the fasta file storing the subject genome.
file_name: a name of the output fasta file that will store the sequences of the randomly sampled loci.
append: shall new random sequences be added to an existing file_name (append = TRUE) or should an existing file_name be removed before storing new random sequences (append = FALSE; Default)?

Author

Hajk-Georg Drost