This function allows users to specify a number of sequences of a specified length that shall be randomly sampled from the genome. The sampling rule is as follows: For each locus independently sample:

  • 1) choose randomly (equal probability: see sample.int for details) from which of the given chromosomes the locus shall be sampled (replace = TRUE).

  • 2) choose randomly (equal probability: see sample.int for details) from which strand (plus or minus) the locus shall be sampled (replace = TRUE).

  • 3) randomly choose (equal probability: see sample.int the starting position of the locus in the sampled chromosome and strand (replace = TRUE).

extract_random_seqs_from_genome(
  size,
  replace = TRUE,
  prob = NULL,
  interval_width,
  subject_genome,
  file_name = NULL,
  append = FALSE
)

Arguments

size

a non-negative integer giving the number of loci that shall be sampled.

replace

logical value indicating whether sampling should be with replacement. Default: replace = TRUE.

prob

a vector of probability weights for obtaining the elements of the vector being sampled. Default is prob = NULL.

interval_width

the length of the locus that shall be sampled.

subject_genome

file path to the fasta file storing the subject genome.

file_name

a name of the output fasta file that will store the sequences of the randomly sampled loci.

append

shall new random sequences be added to an existing file_name (append = TRUE) or should an existing file_name be removed before storing new random sequences (append = FALSE; Default)?

Author

Hajk-Georg Drost