Extraction
The package provides various molecular counts resampling models that can be assembled into observation models to mimic experimental protocols for extracting and measuring materials from cells. A small set of such extraction schemes is predefined. Although it is not a requirement, they are Instant and directly change the system state. For an example, see examples/specification/extraction.schedule.json.
More realistic extraction schemes need to be defined by hand.
Predefined extraction schemes
GeneRegulatorySystems.Models.Extraction — Module
Contains predefined Instant observation models to simulate extraction of -omics data from the system state.
These recipes assemble some of the primitives defined in Resampling into Schedules, additionally wrapping the corresponding specifications to suppress intermediate output.
GeneRegulatorySystems.Models.Extraction.simple_transcriptome — Function
simple_transcriptome(specification::AbstractDict{Symbol})Construct an Instant naive extraction model that independently samples transcripts (premrnas and mrnas) to achieve a specified target of the expected total count.
Specifically, this constructs a Schedule that just drops all non-transcript molecular species and then applies ResampleTargetMeanEachBinomial. Intermediate output is suppressed.
Specification
In JSON, simple_transcriptome is specified as a JSON object
{"{extract-transcriptome-simple}": {"target": <target>}}where <target> is a JSON number specifying the expected total count to aim for.
The result will be equivalent to
{"step": [
{"{filter}": "\\.(pre)?mrnas$"},
{"{resample-target-mean-each-binomial}": <target>}
]}with intermediate output suppressed.
GeneRegulatorySystems.Models.Extraction.amplified_transcriptome — Function
amplified_transcriptome(specification::AbstractDict{Symbol})Construct an Instant extraction model that simulates multiple rounds of amplification by PCR. The procedure is fairly simple:
- Drop all non-transcript molecular species.
- Retain each molecule independently with probability
collect(applyingResampleEachBinomial). - Repeat the following resampling procedure (
ResampleTargetMeanEachBinomial)cyclesmany times: For each molecule independently, either remove it with probabilitydropout, or copy it with probabilityefficiency, or otherwise leave it as is. - Retain each molecule independently with the same probability such that the expected total count is
target(applyingResampleTargetMeanEachBinomial).
Intermediate output is suppressed.
Specification
In JSON, amplified_transcriptome is specified as a JSON object
{"{extract-transcriptome-amplified}": {
"collect": <collect>,
"cycles": <cycles>,
"efficiency": <efficiency>,
"dropout": <dropout>,
"target": <target>
}}where <cycles> is a JSON (integer) number and <collect>, <efficiency>, <dropout> and <target> are JSON numbers specifying the extraction parameters as defined above.
The result will be equivalent to
{"step": [
{"{filter}": "\\.(pre)?mrnas$"},
{"{resample-each-binomial}": <collect>},
{"each": {"length": <cycles>}, "step": {
"{resample-each-accumulate}": [<dropout>, <...>, <efficiency>]
}},
{"{resample-target-mean-each-binomial}": <target>}
]}with intermediate output suppressed, where <...> = 1.0 - <efficiency> - <dropout>.
GeneRegulatorySystems.Models.Extraction.simple_proteome — Function
simple_proteome(specification::AbstractDict{Symbol})Construct an Instant naive extraction model that independently samples proteins to achieve a specified target of the expected total count.
Specifically, this constructs a Schedule that just drops all non-protein molecular species and then applies ResampleTargetMeanEachBinomial. Intermediate output is suppressed.
Specification
In JSON, simple_proteome is specified as a JSON object
{"{extract-proteome-simple}": {"target": <target>}}where <target> is a JSON number specifying the expected total count to aim for.
The result will be equivalent to
{"step": [
{"{filter}": "\\.proteins$"},
{"{resample-target-mean-each-binomial}": <target>}
]}with intermediate output suppressed.
Since these models are implemented as Schedules, which normally call back trace to produce output for each simulation segment, but here we would like to treat the extraction as a unitary step and are not interested in the intermediate steps, we wrap the specification to suppress that output:
GeneRegulatorySystems.Models.Extraction.with_intermediate_output_suppressed — Function
with_intermediate_output_suppressed(specifications...)Transform an extraction specification such that the corresponding Schedule will only emit output once at the end.
This is used by the extraction schemes to behave more like a unit (instead of the composite Schedule they actually are).
Non-destructive extraction
Invoking an extraction scheme will directly modify the current state of the simulated system. This corresponds to the assumption that extraction from a real system (like in scRNAseq) would physically destroy it. Extraction therefore typically ends a sampled trajectory.
To model non-destructive observation that does not affect the trajectory, a simulation schedule may be instructed to branch before the application of the observation. For example,
[
<pre>,
{"branch": true, "step": [
<extraction>
]},
<post>
]will run the <pre> step(s), then branch off to apply an instant <extraction> step, and then return to the stem and proceed with the <post> step(s). The branched model may itself be a Schedule and thus for example continue regulation for a while before the extraction, and it is also possible to simulate multiple extractions from the same state, or to regularly branch-and-extract until the simulation time budget is exhausted.
Combining multiple modalities
Similarly, it is possible to invoke multiple distinct extraction schemes on separate branches and to then merge the results back together. This is written as
[
{"branch": true, "step": [
<extraction1>,
<extraction2>
]},
{"{merge}": "+"}
]and it can for example be used to simulate a naive multi-omics protocol; see also Models.Plumbing.Merge.
Counts resampling primitives
For reference, these are the resampling primitives that are used to construct extraction schemes:
GeneRegulatorySystems.Models.Resampling — Module
Contains Instant models to be used in extraction schemes (observation models).
GeneRegulatorySystems.Models.Resampling.ResampleEachAccumulate — Type
ResampleEachAccumulate <: Model{FlatState}Drop, retain or multiply each molecule independently with specified probabilities ps.
The species are treated an exchangeable. For each molecule, the number of copies it should be replaced by is sampled independently with probabilities given by the Vector ps, where each ps[i] defines the probability of resulting in i - 1 copies.
Specification
In JSON, ResampleEachAccumulate is specified as a JSON object
{"{resample-each-accumulate}": <ps>}where <ps> is a JSON array of unit-range JSON numbers that sum to 1 and specify the per-molecule copy probabilities as defined above.
GeneRegulatorySystems.Models.Resampling.ResampleEachBinomial — Type
ResampleEachBinomial <: Model{FlatState}Retain each molecule independently with probability p, and drop the rest.
This replaces each species' count n with a value sampled from from a binomial distribution with parameters n and p.
Specification
In JSON, ResampleEachBinomial is specified as a JSON object
{"{resample-each-binomial}": <p>}where <p> is a unit-range JSON number specifying the per-molecule retain probability.
GeneRegulatorySystems.Models.Resampling.ResampleHypergeometric — Type
ResampleHypergeometric <: Model{FlatState}Retain (sample without replacement) n molecules with equal probability, and drop the rest.
This replaces the per-species counts with values sampled from a multivariate hypergeometric distribution parametrized by n and the current counts. If n exceeds the total count, the per-species counts are left unchanged.
Specification
In JSON, ResampleHypergeometric is specified as a JSON object
{"{resample-hypergeometric}": <n>}where <n> is a JSON number specifying the total count of molecules to retain.
GeneRegulatorySystems.Models.Resampling.ResampleMultinomial — Type
ResampleMultinomial <: Model{FlatState}Sample n molecules (with replacement) with equal probability.
This replaces the per-species counts with values sampled from a multinomial distribtution parametrized by n and the current counts. If the per-species counts are all zero, they are left unchanged.
Specification
In JSON, ResampleMultinomial is specified as a JSON object
{"{resample-multinomial}": <n>}where <n> is a JSON number specifying the total count of molecules to sample.
GeneRegulatorySystems.Models.Resampling.ResampleTargetMeanEachBinomial — Type
ResampleTargetMeanEachBinomial <: Model{FlatState}Retain each molecule independently with the same probability such that the resulting expected total count is target.
This replaces each species $i$'s count $n_i$ with a value sampled from from a binomial distribution $B(n_i, p)$ with $p = \texttt{target} / \sum_i{n_i}$.
Specification
In JSON, ResampleTargetMeanEachBinomial is specified as a JSON object
{"{resample-target-mean-each-binomial}": <target>}where <target> is a JSON number specifying the target expected total molecule count.