Extraction

The package provides various molecular counts resampling models that can be assembled into observation models to mimic experimental protocols for extracting and measuring materials from cells. A small set of such extraction schemes is predefined. Although it is not a requirement, they are Instant and directly change the system state. For an example, see examples/specification/extraction.schedule.json.

More realistic extraction schemes need to be defined by hand.

Predefined extraction schemes

GeneRegulatorySystems.Models.ExtractionModule

Contains predefined Instant observation models to simulate extraction of -omics data from the system state.

These recipes assemble some of the primitives defined in Resampling into Schedules, additionally wrapping the corresponding specifications to suppress intermediate output.

source
GeneRegulatorySystems.Models.Extraction.simple_transcriptomeFunction
simple_transcriptome(specification::AbstractDict{Symbol})

Construct an Instant naive extraction model that independently samples transcripts (premrnas and mrnas) to achieve a specified target of the expected total count.

Specifically, this constructs a Schedule that just drops all non-transcript molecular species and then applies ResampleTargetMeanEachBinomial. Intermediate output is suppressed.

Specification

In JSON, simple_transcriptome is specified as a JSON object

{"{extract-transcriptome-simple}": {"target": <target>}}

where <target> is a JSON number specifying the expected total count to aim for.

The result will be equivalent to

{"step": [
    {"{filter}": "\\.(pre)?mrnas$"},
    {"{resample-target-mean-each-binomial}": <target>}
]}

with intermediate output suppressed.

source
GeneRegulatorySystems.Models.Extraction.amplified_transcriptomeFunction
amplified_transcriptome(specification::AbstractDict{Symbol})

Construct an Instant extraction model that simulates multiple rounds of amplification by PCR. The procedure is fairly simple:

  1. Drop all non-transcript molecular species.
  2. Retain each molecule independently with probability collect (applying ResampleEachBinomial).
  3. Repeat the following resampling procedure (ResampleTargetMeanEachBinomial) cycles many times: For each molecule independently, either remove it with probability dropout, or copy it with probability efficiency, or otherwise leave it as is.
  4. Retain each molecule independently with the same probability such that the expected total count is target (applying ResampleTargetMeanEachBinomial).

Intermediate output is suppressed.

Specification

In JSON, amplified_transcriptome is specified as a JSON object

{"{extract-transcriptome-amplified}": {
    "collect": <collect>,
    "cycles": <cycles>,
    "efficiency": <efficiency>,
    "dropout": <dropout>,
    "target": <target>
}}

where <cycles> is a JSON (integer) number and <collect>, <efficiency>, <dropout> and <target> are JSON numbers specifying the extraction parameters as defined above.

The result will be equivalent to

{"step": [
    {"{filter}": "\\.(pre)?mrnas$"},
    {"{resample-each-binomial}": <collect>},
    {"each": {"length": <cycles>}, "step": {
        "{resample-each-accumulate}": [<dropout>, <...>, <efficiency>]
    }},
    {"{resample-target-mean-each-binomial}": <target>}
]}

with intermediate output suppressed, where <...> = 1.0 - <efficiency> - <dropout>.

source
GeneRegulatorySystems.Models.Extraction.simple_proteomeFunction
simple_proteome(specification::AbstractDict{Symbol})

Construct an Instant naive extraction model that independently samples proteins to achieve a specified target of the expected total count.

Specifically, this constructs a Schedule that just drops all non-protein molecular species and then applies ResampleTargetMeanEachBinomial. Intermediate output is suppressed.

Specification

In JSON, simple_proteome is specified as a JSON object

{"{extract-proteome-simple}": {"target": <target>}}

where <target> is a JSON number specifying the expected total count to aim for.

The result will be equivalent to

{"step": [
    {"{filter}": "\\.proteins$"},
    {"{resample-target-mean-each-binomial}": <target>}
]}

with intermediate output suppressed.

source

Since these models are implemented as Schedules, which normally call back trace to produce output for each simulation segment, but here we would like to treat the extraction as a unitary step and are not interested in the intermediate steps, we wrap the specification to suppress that output:

Non-destructive extraction

Invoking an extraction scheme will directly modify the current state of the simulated system. This corresponds to the assumption that extraction from a real system (like in scRNAseq) would physically destroy it. Extraction therefore typically ends a sampled trajectory.

To model non-destructive observation that does not affect the trajectory, a simulation schedule may be instructed to branch before the application of the observation. For example,

[
    <pre>,
    {"branch": true, "step": [
        <extraction>
    ]},
    <post>
]

will run the <pre> step(s), then branch off to apply an instant <extraction> step, and then return to the stem and proceed with the <post> step(s). The branched model may itself be a Schedule and thus for example continue regulation for a while before the extraction, and it is also possible to simulate multiple extractions from the same state, or to regularly branch-and-extract until the simulation time budget is exhausted.

Combining multiple modalities

Similarly, it is possible to invoke multiple distinct extraction schemes on separate branches and to then merge the results back together. This is written as

[
    {"branch": true, "step": [
        <extraction1>,
        <extraction2>
    ]},
    {"{merge}": "+"}
]

and it can for example be used to simulate a naive multi-omics protocol; see also Models.Plumbing.Merge.

Counts resampling primitives

For reference, these are the resampling primitives that are used to construct extraction schemes:

GeneRegulatorySystems.Models.Resampling.ResampleEachAccumulateType
ResampleEachAccumulate <: Model{FlatState}

Drop, retain or multiply each molecule independently with specified probabilities ps.

The species are treated an exchangeable. For each molecule, the number of copies it should be replaced by is sampled independently with probabilities given by the Vector ps, where each ps[i] defines the probability of resulting in i - 1 copies.

Specification

In JSON, ResampleEachAccumulate is specified as a JSON object

{"{resample-each-accumulate}": <ps>}

where <ps> is a JSON array of unit-range JSON numbers that sum to 1 and specify the per-molecule copy probabilities as defined above.

source
GeneRegulatorySystems.Models.Resampling.ResampleEachBinomialType
ResampleEachBinomial <: Model{FlatState}

Retain each molecule independently with probability p, and drop the rest.

This replaces each species' count n with a value sampled from from a binomial distribution with parameters n and p.

Specification

In JSON, ResampleEachBinomial is specified as a JSON object

{"{resample-each-binomial}": <p>}

where <p> is a unit-range JSON number specifying the per-molecule retain probability.

source
GeneRegulatorySystems.Models.Resampling.ResampleHypergeometricType
ResampleHypergeometric <: Model{FlatState}

Retain (sample without replacement) n molecules with equal probability, and drop the rest.

This replaces the per-species counts with values sampled from a multivariate hypergeometric distribution parametrized by n and the current counts. If n exceeds the total count, the per-species counts are left unchanged.

Specification

In JSON, ResampleHypergeometric is specified as a JSON object

{"{resample-hypergeometric}": <n>}

where <n> is a JSON number specifying the total count of molecules to retain.

source
GeneRegulatorySystems.Models.Resampling.ResampleMultinomialType
ResampleMultinomial <: Model{FlatState}

Sample n molecules (with replacement) with equal probability.

This replaces the per-species counts with values sampled from a multinomial distribtution parametrized by n and the current counts. If the per-species counts are all zero, they are left unchanged.

Specification

In JSON, ResampleMultinomial is specified as a JSON object

{"{resample-multinomial}": <n>}

where <n> is a JSON number specifying the total count of molecules to sample.

source
GeneRegulatorySystems.Models.Resampling.ResampleTargetMeanEachBinomialType
ResampleTargetMeanEachBinomial <: Model{FlatState}

Retain each molecule independently with the same probability such that the resulting expected total count is target.

This replaces each species $i$'s count $n_i$ with a value sampled from from a binomial distribution $B(n_i, p)$ with $p = \texttt{target} / \sum_i{n_i}$.

Specification

In JSON, ResampleTargetMeanEachBinomial is specified as a JSON object

{"{resample-target-mean-each-binomial}": <target>}

where <target> is a JSON number specifying the target expected total molecule count.

source