Scheduling

The scheduling system forms the core of this package as it ties the various included regulation and instant adjustment models together into single reproducible experiment definitions that, besides the gene regulatory systems to be simulated, may also include initial setup and seed control, simulated -omics extraction protocols, (optionally regularly repeating) interventions as well as simulation branching into independent samples sharing parts of their history.

The primary entity in this module is the Schedule, which is a type of Model that advances the simulation by organizing simulation segments that proceed by invoking other Models. Since the orchestrated Models may themselves be Schedules, it is possible to assemble complex sequences of simulation segments and conceptually integrate the accumulated state changes into single unified trajectories.

Each Schedule has a Specification that represents the exact instructions for how to advance. The Specification also has an alternative representation as a JSON document from which it can be conveniently constructed.

Details are given below, but broadly, the scheduling system supports

  • construction of Schedules and other Models from their JSON representation,
  • a simple templating mechanism allowing the definition of named bindings and their insertion into specifications before model construction,
  • iteration both over explicit enumerations of specifications and over values to consecutively insert into a nested specification template,
  • repetition of (finite-time) nested Schedules,
  • automatic conversion of the simulation state between the representations required by the consecutive segments' Models,
  • random seed control,
  • hooks for output control and progress reporting, and
  • trajectory branching.

To support this multitude of requirements, Specifications are assembled from reusable building blocks (which are themselves Specifications), effectively defining a domain-specific programming language. Conceptually, the JSON representation is the source code of the scheduling language, its Dict/Array form after JSON.parseing is an intermediate representation, its Specification form is an abstract syntax tree, and a Schedule containing it is an interpreter that is invoked whenever the simulation state should be advanced. The syntax of the scheduling language is therefore defined by the allowed combinations of Specifications, and the associated semantics defined in terms of which simulation segments are produced and executed.

The experiment command line tool wraps all of this functionality by loading a collection of JSON specifications, interpreting them and collecting the results in a structured format, along with an index of simulation segments and the specifications used to produce the data. For reproducible simulation experiments, this is the recommended way to use the package.

Each simulation segment can be addressed by a path relative to the Schedule that produced it, and the index includes each segment's path. Since the Schedule can be recreated from its Specification (and the associated bindings) and the sequence of segments it produces is deterministic, the scheduling system's iteration state can be fully reified for any simulation segment, including simulation context such as the exact Model in effect. This can be used to obtain that model for export or analysis without re-running the full experiment, and also to better understand or debug the scheduling subsystem.

Schedule specification

A Schedule's simulation plan is represented by its specification, which is effectively a tree of syntactic elements (all <: Specification) of the scheduling mini-language, with Scope, List and Each being inner nodes and Template, Load and Slice terminal nodes.

Besides its specification, each Schedule keeps track of a map of named value bindings. Templates (and by extension, indirectly also other Specifications enclosing them) may include references to such named values that are to be inserted when stepping through the Schedule; these values may be defined by an enclosing Scope or Each, or they may be free and must then be injected into the Schedule's bindings on construction. Schedules therefore close over their specification, which in that process will ensure that no references are left dangling.

The following are the potential elements of a Specification:

GeneRegulatorySystems.Specifications.TemplateType
Template <: Specification

Contains instructions for instantiating ("expanding") a value from a definition.

When stepping through a Schedule, expanding Templates produces all non-Specification values that influence the schedule's behavior, which includes all primitive Models.

The definition may contain references to named bindings. When expanded, these references are first replaced by their values, and then the function held in the Template's constructor field is called on the result. References come in two forms:

  • Any Dict x that contains a :$ key will be replaced by the object addressed by x[:$]. If that value is a String, it refers to the binding of that same name. If it is a Vector, it refers to a nested object addressed by its items interpreted as path components; each item will in turn descend by accessing the respective key, index or property. If the found value x′ is also a Dict and there are other mappings in x, they are merged into (a shallow copy of) x′, overriding previously existing mappings.
  • Any String s that contains substrings of the form "${binding}" will be replaced by a String with that reference substituted by the respective binding's repr for Strings and Numbers, or the literal "__omitted__" otherwise.

Substitution will in geneneral not return independent objects but rather alias intermediate Dicts, Vectors and other objects into the substituted objects if they contain no substitutions of of their own. In other words, the produced data structures are treated as persistent (and therefore immutable) during expansion.

source
GeneRegulatorySystems.Specifications.SliceType
Slice <: Specification

Empty singleton that represents an infinitesimal-time step in the simulation.

It acts as a sentinel element in the scheduling language and roughly has the role of Nothing.

source
GeneRegulatorySystems.Specifications.SequenceType
Sequence <: Specification

Abstract supertype of specifications that can be iterated.

The meaning of the specified Sequence (to be interpreted when executing a schedule) depends on whether a directly enclosing Scope has its branch flag set.

source
GeneRegulatorySystems.Specifications.EachType
Each <: Sequence

Represents a sequence of specifications defined implicitly by setting a named binding, in turn, to each item from an ordered collection of items and evaluating a nested Specification step in the resulting context.

The iteration variable, defined by as, does not necessarily need to be named or used; if it is not, the Each effectively represents a repetition of the same (nested) specification.

source
GeneRegulatorySystems.Specifications.ScopeType
Scope <: Specification

Contains named value definitions (mostly Templates) that apply to a nested context specified by step (which is a Specification).

In that sense it is equivalent to a lexical scope in any programming language, but it can additionally be thought of conceptually as applying to a range on the time axis during simulation, either filling its parent range or, if definitions[:to] is set, limited to that duration.

Template values in definitions may contain references to bindings from a surrounding scope (Scope or Each, or by inclusion on Schedule construction).

The definitions shadow bindings of the same name from a surrounding scope. If the barrier flag is set, any bindings from a surrounding scope will not be available in the nested step; only new bindings from definitions will be included (with some exceptions, see Schedule{Scope}).

If the branch flag is set, the step must be a Sequence specification and should then be interpreted as specifying independent simulation branches instead of the default behavior of acting on the same simulation state one after the other.

source

JSON representation

While Schedules and their specifications can be built by hand, users will typically construct them using the Specification function from their alternative representation as JSON documents:

GeneRegulatorySystems.Specifications.SpecificationType

Abstract supertype of all syntactic elements of the scheduling language.

Construction

Specification(x; bound::Set{Symbol} = Set{Symbol}(), as::Symbol = :step)

Construct a Specification from nested Dict/Vector objects such as they are obtained by loading JSON via JSON.parse(..., dicttype = Dict{Symbol, Any}).

Specification recursively interprets a JSON document (or part thereof) and returns a Specification subtype depending on the type and shape of the JSON. It will be interpreted as either a :step, a :value or :items; the top-level document will be interpreted as a :step. Specifically:

  • When expecting a :step:
    • If x is a Vector (JSON Array), it will be parsed as a List of :step Specifications.
    • If x is a Dict (JSON Object), the earliest matching rule of the following applies:
      1. (reference literal) If x contains a :$ key (JSON name "$"), it is parsed as a Template expanding to the binding referenced by the corresponding value.
      2. (load literal) If x contains a :< key (JSON name "<"), it is parsed as a Load of the file referenced by the corresponding value. This Load will be wrapped in a Scope that has barrier set and collects all the other mappings in x as definitions, interpreting the mapped values as :value Specifications.
      3. (template literal) If x contains a single mapping, and that mapping's key is enclosed in braces (JSON names "{...}"), it is parsed as a Template that is expanded by transforming the substituted value using a function returned by passing the key (without the braces) to Specifications.constructor.
      4. (each) If x contains an :each key (JSON name "each"), it is parsed as an Each. The iterable items are defined by x[:each] interpreted as an :items Specification; the step is defined by x[:step] interpreted as a :step Specification, and the index variable name is optionally defined by x[:as]. If there are any other mappings in x, they will be collected and the Each wrapped in a Scope using these definitions. In other words, the corresponding definitions are available in the items and step definitions, but cannot refer to the index variable.
      5. (scope) If x is not empty, it is parsed as a Scope, with the step defined by x[:step] interpreted as a :step Specification (defaulting to Slice()) and branch optionally set by x[:branch]. All the other mappings in x are collected as definitions, interpreting the mapped values as :value Specifications.
      6. (slice) Otherwise x is empty and is parsed as Slice().
    • Otherwise, x is parsed as a Template expanding to x.
  • When expecting :items:
    • If x is a Dict (JSON Object), the earliest matching rule of the following applies:
      1. (reference literal) as above in the :step case
      2. (template literal) as above in the :step case
      3. (range literal) Otherwise x is parsed as a Template that is expanded by calling Julia's range function, splatting the substituted value as keyword arguments.
    • Otherwise, x is parsed as a Template expanding to x.
  • When expecting a :value:
    • If x is a Dict (JSON Object), the earliest matching rule of the following applies:
      1. (reference literal) as above in the :step case
      2. (load literal) as above in the :step case
      3. (template literal) as above in the :step case
      4. Otherwise, x is parsed as a Template expanding to x.
    • Otherwise, x is parsed as a Template expanding to x.
source

When parsing the alternative representation as a Specification, whenever a Template is constructed, the corresponding constructor, to be called when it is eventually expanded, is looked up by calling Specifications.constructor. This allows the definition of (non-String and non-number) terminal values within the JSON specification language, including all actual Models.

GeneRegulatorySystems.Specifications.constructorMethod
constructor(name::Symbol)

Select by name and return a function that accepts a substituted Template value and constructs an object of the selected kind from it.

This function's methods define which object literals can be used in the JSON specification language and how they should be interpreted after template substitution. Each method must accept a single argument of a type as it would be produced by calling JSON.parse(..., dicttype = Dict{Symbol, Any}).

source

To see which objects can be defined in the language using the {"{...}": ...} syntax, you can simply call methods(Specifications.constructor). To register new types of objects and in this way support them in the language, you may define new methods of the form constructor(::Val{:...}).

Schedule semantics

When a Schedule is invoked to advance the simulation state (i.e. by calling it as a functor, like any Model), its exact behavior is determined by the type of its (top-level) specification, which may involve constructing and recursively avancing on nested Schedules until the recursion terminates on the non-Schedule (primitive) Models to actually produce and execute simulation segments. These terminal models are wrapped in Primitives (which are also <: Model) that delegate simulation but add hooks for output handling and progress reporting and further automatically convert the simulation state to the representation required by the wrapped model.

GeneRegulatorySystems.Models.Scheduling.PrimitiveType

Wraps a non-Schedule Model to be invoked in the process of executing a Schedule, adding additional behavior around the forwarded invocation.

Invocation

(f!::Primitive)(x, Δt; path, trace = nothing, dryrun = nothing, context...)

Delegate to another Model f!.f!, adding pre- and post-processing.

This produces a single simulation segment; it

  • converts the simulation state to the representation required by the wrapped Model,
  • reports progress via @logmsg, and
  • if trace is given, calls it back with the new simulation state and appends various ancillary information in that call, including into to signal if and where results should be saved.

The wrapped models are expected to retain intermediate results for their last invocation in the simulation state x if into is not nothing so they can be saved in the trace callback.

If dryrun is given, execution short-circuits by calling that back instead.

source

As the interpreter descends on the specification, the constructed nested Schedules or Primitives obtain new or replaced bindings either from direct definition in the specification or from implicit built-in behavior (mostly related to output control), and they further keep track of their path in the recursion (see Paths and reification).

As a reminder, since each Schedule f! is a Model, it may advance the simulation state x by Δt ≥ 0.0 units of simulation time when being called as a functor like f!(x, Δt; ...). This call dispatches on the specific type of the Schedule's (top-level) specification, and the following describes the resulting behavior.

GeneRegulatorySystems.Models.Scheduling.ScheduleType
Schedule{S <: Specification} <: Model{Any}

A Model that advances the simulation by delegating to other Models for a sequence of simulation segments that is organized according to a specification of type S.

This process may involve multiple levels of recursively constructing and executing Schedules, reflecting the potentially nested structure of specification. When descending on the specification, the corresponding Schedules may accumulate a collection of named values (bindings) that may be inserted in place of free references within the nested specifications before their interpretation; in this way, Schedules support a limited amount of templating.

Specification

See Schedule specification.

Invocation

(f!::Schedule{Template})(x, Δt::Float64; path, context...)

Expand the template, convert it to a Model and forward the call to it.

Effectively, this means either a recursion to sub-schedules (if the template evaluates to a Specification), or the execution of primitive simulation segments, either recording everything (if the template evaluates to a Model) or only at the last timepoint (if it evaluates to a number).

Specifically, depending on the expanded value,

  • if it is already a Model, it first gets wrapped in a Wrapped (to tag it with f!.path for later reference), and then further in a Primitive (which adds output and progress reporting when invoked). The latter's into field is determined from f!.bindings[:into]; if it is "{channels}", into is set to f!.bindings[:channel] instead. Otherwise,
  • if it is a Specification, it gets baked into a new Schedule (closing over f!.bindings), or
  • if it is a number, a model gets looked up at f!.bindings[:do] and placed into a Primitive with skip set to the expanded number (resulting in a segment without output advancing to that timepoint, and another instant segment with output). This is a shortcut provided to handle the common case of discretely sampling along the time axis by setting any step::Specification to the desired step size, but it requires defining the Model as :do in an enclosing Scope. If :do is not bound, fall back to Wait).

(f!::Schedule{Slice})(x, Δt::Float64; context...)

Look up a Model in f!.bindings[:do] and forward the call to it.

Read this as "step in infinitesimal slices until the simulatation budget Δt is exhausted". If :do is not bound, fall back to Wait).


(f!::Schedule{Load})(x, Δt::Float64; load, context...)

Load a Specification from a JSON file, turn it into a Schedule and forward the call to it.

f!.path is passed to load, which needs to be a function that returns data structures as they would be produced by calling JSON.parse with dicttype = Dict{Symbol, Any}.


(f!::Schedule{<:Sequence})(x, Δt::Float64; context...)

Iterate the Models specified by f!.specification (a List or Each) and invoke them in sequence.

The exact behavior depends on whether f!.branch was set (by the directly enclosing Scope):

  • If so, simulation will not advance the state x, but will advance copies instead, one for each item in the sequence. The copies will share the same randomness instance, so because they draw from that randomness in order, their trajectories will start to differ at the branch (copy) time point. All advanced copies will be returned together with the original x as a Branched state so that they can optionally be merged (see Merge), but typically the branched components will instead be dropped downstream. (Note that by this point, their trajectories likely already have been traced in the respective Primitive invocations.)
  • Otherwise, the items are invoked in turn on the same state x. After each step, the remaining simulation budget Δt will be decreased by the advanced time interval. This means that steps may be invoked with Δt == 0.0, which typically means that dynamic Models will have no effect but Instant models will always be applied.

The nested Models' paths will be suffixed by their iteration index, separated either by "/" if f!.branch was set or by "-" otherwise. Additionally, f!.bindings[:channel] will be suffixed by "-" and the iteration index.


(f!::Schedule{Scope})(x, Δt::Float64; context...)

Advance by constructing a nested Model, optionally evaluating and adding new bindings to its context, and either invoking it exactly once or, if bindings[:to] is set, repeatedly until that simulation budget is exhausted.

The new bindings are determined by merging (the prior) f!.bindings and new entries obtained from f!.specification.definitions. If f!.barrier is set, this will only include :seed, :into, :channel and :defaults from f!.bindings. In either case, the definitions may contain references to f!.bindings, and new definitions will shadow prior bindings of the same name. (f!.barrier is currently only set when parsing a Load/:< literal from the JSON representation.)

The so extended bindings are then used to construct a new Model (a Schedule or Primitive) from the Specification in f!.step and invoke it. The new Model's path will be suffixed by "+" to signify descending on a Scope, unless f!.branch is set (because then the information is redundant since branching can only be specified in a Scope and the next path component is then guaranteed to start with "/").

If f!.specification.definitions[:to] is not set (i.e. directly in this Scope), the call is just forwarded to that new Model. Otherwise, the simulation time budget Δt is clipped to that value and the new Model is then invoked repeatedly, each time deducting the actually advanced simulation time from Δt, until it is exhausted. (The invocation will pass the full remaining Δt each time, but the nested Model is allowed to advance less than that, for example because it is a Schedule that has :to defined itself.)

source

Paths and reification

As a Schedule invocation descends on its Specification, it keeps track of its current path in that tree and includes it when constructing the terminal Primitive models that ultimately produce the simulation segments. In this way, all segments generated by a Schedule and its top-level Specification are uniquely identified by their path, and path prefixes likewise address contiguous ranges of simulation segments associated with inner nodes of the specification.

Further, the current path is recorded for all definitions evaluated during schedule execution:

GeneRegulatorySystems.Models.Scheduling.LocatorType
Locator

Contains a path to an object within a Schedule.

As a Schedule is executed, Locators will be bound (with names starting on "^") alongside the explicitly defined bindings to record the path within the Schedule where the definition was evaluated. These "source" bindings can therefore be referenced in Templates, and are further used to wrap evaluated Models in Models.Wrapped to remember where they were originally defined.

source

Every path is a String that consists of segments, each describing a single step of descent:

  • descending on a Scope with branch unset appends "+",
  • descending on a Sequence appends a "/" and the within-sequence index if a directly enclosing Scope has branch set, and a "-" and the within-sequence index otherwise, and
  • evaluating a binding definition in a Scope appends a "." and the corresponding key.

While the terminal Primitive models may advance the simulation state stochastically, their construction and organization as part of Schedule execution is fully deterministic, and further independent between recursion branches at each inner Specification node. This means that each object produced in the process of stepping through the Schedule can be reified exactly, given only the root Schedule (defined by specification and bindings) and the object's corresponding path. This functionality is exposed through the reify function:

GeneRegulatorySystems.Models.Scheduling.reifyFunction
reify(x, path; load = nothing)

Recreate an object by repeatedly descending on the definition object x, as selected by path, expanding the required definitions along the way.

When called directly, x will typically be a Schedule, but it doesn't have to be: As a convenience, reify can index into AbstractVectors, AbstractDicts and other objects (by accessing their indices, keys or properties).

Reification will follow the same rules of descent though the definition object as the corresponding direct invocation, but instead of walking the full tree will only descend on one branch per inner node, as selected by path, implicitly reifying further definition objects along the way.

If any of the intermediate definition objects are of type Schedule{Load}, the load keyword must be given, analogously to invoking the Schedule, so that reify knows how to execute the Load.

source

The experiment tool traces all simulation segments' Primitive paths and includes them in its results index so that they unambiguously identify their definition location and can also be reified if needed.

Reification can be useful to obtain a specific object or model that is defined within a JSON specification document, such as for export or further analysis. It can also be used for better understanding or debugging the scheduling mechanism. To assist with this, the reify tool provides a CLI wrapper script to the reify function that supports pretty printing and can be pointed either at an experiment results location or directly at a JSON specification file.