Scheduling
The scheduling system forms the core of this package as it ties the various included regulation and instant adjustment models together into single reproducible experiment definitions that, besides the gene regulatory systems to be simulated, may also include initial setup and seed control, simulated -omics extraction protocols, (optionally regularly repeating) interventions as well as simulation branching into independent samples sharing parts of their history.
The primary entity in this module is the Schedule, which is a type of Model that advances the simulation by organizing simulation segments that proceed by invoking other Models. Since the orchestrated Models may themselves be Schedules, it is possible to assemble complex sequences of simulation segments and conceptually integrate the accumulated state changes into single unified trajectories.
Each Schedule has a Specification that represents the exact instructions for how to advance. The Specification also has an alternative representation as a JSON document from which it can be conveniently constructed.
Details are given below, but broadly, the scheduling system supports
- construction of
Schedules and otherModels from their JSON representation, - a simple templating mechanism allowing the definition of named bindings and their insertion into specifications before model construction,
- iteration both over explicit enumerations of specifications and over values to consecutively insert into a nested specification template,
- repetition of (finite-time) nested
Schedules, - automatic conversion of the simulation state between the representations required by the consecutive segments'
Models, - random seed control,
- hooks for output control and progress reporting, and
- trajectory branching.
To support this multitude of requirements, Specifications are assembled from reusable building blocks (which are themselves Specifications), effectively defining a domain-specific programming language. Conceptually, the JSON representation is the source code of the scheduling language, its Dict/Array form after JSON.parseing is an intermediate representation, its Specification form is an abstract syntax tree, and a Schedule containing it is an interpreter that is invoked whenever the simulation state should be advanced. The syntax of the scheduling language is therefore defined by the allowed combinations of Specifications, and the associated semantics defined in terms of which simulation segments are produced and executed.
The experiment command line tool wraps all of this functionality by loading a collection of JSON specifications, interpreting them and collecting the results in a structured format, along with an index of simulation segments and the specifications used to produce the data. For reproducible simulation experiments, this is the recommended way to use the package.
Each simulation segment can be addressed by a path relative to the Schedule that produced it, and the index includes each segment's path. Since the Schedule can be recreated from its Specification (and the associated bindings) and the sequence of segments it produces is deterministic, the scheduling system's iteration state can be fully reified for any simulation segment, including simulation context such as the exact Model in effect. This can be used to obtain that model for export or analysis without re-running the full experiment, and also to better understand or debug the scheduling subsystem.
Schedule specification
A Schedule's simulation plan is represented by its specification, which is effectively a tree of syntactic elements (all <: Specification) of the scheduling mini-language, with Scope, List and Each being inner nodes and Template, Load and Slice terminal nodes.
Besides its specification, each Schedule keeps track of a map of named value bindings. Templates (and by extension, indirectly also other Specifications enclosing them) may include references to such named values that are to be inserted when stepping through the Schedule; these values may be defined by an enclosing Scope or Each, or they may be free and must then be injected into the Schedule's bindings on construction. Schedules therefore close over their specification, which in that process will ensure that no references are left dangling.
The following are the potential elements of a Specification:
GeneRegulatorySystems.Specifications.Template — Type
Template <: SpecificationContains instructions for instantiating ("expanding") a value from a definition.
When stepping through a Schedule, expanding Templates produces all non-Specification values that influence the schedule's behavior, which includes all primitive Models.
The definition may contain references to named bindings. When expanded, these references are first replaced by their values, and then the function held in the Template's constructor field is called on the result. References come in two forms:
- Any
Dictxthat contains a:$key will be replaced by the object addressed byx[:$]. If that value is aString, it refers to the binding of that same name. If it is aVector, it refers to a nested object addressed by its items interpreted as path components; each item will in turn descend by accessing the respective key, index or property. If the found valuex′is also aDictand there are other mappings inx, they are merged into (a shallow copy of)x′, overriding previously existing mappings. - Any
Stringsthat contains substrings of the form"${binding}"will be replaced by aStringwith that reference substituted by the respective binding'sreprforStrings andNumbers, or the literal"__omitted__"otherwise.
Substitution will in geneneral not return independent objects but rather alias intermediate Dicts, Vectors and other objects into the substituted objects if they contain no substitutions of of their own. In other words, the produced data structures are treated as persistent (and therefore immutable) during expansion.
GeneRegulatorySystems.Specifications.Slice — Type
Slice <: SpecificationEmpty singleton that represents an infinitesimal-time step in the simulation.
It acts as a sentinel element in the scheduling language and roughly has the role of Nothing.
GeneRegulatorySystems.Specifications.Load — Type
Load <: SpecificationRepresents an instruction to load, parse and insert a Specification from a file.
Its path is relative and given context when invoking the containing Schedule{Load} via the load function argument.
GeneRegulatorySystems.Specifications.Sequence — Type
Sequence <: SpecificationAbstract supertype of specifications that can be iterated.
The meaning of the specified Sequence (to be interpreted when executing a schedule) depends on whether a directly enclosing Scope has its branch flag set.
GeneRegulatorySystems.Specifications.List — Type
List <: SequenceRepresents a static list of specifications.
GeneRegulatorySystems.Specifications.Each — Type
Each <: SequenceRepresents a sequence of specifications defined implicitly by setting a named binding, in turn, to each item from an ordered collection of items and evaluating a nested Specification step in the resulting context.
The iteration variable, defined by as, does not necessarily need to be named or used; if it is not, the Each effectively represents a repetition of the same (nested) specification.
GeneRegulatorySystems.Specifications.Scope — Type
Scope <: SpecificationContains named value definitions (mostly Templates) that apply to a nested context specified by step (which is a Specification).
In that sense it is equivalent to a lexical scope in any programming language, but it can additionally be thought of conceptually as applying to a range on the time axis during simulation, either filling its parent range or, if definitions[:to] is set, limited to that duration.
Template values in definitions may contain references to bindings from a surrounding scope (Scope or Each, or by inclusion on Schedule construction).
The definitions shadow bindings of the same name from a surrounding scope. If the barrier flag is set, any bindings from a surrounding scope will not be available in the nested step; only new bindings from definitions will be included (with some exceptions, see Schedule{Scope}).
If the branch flag is set, the step must be a Sequence specification and should then be interpreted as specifying independent simulation branches instead of the default behavior of acting on the same simulation state one after the other.
JSON representation
While Schedules and their specifications can be built by hand, users will typically construct them using the Specification function from their alternative representation as JSON documents:
GeneRegulatorySystems.Specifications.Specification — Type
Abstract supertype of all syntactic elements of the scheduling language.
Construction
Specification(x; bound::Set{Symbol} = Set{Symbol}(), as::Symbol = :step)Construct a Specification from nested Dict/Vector objects such as they are obtained by loading JSON via JSON.parse(..., dicttype = Dict{Symbol, Any}).
Specification recursively interprets a JSON document (or part thereof) and returns a Specification subtype depending on the type and shape of the JSON. It will be interpreted as either a :step, a :value or :items; the top-level document will be interpreted as a :step. Specifically:
- When expecting a
:step:- If
xis aVector(JSON Array), it will be parsed as aListof:stepSpecifications. - If
xis aDict(JSON Object), the earliest matching rule of the following applies:- (reference literal) If
xcontains a:$key (JSON name"$"), it is parsed as aTemplateexpanding to the binding referenced by the corresponding value. - (load literal) If
xcontains a:<key (JSON name"<"), it is parsed as aLoadof the file referenced by the corresponding value. ThisLoadwill be wrapped in aScopethat hasbarrierset and collects all the other mappings inxasdefinitions, interpreting the mapped values as:valueSpecifications. - (template literal) If
xcontains a single mapping, and that mapping's key is enclosed in braces (JSON names"{...}"), it is parsed as aTemplatethat is expanded by transforming the substituted value using a function returned by passing the key (without the braces) toSpecifications.constructor. - (each) If
xcontains an:eachkey (JSON name"each"), it is parsed as anEach. The iterableitemsare defined byx[:each]interpreted as an:itemsSpecification; thestepis defined byx[:step]interpreted as a:stepSpecification, and the index variable name is optionally defined byx[:as]. If there are any other mappings inx, they will be collected and theEachwrapped in aScopeusing these definitions. In other words, the corresponding definitions are available in theitemsandstepdefinitions, but cannot refer to the index variable. - (scope) If
xis not empty, it is parsed as aScope, with thestepdefined byx[:step]interpreted as a:stepSpecification(defaulting toSlice()) andbranchoptionally set byx[:branch]. All the other mappings inxare collected asdefinitions, interpreting the mapped values as:valueSpecifications. - (slice) Otherwise
xis empty and is parsed asSlice().
- (reference literal) If
- Otherwise,
xis parsed as aTemplateexpanding tox.
- If
- When expecting
:items:- If
xis aDict(JSON Object), the earliest matching rule of the following applies:- (reference literal) as above in the
:stepcase - (template literal) as above in the
:stepcase - (range literal) Otherwise
xis parsed as aTemplatethat is expanded by calling Julia'srangefunction, splatting the substituted value as keyword arguments.
- (reference literal) as above in the
- Otherwise,
xis parsed as aTemplateexpanding tox.
- If
- When expecting a
:value:- If
xis aDict(JSON Object), the earliest matching rule of the following applies:- (reference literal) as above in the
:stepcase - (load literal) as above in the
:stepcase - (template literal) as above in the
:stepcase - Otherwise,
xis parsed as aTemplateexpanding tox.
- (reference literal) as above in the
- Otherwise,
xis parsed as aTemplateexpanding tox.
- If
When parsing the alternative representation as a Specification, whenever a Template is constructed, the corresponding constructor, to be called when it is eventually expanded, is looked up by calling Specifications.constructor. This allows the definition of (non-String and non-number) terminal values within the JSON specification language, including all actual Models.
GeneRegulatorySystems.Specifications.constructor — Method
constructor(name::Symbol)Select by name and return a function that accepts a substituted Template value and constructs an object of the selected kind from it.
This function's methods define which object literals can be used in the JSON specification language and how they should be interpreted after template substitution. Each method must accept a single argument of a type as it would be produced by calling JSON.parse(..., dicttype = Dict{Symbol, Any}).
To see which objects can be defined in the language using the {"{...}": ...} syntax, you can simply call methods(Specifications.constructor). To register new types of objects and in this way support them in the language, you may define new methods of the form constructor(::Val{:...}).
Schedule semantics
When a Schedule is invoked to advance the simulation state (i.e. by calling it as a functor, like any Model), its exact behavior is determined by the type of its (top-level) specification, which may involve constructing and recursively avancing on nested Schedules until the recursion terminates on the non-Schedule (primitive) Models to actually produce and execute simulation segments. These terminal models are wrapped in Primitives (which are also <: Model) that delegate simulation but add hooks for output handling and progress reporting and further automatically convert the simulation state to the representation required by the wrapped model.
GeneRegulatorySystems.Models.Scheduling.Primitive — Type
Wraps a non-Schedule Model to be invoked in the process of executing a Schedule, adding additional behavior around the forwarded invocation.
Invocation
(f!::Primitive)(x, Δt; path, trace = nothing, dryrun = nothing, context...)Delegate to another Model f!.f!, adding pre- and post-processing.
This produces a single simulation segment; it
- converts the simulation state to the representation required by the wrapped
Model, - reports progress via
@logmsg, and - if
traceis given, calls it back with the new simulation state and appends various ancillary information in that call, includingintoto signal if and where results should be saved.
The wrapped models are expected to retain intermediate results for their last invocation in the simulation state x if into is not nothing so they can be saved in the trace callback.
If dryrun is given, execution short-circuits by calling that back instead.
As the interpreter descends on the specification, the constructed nested Schedules or Primitives obtain new or replaced bindings either from direct definition in the specification or from implicit built-in behavior (mostly related to output control), and they further keep track of their path in the recursion (see Paths and reification).
As a reminder, since each Schedule f! is a Model, it may advance the simulation state x by Δt ≥ 0.0 units of simulation time when being called as a functor like f!(x, Δt; ...). This call dispatches on the specific type of the Schedule's (top-level) specification, and the following describes the resulting behavior.
GeneRegulatorySystems.Models.Scheduling.Schedule — Type
Schedule{S <: Specification} <: Model{Any}A Model that advances the simulation by delegating to other Models for a sequence of simulation segments that is organized according to a specification of type S.
This process may involve multiple levels of recursively constructing and executing Schedules, reflecting the potentially nested structure of specification. When descending on the specification, the corresponding Schedules may accumulate a collection of named values (bindings) that may be inserted in place of free references within the nested specifications before their interpretation; in this way, Schedules support a limited amount of templating.
Specification
Invocation
(f!::Schedule{Template})(x, Δt::Float64; path, context...)Expand the template, convert it to a Model and forward the call to it.
Effectively, this means either a recursion to sub-schedules (if the template evaluates to a Specification), or the execution of primitive simulation segments, either recording everything (if the template evaluates to a Model) or only at the last timepoint (if it evaluates to a number).
Specifically, depending on the expanded value,
- if it is already a
Model, it first gets wrapped in aWrapped(to tag it withf!.pathfor later reference), and then further in aPrimitive(which adds output and progress reporting when invoked). The latter'sintofield is determined fromf!.bindings[:into]; if it is"{channels}",intois set tof!.bindings[:channel]instead. Otherwise, - if it is a
Specification, it gets baked into a newSchedule(closing overf!.bindings), or - if it is a number, a model gets looked up at
f!.bindings[:do]and placed into aPrimitivewithskipset to the expanded number (resulting in a segment without output advancing to that timepoint, and another instant segment with output). This is a shortcut provided to handle the common case of discretely sampling along the time axis by setting anystep::Specificationto the desired step size, but it requires defining theModelas:doin an enclosingScope. If:dois not bound, fall back toWait).
(f!::Schedule{Slice})(x, Δt::Float64; context...)Look up a Model in f!.bindings[:do] and forward the call to it.
Read this as "step in infinitesimal slices until the simulatation budget Δt is exhausted". If :do is not bound, fall back to Wait).
(f!::Schedule{Load})(x, Δt::Float64; load, context...)Load a Specification from a JSON file, turn it into a Schedule and forward the call to it.
f!.path is passed to load, which needs to be a function that returns data structures as they would be produced by calling JSON.parse with dicttype = Dict{Symbol, Any}.
(f!::Schedule{<:Sequence})(x, Δt::Float64; context...)Iterate the Models specified by f!.specification (a List or Each) and invoke them in sequence.
The exact behavior depends on whether f!.branch was set (by the directly enclosing Scope):
- If so, simulation will not advance the state
x, but will advance copies instead, one for each item in the sequence. The copies will share the samerandomnessinstance, so because they draw from that randomness in order, their trajectories will start to differ at the branch (copy) time point. All advanced copies will be returned together with the originalxas aBranchedstate so that they can optionally be merged (seeMerge), but typically the branched components will instead be dropped downstream. (Note that by this point, their trajectories likely already have beentraced in the respectivePrimitiveinvocations.) - Otherwise, the items are invoked in turn on the same state
x. After each step, the remaining simulation budgetΔtwill be decreased by the advanced time interval. This means that steps may be invoked withΔt == 0.0, which typically means that dynamicModels will have no effect butInstantmodels will always be applied.
The nested Models' paths will be suffixed by their iteration index, separated either by "/" if f!.branch was set or by "-" otherwise. Additionally, f!.bindings[:channel] will be suffixed by "-" and the iteration index.
(f!::Schedule{Scope})(x, Δt::Float64; context...)Advance by constructing a nested Model, optionally evaluating and adding new bindings to its context, and either invoking it exactly once or, if bindings[:to] is set, repeatedly until that simulation budget is exhausted.
The new bindings are determined by merging (the prior) f!.bindings and new entries obtained from f!.specification.definitions. If f!.barrier is set, this will only include :seed, :into, :channel and :defaults from f!.bindings. In either case, the definitions may contain references to f!.bindings, and new definitions will shadow prior bindings of the same name. (f!.barrier is currently only set when parsing a Load/:< literal from the JSON representation.)
The so extended bindings are then used to construct a new Model (a Schedule or Primitive) from the Specification in f!.step and invoke it. The new Model's path will be suffixed by "+" to signify descending on a Scope, unless f!.branch is set (because then the information is redundant since branching can only be specified in a Scope and the next path component is then guaranteed to start with "/").
If f!.specification.definitions[:to] is not set (i.e. directly in this Scope), the call is just forwarded to that new Model. Otherwise, the simulation time budget Δt is clipped to that value and the new Model is then invoked repeatedly, each time deducting the actually advanced simulation time from Δt, until it is exhausted. (The invocation will pass the full remaining Δt each time, but the nested Model is allowed to advance less than that, for example because it is a Schedule that has :to defined itself.)
Paths and reification
As a Schedule invocation descends on its Specification, it keeps track of its current path in that tree and includes it when constructing the terminal Primitive models that ultimately produce the simulation segments. In this way, all segments generated by a Schedule and its top-level Specification are uniquely identified by their path, and path prefixes likewise address contiguous ranges of simulation segments associated with inner nodes of the specification.
Further, the current path is recorded for all definitions evaluated during schedule execution:
GeneRegulatorySystems.Models.Scheduling.Locator — Type
LocatorContains a path to an object within a Schedule.
As a Schedule is executed, Locators will be bound (with names starting on "^") alongside the explicitly defined bindings to record the path within the Schedule where the definition was evaluated. These "source" bindings can therefore be referenced in Templates, and are further used to wrap evaluated Models in Models.Wrapped to remember where they were originally defined.
Every path is a String that consists of segments, each describing a single step of descent:
- descending on a
Scopewithbranchunset appends"+", - descending on a
Sequenceappends a"/"and the within-sequence index if a directly enclosingScopehasbranchset, and a"-"and the within-sequence index otherwise, and - evaluating a binding definition in a
Scopeappends a"."and the corresponding key.
While the terminal Primitive models may advance the simulation state stochastically, their construction and organization as part of Schedule execution is fully deterministic, and further independent between recursion branches at each inner Specification node. This means that each object produced in the process of stepping through the Schedule can be reified exactly, given only the root Schedule (defined by specification and bindings) and the object's corresponding path. This functionality is exposed through the reify function:
GeneRegulatorySystems.Models.Scheduling.reify — Function
reify(x, path; load = nothing)Recreate an object by repeatedly descending on the definition object x, as selected by path, expanding the required definitions along the way.
When called directly, x will typically be a Schedule, but it doesn't have to be: As a convenience, reify can index into AbstractVectors, AbstractDicts and other objects (by accessing their indices, keys or properties).
Reification will follow the same rules of descent though the definition object as the corresponding direct invocation, but instead of walking the full tree will only descend on one branch per inner node, as selected by path, implicitly reifying further definition objects along the way.
If any of the intermediate definition objects are of type Schedule{Load}, the load keyword must be given, analogously to invoking the Schedule, so that reify knows how to execute the Load.
The experiment tool traces all simulation segments' Primitive paths and includes them in its results index so that they unambiguously identify their definition location and can also be reified if needed.
Reification can be useful to obtain a specific object or model that is defined within a JSON specification document, such as for export or further analysis. It can also be used for better understanding or debugging the scheduling mechanism. To assist with this, the reify tool provides a CLI wrapper script to the reify function that supports pretty printing and can be pointed either at an experiment results location or directly at a JSON specification file.