Skip to contents

There are applications that would require the data to be separate from the model. This is the case where several competing models exist for the same set of data. Our focus is on the case where we are developing a singular model for a given data-set. For this reason, we store them together. Having two models would require us to copy the same data-set into the directories of all competing models. On a system where symbolic links are possible (or hard links) they can be used to conserve disk-space for large data-sets.

The data-sets themselves can be stored in a normal SBtab table, using any valid TableName. How many data sets there are and their type are listed in the table of experiments: Experiments.tsv

Each experiment can be a time series experiment, or a dose response curve (where each line is a separate time series with a single measurement time-point, a final state that counts).

In a time series table, the first column is called !TimePoint, the second is typically !Time, followed by the measured quantities labelled as >outputFunctionID (values) and ~outputFunctionID (error estimates).

Time Series

label example meaning
!TimePoint Experiment0Time0 same as !ID, but for times (a unique string)
!Time -0.5 a floating point constant
>outputFunctionID >Calcium_Out values that are meant to be compared to the Output called Calcium_Out
~outputFunctionID ~Calcium_Out error estimated for the values in >outputFunctionID

Each time series typically requires one model simulation to reproduce (unless scheduled events are happening).

Example Time Series

!!SBtab Document='myModel' TableType='QuantityMatrix' TableName='AutorYEARfigureS1a' TableTitle='Data originally published in CITATION'
!TimePoint !Time >A_out ~A_out >totalB ~totalB
E0T0 -1.0 1.20 0.012 201.1 12
E0T1 +0.0 11.8 0.12 203.2 11
E0T2 +1.0 31.7 0.19 198.7 13

Dose Response

An experiment that maps an increasing input to output values. In such cases the output has to happen at one pre-defined time-point for each dose. These dose-response curves will be transformed into n time-series experiments during parsing, where n is the number of content-rows (without headers).

label example meaning
!ID Experiment0Dose0 a unique string, identifying this dose
>anInput 300 a valid value for one of the input parameters
>outputFunctionID >Calcium_Out values that are meant to be compared to the Output called Calcium_Out
~outputFunctionID ~Calcium_Out error estimated for the values in >outputFunctionID

A dose response curve requires n simulations of the model to reproduce.

Example Dose Response Curve

!!SBtab Document='myModel' TableType='QuantityMatrix' TableName='AutorYEARfigureS1b' TableTitle='Data originally published in CITATION'
!ID >treatment_dose >treatment_duration >A_final ~A_final
E0D0 200 50 50.1 1.2
E0D1 1000 50 83.2 0.9
E0D2 7000 25 74.7 1.8

Scheduled Events

an experiment can contain sudden events, in systems biology this is useful to describe experiments that include an intervention at a specified time (activation, silencing, stimulus, action potential, etc.), these events happen much faster than the system dynamics and modelling them exactly would slow down the solver dramatically. In the case of an event at \(t\), the solver is stopped, a linear (or affine) transformation to the current state \(x(t)\) and parameters \(p\) is applied (in C):

\[ x(t) := A x(t) + b \]

(this is an assignment, not a mathematical equality, there is a discontinuity if \(A\) and \(b\) have non-trivial values)

Each experiment[[i]]$event has this structure:

  • experiment[[1]]$event
    • time (a numeric vector, the event schedule)
    • tf (a named list with two items, both affine transformations)
      • state (the state transformation, a named list)
        • A (a three dimensional array, where the third dimension corresponds to the time)
        • b (a three dimensional array, with the second dimension always being 1, and third as long as the time vector)
      • param (the parameter transformation, a named list)
        • A (a three dimensional array, where the third dimension corresponds to the time)
        • b (a three dimensional array, with the second dimension always being 1, and third as long as the time vector)
  • experiment[[2]]$event
    • […]
  • experiment[[3]]$event
    • […]

same type of content for each experiment. In some cases it is permissible to omit parts if they are trivial (\(b\) is trivial if it is 0, the neutral element of addition). In such cases, omitted items can be NULL (this may not work in some combinations, subject to improvement).

R-functions

The icpm-kth/SBtabVFGEN package has functions to import a model with the data into R data structures. If you did not install this already:

remotes::install_github("icpm-kth/SBtabVFGEN")

and once it is installed:

model.tsv <- dir(pattern="[.]tsv$",full.names=TRUE) # a list of file paths to TSV files
model.tab <- SBtabVFGEN::sbtab_from_tsv(model.tsv)  # a list of data.frames
experiments <- SBtabVFGEN::sbtab.data(model.tab)    # a list of simulation experiments with data

The experiments variable is a list, each member (itself a list), describes a simulation experiment. Each experiment may result in one or more calls to the simulator (in C this amounts to resetting the driver of the chosen solver). An experiment has the properties:

item type meaning
input numeric vector known input parameters
initialTime numeric vector a scalar time \(t_0\)
initialState numeric vector initial state \(x_0\) of the ode \(\dot x = f(x(t),t,p=c(k,u))\), \(x(t_0)=x_0\)
outputTimes numeric vector a time vector that corresponds to when measurements were taken
outputValues data.frame the values of the data at the above outputTimes
errorValues data.frame an indication of the measurement error/noise
events list a sudden transformation event

Gaussian Measurement errors

For Gaussian noise, errorValues can be the standard deviation of the mean. the data frame has the same shape and names as the output values. The usual way to write this somewhere is typically

outputValues ± errorValues

For other error models, or noise distributions, the user can decide what kind of values are useful and use them in their custom scoring functions (untested by us).