Importing Data
data.Rmd
There are applications that would require the data to be separate from the model. This is the case where several competing models exist for the same set of data. Our focus is on the case where we are developing a singular model for a given data-set. For this reason, we store them together. Having two models would require us to copy the same data-set into the directories of all competing models. On a system where symbolic links are possible (or hard links) they can be used to conserve disk-space for large data-sets.
The data-sets themselves can be stored in a normal SBtab table, using any valid TableName
. How many data sets there are and their type are listed in the table of experiments: Experiments.tsv
Each experiment can be a time series experiment, or a dose response curve (where each line is a separate time series with a single measurement time-point, a final state that counts).
In a time series table, the first column is called !TimePoint
, the second is typically !Time
, followed by the measured quantities labelled as >outputFunctionID
(values) and ~outputFunctionID
(error estimates).
Time Series
label | example | meaning |
---|---|---|
!TimePoint | Experiment0Time0 |
same as !ID, but for times (a unique string) |
!Time | -0.5 |
a floating point constant |
>outputFunctionID | >Calcium_Out |
values that are meant to be compared to the Output called Calcium_Out
|
~outputFunctionID | ~Calcium_Out |
error estimated for the values in >outputFunctionID |
Each time series typically requires one model simulation to reproduce (unless scheduled events are happening).
Dose Response
An experiment that maps an increasing input to output values. In such cases the output has to happen at one pre-defined time-point for each dose. These dose-response curves will be transformed into n time-series experiments during parsing, where n is the number of content-rows (without headers).
label | example | meaning |
---|---|---|
!ID | Experiment0Dose0 |
a unique string, identifying this dose |
>anInput | 300 |
a valid value for one of the input parameters |
>outputFunctionID | >Calcium_Out |
values that are meant to be compared to the Output called Calcium_Out
|
~outputFunctionID | ~Calcium_Out |
error estimated for the values in >outputFunctionID |
A dose response curve requires n simulations of the model to reproduce.
Scheduled Events
an experiment can contain sudden events, in systems biology this is useful to describe experiments that include an intervention at a specified time (activation, silencing, stimulus, action potential, etc.), these events happen much faster than the system dynamics and modelling them exactly would slow down the solver dramatically. In the case of an event at \(t\), the solver is stopped, a linear (or affine) transformation to the current state \(x(t)\) and parameters \(p\) is applied (in C):
\[ x(t) := A x(t) + b \]
(this is an assignment, not a mathematical equality, there is a discontinuity if \(A\) and \(b\) have non-trivial values)
Each experiment[[i]]$event
has this structure:
-
experiment[[1]]$event
-
time
(a numeric vector, the event schedule) -
tf
(a named list with two items, both affine transformations)-
state
(the state transformation, a named list)-
A
(a three dimensional array, where the third dimension corresponds to the time) -
b
(a three dimensional array, with the second dimension always being 1, and third as long as the time vector)
-
-
param
(the parameter transformation, a named list)-
A
(a three dimensional array, where the third dimension corresponds to the time) -
b
(a three dimensional array, with the second dimension always being 1, and third as long as the time vector)
-
-
-
-
experiment[[2]]$event
- […]
-
experiment[[3]]$event
- […]
same type of content for each experiment. In some cases it is permissible to omit parts if they are trivial (\(b\) is trivial if it is 0
, the neutral element of addition). In such cases, omitted items can be NULL
(this may not work in some combinations, subject to improvement).
R-functions
The icpm-kth/SBtabVFGEN package has functions to import a model with the data into R data structures. If you did not install this already:
remotes::install_github("icpm-kth/SBtabVFGEN")
and once it is installed:
model.tsv <- dir(pattern="[.]tsv$",full.names=TRUE) # a list of file paths to TSV files
model.tab <- SBtabVFGEN::sbtab_from_tsv(model.tsv) # a list of data.frames
experiments <- SBtabVFGEN::sbtab.data(model.tab) # a list of simulation experiments with data
The experiments
variable is a list, each member (itself a list), describes a simulation experiment. Each experiment may result in one or more calls to the simulator (in C this amounts to resetting the driver of the chosen solver). An experiment has the properties:
item | type | meaning |
---|---|---|
input | numeric vector | known input parameters |
initialTime | numeric vector | a scalar time \(t_0\) |
initialState | numeric vector | initial state \(x_0\) of the ode \(\dot x = f(x(t),t,p=c(k,u))\), \(x(t_0)=x_0\) |
outputTimes | numeric vector | a time vector that corresponds to when measurements were taken |
outputValues | data.frame | the values of the data at the above outputTimes |
errorValues | data.frame | an indication of the measurement error/noise |
events | list | a sudden transformation event |
Gaussian Measurement errors
For Gaussian noise, errorValues
can be the standard deviation of the mean. the data frame has the same shape and names as the output values. The usual way to write this somewhere is typically
outputValues ± errorValues
For other error models, or noise distributions, the user can decide what kind of values are useful and use them in their custom scoring functions (untested by us).