Importing Data
data.Rmd
The data is part of the SBtab document, but it can also be seen as
separate from the model, so it’s imported with a different function.
This function also loads the inputs and initial values for the model.
When determining the inputs, the function takes conservation laws into
account and calculates input values for conserved quantities, based on
the stated initial values. The imported object ex
is a list
of simulation instructions:
library(uqsa)
library(SBtabVFGEN)
f <- uqsa_example('AKAR4cl') # file names
sb <- SBtabVFGEN::sbtab_from_tsv(f) # a list of data.frames
#> [tsv] file[1] «100nM.tsv» belongs to Document «AKAR4cl»
#> I'll take this as the Model Name.
#> 100nM.tsv 25nM.tsv 400nM.tsv Compound.tsv Experiments.tsv Output.tsv Parameter.tsv Reaction.tsv
cl <- readRDS(uqsa_example('AKAR4cl',f="RDS")) # previously saved laws
ex <- SBtabVFGEN::sbtab.data(sb,cl) # includes the data
ex[[1]]$input
#> AKAR4_C_ConservedConst AKAR4_ConservedConst
#> 0.4 -0.2
ex[[1]]$initialState
#> AKAR4p C
#> 0.0 0.4
The data-sets themselves can be stored in a normal SBtab table, using
any valid TableName
. How many data sets there are and their
type are listed in the table of experiments:
Experiments.tsv
. Data not listed in the table of
experiments is not loaded with the sbtab.data
function:
sb$Experiments
#> !Name >C
#> AKAR4_400nM AKAR4_400nM 0.400
#> AKAR4_100nM AKAR4_100nM 0.100
#> AKAR4_25nM AKAR4_25nM 0.025
#> !Comment
#> AKAR4_400nM Figure 3—figure supplement 1 in https://doi.org/10.7554/eLife.68164 (dark blue)
#> AKAR4_100nM Figure 3—figure supplement 1 in https://doi.org/10.7554/eLife.68164 (gray)
#> AKAR4_25nM Figure 3—figure supplement 1 in https://doi.org/10.7554/eLife.68164 (light blue)
# The first one has this data-set
head(ex$AKAR4_400nM$outputValues)
#> AKAR4pOUT
#> E400T001 108.60
#> E400T002 111.35
#> E400T003 108.75
#> E400T004 111.40
#> E400T005 111.70
#> E400T006 113.25
The values in ex
are simulation instructions for all
experiments.
item | type | meaning |
---|---|---|
input | numeric vector | known input parameters |
initialTime | numeric vector | a scalar time \(t_0\) |
initialState | numeric vector | initial state \(x_0\) of the ode \(\dot x = f(x(t),t,p=c(k,u))\), \(x(t_0)=x_0\) |
outputTimes | numeric vector | a time vector that corresponds to when measurements were taken |
outputValues | data.frame | the values of the data at the above outputTimes |
errorValues | data.frame | an indication of the measurement error/noise (standard error) |
events | list | a sudden transformation event |
But, any data sets not present in Experiments, are still accessible:
head(sb$AKAR4_400nM)
#> !Time >AKAR4pOUT ~AKAR4pOUT
#> E400T001 -15 108.60 14.7950
#> E400T002 -10 111.35 14.9325
#> E400T003 -5 108.75 14.8025
#> E400T004 0 111.40 14.9350
#> E400T005 5 111.70 14.9500
#> E400T006 10 113.25 15.0275
This works for any data set that has a TSV file - or a sheet in any other valid format (ods, xlsx). This table is just not a simulation instruction:
-
sb
has the values of all tsv files/sheets -
ex
has data and simulation instructions for the rgsl package
Each experiment is a time series experiment, after the import (dose response sets are converted to one time series experiment per point).
More about this topic can be found in the SBtab section.
The rest of this page explains SBtab features that are related to data-sets and experiment descriptions.
Time Series
In a time series table, the first column is called
!TimePoint
, the second is typically !Time
,
followed by the measured quantities labelled as
>outputFunctionID
(values) and
~outputFunctionID
(error estimates).
Label | Example | Meaning |
---|---|---|
!TimePoint | Experiment0Time0 |
same as !ID, but for times (a unique string) |
!Time | -0.5 |
a floating point constant |
>outputFunctionID | >Calcium_Out |
values that are meant to be compared to the Output
called Calcium_Out
|
~outputFunctionID | ~Calcium_Out |
error estimated for the values in >outputFunctionID |
Each time series typically requires one model simulation to reproduce (unless scheduled events are happening).
Dose Response
An experiment that maps an increasing input to output values. In such cases the output has to happen at one pre-defined time-point for each dose. These dose-response curves will be transformed into n time-series experiments during parsing, where n is the number of content-rows (without headers).
label | example | meaning |
---|---|---|
!ID | Experiment0Dose0 |
a unique string, identifying this dose |
>anInput | 300 |
a valid value for one of the input parameters |
>outputFunctionID | >Calcium_Out |
values that are meant to be compared to the Output
called Calcium_Out
|
~outputFunctionID | ~Calcium_Out |
error estimated for the values in >outputFunctionID |
A dose response curve requires n simulations of the model to reproduce.
Example Dose Response Curve
This curve has two inputs and one output (A_final
), each
line is a separate time series with one (final) measurement time
point.
!!SBtab Document='myModel' TableName='DataSetGamma'
!ID >treatment_dose >treatment_duration >A_final ~A_final
E0D0 200 50 50.1 1.2
E0D1 1000 50 83.2 0.9
E0D2 7000 25 74.7 1.8
Anything more complex has to be expressed as a
Time series
with an event schedule. The measurement time is
taken from the Experiments
table (Experiments.tsv).
Scheduled Events
An experiment can contain sudden events, in systems biology
this is useful to describe experiments that include an intervention at a
specified time (activation, silencing, stimulus, action potential,
etc.). These events happen much faster than the system dynamics and
modelling them exactly would slow down the solver dramatically. In the
case of an event at time \(t\), the
solver is stopped, a transformation to the current state \(x(t)\) and parameters \(p\) is applied (in our C code, in the
rgsl
package, not with deSolve
):
The SBtab table Transformation
(Tranformation.tsv
as a file) defines which transformations
the system is subject to.
An event table defines for each experiment, when the transformations apply.
The transformation function (C code), is auto-generated by RPN-derivative/sh/ode.sh.
An example of a Transformation table from the Spike model:
!!SBtab Document='Spike' TableName='Transformation'
!ID >Ca >Buffer
APCa Ca A*(k2-k1)*dose
These transformations apply at times given in an event schedule (Dose not used):
!!SBtab Document='Spike' TableName='AP20Hz'
!ID !Time !Transformation !Dose
Freq20HzAP0 0 APCa 1.0
Freq20HzAP1 50 APCa 1.0
Freq20HzAP2 100 APCa 1.0
Freq20HzAP3 150 APCa 1.0
Freq20HzAP4 200 APCa 1.0
Freq20HzAP5 250 APCa 1.0
Freq20HzAP6 300 APCa 1.0
Freq20HzAP7 350 APCa 1.0
Freq20HzAP8 400 APCa 1.0
And this schedule is applied to an experiment by listing it in the Experiment table:
!!SBtab Document='Spike' TableName='Experiment'
!ID !Event !T0
E20Hz AP20Hz -5
The !Event
column specifies the event schedule for this
experiment (each experiment is one row).
This setup will create this C function:
int Spike_event(double t, double y_[], void *par, int EventLabel, double dose)
{
double *p_=par;
if (!y_ || !par || EventLabel<0) return 1;
enum eventLabel { APCa, numEvents }; /* event name indexes */
enum stateVariable { var_Ca,var_Buffer, numStateVar }; /* state variable indexes */
enum param { par_k1,par_k2,par_A,par_CaBase, numParam }; /* parameter indexes */
double k1=p_[0];
double k2=p_[1];
double A=p_[2];
double CaBase=p_[3];
double Ca=y_[0];
double Buffer=y_[1];
double rf1=Buffer;
double rf2=-k1*k2*Ca-(k1+k2)*Buffer;
switch(EventLabel){
case APCa:
y_[var_Buffer] = A*(k2-k1)*dose; /* state variable transformation */
break;
}
return GSL_SUCCESS;
}
Any number of transformations can be added to the table. The
switch
statement will select the correct one to perform,
based on the label (integer/enum). In this case, there is only one
transformation. See Transformation Events for
more information.
Gaussian Measurement errors
For Gaussian noise, ~xyz
values can be the standard
deviation of the mean (standard error). the data frame has the same
shape and names as the output values. The usual way to write this
somewhere is typically
outputValues ± errorValues
For other error models, or noise distributions, the user can decide what kind of values are useful and use them in their custom scoring functions (untested by us).