Importing Models into R
models.Rmd
If the model can be described in terms of reactions, inputs, and outputs, then the most straightforward way to define it is through SBtab tables.
For completely different models (or solvers), the user must either write their own likelihood function (cost-function/objective-function) or write a vf file manually, to generate C code from.
It is also possible to directly write the model functions in C (for
gsl_odeiv2
).
The SBtab path uses these functions:
f <- dir(pattern="[.]tsv$",full.names=TRUE) # file names
sb <- SBtabVFGEN::sbtab_from_tsv(f) # a list of data.frames
cl <- SBtabVFGEN::sbtab_to_vfgen(sb) # conservation laws, if any
ex <- SBtabVFGEN::sbtab.data(sb,cl) # includes the data
The SBtab model has now been loaded into R as
data.frames
, and also converted into an ODE. The data has
been loaded as well.
The rest of this page describes the variables we just created, as well as the written files.
Real Example
library(uqsa)
library(SBtabVFGEN)
f <- uqsa_example('AKAR4') # file names
print(f[1])
#> [1] "/tmp/RtmpMhLIv3/temp_libpath2362eb7e3ffb7c/uqsa/extdata/AKAR4/AKAR4_100nM.tsv"
sb <- SBtabVFGEN::sbtab_from_tsv(f) # a list of data.frames
#> [tsv] file[1] «AKAR4_100nM.tsv» belongs to Document «AKAR4»
#> I'll take this as the Model Name.
#> AKAR4_100nM.tsv AKAR4_25nM.tsv AKAR4_400nM.tsv AKAR4_Compound.tsv AKAR4_Experiments.tsv AKAR4_Output.tsv AKAR4_Parameter.tsv AKAR4_Reaction.tsv
cl <- SBtabVFGEN::sbtab_to_vfgen(sb) # conservation laws, if any
#> Document Name: AKAR4.
#> SBtab has 8 tables.
#> The names of SBtab[[1]]:
#> !Time, >AKAR4pOUT, ~AKAR4pOUT
#> !Name !KineticLaw
#> reaction_1 reaction_1 kf_C_AKAR4*C*AKAR4 - kb_C_AKAR4*AKAR4_C
#> reaction_2 reaction_2 kcat_AKARp*AKAR4_C
#> !ReactionFormula !IsReversible
#> reaction_1 C + AKAR4 <=> AKAR4_C 1
#> reaction_2 AKAR4_C <=> AKAR4p + C 0
#> There is no «Constant» Table in this model. This is OK.
#> There is no «Expression» Table in this model. This is OK.
#> compound names:
#> [1] "AKAR4" "AKAR4_C" "AKAR4p" "C"
#> Units:
#> [1] "micromole/liter" "micromole/liter" "micromole/liter" "micromole/liter"
#> ---
#> There is no «Input» Table in this model.
#> units in «!Unit» column:
#> [1] "micromole/liter" "1/(micromolarity*second)"
#> [3] "1/second" "liter"
#> automatically created sbml unit ids:
#> [1] "micromole_per_liter" "one_over_micromolarity_second"
#> [3] "one_over_second" "liter"
#> [1] "Compartment"
#> [1] "AKAR4"
#> unit of species 1 (AKAR4): «micromole_per_liter»
#> [1] "AKAR4_C"
#> unit of species 2 (AKAR4_C): «micromole_per_liter»
#> [1] "AKAR4p"
#> unit of species 3 (AKAR4p): «micromole_per_liter»
#> [1] "C"
#> unit of species 4 (C): «micromole_per_liter»
#> adding SBtab parameter 1: «kf_C_AKAR4»
#> unit of parameter 1 (kf_C_AKAR4): «one_over_micromolarity_second»
#> adding SBtab parameter 2: «kb_C_AKAR4»
#> unit of parameter 2 (kb_C_AKAR4): «one_over_second»
#> adding SBtab parameter 3: «kcat_AKARp»
#> unit of parameter 3 (kcat_AKARp): «one_over_second»
#> adding reaction 1: «reaction_1»
#> converting flux to mathml: «kf_C_AKAR4*C*AKAR4 - kb_C_AKAR4*AKAR4_C»
#> reaction rates in SBML need to have the unit «substance/time» rather than «concentration/time».
#> So conventional kinetics need to be multiplied by compartment volumes.
#> Since this model has only one Compartment, all kinetics will be multiplied by its volume for the SBML model.
#> [SBML] Reactants:
#> [1] "C" "AKAR4"
#> [SBML] Products:
#> [1] "AKAR4_C"
#> ---
#> [1] "C" "" "C"
#> [1] "AKAR4" "" "AKAR4"
#> [1] "AKAR4_C" "" "AKAR4_C"
#> adding reaction 2: «reaction_2»
#> converting flux to mathml: «kcat_AKARp*AKAR4_C»
#> reaction rates in SBML need to have the unit «substance/time» rather than «concentration/time».
#> So conventional kinetics need to be multiplied by compartment volumes.
#> Since this model has only one Compartment, all kinetics will be multiplied by its volume for the SBML model.
#> [SBML] Reactants:
#> [1] "AKAR4_C"
#> [SBML] Products:
#> [1] "AKAR4p" "C"
#> ---
#> [1] "AKAR4_C" "" "AKAR4_C"
#> [1] "AKAR4p" "" "AKAR4p"
#> [1] "C" "" "C"
#> output 1:
#> [1] "AKAR4pOUT"
#> unit of output 1 (AKAR4pOUT): «micromole_per_liter»
#> output 1 has formula:
#> [1] "108 + 380*AKAR4p"
#> 108 + 380 * AKAR4p
#> The sbml file has been named «AKAR4.xml».
#> class(IsConstant): logical.
#> [1] Formula Unit
#> <0 rows> (or 0-length row.names)
#> [1] Formula Unit
#> <0 rows> (or 0-length row.names)
#> character(0)
#> ---
#> character
#> Reaction 1:line (a->b): «C + AKAR4 » ←→ « AKAR4_C»
#> where a: [1] "C " " AKAR4 "
#> and b: [1] " AKAR4_C"
#> Products:
#> 1 × AKAR4_C (AKAR4_C is compound 2)
#> Reactants:
#> 1 × C (C is compound 4)
#> 1 × AKAR4 (AKAR4 is compound 1)
#> Reaction 2:line (a->b): «AKAR4_C » ←→ « AKAR4p + C»
#> where a: [1] "AKAR4_C "
#> and b: [1] " AKAR4p " " C"
#> Products:
#> 1 × AKAR4p (AKAR4p is compound 3)
#> 1 × C (C is compound 4)
#> Reactants:
#> 1 × AKAR4_C (AKAR4_C is compound 2)
#> Number of compounds: 4
#> Number of Reactions: 2
#> [,1] [,2]
#> [1,] 0 1
#> [2,] 1 0
#> [3,] 0 1
#> [4,] 1 -1
#> Stoichiometric Matrix:
#> [,1] [,2]
#> [1,] -1 0
#> [2,] 1 -1
#> [3,] 0 1
#> [4,] -1 1
#> ---
#> Conservation Law dimensions: 4 × 2
#> To check that the conservation laws apply: norm(t(StoichiometryMatrix) * ConservationLaw == 0.00000)
#> [,1] [,2]
#> [1,] 0 1
#> [2,] 1 0
#> [3,] 0 1
#> [4,] 1 -1
#> [1] 2
#> [1] 0 1 0 1
#> [1] 0.2 0.0 0.0 0.0
#> 1*AKAR4_C+1*C
#> This will comment out compound 2 («AKAR4_C», initial value: 0), Conserved Constant = 0.000000
#> [1] 1 0 1 -1
#> [1] 0.2 0.0 0.0 0.0
#> 1*AKAR4+1*AKAR4p-1*C
#> This will comment out compound 1 («AKAR4», initial value: 0.2), Conserved Constant = 0.200000
#> Loading required package: hdf5r
#> StateVariable AKAR4 will be commented out as it was already defined as a Mass Conservation Law Expression.StateVariable AKAR4_C will be commented out as it was already defined as a Mass Conservation Law Expression.
#> The vf content was written to: AKAR4.vf
#> MOD: StateVariable AKAR4 will be commented out as it was already defined as a Mass Conservation Law Expression.
#> MOD: StateVariable AKAR4_C will be commented out as it was already defined as a Mass Conservation Law Expression.
#> The mod content was written to: AKAR4.mod
ex <- SBtabVFGEN::sbtab.data(sb,cl) # includes the data
The ex
variable holds simulation instructions as a plain
R list: inputs, initial values, measurement times, etc. – but also the
actual data.
SBtab content:
## Reactions, as an example of SBtab in R:
print(sb$Reaction)
#> !Name !KineticLaw
#> reaction_1 reaction_1 kf_C_AKAR4*C*AKAR4 - kb_C_AKAR4*AKAR4_C
#> reaction_2 reaction_2 kcat_AKARp*AKAR4_C
#> !ReactionFormula !IsReversible
#> reaction_1 C + AKAR4 <=> AKAR4_C 1
#> reaction_2 AKAR4_C <=> AKAR4p + C 0
## Conservation Laws, as text, and its attributes:
cl$Text
#> [1] "AKAR4_C_ConservedConst = AKAR4_C+1*C"
#> [2] "AKAR4_ConservedConst = AKAR4+1*AKAR4p-1*C"
## data:
head(ex[[1]]$outputValues)
#> AKAR4pOUT
#> E400T001 108.60
#> E400T002 111.35
#> E400T003 108.75
#> E400T004 111.40
#> E400T005 111.70
#> E400T006 113.25
head(ex[[1]]$errorValues)
#> AKAR4pOUT
#> E400T001 14.7950
#> E400T002 14.9325
#> E400T003 14.8025
#> E400T004 14.9350
#> E400T005 14.9500
#> E400T006 15.0275
Data as a plot:
Files
The function sbtab_to_vfgen
will produce several files,
using the model’s name (Document='model_name'
attribute of
SBtab):
-
model_name.vf
, a vfgen compatible file -
model_name.mod
, a NEURON mod file - (optional)
model_name.xml
, an SBML file, if you have the R bindings to libsbml installed in R and libsbml in your system -
model_name.tar.gz
, compressed archive of a text representation of all ODE elements, no specific format -
model_name.zip
, the same astar.gz
, but as a zip archive
Let’s look at the state variables of the ode:
grep '<StateVariable' AKAR4.vf
#> <!-- <StateVariable Name="AKAR4" Description="removed compound" DefaultInitialCondition="0.2" Formula="-reaction_1"/> -->
#> <!-- <StateVariable Name="AKAR4_C" Description="removed compound" DefaultInitialCondition="0" Formula="+reaction_1-reaction_2"/> -->
#> <StateVariable Name="AKAR4p" Description="compound" DefaultInitialCondition="0" Formula="+reaction_2"/>
#> <StateVariable Name="C" Description="compound" DefaultInitialCondition="0" Formula="-reaction_1+reaction_2"/>
Two State variables have been commented out. These two have been replaced by vfgen expressions:
grep '<Expression' AKAR4.vf
#> <Expression Name="AKAR4_C" Description="derived from conservation law 1" Formula="AKAR4_C_ConservedConst - (C)"/>
#> <Expression Name="AKAR4" Description="derived from conservation law 2" Formula="AKAR4_ConservedConst - (AKAR4p-C)"/>
#> <Expression Name="reaction_1" Description="flux" Formula="kf_C_AKAR4*C*AKAR4 - kb_C_AKAR4*AKAR4_C"/>
#> <Expression Name="reaction_2" Description="flux" Formula="kcat_AKARp*AKAR4_C"/>
Finally, we clean up the generated files:
for t in vf tar.gz zip ; do rm ./AKAR4.$t ; done
[ -f AKAR4.xml ] && rm AKAR4.xml # since that one is optional
echo $?
#> 0
The Vector Field file
The vf file is compatible with the vfgen tool, which
creates code in many languages from this vf
file (matlab,
cvode, octave, python, xppaut). Use this tool if you want to use the
created source code for other programming languages. Some SBtab content
will be missing here (e.g. scheduled events). The vfgen tool uses the ginac library to calculate Jacobians.
Ginac cannot parse vf files with inequality expressions in them
(e.g. a < b
).
Converting the ODE to code
We use our tool, RPN-derivative, which
reads the same file (or one of the archive files:
zip
/tar.gz
) to create slightly different C
code. The main advantages are:
- a vector valued output function with a predictable name:
int MODEL_func()
- the error code of each function is the length of the output buffers that need to be allocated
- more functions are created:
- initial conditions:
int MODEL_init()
- default parameter values:
int MODEL_default()
- initial conditions:
The repository of this tool, icpm-kth/RPN-derivative, contains a
shell script ode.sh;
this script uses one of three methods to calculate analytical
derivatives for the Jacobians: 1. maxima, 2. yacas, 3. the
bin/derivative
binary that RPN-derivative contains (after
make && make install
)
Assuming that you followed all installation instructions:
Note: ode.sh
is compatible with the same vf files from
SBtabVFGEN, but doesn’t support all of the output formats that VFGEN can
do, and does nothing related to delays.
Background on Parameters vs Inputs
Our goal is to perform parameter estimation. A systems biology model typically has parameters that are either reaction rate coefficients \(k_f\), equilibrium constants \(K_d\), Hill coefficients, and many other quantities related to kinetc laws; they are often unknown or not uniquely determined yet. But, not all parameters of a model are unknown, and not all model parameters are intrinsic to the system we study. Some of them model the interventions we do to the model when performing an experiment. This could be the frequency of a driving force for a mechanical model, or the amount of a treatment dose (added to the system) that affects the time course of reactions:
- a substrate, or buffer
- an enzyme,
- a silencing agent,
- an inhibitor
These parameters are known to us, because they are written down in the protocol. If the input is time dependent, then its dynamics (or explicit algebraic functions) have to become part of the model. These input parameters can be different between experiments, but the intrinsic parameters are always the same.
The ordinary differential equation on the other hand does not need to know the distinction between known or unknown parameters. For these reasons:
parMCMC <- [...] # some sampling variable
k <- parMap(parMCMC) # a model compatible parameter vector
u <- experiments[[i]]$input # a valid model input vector
p <- c(k,u)
# solve ODE using p
where k
are intrinsic, unknown parameters (subject to
sampling), and u
are the known input parameters that are
encoded in the input
field of an experiment.
If input parameters u
exist, then they are always
concatenated (p <-c(k,u)
) in that order and passed to
the model, the model’s C code only sees p
. The wrapper
functions in the rgsl
package do this concatenation when
the solver is called. We sample in logarithmic space, but the solver
gets k
and appends the right u
for each
experiment, both in linear space.
So, if parMCMC
is in log10-space, then
parMap
is the transformation to linear-space for the solver
to work in:
parMap <- function(par) {
return(10^(par))
}