The Comets Python Toolbox
The Comets python module is intended to offer a programatic, easy and intuitive interface to COMETS
. While it internally uses the same COMETS
Java engine as always, it replaces the legacy parameter files, simulation layouts or models, bash scripts and output files, by python objects in a single environment, which users deal with to perform simulations and analyze the results.
Any comets simulation starts from a layout and a set of parameters. The layout specifies the environment (media metabolites, refresh values, periodic dilutions) and the species present in it, that is, the models. The parameters specify many simulation characteristics, such as number of iterations, timestep, type of metabolite exchange or whether to record different output logs. Therefore, the two main types of objects are layout
and params
. These two are passed to the comets
class, which perform simulations and contain their output.
In this section, we will walk through the basic functionalities of COMETS
using the Python Toolbox, and more specific examples of usage will be provided in the next sections.
Preparing a model for COMETS
The class model
is used to store the genome-scale metabolic models used in COMETS
simulations. Most frequently, we will first load a model using COBRAPy
. Then, we can pass it to the COMETS
model
class, which allows us to change COMETS
-specific model parameters, such as initial population sizes.
import cobra
import cobra.test
import cometspy as c
# Load a textbook example model using the COBRAPy toolbox
test_model = cobra.test.create_test_model('textbook')
# Use the above model to create a COMETS model
test_model = c.model(test_model)
# Change comets specific parameters, e.g. the initial biomass of the model
# Notre
test_model.initial_pop = [0, 0, 1e-7]
Using license file /home/djordje/gurobi.lic
Academic license - for non-commercial use only
Setting COMETS
simulation parameters
COMETS
simulation parameters are stored in the params
class, which contains just a dict
object with the parameter names and values. If we initialize the class without arguments, it will contain the default parameter values (see here). Once loaded, the parameter values can be visualized and modified as desired.
# Create a parameters object with default values
my_params = c.params()
# Change the value of a parameter, for example number of simulation cycles
my_params.set_param('maxCycles', 100)
# Set some writeTotalBiomassLog parameter to True, in order to save the output
my_params.set_param('writeTotalBiomassLog', True)
# See avaliable parameters and their values
my_params.show_params()
VALUE | UNITS | |
---|---|---|
BiomassLogName | biomass.txt | |
BiomassLogRate | 1 | cycles |
FluxLogName | flux_out | |
FluxLogRate | 5 | cycles |
MediaLogName | media_out | |
... | ... | ... |
writeBiomassLog | False | logical |
writeFluxLog | False | logical |
writeMediaLog | False | logical |
writeSpecificMediaLog | False | logical |
writeTotalBiomassLog | True | logical |
62 rows × 2 columns
Preparing a COMETS
simulation layout
The layout class describes the characteristics of the environment, i.e. the "world", including which species (models) are in it. It can be instantiated in empty or using COMETS
models:
- If instantiated without arguments (as
my_layout = c.layout()
), an empty layout is created with all necessary fields that have to be populated. - If a layout is instantiated passing a
model
(or several models), it will generate a layout with all metabolites those models can exchange with the environment at zero concentration, plus metals and ions at unlimited concentration (default -1000).
To examine the different parts of a Comets layout
, let\'s first create one from the above loaded textbook model:
my_layout = c.layout(test_model)
The layout stores information about the species (my_layout.models
) and spatial structure (my_layout.grid
) in the environment. In this case, the model is only the textbook one, and the grid is the default one, which is 1 \times 1 i.e. only one cell.
The layout stores also information about the media as a pandas dataframe. In this case, no amount of any media component is present.
my_layout.media
diff_c | g_refresh | g_static | g_static_val | init_amount | metabolite | |
---|---|---|---|---|---|---|
0 | 0.000005 | 0 | 0 | 0 | 0 | ac_e |
1 | 0.000005 | 0 | 0 | 0 | 0 | acald_e |
2 | 0.000005 | 0 | 0 | 0 | 0 | akg_e |
3 | 0.000005 | 0 | 0 | 0 | 0 | co2_e |
4 | 0.000005 | 0 | 0 | 0 | 0 | etoh_e |
5 | 0.000005 | 0 | 0 | 0 | 0 | for_e |
6 | 0.000005 | 0 | 0 | 0 | 0 | fru_e |
7 | 0.000005 | 0 | 0 | 0 | 0 | fum_e |
8 | 0.000005 | 0 | 0 | 0 | 0 | glc__D_e |
9 | 0.000005 | 0 | 0 | 0 | 0 | gln__L_e |
10 | 0.000005 | 0 | 0 | 0 | 0 | glu__L_e |
11 | 0.000005 | 0 | 0 | 0 | 0 | h2o_e |
12 | 0.000005 | 0 | 0 | 0 | 0 | h_e |
13 | 0.000005 | 0 | 0 | 0 | 0 | lac__D_e |
14 | 0.000005 | 0 | 0 | 0 | 0 | mal__L_e |
15 | 0.000005 | 0 | 0 | 0 | 0 | nh4_e |
16 | 0.000005 | 0 | 0 | 0 | 0 | o2_e |
17 | 0.000005 | 0 | 0 | 0 | 0 | pi_e |
18 | 0.000005 | 0 | 0 | 0 | 0 | pyr_e |
19 | 0.000005 | 0 | 0 | 0 | 0 | succ_e |
metabolite | init_amount | diff_c | g_static | g_static_val | g_refresh_val |
---|---|---|---|---|---|
ca2_e | 1000 | NaN | 1 | 1000 | 0 |
cbl_e | 1000 | NaN | 1 | 1000 | 0 |
cl_e | 1000 | NaN | 1 | 1000 | 0 |
. . . | . . . | . . . | . . . | . . . | . . . |
When initiated from models, the media compounds that can be in the environment are all those for which there is an exchange reaction in at least one of the models. The media, shown in the table above, is a pandas
dataframe where several pieces of information are stored:
init_amount
is the initial amount to be added to each cell of the simulation grid (in mmol).diff_c
indicates whether the molecule has a diffusion constant different than the default one (stored inec_layout.global_diff
)g_static
indicates whether the component should remain at a static value, i.e. without change due to consumption and other effects of the simulation. This is useful for example, for setting some nutrients as unlimited.g_static_val
indicates at which value shouold the nutrient remain static, if the previous coulmn value is 1.g_refresh_val
indicates the amount of the metabolite that should be added after each simulation cycle to each cell of the grid.
In addition, we can set local static
and refresh
values, specific to a cell of the simulation grid.
When a media component is static
, this means that its concentration is returned in each cycle to the set static value. This is used when we want a media component to remain virtually unlimited during a simulation.
When a media component has a refresh
value, this means it will be replenished by adding the set amount at every simulation cycle.
Local refresh values are stored in a list, my_layout.local_refresh
, where each element of the list is itself a list with the form [ x y m1_r m2_r m3_r ... ]
. The first two elements x
and y
represent the coordinates, and are followed by the refresh values for all metabolites, in the same order as in media
.
Local static values are stored in a similar way. Each element of the my_layout.local_static
list is itself a list with the form [ x y m1_s m1_s_v m2_s m2_s_v ... ]
. The difference here is that for each metabolite, there are two values, one defining whether the molecule is to be static at that coordinate (m1_s
, m2_s
, ... ) and another with the value at which it should be kept (m1_s_v
, m2_s_v
, ... ).
Note that both local_refresh
and local_static
can be empty (the default), or contain only entries for the coordinates where there is at least one nonzero refresh or static value, respectively.
Finally, the layout also contains information about the starting biomass of each model. This information is stored in the initial_pop
list. Each component of initial_pop
is itself a list with the format [x y biomass_1 biomass_2 ...]
specifying the amount of biomass of each model in each coordinate.
Running a COMETS
simulation
The comets
class uses a layout object and a parameters object to run simulations and store the output. Running a comets simulation is pretty straightforward. We firstly define the comets
object by passing it a layout
and a params
objects as arguments. Then, we run()
the simulation:
my_simulation = c.comets(my_layout, my_params)
my_simulation.run()
Running COMETS simulation ...
Done!
Checking simulation output and possible errors
In the background, this command invokes the COMETS
Java engine in a console, giving a standard output (stdout) and standard error (stderr) logs. Both can be acessed through the fields run_outputs
and run_errors
, respectively.
print(my_simulation.run_output)
-script
running script file: /home/djordje/Dropbox/projects/comets_paper/Methods_Paper_Materials/COMETS_Examples/COMETS_getting_started/Python/.current_script
Current Java version: 11.0.7
Loading layout file '/home/djordje/Dropbox/projects/comets_paper/Methods_Paper_Materials/COMETS_Examples/COMETS_getting_started/Python/.current_layout'...
Found 1 model files!
Loading '/home/djordje/Dropbox/projects/comets_paper/Methods_Paper_Materials/COMETS_Examples/COMETS_getting_started/Python/e_coli_core.cmd' ...
Loading '/home/djordje/Dropbox/projects/comets_paper/Methods_Paper_Materials/COMETS_Examples/COMETS_getting_started/Python/e_coli_core.cmd' ...
Academic license - for non-commercial use only
Academic license - for non-commercial use only
Done!
Testing default parameters...
Done!
Optimizer status code = 5 (looks ok!)
objective solution = 0.8739215069684305
Constructing world...
Done!
medialist ac_e acald_e akg_e co2_e etoh_e for_e fru_e fum_e glc__D_e gln__L_e glu__L_e h2o_e h_e lac__D_e mal__L_e nh4_e o2_e pi_e pyr_e succ_e
Cycle 1
Total biomass:
Model e_coli_core.cmd: 1.0E-7
Cycle 2
Total biomass:
Model e_coli_core.cmd: 1.0E-7
...
Total time = 0.312s
print(my_simulation.run_errors)
STDERR empty.
Accessing the results of the simulation
The results of the successful simulation are stored in several fields in the comets
object, depending on whether the parameters writeTotalBiomasslog
, writeBiomassLog
, writeFluxLog
and writeMediaLog
were set to True
.
- The field
total_biomass
stores the total biomass (summed up over all coordinates) for each timepoint and species. - The field
biomass
stores detailed biomass values for each timepoint, coordinate and species. - The field
media
stores the composition of the media at each timepoint. - The field
fluxes
stores the metabolic fluxes for each species, coordinate and timepoint.
Additionally, specific comets modes will have additional output fields; for instance, if we run an evolution simulation, the field genotypes
will store information about each species such as its ancestor and which mutation it suffered.
All of the output files ae pandas
dataframes which can be further analyzed or plotted using standard Python tools.
my_simulation.total_biomass
cycle | e_coli_core | |
---|---|---|
0 | 0.0 | 1.000000e-07 |
1 | 1.0 | 1.000000e-07 |
2 | 2.0 | 1.000000e-07 |
3 | 3.0 | 1.000000e-07 |
4 | 4.0 | 1.000000e-07 |
... | ... | ... |
96 | 96.0 | 1.000000e-07 |
97 | 97.0 | 1.000000e-07 |
98 | 98.0 | 1.000000e-07 |
99 | 99.0 | 1.000000e-07 |
100 | 100.0 | 1.000000e-07 |
101 rows × 2 columns