The Comets Python Toolbox

The Comets python module is intended to offer a programatic, easy and intuitive interface to COMETS. While it internally uses the same COMETS Java engine as always, it replaces the legacy parameter files, simulation layouts or models, bash scripts and output files, by python objects in a single environment, which users deal with to perform simulations and analyze the results.

Any comets simulation starts from a layout and a set of parameters. The layout specifies the environment (media metabolites, refresh values, periodic dilutions) and the species present in it, that is, the models. The parameters specify many simulation characteristics, such as number of iterations, timestep, type of metabolite exchange or whether to record different output logs. Therefore, the two main types of objects are layout and params. These two are passed to the comets class, which perform simulations and contain their output.

In this section, we will walk through the basic functionalities of COMETS using the Python Toolbox, and more specific examples of usage will be provided in the next sections.

Preparing a model for COMETS

The class model is used to store the genome-scale metabolic models used in COMETS simulations. Most frequently, we will first load a model using COBRAPy. Then, we can pass it to the COMETS model class, which allows us to change COMETS-specific model parameters, such as initial population sizes.

import cobra
import cobra.test
import cometspy as c

# Load a textbook example model using the COBRAPy toolbox 
test_model = cobra.test.create_test_model('textbook')

# Use the above model to create a COMETS model
test_model = c.model(test_model)

# Change comets specific parameters, e.g. the initial biomass of the model
# Notre 
test_model.initial_pop = [0, 0, 1e-7] 
Using license file /home/djordje/gurobi.lic
Academic license - for non-commercial use only
Setting COMETS simulation parameters

COMETS simulation parameters are stored in the params class, which contains just a dict object with the parameter names and values. If we initialize the class without arguments, it will contain the default parameter values (see here). Once loaded, the parameter values can be visualized and modified as desired.

# Create a parameters object with default values 
my_params = c.params()

# Change the value of a parameter, for example number of simulation cycles
my_params.set_param('maxCycles', 100)

# Set some writeTotalBiomassLog parameter to True, in order to save the output
my_params.set_param('writeTotalBiomassLog', True)

# See avaliable parameters and their values
my_params.show_params()
VALUE UNITS
BiomassLogName biomass.txt
BiomassLogRate 1 cycles
FluxLogName flux_out
FluxLogRate 5 cycles
MediaLogName media_out
... ... ...
writeBiomassLog False logical
writeFluxLog False logical
writeMediaLog False logical
writeSpecificMediaLog False logical
writeTotalBiomassLog True logical

62 rows × 2 columns

Preparing a COMETS simulation layout

The layout class describes the characteristics of the environment, i.e. the "world", including which species (models) are in it. It can be instantiated in empty or using COMETS models:

  • If instantiated without arguments (as my_layout = c.layout()), an empty layout is created with all necessary fields that have to be populated.
  • If a layout is instantiated passing a model (or several models), it will generate a layout with all metabolites those models can exchange with the environment at zero concentration, plus metals and ions at unlimited concentration (default -1000).

To examine the different parts of a Comets layout, let\'s first create one from the above loaded textbook model:

my_layout = c.layout(test_model)

The layout stores information about the species (my_layout.models) and spatial structure (my_layout.grid) in the environment. In this case, the model is only the textbook one, and the grid is the default one, which is 1 \times 1 i.e. only one cell.

The layout stores also information about the media as a pandas dataframe. In this case, no amount of any media component is present.

my_layout.media
diff_c g_refresh g_static g_static_val init_amount metabolite
0 0.000005 0 0 0 0 ac_e
1 0.000005 0 0 0 0 acald_e
2 0.000005 0 0 0 0 akg_e
3 0.000005 0 0 0 0 co2_e
4 0.000005 0 0 0 0 etoh_e
5 0.000005 0 0 0 0 for_e
6 0.000005 0 0 0 0 fru_e
7 0.000005 0 0 0 0 fum_e
8 0.000005 0 0 0 0 glc__D_e
9 0.000005 0 0 0 0 gln__L_e
10 0.000005 0 0 0 0 glu__L_e
11 0.000005 0 0 0 0 h2o_e
12 0.000005 0 0 0 0 h_e
13 0.000005 0 0 0 0 lac__D_e
14 0.000005 0 0 0 0 mal__L_e
15 0.000005 0 0 0 0 nh4_e
16 0.000005 0 0 0 0 o2_e
17 0.000005 0 0 0 0 pi_e
18 0.000005 0 0 0 0 pyr_e
19 0.000005 0 0 0 0 succ_e
metabolite init_amount diff_c g_static g_static_val g_refresh_val
ca2_e 1000 NaN 1 1000 0
cbl_e 1000 NaN 1 1000 0
cl_e 1000 NaN 1 1000 0
. . . . . . . . . . . . . . . . . .

When initiated from models, the media compounds that can be in the environment are all those for which there is an exchange reaction in at least one of the models. The media, shown in the table above, is a pandas dataframe where several pieces of information are stored:

  • init_amount is the initial amount to be added to each cell of the simulation grid (in mmol).
  • diff_c indicates whether the molecule has a diffusion constant different than the default one (stored in ec_layout.global_diff)
  • g_static indicates whether the component should remain at a static value, i.e. without change due to consumption and other effects of the simulation. This is useful for example, for setting some nutrients as unlimited.
  • g_static_val indicates at which value shouold the nutrient remain static, if the previous coulmn value is 1.
  • g_refresh_val indicates the amount of the metabolite that should be added after each simulation cycle to each cell of the grid.

In addition, we can set local static and refresh values, specific to a cell of the simulation grid.

When a media component is static, this means that its concentration is returned in each cycle to the set static value. This is used when we want a media component to remain virtually unlimited during a simulation.

When a media component has a refresh value, this means it will be replenished by adding the set amount at every simulation cycle.

Local refresh values are stored in a list, my_layout.local_refresh, where each element of the list is itself a list with the form [ x y m1_r m2_r m3_r ... ]. The first two elements x and y represent the coordinates, and are followed by the refresh values for all metabolites, in the same order as in media.

Local static values are stored in a similar way. Each element of the my_layout.local_static list is itself a list with the form [ x y m1_s m1_s_v m2_s m2_s_v ... ]. The difference here is that for each metabolite, there are two values, one defining whether the molecule is to be static at that coordinate (m1_s, m2_s, ... ) and another with the value at which it should be kept (m1_s_v, m2_s_v, ... ).

Note that both local_refresh and local_static can be empty (the default), or contain only entries for the coordinates where there is at least one nonzero refresh or static value, respectively.

Finally, the layout also contains information about the starting biomass of each model. This information is stored in the initial_pop list. Each component of initial_pop is itself a list with the format [x y biomass_1 biomass_2 ...]specifying the amount of biomass of each model in each coordinate.

Running a COMETS simulation

The comets class uses a layout object and a parameters object to run simulations and store the output. Running a comets simulation is pretty straightforward. We firstly define the comets object by passing it a layout and a params objects as arguments. Then, we run() the simulation:

my_simulation = c.comets(my_layout, my_params)
my_simulation.run()
Running COMETS simulation ...
Done!
Checking simulation output and possible errors

In the background, this command invokes the COMETS Java engine in a console, giving a standard output (stdout) and standard error (stderr) logs. Both can be acessed through the fields run_outputs and run_errors, respectively.

print(my_simulation.run_output)
-script
running script file: /home/djordje/Dropbox/projects/comets_paper/Methods_Paper_Materials/COMETS_Examples/COMETS_getting_started/Python/.current_script
Current Java version: 11.0.7
Loading layout file '/home/djordje/Dropbox/projects/comets_paper/Methods_Paper_Materials/COMETS_Examples/COMETS_getting_started/Python/.current_layout'...
Found 1 model files!
Loading '/home/djordje/Dropbox/projects/comets_paper/Methods_Paper_Materials/COMETS_Examples/COMETS_getting_started/Python/e_coli_core.cmd' ...
Loading '/home/djordje/Dropbox/projects/comets_paper/Methods_Paper_Materials/COMETS_Examples/COMETS_getting_started/Python/e_coli_core.cmd' ...
Academic license - for non-commercial use only
Academic license - for non-commercial use only
Done!
 Testing default parameters...
Done!
Optimizer status code = 5 (looks ok!)
objective solution = 0.8739215069684305
Constructing world...
Done!
medialist   ac_e    acald_e akg_e   co2_e   etoh_e  for_e   fru_e   fum_e   glc__D_e    gln__L_e    glu__L_e    h2o_e   h_e lac__D_e    mal__L_e    nh4_e   o2_e    pi_e    pyr_e   succ_e
Cycle 1
Total biomass:
Model e_coli_core.cmd: 1.0E-7
Cycle 2
Total biomass:
Model e_coli_core.cmd: 1.0E-7
...
Total time = 0.312s
print(my_simulation.run_errors)
STDERR empty.
Accessing the results of the simulation

The results of the successful simulation are stored in several fields in the comets object, depending on whether the parameters writeTotalBiomasslog, writeBiomassLog, writeFluxLog and writeMediaLog were set to True.

  • The field total_biomass stores the total biomass (summed up over all coordinates) for each timepoint and species.
  • The field biomass stores detailed biomass values for each timepoint, coordinate and species.
  • The field media stores the composition of the media at each timepoint.
  • The field fluxes stores the metabolic fluxes for each species, coordinate and timepoint.

Additionally, specific comets modes will have additional output fields; for instance, if we run an evolution simulation, the field genotypes will store information about each species such as its ancestor and which mutation it suffered.

All of the output files ae pandas dataframes which can be further analyzed or plotted using standard Python tools.

my_simulation.total_biomass
cycle e_coli_core
0 0.0 1.000000e-07
1 1.0 1.000000e-07
2 2.0 1.000000e-07
3 3.0 1.000000e-07
4 4.0 1.000000e-07
... ... ...
96 96.0 1.000000e-07
97 97.0 1.000000e-07
98 98.0 1.000000e-07
99 99.0 1.000000e-07
100 100.0 1.000000e-07

101 rows × 2 columns