The Comets Python Toolbox
The Comets python module is intended to offer a programatic, easy and intuitive interface to COMETS. While it internally uses the same COMETS Java engine as always, it replaces the legacy parameter files, simulation layouts or models, bash scripts and output files, by python objects in a single environment, which users deal with to perform simulations and analyze the results.
Any comets simulation starts from a layout and a set of parameters. The layout specifies the environment (media metabolites, refresh values, periodic dilutions) and the species present in it, that is, the models. The parameters specify many simulation characteristics, such as number of iterations, timestep, type of metabolite exchange or whether to record different output logs. Therefore, the two main types of objects are layout and params. These two are passed to the comets class, which perform simulations and contain their output.
In this section, we will walk through the basic functionalities of COMETS using the Python Toolbox, and more specific examples of usage will be provided in the next sections.
Preparing a model for COMETS
The class model is used to store the genome-scale metabolic models used in COMETS simulations. Most frequently, we will first load a model using COBRAPy. Then, we can pass it to the COMETS model class, which allows us to change COMETS-specific model parameters, such as initial population sizes.
import cobra
import cobra.test
import cometspy as c
# Load a textbook example model using the COBRAPy toolbox
test_model = cobra.test.create_test_model('textbook')
# Use the above model to create a COMETS model
test_model = c.model(test_model)
# Change comets specific parameters, e.g. the initial biomass of the model
# Notre
test_model.initial_pop = [0, 0, 1e-7]
Using license file /home/djordje/gurobi.lic
Academic license - for non-commercial use only
Setting COMETS simulation parameters
COMETS simulation parameters are stored in the params class, which contains just a dict object with the parameter names and values. If we initialize the class without arguments, it will contain the default parameter values (see here). Once loaded, the parameter values can be visualized and modified as desired.
# Create a parameters object with default values
my_params = c.params()
# Change the value of a parameter, for example number of simulation cycles
my_params.set_param('maxCycles', 100)
# Set some writeTotalBiomassLog parameter to True, in order to save the output
my_params.set_param('writeTotalBiomassLog', True)
# See avaliable parameters and their values
my_params.show_params()
| VALUE | UNITS | |
|---|---|---|
| BiomassLogName | biomass.txt | |
| BiomassLogRate | 1 | cycles |
| FluxLogName | flux_out | |
| FluxLogRate | 5 | cycles |
| MediaLogName | media_out | |
| ... | ... | ... |
| writeBiomassLog | False | logical |
| writeFluxLog | False | logical |
| writeMediaLog | False | logical |
| writeSpecificMediaLog | False | logical |
| writeTotalBiomassLog | True | logical |
62 rows × 2 columns
Preparing a COMETS simulation layout
The layout class describes the characteristics of the environment, i.e. the "world", including which species (models) are in it. It can be instantiated in empty or using COMETS models:
- If instantiated without arguments (as
my_layout = c.layout()), an empty layout is created with all necessary fields that have to be populated. - If a layout is instantiated passing a
model(or several models), it will generate a layout with all metabolites those models can exchange with the environment at zero concentration, plus metals and ions at unlimited concentration (default -1000).
To examine the different parts of a Comets layout, let\'s first create one from the above loaded textbook model:
my_layout = c.layout(test_model)
The layout stores information about the species (my_layout.models) and spatial structure (my_layout.grid) in the environment. In this case, the model is only the textbook one, and the grid is the default one, which is 1 \times 1 i.e. only one cell.
The layout stores also information about the media as a pandas dataframe. In this case, no amount of any media component is present.
my_layout.media
| diff_c | g_refresh | g_static | g_static_val | init_amount | metabolite | |
|---|---|---|---|---|---|---|
| 0 | 0.000005 | 0 | 0 | 0 | 0 | ac_e |
| 1 | 0.000005 | 0 | 0 | 0 | 0 | acald_e |
| 2 | 0.000005 | 0 | 0 | 0 | 0 | akg_e |
| 3 | 0.000005 | 0 | 0 | 0 | 0 | co2_e |
| 4 | 0.000005 | 0 | 0 | 0 | 0 | etoh_e |
| 5 | 0.000005 | 0 | 0 | 0 | 0 | for_e |
| 6 | 0.000005 | 0 | 0 | 0 | 0 | fru_e |
| 7 | 0.000005 | 0 | 0 | 0 | 0 | fum_e |
| 8 | 0.000005 | 0 | 0 | 0 | 0 | glc__D_e |
| 9 | 0.000005 | 0 | 0 | 0 | 0 | gln__L_e |
| 10 | 0.000005 | 0 | 0 | 0 | 0 | glu__L_e |
| 11 | 0.000005 | 0 | 0 | 0 | 0 | h2o_e |
| 12 | 0.000005 | 0 | 0 | 0 | 0 | h_e |
| 13 | 0.000005 | 0 | 0 | 0 | 0 | lac__D_e |
| 14 | 0.000005 | 0 | 0 | 0 | 0 | mal__L_e |
| 15 | 0.000005 | 0 | 0 | 0 | 0 | nh4_e |
| 16 | 0.000005 | 0 | 0 | 0 | 0 | o2_e |
| 17 | 0.000005 | 0 | 0 | 0 | 0 | pi_e |
| 18 | 0.000005 | 0 | 0 | 0 | 0 | pyr_e |
| 19 | 0.000005 | 0 | 0 | 0 | 0 | succ_e |
| metabolite | init_amount | diff_c | g_static | g_static_val | g_refresh_val |
|---|---|---|---|---|---|
| ca2_e | 1000 | NaN | 1 | 1000 | 0 |
| cbl_e | 1000 | NaN | 1 | 1000 | 0 |
| cl_e | 1000 | NaN | 1 | 1000 | 0 |
| . . . | . . . | . . . | . . . | . . . | . . . |
When initiated from models, the media compounds that can be in the environment are all those for which there is an exchange reaction in at least one of the models. The media, shown in the table above, is a pandas dataframe where several pieces of information are stored:
init_amountis the initial amount to be added to each cell of the simulation grid (in mmol).diff_cindicates whether the molecule has a diffusion constant different than the default one (stored inec_layout.global_diff)g_staticindicates whether the component should remain at a static value, i.e. without change due to consumption and other effects of the simulation. This is useful for example, for setting some nutrients as unlimited.g_static_valindicates at which value shouold the nutrient remain static, if the previous coulmn value is 1.g_refresh_valindicates the amount of the metabolite that should be added after each simulation cycle to each cell of the grid.
In addition, we can set local static and refresh values, specific to a cell of the simulation grid.
When a media component is static, this means that its concentration is returned in each cycle to the set static value. This is used when we want a media component to remain virtually unlimited during a simulation.
When a media component has a refresh value, this means it will be replenished by adding the set amount at every simulation cycle.
Local refresh values are stored in a list, my_layout.local_refresh, where each element of the list is itself a list with the form [ x y m1_r m2_r m3_r ... ]. The first two elements x and y represent the coordinates, and are followed by the refresh values for all metabolites, in the same order as in media.
Local static values are stored in a similar way. Each element of the my_layout.local_static list is itself a list with the form [ x y m1_s m1_s_v m2_s m2_s_v ... ]. The difference here is that for each metabolite, there are two values, one defining whether the molecule is to be static at that coordinate (m1_s, m2_s, ... ) and another with the value at which it should be kept (m1_s_v, m2_s_v, ... ).
Note that both local_refresh and local_static can be empty (the default), or contain only entries for the coordinates where there is at least one nonzero refresh or static value, respectively.
Finally, the layout also contains information about the starting biomass of each model. This information is stored in the initial_pop list. Each component of initial_pop is itself a list with the format [x y biomass_1 biomass_2 ...]specifying the amount of biomass of each model in each coordinate.
Running a COMETS simulation
The comets class uses a layout object and a parameters object to run simulations and store the output. Running a comets simulation is pretty straightforward. We firstly define the comets object by passing it a layout and a params objects as arguments. Then, we run() the simulation:
my_simulation = c.comets(my_layout, my_params)
my_simulation.run()
Running COMETS simulation ...
Done!
Checking simulation output and possible errors
In the background, this command invokes the COMETS Java engine in a console, giving a standard output (stdout) and standard error (stderr) logs. Both can be acessed through the fields run_outputs and run_errors, respectively.
print(my_simulation.run_output)
-script
running script file: /home/djordje/Dropbox/projects/comets_paper/Methods_Paper_Materials/COMETS_Examples/COMETS_getting_started/Python/.current_script
Current Java version: 11.0.7
Loading layout file '/home/djordje/Dropbox/projects/comets_paper/Methods_Paper_Materials/COMETS_Examples/COMETS_getting_started/Python/.current_layout'...
Found 1 model files!
Loading '/home/djordje/Dropbox/projects/comets_paper/Methods_Paper_Materials/COMETS_Examples/COMETS_getting_started/Python/e_coli_core.cmd' ...
Loading '/home/djordje/Dropbox/projects/comets_paper/Methods_Paper_Materials/COMETS_Examples/COMETS_getting_started/Python/e_coli_core.cmd' ...
Academic license - for non-commercial use only
Academic license - for non-commercial use only
Done!
Testing default parameters...
Done!
Optimizer status code = 5 (looks ok!)
objective solution = 0.8739215069684305
Constructing world...
Done!
medialist ac_e acald_e akg_e co2_e etoh_e for_e fru_e fum_e glc__D_e gln__L_e glu__L_e h2o_e h_e lac__D_e mal__L_e nh4_e o2_e pi_e pyr_e succ_e
Cycle 1
Total biomass:
Model e_coli_core.cmd: 1.0E-7
Cycle 2
Total biomass:
Model e_coli_core.cmd: 1.0E-7
...
Total time = 0.312s
print(my_simulation.run_errors)
STDERR empty.
Accessing the results of the simulation
The results of the successful simulation are stored in several fields in the comets object, depending on whether the parameters writeTotalBiomasslog, writeBiomassLog, writeFluxLog and writeMediaLog were set to True.
- The field
total_biomassstores the total biomass (summed up over all coordinates) for each timepoint and species. - The field
biomassstores detailed biomass values for each timepoint, coordinate and species. - The field
mediastores the composition of the media at each timepoint. - The field
fluxesstores the metabolic fluxes for each species, coordinate and timepoint.
Additionally, specific comets modes will have additional output fields; for instance, if we run an evolution simulation, the field genotypes will store information about each species such as its ancestor and which mutation it suffered.
All of the output files ae pandas dataframes which can be further analyzed or plotted using standard Python tools.
my_simulation.total_biomass
| cycle | e_coli_core | |
|---|---|---|
| 0 | 0.0 | 1.000000e-07 |
| 1 | 1.0 | 1.000000e-07 |
| 2 | 2.0 | 1.000000e-07 |
| 3 | 3.0 | 1.000000e-07 |
| 4 | 4.0 | 1.000000e-07 |
| ... | ... | ... |
| 96 | 96.0 | 1.000000e-07 |
| 97 | 97.0 | 1.000000e-07 |
| 98 | 98.0 | 1.000000e-07 |
| 99 | 99.0 | 1.000000e-07 |
| 100 | 100.0 | 1.000000e-07 |
101 rows × 2 columns