Dynamic metagenome‐scale metabolic modeling of a yogurt bacterial community

Genome‐scale metabolic models and flux balance analysis (FBA) have been extensively used for modeling and designing bacterial fermentation. However, FBA‐based metabolic models that accurately simulate the dynamics of coculture are still rare, especially for lactic acid bacteria used in yogurt fermentation. To investigate metabolic interactions in yogurt starter culture of Streptococcus thermophilus and Lactobacillus delbrueckii subsp. bulgaricus, this study built a dynamic metagenome‐scale metabolic model which integrated constrained proteome allocation. The accuracy of the model was evaluated by comparing predicted bacterial growth, consumption of lactose and production of lactic acid with reference experimental data. The model was then used to predict the impact of different initial bacterial inoculation ratios on acidification. The dynamic simulation demonstrated the mutual dependence of S. thermophilus and L. d. bulgaricus during the yogurt fermentation process. As the first dynamic metabolic model of the yogurt bacterial community, it provided a foundation for the computer‐aided process design and control of the production of fermented dairy products.

Therefore, designing an optimal yogurt starter culture for desired yogurt properties is one of the primary engineering objectives for yogurt manufacturers.
To investigate and rationally engineer the yogurt starter culture in a more efficient and low-cost manner, a computational model is needed to simulate key variables, such as LAB biomass levels and concentrations of critical compounds, in the fermentation process.
There are mainly two types of models for simulating microbial growth and metabolism: differential-equation-based model and flux balance analysis (FBA)-based model. So far, there have been many attempts to use differential-equation-based models to simulate growth, substrate consumption, and lactic acid production of LABs (Bouguettoucha et al., 2011). However, some of these models are too simplistic that only consist of Monod or extended Monod equations that empirically link microbial growth and substrate utilization (Bâati et al., 2004;Vázquez & Murado, 2008;Youssef et al., 2005), leaving the whole metabolic network as a "black box." There also exist differential-equation-based "white box" models that capture the metabolic activity via a series of enzyme kinetic equations (Foster et al., 2021). These models are typically costly to construct due to various enzyme kinetic mechanisms (Ulusu, 2015) and would require a large number of enzyme kinetic parameters that are difficult to obtain (Bar-Even et al., 2011).
Alternatively, FBA-based metabolic models can avoid major shortcomings of differential-equation-based models. First, genome-scale metabolic models (GSMMs) can be easily reconstructed when annotated genomes are available (Mendoza et al., 2019). Second, FBA does not require information on enzyme kinetic mechanisms and kinetic parameters (e.g., k cat ) (Orth et al., 2010). Finally, the gene-protein-reaction relations in GSMMs allow the integration of multiomics data, such as quantified proteomics (Bakker et al., 2010). Currently, several GSMMs for dairy-origin LABs have already been reconstructed (Flahaut et al., 2013;Oliveira et al., 2005;Özcan et al., 2019;Pastink et al., 2009), and a dynamic coculture metabolic model for cheese starter culture involving those GSMMs has been built (Özcan et al., 2021). However, there is still a lack of metagenomescale metabolic models (Branco dos Santos et al., 2013) that can simulate the growth and metabolism of LAB cocultures used in real industrial scenarios (as opposed to assumed ones). Furthermore, existing FBA models of LAB cultures cannot simulate unique interspecies interactions in yogurt fermentation.
With the aim to quantitatively model the fermentation kinetics and metabolic interactions of LABs in yogurt fermentation, this study built the first dynamic metagenome-scale metabolic model of major species identified in the yogurt starter culture, that is, S. thermophilus (ST) and L. delbrueckii subsp. bulgaricus (LB). In addition, constrained proteome allocation, for the first time, was integrated into dynamic community-level FBA. Subsequently, we showed how the model can simulate the growth and metabolism of the ST/LB coculture during yogurt fermentation with good accuracy. Finally, we explored the potential of the developed model in supporting the design and optimization of the yogurt fermentation process via simulating the impact of differential ST/LB inoculation ratio on the overall fermentation behavior.

| Sample preparation and fermentation conditions
To conduct the experiment, a commercial yogurt fermentation starter (YoFlex Premium 1.0, CHR Hansen) of pack size 250 U was used. A growth medium was prepared by adding Fonterra whole milk powder to 1 L of water and stirring for 1 h to achieve a 12% (w/w) concentration. The growth medium was then heated to 95°C for 5 min and cooled to 42°C for the inoculation of the fermentation starter. The fermentation process was carried out at a temperature of 43°C for 5 h until the system reached the endpoint at pH =~4.5.

| Cell counting and pH measurement
To quantify the growth kinetics of bacteria during fermentation, viable cell counting was used. LABs were isolated from samples taken at different time points (0,30,60,90,105,120,150,180,240, and 300 min) using the dilution plate method (1:10). The isolated bacteria were then cultured on deMan, Rogosa, and Sharpe agar (AOBOX) and The pH change during fermentation was measured using a fermentation monitor (iCinac, AMS) with an Inlab Smart pro-ISM detection electrode (Mettler Toledo). Datapoints were collected at a frequency of three times per minute. Before measurement, the electrode was calibrated with a standard pH calibration solution.

| Metabolomics quantification
The concentrations of lactic acid and lactose were measured at different time points during fermentation using two different highperformance liquid chromatography (HPLC) systems: HPLC-DAD (Agilent 1260) with a C18 column (250 mm × 4.6 mm, Agilent) and HPLC-RI (Waters 2695-2414) with a Zorbax NH2 column (250 mm × 4.6 mm, Agilent). For lactic acid, samples were centrifuged at 8000g for 15 min at 4°C, and the supernatant, after being filtered with 0.22 μm pore size, was measured by HPLC. The injection volume was 10 μL, and the flow rate of the mobile phase (A = 0.1% phosphoric QIU ET AL. | 2187 acid, B = acetonitrile) was 0.5 mL/min. The column temperature was set at 30°C and the detection wavelength was 220 nm. The gradient elution procedure was as follows: 0-4 min, 90%-60% elution A; 4-6 min, 60%-50% elution A; 6-9 min, 50%-80% elution A; 9-15 min, 80%-90% elution A. For lactose, samples were first mixed with 12% TCA and centrifuged at 8000g for 15 min in 4°C. The supernatant, after being filtered with 0.22 μm pore size, was measured by HPLC. The injection volume was 10 μL, and the flow rate of the mobile phase was 1 mL/min, with the ratio of acetonitrile to water fixed at 70:30 (v/v). The column and detector temperature were set at 40°C. Lactic acid and lactose concentrations measured in this study were not used to validate the dynamic simulation, as explained in Section 2.2. They were only used to validate predicted lactic acid yield ratios from lactose consumed (Section 3.2), and measured lactic acid concentrations were used to fit a linear function for milk pH (Section 2.4.3).
The concentrations of free amino acids in 12% (w/w) reconstituted milk were measured using ultraperformance liquid chromatography (UPLC) (I-Class, Waters) equipped with a triple quadrupole mass spectrometer (Xevo TQ-S micro, Waters). To prepare the samples, 50% ethanol was added to the milk and then shaken. Next, 10 μL of the shaken samples was mixed with 10 μL of deionized water, 5 μL of D-norleucine (internal standard), and 40 μL of 0.1% formic acid in isopropanol. The samples were centrifuged for 10 min at 10,000g in 4°C. The resulting supernatant was then added to boric acid buffer and AccQ Tag solution (Kairos) for derivatization, and filtered using a 0.22-μm pore size filter. The column was UPLC HSS T3 (1.7 μm, 2.1 mm × 150 mm, Waters). The injection volume was 5 μL, and the flow rate of the mobile phase (A = 0.1% formic acid, B = acetonitrile) was 0.5 mL/ min. The column temperature was set at 50°C. The mass spectrometer was operated in electrospray ionization mode with an ionization energy of 1.5 kV. The cone voltage was set to 20 V, the desolvation temperature was maintained at 600°C, the desolvation gas flow rate was set to 1000 L/h, and the cone gas flow rate was set to 10 L/h. The gradient elution procedure was as follows: 0-2.5 min, 96%-90% elution A; 2.5-5 min, 90%-72% elution A; 5-6 min, 72%-5% elution A; 6-7 min, 5%-5% elution A; 7-9 min, 5%-96% elution A.
2.4 | Dynamic metabolic model from the metagenome of the yogurt starter culture Building the dynamic metagenome-scale metabolic model of the yogurt starter culture comprised two steps: (1) from metagenome to annotated protein-coding genes of major species and (2) from coding genes to GSMMs (Figure 1). With the resulting GSMMs, dynamic flux balance analysis (dFBA) was implemented to simulate bacterial growth and metabolism, and predict the change in fermentation behavior by perturbation to initial coculture composition.
2.4.1 | Metagenome assembly, binning, and annotation DNA extraction of the yogurt starter culture followed the procedure in the metagenomic study of cereal vinegar microbiota by Wu et al. (2017). Three parallel DNA samples were sequenced by Illumina PE150 platform, and the raw data were filtered for high-quality reads by removing adapter overlaps, reads with a quality value lower than 38 and length lower than 350 bp. After the quality control step, the cleaned data were assembled by MEGAHIT (Li et al., 2016) with default parameters.
F I G U R E 1 Workflow diagram of constructing a dynamic metabolic model for the yogurt starter culture based on metagenomic analysis. CONCOCT, Clustering cONtigs on COverage and ComposiTion; dFBA, dynamic flux balance analysis; GSMM, genome-scale metabolic model; GTDBtk, genome taxonomy database toolkit; HPLC, high-performance liquid chromatography; mOTUs, operational taxonomic units; UPLC, ultraperformance liquid chromatography.
Those bins' genomic abundances in each sample were then computed by the Quant_bin module in metaWRAP.
The taxonomy profile was computed with three different tools.
Using reference genomes, operational taxonomic units (mOTUs) identified species and computed relative abundances with unassembled DNA reads (Ruscheweyh et al., 2021). When the metagen- Protein-coding genes were predicted for each MAG by Prodigal (Hyatt et al., 2010). The functional annotation of protein-coding genes using eggNOG-mapper (Cantalapiedra et al., 2021) was carried out with Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa et al., 2016) and Carbohydrate-Active enZymes (CAZy) databases (Cantarel et al., 2009), providing information on cellular pathways and CAZy.

| Reconstruction of GSMMs
Nonredundant protein sequences were filtered from concatenated protein sequences of three parallel samples, and used as the input for automatic GSMM reconstruction by CarveMe (Machado et al., 2018).
In addition to protein sequences, other inputs were chemically defined media (CDM) for gap filling and the universal bacterial template. The CDM was adopted from the coculture metabolic model of cheese starter in Özcan et al. (2021). The universal bacterial template used for reconstruction was specialized for Gram-positive bacteria.
For refinement of GSMMs, the growth function (biomass synthesis, v Growth ) should be set as species-specific (see Supporting Information, Section 1). The stoichiometric coefficients in ST's growth function were adjusted based on measured biomass composition (Pastink et al., 2009), and the growth-associated adenosine triphosphate (ATP) requirement was adopted from the GSMM of S. thermophilus CH8 (Rau et al., 2022). Due to data limitation, the biomass composition in LB's growth function was set according to the default of Gram-positive bacteria (Magnúsdóttir et al., 2017), and the growth-associated ATP requirement was adopted from the GSMM of Lactobacillus plantarum (Teusink et al., 2006). On the basis of functional annotation of protein-coding genes (Section 2.4.1), reactions included erroneously were removed and missing reactions were added (see Supporting Information, Section 1). In addition, to characterize the proteolysis activity in the coculture system, a selfdefined reaction named "PROLYSIS" that utilizes the casein peptide was added to the GSMM: ( 141 270 35 73 1) → + + + Stoichiometric coefficients (α β , , …) in the reaction were approximated from fractions of amino acid in the casein protein of cow milk (Landi et al., 2021) (see Supporting Information, Section 1). The boundary of the flux through casein peptide utilization in each GSMM was set based on the proteolytic activity of each species (see Supporting Information, Section 3 and Figure S3B).

| dFBA and proteome allocation constraint
To simulate the growth of bacteria and the production of target metabolites in time, dFBA was adopted, as a combination of FBA and differential-equation-based dynamic system modeling (Henson & Hanly, 2014). The code of model implementation can be found at Furthermore, proteome allocation was implemented to constrain reaction fluxes of the central carbon metabolism (Regueira et al., 2021;Zeng & Yang, 2020). Proteome was divided into sectors of inflexible housekeeping (Q), anabolism (A), transportation (T), catabolism (C), and the free sector. The upper bound of all flexible sectors combined was assumed to be 50% of the total proteome (Equations 6 and 7).
The proteome cost on each reaction was computed as the ratio of the flux to the multiplicative product of enzyme activity, a i , and saturation degree, σ i (Equation 8). The enzyme saturation degrees (σ i ), except for that of the lactose transporter, are unknown and were set to 0.5, assuming they are similar to glycolytic enzymes in Escherichia coli which were reported to be mostly half-saturated (Bennett et al., 2009). For the milk environment, the saturation degree of the lactose transporter (σ LT ) was assumed to be 1 as it was considered to be fully saturated by abundant lactose; when the concentration of lactose was low, σ LT was assumed to follow Michaelis-Menten kinetics, σ =  (Table S5).
The activity of lactose uptake incorporated inhibition by undissociated lactate (LacH), the product of glycolysis under anaerobic conditions. The exponential decay equation (9) to model the inhibition of lactose transporter activity was adopted from Vereecken and Van Impe (2002) and Aghababaie et al. (2015). The minimal activity of lactose transport a LT min was set to maintain the growth rate at the stationary phase when pH is around 4.5 (see Supporting Information, Section 3 and Figure S3C). The concentration of undissociated lactate was computed using Henderson-Hasselbalch equation (10)  in this study, proteome allocation constraints were integrated into classical FBA to explain the preference for lactic acid production by LABs that is energetically less favorable than acetic acid production.
To implement proteome allocation constraints, enzyme activities of reactions in central carbon metabolism were mapped to the reconstructed metabolic network (full names of reactions can be found in Table S4), and proteome costs for producing 1 unit of the flux of lactic acid and acetic acid from pyruvate could be computed (Table S6). The proteome cost of lactic acid production per unit flux is 0.0071mg E/g DW, the same for ST and LB. For acetic acid, the proteome cost is 0.1218mg E/g DW in ST, but gets much larger in LB due to the lack of pyruvate formate lyase (PFL), 0.6843mg E/g DW. In short, the proteome cost of lactic acid production in LABs is much smaller than that of acetic acid production, though the latter pathway has a higher energy (ATP) yield.
To and Figure S3A). The predicted lactic acid yield ratio from lactose consumed (mol lactate/mol lactose) was 1.6919 for ST and 1.9305 for LB, which was consistent with the experimental measurement (Table S1) (Machado, 2023). In the in silico milk environment that has no purine (adenine, guanine, and xanthine) to mimic the nutrient composition of actual reconstituted milk used for yogurt fermentation, LB's growth is promoted by the increase of the concentration of formic acid (Figure 4d). In contrast, when purines are supplied, formic acid has little influence on LB's growth rate, which validates the previous finding that LB requires formic acid to synthesize DNA/RNA materials in environments with low levels of purines (Suzuki et al., 1986). Succinctly, the potential metabolic interactions of ST/LB community in the yogurt starter culture inferred by the GSMMs reconstructed in this study can be summarized as the following: ST provides formic acid for LB to synthesize DNA/RNA materials, and LB utilizes casein proteins to supply methionine to ST; they both consume lactose and produce lactic acid, which in turn inhibits their growth (Vereecken & Van Impe, 2002) (Figure 4e).
The addition of proteome constraints allows the model to account for the metabolic switch from acetic acid production to lactic acid production in ST when its growth rate increases with the increase of carbon source concentration, which has been demonstrated in Regueira et al. (2021) with a generic LAB-GSMM that can respond to the change of glucose concentration (Regueira et al., 2021) ( Figure 4f,g). The proteome costs allocated to acetate and lactate productions are computed as the proteome cost of PFL, pyruvate dehydrogenase (PDH), phosphotransacetylase, and acetate kinase, and that of lactate dehydrogenase (Figure 4h). In general, only two moles of lactic acid can be produced from the fermentation of one mole of lactose. The predicted ratio of acetic acid produced to lactose consumed sometimes exceeds 2, which results from the metabolism of other nutrients in the in silico culture medium ( Figure S6). With LB's GSMM, proteome-constrained FBA could not demonstrate a growth rate triggered a metabolic switch from acetic acid production to lactic acid production due to the much larger protein requirement of converting pyruvate to acetyl-CoA by solely PDH in LB (Figure 3b) and the biosynthetic requirement of fatty acids ( Figure S7). When the uptake of lactose provides enough carbon flux that exceeds the biosynthetic requirement of fatty acid, a surplus flux will be predicted to go through acetic acid production ( Figure S7).

| Dynamic simulation of growth kinetics and metabolism of ST/LB coculture
The dynamic simulation of yogurt fermentation was performed using the initial conditions of the coculture experiment in Oliveira et al.   Overall, the model, without parameter recalibration using the reference experimental data, adequately captured the initiation of the exponential growth phase and the transition to the stationary phase for both ST and LB, as well as the protocooperation between the two species (Figure 5a,b). The accumulation of formic acid (produced by ST) initiates the exponential growth of LB by activating its synthesis of purines, whose natural concentration is too low to support the growth of LB in the milk environment. In return, the activation of growth and metabolism of LB with strong proteolytic activity enhances the growth rate of ST by supplementing methionine, which is also limited in the milk environment (Figure 5e,f). With the accumulation of lactic acid, the growth rates of ST and LB are both progressively inhibited.
Finally, their growths are halted, as shown by the stationary phase (Figure 5a-c).

| Prediction of the impact of different initial ST/LB inoculation ratios on the fermentation behavior
Perturbations on the initial inoculation ratio of ST and LB of the yogurt starter culture were conducted to investigate its impact on the fermentation behavior. The total initial biomass concentration of the coculture was fixed at 0.18 g dry weight (DW)/L, the same as the initial condition setting in Section 3.3. The simulated growth curves of ST and LB from different initial ST/LB inoculation ratios demonstrate the mutual dependence of ST and LB in the coculture ( Figure 6a). When the initial ST:LB inoculation ratio is modulated to increase from 1 to 10, the proteolytic activity of the microbial community becomes weaker due to the decrease of the initial biomass concentration of LB, and correspondingly, the growth of ST is also reduced due to lowered supply of methionine; When the initial  Poolman et al., 1995): (e) predicted growth rates, (g) fluxes through acetic/lactic acid productions, and (h) proteome costs allocated to biosynthetic pathways of acetic and lactic acids. 2mba, 2-methylbutanoic acid; 2mpa, 2-methylpropanoic acid; 2obut, 2-oxobutanoate; 3mba, 3-methylbutanoic acid; 4hba, 4-hydroxy-benzyl alcohol; CDM, chemically defined medium; LB, Lactobacillus delbrueckii subsp. bulgaricus; MPL, milieu proche du lait; MPL, milieu proche du lait; ST, Streptococcus thermophilus; succ, succinic acid. ST:LB inoculation ratio is set to decrease from 1 to 0.1, the lowered productivity of formic acid from ST makes LB enter the exponential phase slower than that of the case with initial ST:LB = 1 (Figure 6a). In addition, the simulation shows that, given a certain range of initial ST:LB ratios (in this case, 0.1-10), ST will eventually become the dominant species (Figure 6b), which agrees with the previous study on the rods (LB) to cocci (ST) ratio in cheese fermentation (Yun et al., 1995).
The predicted average acidification rates (lactic acid concentra- At the early stage of yogurt fermentation (0-3 h), starter cultures with the initial ST:LB ratio larger than 1 have more than twice the lactic acid production rates, in contrast to starter cultures with the initial ST:LB ratio lower than 1; later (3-5 h), the lactic acid production rate of the starter culture with initial ST:LB = 0.1 catches up ( Figure 6d). To sum up, initial ST:LB = 1 is predicted to be optimal for lactic acid production in 5 h if the initial total biomass is fixed at 0.18 g DW/L, and the acidification will generally be faster initially when ST is the dominant species in the coculture. On the basis of predicted acidification kinetics, the starter culture composition can be designed for targeted acidification patterns, for example, ST:LB ratio = 0.1 is suitable for slow acidification first but fast acidification later.

| DISCUSSION
Overall, this work presented the reconstruction of GSMMs for the yogurt starter culture, that is, the coculture of ST and LB, based on metagenomic analysis and provided a dynamic metagenome-scale metabolic modeling approach for simulating the microbial growth and metabolism during the yogurt fermentation process. Although community-level FBA has already been used in a few scenarios to simulate growth and metabolism (Branco dos Santos et al., 2013;Khandelwal et al., 2013;Sieuwerts, 2009), the model in this study, for the first time,  (Oliveira et al., 2012;Sieuwerts, 2009), and provided confirmation of such ecological interaction with data-driven modeling.
Although the model showed good predictive accuracy in bacterial growth and metabolism (Sections 3.2 and 3.3), it had limitations in mainly three aspects: (1) MAG-derived GSMMs of ST and LB lacked accurate strain-specific biomass compositions and growth-associated ATP requirements; (2) the model could not yet predict fluxes through the biosynthesis of important flavor compounds, such as methyl ketones, and the probiotic exopolysaccharides that are often produced in yogurt fermentation; (3) no mechanistic representation of regulatory activities was included in the current model. For instance, acidification by ST was previously found to be stimulated by formic acid, casitone, pyruvic acid, folic acid, and polysorbate 20 (Sieuwerts et al., 2010).
Monoculture of dominant ST and LB strains separated from the starter culture will be needed to approximate the growthassociated ATP requirement from the carbon source utilized (Teusink et al., 2006), and gas chromatography/mass spectrometry or HPLC can be used to quantify major components in cellular biomass, that is, protein, DNA, RNA, lipids, and glycogen (Long & Antoniewicz, 2014;Simensen et al., 2022). To resolve the other two limitations, the proposed next step is to further refine the established GSMMs by manually adding the biosynthetic pathways of flavor and probiotic compounds of interests and implement dynamic regulatory FBA (Liu & Bockmayr, 2020) with meta-transcriptome profiling. For the current model, the failure to predict active fluxes towards the formation of diacetyl and acetoin was not caused by the lack of pathway reconstruction, but the lack of a constraint to divert the downstream metabolic flux to those biomass-independent products.
Despite those several limitations remain to be overcome, the genome-scale metabolic reconstructions and the dynamic community-level FBA model for the classical yogurt starter culture of ST and LB presented in this work have been shown to have the potential to offer an efficient tool to guide engineering decisions in fermentation processes, which could be used to address issues such as the optimal initial biomass ratio of ST and LB to maximize the rate of acidification or possibly other process targets (e.g., flavor formation). Furthermore, the two-species model can be expanded by including GSMMs of other probiotic bacteria, for example, bifidobacterium (Lamoureux et al., 2002), to simulate the fermentation dynamics of more complex starter cultures.