Simultaneous design of fermentation and microbe

Constraint-based optimization of microbial strains and model-based bioprocess design have been used extensively to enhance yields in biotechnological processes. However, strain and process optimization are usually carried out in sequential steps, causing underperformance of the biotechnological process when scaling up to industrial fermentation conditions. Herein, we propose the optimization formulation SimulKnock that combines the optimization of a fermentation process with metabolic network design in a bilevel optimization program. The upper level maximizes space-time yield and includes mass balances of a continuous fermentation, while the lower level is based on flux balance analysis. SimulKnock predicts optimal gene deletions and finds the optimal trade-off between growth rate and product yield. Results of a case study with a genome-scale metabolic model of E. coli indicate higher space-time yields than a sequential approach using OptKnock for almost all target products considered. By leveraging SimulKnock, we reduce the gap between strain and process optimization.


Introduction
Industrial microbiology promises product synthesis from renewable feedstocks representing a sustainable alternative to petro-chemical synthesis [1].Often, these biotechnological products are produced by high-performing microbial strains that have been designed using metabolic engineering.Strain design is supported by computational methods to reduce the experimental effort [2].In the first place, constraint-based metabolic modeling formulations predict the influence of a given genetic modification on the microorganism [3].The bestknown formulation is flux balance analysis (FBA) [4,5].Using linear programming, FBA predicts the internal fluxes of a microorganism based on its genome-scale metabolic model (GEM).Variations of FBA refine the cellular objective of the organism [6] or better suit a genetically modified microorganism [7,8,9].Constraint-based strain optimization formulations go one step further and predict targets for genetic modification [10,3].The first formulation, OptKnock, was presented by Burgard et al. [11].OptKnock proposes optimal gene knockouts by solving a bilevel optimization problem.The upper level represents the bioengineering perspective to maximize the product yield on the substrate.The lower level, based on FBA, represents the microorganism with the cellular objective of maximizing biomass production.OptKnock has been extended, e.g., to account for worst-case predictions [12], insertion of genes [13], and up and down regulation of genes [14].It was also modified to suit a genetically modified organism better [15].For strain design, the predicted modifications are experimentally tested and evaluated at a laboratory scale, and the best-performing strain is chosen.
For industrial production, industrial-scale process conditions come into play.Process design involves the analysis of mass and energy balances an analyzing production costs, including up-and downstream processing [16].In bioprocess design, a suitable kinetic represents the microorganism, e.g., its growth [17,18].Computational process optimization supports process design by adjusting the process parameters to achieve maximal space-time yield or minimal production cost [18,19,20].
Typically, strain and process design are separate steps, performed in sequence.Process conditions, however, influence the behavior and performance of the microorganism.Hence, a high-performing strain in the laboratory may underperform in an industrial process.For example, a strain is first designed under small-scale batch fermentation conditions in the laboratory.Next, the industrial process is designed for the designated strain, whereby industrial processes are envisaged to run on a large scale in continuous mode, and example processes already exist in the pharmaceutical and food sectors.This switch in conditions entails a significant adjustment for the organism.Reasons for the under-performance can be steep gradients in the spatial distribution of substrates (e.g., oxygen) in the fermenter, high downstream processing costs, or lacking genetic and phenotypic stability by the microbe [21,22].Strain and process design coupling is needed to overcome these scale-up difficulties [22,23].There exist different approaches to do so.In their attempt to capture spatial changes in a stirred bioreactor, Lapin et al. [24] coupled computational fluid dynam-ics and metabolic modeling.To model diauxic growth, Mahadevan et al. [25] connected the mass balance equations of batch fermentation with metabolic modeling in their dynamic flux balance analysis (dFBA).Structurally, dFBA are differential-algebraic equations with embedded optimization criteria.For fed-batch fermentation, de Oliveira et al. [26] replaced the metabolic network in dFBA with a surrogate model.In an iterative process, Zhuang et al. [27] connected dFBA with computational strain optimization techniques to suit the organism to the future process conditions of batch fermentation.Ploch et al. [28] developed a differential-algebraic equation system with embedded dFBA optimization to model a biorefinery process under changing conditions.Jabarivelisdeh and Waldherr [29] introduced a bilevel dynamic optimization framework, where a previously found genetic modification can be switched on during batch fermentation.In a step-wise approach, Tafur Rangel et al. [30] selected a microbial strain regarding the required downstream process units.Konstantinos and Antonis [31] considered downstream process optimization simultaneous to strain optimization, namely in a superstructure optimization program.However, the implications of the switch from laboratory (batch or fed-batch) fermentation conditions to industrial (continuous) fermentation conditions have yet to be considered.
We propose a new formulation combining strain optimization with process optimization.
Our simultaneous approach SimulKnock, suggests optimal gene knockouts that suit a strain to its future industrial-scale conditions, and at the same time, the process conditions are customized to the strain.SimulKnock is a bilevel optimization program where the upper level maximizes the space-time yield subject to the mass balances of fermentation, and the lower level maximizes biomass production based on the FBA formulation.In this paper, we consider continuous fermentation, an envisaged fermentation mode in industry.SimulKnock, however, can readily be extended to account for other fermentation modes, too.Certainly, extensions may result in different optimization classes, e.g., simultaneous optimization of batch operation and strain design would be a dynamic bilevel optimization problem, which is challenging [28,32].Michaelis-Menten or Monod kinetics are employed to connect the two levels.Compared to existing strain optimization formulations, SimulKnock can be interpreted as an extension of OptKnock by process equations or as a variation of dFBA with a switch to continuous fermentation and implementation of a process objective and knockout predictions.Similarly, SimulKnock can be seen as an extension of process optimization by the strain optimization level (lower level).For our case studies, we use strong dualization to reformulate the bilevel optimization program into a single-level program and solve it globally.We apply our formulation to a GEM of E. coli for the production of ethanol, succinate, acetate, formate, fumarate, and lactate and compare the results with sequential optimization results and experimental results for succinate production from the literature.The proposed formulation SimulKnock will help close the gap in profitability between petrochemical and biochemical processes.

Mass balances for a continuous process optimization
For optimal process design, the fermentation process was modeled with mathematical equations.These equations may include mass and energy balances of different reactor types and process units.In SimulKnock, we chose the mass balance equations of a continuous stirred-tank reactor.In principle, however, SimulKnock can easily be extended to account for other reactor types or even full processes.We assumed that the reactor is ideally mixed and contains only one homogeneous phase, that only one substrate limits growth and production, and that only one population of organisms is in the tank.Furthermore, we assumed that the volumetric inflow equals the volumetric outflow.The biomass, the substrate, and the product were chosen as the representative compounds of the fermentation.Similarly to standard literature, mass conservation leads to where D is the dilution rate in h −1 ; c bio is the biomass concentration in gram Cell Dry Weight (CDW) per liter (g CDW L −1 ); c S,F eed , c S and c P are the substrate feed concentration, substrate concentration, and product concentration, respectively (all in g L −1 ).Y bio/S and Y P/S , both with the unit g g −1 , are the yields of biomass and product per substrate, respectively.The maintenance factor is denoted as m S in g g −1 h −1 and describes the required substrate uptake rate for cellular maintenance.Two kinetics were used in the equations: the product kinetics q P in g g −1 h −1 , describing the rate of formation of the product, and the growth rate kinetics µ in h −1 , describing the rate of formation of the biomass.The dilution rate D is the quotient of the volumetric inflow/outflow and the culture volume within the bioreactor.The dilution rate is also the inverse of the residence time.As is typical, we assumed that a flow equilibrium is achieved after a sufficiently long period of operation at a constant dilution rate.It follows that the continuous fermentation is at a steady state, such that dc bio dt = 0, dc S dt = 0, and dc P dt = 0 apply.Thereby, it directly follows that D = µ, which means that biomass is neither washed out nor accumulated in the process.

Flux balance analysis
FBA analyzes the internal fluxes within a cell using linear programming [5,4].The analysis is based on the metabolic network of an organism that includes the metabolites and the stoichiometry of the internal reactions.In FBA, based on the steady state assumption, the metabolites' mass balances are constrained to zero, and the reaction fluxes are constrained to upper and lower bounds.One cellular objective is defined, e.g., the maximization of the biomass flux.The mathematical formulation of FBA reads max where v denotes the vector of reaction fluxes in mmol g −1 CDW h −1 and where n is the number of reactions in a metabolic network, where all reversible reactions were split into one forward and one backward reaction, further called irreversible network.The biomass flux is denoted by v bio in h −1 and is an element of v; S is the stoichiometric matrix of the irreversible network, and lower and upper are lower and upper bounds of the flux values, respectively.
These bounds may include thresholds on the biomass flux or the ATP maintenance reaction.

Proposed combined optimization: SimulKnock
We designed SimulKnock by embedding the FBA into the optimization of a continuous fermentation accounting for cellular degrees of freedom, resulting in a bilevel optimization formulation (Figure 1), following the paradigm of OptKnock [11].Using the steady-state mass balances of metabolites stemming from FBA, we assume that the cellular metabolism adapts infinitesimally quickly to shifting environmental conditions [33].
SimulKnock maximizes the space-time yield in the upper-level program like other bioprocess optimization examples [34,35].The connection between upper-and lower-level program is achieved via the expression of process parameters with metabolic variables, based on Ploch et al. [28] and Mahadevan et al. [25].The growth rate directly transforms to µ = v bio , and due to steady-state continuous fermentation conditions, it holds D = v bio .Moreover, instead of considering that everything taken up goes into product formation, biomass formation, or maintenance, the substrate uptake is now directly expressed  by , where v S is the substrate uptake flux in mmol g −1 CDW L −1 and M S denotes the molar mass of the substrate in g mmol −1 .Thus, the maintenance is only directly considered at the lower level in SimulKnock by setting a threshold on the ATP maintenance reaction; there is no doubling at the upper level.The rate of product formation can also be directly considered now by setting q P = v P • M P , where v P is the product flux in mmol g −1 CDW L −1 and is an element of the flux vector v and M P is the molar mass of the product in g mmol −1 .Hence, (1) transforms to Note that instead of fixing the dilution rate in the upper level and, thereby, fixing the growth rate, the dilution rate D will be set according to the optimal value of v bio after the lower-level optimization.Hence, in the following, D will be replaced with v bio .Furthermore, we considered two alternative kinetics to connect the upper and lower levels: Monod and Michaelis-Menten.In Section A.1 in the Supporting Information, we discuss why including both kinetics simultaneously is not advisable.
Monod is an empirical, widely known, and easy-to-measure model of microbial growth.
It links the substrate concentration with the growth rate and reads where v max bio denotes the maximum growth rate in h −1 , and K S denotes the Monod affinity constant in g L −1 for the specified substrate.We assumed that gene deletions would not affect the kinetic parameters, i.e., we assumed that the kinetic parameters were constants for a given organism and substrate.Thus, both v max bio and K S are parameters taken from literature or, in the case of v max bio , can be approximated with FBA.The growth rate results from the lower-level program, and thus, the Monod kinetics need to be imposed on the upper-level program (reactor level).As depicted on the right-hand side in (2), we, therefore, reformulated the kinetics such that the substrate concentration is a function of the growth rate: c S = f (v bio ).Then, we implemented the kinetics in the upper level by replacing c S with the kinetic term.If we would keep c S , the growth rate would be fixed by the kinetic term (equality constraint) instead of being optimized at the lower level.SimulKnock with included Monod kinetics reads max where • denotes the element-wise product, y ∈ {0, 1} r denotes the binary knockout vector, and r denotes the number of reactions in a metabolic network, which contains irreversible and reversible reactions, further referred to as a reversible network.The parameter K is the number of maximal allowed knockouts, and the parameter f denotes a value between zero and one.The matrix B maps the reactions in the irreversible metabolic network with those in the reversible network.The apostrophe ′ denotes the cellular variables in the upper level.
Note that, in the algorithm, they are fixed by the lower-level program.
As described above, we chose to eliminate D and c S from the formulation.In theory, the process variables c bio and c P could also be eliminated, as the equality constraints at the upper level are fixing them.Nevertheless, we decided on this formulation because preliminary tests and our intuition suggest that it results in simpler convergence.Note that our recent research indicates that an intermediate between full-space (keeping all variables) and reduced-space formulation (eliminating all variables except the degrees of freedom) is most beneficial in some cases [36,37].With c bio and c P being fixed by equality constraints, the substrate concentration in the feed c S,F eed is the only degree of freedom in the process conditions.
At the cellular level, v S will reach its upper bound based on an experimental value, as it is standard in other strain optimization examples (e.g., [11]).Even if the Monod kinetics constrain v bio and c S,F eed is fixed during optimization; spare carbon atoms will go into product formation.
In our study, the Monod constant was set to 0.044 g L −1 , and v max bio was set to 0.73 h −1 , according to Wick et al. [38].The wild-type biomass flux v bio,W T was determined by a previously performed FBA, and the parameter f was set to 0.1.Furthermore, the ATP maintenance reaction flux threshold was set to 6.86 mmol g −1 CDW h −1 , according to the default value in i ML1515 [39].The upper bound of the glucose exchange reaction was set to 10 mmol g −1 CDW h −1 .Alternatively, we used Michaelis-Menten, the simplest form of enzyme kinetics.Enzymes catalyze reactions within a cell, with metabolites as reactants.An enzyme-catalyzed reaction can be described using Michaelis-Menten kinetics.In alignment with Mahadevan et al. [25] and Ploch et al. [28], we chose to apply Michaelis-Menten kinetics to the substrate uptake reaction.Thus, the kinetics link the substrate concentration in the bioreactor with the substrate uptake rate.More precisely, the substrate uptake rate is a function of the substrate concentration: v S = g(c S ).The kinetics read The maximal substrate uptake is denoted with v max S in mmol g −1 CDW h −1 , and the Michaelis constant is K S,M M in g L −1 .Again, we assumed that the kinetic parameters were constants and, thus, can be retrieved from the literature.As v S is a cellular variable, the kinetics were implemented in the lower level.SimulKnock with embedded Michaelis-Menten kinetics reads max Again, c bio and c P are being fixed by the equality constraints in the upper level.However, unlike the formulation including Monod, c S is a degree of freedom at the process level now.
It is included in the upper-level equality constraint in the summand v bio • (c S,F eed − c S ).
With v bio being part of the upper-level objective, this term also has an optimum, and c S is determined during the optimization.Another difference is that v S is now constrained by the kinetics instead of reaching the upper bound set by the user.
In our study, the Michaelis constant was calculated with 0.53 mmol L −1 multiplied by the molar weight of glucose, and v max S was set to 10 mmol g −1 CDW h −1 , both values according to Meadows et al. [40].
To simplify the comparison of Monod and Michaelis-Menten, Table 1 summarizes the different aspects of both kinetics applied in SimulKnock.
Apart from the aspects already discussed above, we found that, in general, the parameter availability for Monod is higher than for Michaelis-Menten.Furthermore, the inclusion of inhibitions is supposedly a straightforward extension with Monod, whereas including multiple substrates would potentially be easier with Michaelis-Menten (cf.[28]).

Reformulation of the bilevel program to a single-level quadraticallyconstrained quadratic program
We used the software packages libALE [41] and libDIPS [42] to implement SimulKnock in both versions-Monod (3) and Michaelis-Menten (4)-and successfully solved the bilevel programs for the E. coli core network [43] using the algorithm [44] without KKT-based tightening and with Gurobi v10.0 [45] as the subsolver.However, this algorithm was too expensive for the genome-scale metabolic model i ML1515.[39].Consequently, we reformulated the bilevel programs to single-level mixed-integer quadratically-constrained quadratic programs (MIQCQP).We conducted two reformulation steps for numerical reasons: first, we reformulated SimulKnock as a single-level program.Second, we eradicated nonlinear terms to achieve an MIQCQP formulation of SimulKnock.
To the first reformulation: In both versions of SimulKnock (Monod or Michaelis-Menten embedded), the cellular level is linear (in the lower-level variables, as the process level fixes the process variables).Thus, we can apply strong duality and reformulate to a single-level program.The reformulations for SimulKnock with Monod kinetics embedded are identical to those presented in OptKnock [11], as the Monod kinetics are in the upper-level program.In the following, we will show the reformulation of SimulKnock with Michaelis-Menten kinetics embedded.For ease of notation, we introduced c S +K S,M M .By applying the definitions above, the lower-level program of (4) can be rewritten as Converting program 5 to its dual and applying the strong duality theorem yields the system of equations In the second reformulation step, we reformulated the nonlinear kinetic term.We introduced an additional variable for the fraction on the right-hand side of the kinetics, thereby introducing an additional equality constraint in the problem.For example, in the Michaelis-Menten kinetics, the fraction is replaced with the optimization variable σ M M , and the con- are introduced, which results in the single-level optimization problem becoming an MIQCQP.
This reformulation is beneficial because MIQCQPs are solved directly by commercial solvers, such as Gurobi [45].
Replacing the lower level in (4) with the set of constraints of ( 6) and applying the reformulation of the kinetic term suggested in (7) yields the single-level mixed-integer quadratic form of SimulKnock with Michaelis-Menten embedded: max y,c S ,c S,F eed ,σ Similar reformulations also reduce SimulKnock with Monod kinetics to an MIQCQP.The reformulated programs were implemented using Pyomo [46,47] and solved with Gurobi v10.0 [45].

Case studies
We performed three case studies with SimulKnock.The first case study illustrates Simul-Knock's mode of action compared to OptKnock and a sequential optimization approach.
The second case study employs the genome-scale metabolic model i ML1515 [39] of E. coli for one to three knockout predictions and elaborates on the differences between Monod and Michaelis-Menten kinetics.The third case study compares the results of SimulKnock with the published results of an experimental study on E. coli continuous fermentation [48].The first case study was performed on up to 8 Intel Xeon E5-2640 CPU threads and was solved within less than a second.Case studies two and three were solved within five hours on the RWTH high-performance computing cluster using 48 threads and up to 4 GB of memory per thread.

SimulKnock predicts different knockouts than OptKnock with embedded illustrative network
To show SimulKnock's mode of action compared to sequential optimization using OptKnock, we constructed an illustrative network (Figure 2).We performed four optimizations with the illustrative network embedded: (i) an FBA with maximization of biomass as the objective (referred to as wild-type), (ii) an OptKnock prediction, (iii) a sequential approach using OptKnock (see below), and (iv) SimulKnock.
Table 2 shows the results of the case study.
Table 2: Results of the case study with the illustrative network (Figure 2) embedded.The substrate is glucose, and the maximum substrate flux allowed is set to v upper S = 10 mmol g −1 CDW h −1 ; the upper bounds on A-B and A-F are set to 7 mmol g −1 CDW h −1 and 3 mmol g −1 CDW h −1 , respectively; the maximum oxygen flux allowed is Michaelis-Menten kinetics were applied in SimulKnock and the sequential approach, with the Michaelis constant K S,M M = 0.53 mmol L −1 [40].In the sequential approach, 1. OptKnock, and 2. A process optimization is performed with embedded knockouts from OptKnock.Abbreviations: v S : substrate uptake flux, v O : oxygen flux, v bio : growth rate, c bio : biomass concentration, c P : product concentration, STY: space-time yield Wild-type -10 The sequential approach denotes a two-step procedure.First, OptKnock is performed.
Second, the predicted knockouts are applied to the network, and a continuous process optimization is performed.The continuous process optimization looks like the formulation of SimulKnock but with fixed knockouts, which were determined from OptKnock.Thus, Op-tKnock is included in the sequential approach.Still, we added the results of OptKnock for completeness to allow for a direct comparison of functionalities with SimulKnock.Michaelis-Menten kinetics were applied in the sequential approach and SimulKnock.The FBA and OptKnock predictions do not imply process variables, e.g., the substrate or product concentration, which is also why the kinetics were not applied in OptKnock or FBA.
The results indicate that SimulKnock and OptKnock predict different optimal knockouts for one maximum allowable knockout.The reason for this difference is that OptKnock optimizes for the target chemical production, whereas SimulKnock optimizes for space-time yield.Indeed, the product flux v P is higher for OptKnock than for SimulKnock.The applied kinetics in SimulKnock and the sequential approach force the substrate uptake lower than the maximum allowed value.Namely, in the sequential approach, the substrate uptake is set to 8 mmol g −1 CDW h −1 by the process optimization, whereas OptKnock predicted it to be 10 mmol g −1 CDW h −1 .In SimulKnock, only the reactions in the upper part of the network, i.e., the reactions going via the metabolites A, F, G, E, and C, are active and used for biomass and target chemical production.In contrast, in OptKnock and the sequential approach, the reactions going via B, D, and E are used for target chemical production, while the route via F and C produces additional biomass.The space-time yield, which is the product of the growth rate v bio and the product concentration c P , is higher for SimulKnock than for the sequential approach.Note that both formulations predict the same two knockouts, B-C and F-C, and the same space-time yield for two maximum allowable knockouts.However, this does not generally indicate that with more allowed knockouts, the results of OptKnock and SimulKnock would be similar.Instead, the illustrative network is constructed so small that no other options exist.

SimulKnock can achieve higher space-time yields than sequential optimization with embedded i ML1515
To further investigate the computational tractability of the SimulKnock approach, we chose a genome-scale metabolic model, i ML1515 [39], which describes the metabolism of E. coli.
The metabolic network includes 1516 genes, resulting in 1877 metabolites and 2712 metabolic reactions.The objective was set to maximize the space-time yield of six target chemicals for a glucose-limited medium.Furthermore, to highlight the superiority of the simultaneous strain and process optimization, SimulKnock was compared against the sequential approach on the space-time yield of the target chemical.

One knockout
For one knockout prediction, SimulKnock and the sequential approach furnished identical results.The knockout predictions and space-time yields are in the Supporting Information

Two knockouts: Monod vs. Michaelis Menten
In this case study, we allowed for two gene knockouts.The comparison results of Michaelis-Menten and Monod kinetics are plotted in Figure 3.Note that for E. coli, the growth rate is higher with oxygen than without.The lowerlevel optimization problem always activates the oxygen uptake to maximize growth.Hence, in SimulKnock, the oxygen supply must be by the modeler.We performed runs with and without an oxygen supply.Figure 3 shows the results where SimulKnock achieved the higher absolute value.
SimulKnock predicts significantly higher space-time yields for four of the six considered target chemicals with embedded Michaelis-Menten kinetics.The difference in the spacetime yield from the two approaches is due to different reaction knockouts predicted (see the knockouts table A.2 in the Supporting Information for reference).For example, SimulKnock targets the 6-phosphogluconolactonase and the acetate reversible transport for higher ethanol space-time yield instead of the ATP synthase and the triose-phosphate isomerase.The most significant difference becomes visible for formate.Instead of the phosphoglycerate mutase, SimulKnock predicts the phosphofructokinase will be knocked out.Even if SimulKnock and the sequential approach do not predict the same knockouts for fumarate, they reach the same result with Michaelis-Menten.
The kinetics also influence the knockout predictions.SimulKnock with embedded Michaelis-Menten kinetics suggests blocking Acetaldehyde dehydrogenase and D-alanine-D-alanine dipeptidase for succinate production.With Monod embedded dytosine deaminase is knocked out instead of the dipeptidase,.Interestingly, SimulKnock achieves a lower space-time yield for fumarate than the sequential approach with embedded Monod kinetics.
The sequential formulation maximizes the target chemical concentration at the process level, with the biomass growth rate being used only in the metabolism level problem.Therefore, the targeted reaction knockouts decrease the growth rate to achieve the maximum target chemical concentration.In contrast, the SimulKnock formulation recognizes the trade-off between a higher growth rate and a higher target chemical flux.It is, therefore, not surprising that the sequential approach predicts a higher target chemical concentration and lower biomass concentrations than the SimulKnock approach.The different modes of action become even more visible when comparing the molar product yield on the substrate.We computed the yield by dividing the target chemical flux v P by the substrate flux v S (cf. Table A.2 in the Supporting Information for the data).In all cases, the molar product yield was lower with SimulKnock than with the sequential approach.At maximum, in the case of acetate production, the yield was 45 % lower, whereas the space-time yield was 2.6 times  [39].All six target chemicals were produced via aerobic pathways.higher.

Three knockouts
Upon increasing the possible reaction eliminations to three, the SimulKnock formulation with Michaelis-Menten kinetics could not be solved within a feasible time limit.Thus, only the Monod kinetics results are presented in Figure 4.Note that, again, we performed runs with and without an oxygen supply.Figure 4 shows the results where SimulKnock achieved the higher absolute value, all achieved using aerobic pathways.
Acetate and formate space-time yield is increased by about 100% when using SimulKnock, whereas ethanol, succinate, and fumarate increase marginally, and lactate remains the same.
Once again, the reason for this difference is the different reaction knockouts predicted by the two approaches (cf.Table A.3 in the Supporting Information).The knockouts for lactate and fumarate are similar, often targeting neighboring reactions, thereby predicting similar yields.When ethanol is considered, two of the three reaction eliminations target the same reactions with SimulKnock when compared to the sequential approach.When the target chemical is acetate, all three knockouts are different; with formate, two of the three knockouts are different.This observation hints towards the possibility of missing possible knockout strategies with the sequential approach.
It is worth noting that the assumption made in this study about the biomass reaction being the rate-limiting step has yet to be experimentally validated.Therefore, reaction eliminations could, theoretically, target the rate-limiting step on which the Monod kinetic parameters were fitted.In that case, a new parameter fitting using the modified organism would be required.In contrast, the Michaelis-Menten kinetics do not require parameter refitting due to reaction eliminations, as the substrate uptake reaction is always active.
When we line up space-time yield and growth rate with an increasing number of knockouts, we observe different behaviors: For ethanol, succinate, lactate, and formate, space-time yield and growth rate with SimulKnock stay more or less constant over one to three knockouts.Exemplary, Figure 5a shows the course for succinate.Change of space-time yield and growth rate over the number of knockouts for i ML1515 [39] with embedded Monod kinetics.Succinate was produced anaerobically for 1 and 2 knockouts and aerobically for 3 knockouts.Acetate was produced via an aerobic pathway.STY: space-time yield, Seq.Appr.: Sequential approach.
In the sequential approach, space-time yield decreases with an increasing number of knockouts, as depicted at two gene knockouts in Figure 5a.This decrease in space-time yield comes due to the mathematical structure of OptKnock.In OptKnock, high product yields are achieved at the expense of reduced growth (cf.Table A.3 in the Supporting Information for data).In turn, space-time yield (equals product concentration times growth rate) is linearly dependent on the growth rate.The figure depicts the aerobic pathway for three gene knockouts, following our decision to show the results for the highest SimulKnock space-time yield.Hence, the fermentation conditions change compared to the anaerobic conditions that apply for one and two gene knockouts.While the space-time yield of the sequential approach would decline for anaerobic conditions (not shown in the figure), it reaches similar results as SimulKnock for aerobic conditions.
Acetate and fumarate exhibit a different behavior, as depicted in an example in Figure 5b.With SimulKnock, the space-time yield rises significantly at three knockouts, in alignment with the increase in growth.With the sequential approach, the space-time yield is lowest for two knockouts and then increases again, decreasing the growth rate.This interesting behavior stems from the two-step procedure.Only in the second step are the process conditions optimized, but they can only influence one of the two factors, namely the product concentration c P .Thus, there is no monotonous behavior with the sequential approach.This finding displays the problems often occurring during scale-up when sequentially optimizing the microbial strain and the process conditions.On the other hand, acetate case underpins the potential of SimulKnock.

SimulKnock can furnish meaningful knockout predictions but also exhibits model-experiment mismatch
In the last case study, we took the experimental study of van Heerden and Nicol [48] as a test case to elaborate on how far SimulKnock reproduces their results regarding spacetime yield, dilution rate, and knockouts.We aimed to see whether SimulKnock furnishes experimentally meaningful results.In their experimental study, van Heerden and Nicol [48] produced succinic acid from E. coli KJ134 in a continuous fermentation with two different glucose feed concentrations.We applied SimulKnock with embedded Monod kinetics to the E. coli GEMs i ML1515 [39] as well as i EC1349 Crooks [49] with three maximum allowable knockouts.While van Heerden and Nicol [48] tested different dilution rates and measured the space-time yield for each, SimulKnock predicted the optimal space-time yield and the corresponding dilution rate.Figure 6 shows the case study results, depicted as the space-time yield over the dilution rate.Note that, in continuous fermentation, the dilution rate equals the growth rate µ and the biomass flux v bio .In the experimental study, 13  coli KJ134 [48] with results from SimulKnock, based on the metabolic networks i ML1515 [39] and i EC1349 Crooks [49].Two glucose feed concentrations were studied: 20 g L −1 and 50 g L −1 .genetic modifications were applied to the organism to reach a maximum space-time yield of around 1 g L −1 h −1 on 20 g L −1 glucose feed.SimulKnock predicted a maximum spacetime yield of about 4 and 6 g L −1 h −1 with i ML1515 and i EC1349 Crooks, respectively, on 50 g L −1 .The space-time yields were reached with three knockouts.The reactions to be eliminated, predicted with i ML1515 embedded, were acetate kinase, ATP synthase, and fumarase.With i EC1349 Crooks embedded, the predicted reaction eliminations were phosphotransacetylase, ATP synthase, and succinyl-CoA synthetase.Amongst them, acetate kinase and phosphotransacetylase were also eliminated in the experimental strain.Hence, SimulKnock proves to predict experimentally meaningful results.However, the figure reveals a substantial model-experiment mismatch.Especially the results of i EC1349 Crooks are higher than any experimental value.In i EC1349 Crooks, the high growth rate might result from the biomass function difference.For illustration, for an FBA with a glucose limitation of 10 mmol g −1 CDW , i ML1515 has a growth rate of 0.87 h −1 , whereas i EC1349 Crooks has a growth rate of 1.02 h −1 .The mismatch in space-time yield that comes visible with both metabolic networks might result from SimulKnock not considering inhibition and repression effects and being overly optimistic by not considering byproduct formation unless the byproduct formation hinders biomass formation.The mismatch could be reduced by including experimental flux data of the organism in question, for example, from Fischer et al. [50] for E. coli, or by using additional information from a protein allocation model [51].In both cases, the bounds of specific fluxes would be fixed in a preprocessing step, which can readily be done in the SimulKnock code.These refinements of the metabolic network and adaptations at the cellular level are interesting in case of application but were not in the scope of this work.

Conclusion
We presented SimulKnock, an optimization formulation combining continuous fermentation optimization with strain optimization via gene knockouts.Such an optimization formulation allows us to consider industrial fermentation conditions already in the strain design, thereby overcoming current scale-up difficulties.SimulKnock is a bilevel optimization formulation, which we transformed into a single-level mixed-integer nonlinear optimization program and solved globally for an illustrative example network as well as for the E. coli GEM i ML1515 [39].We applied Monod or Michaelis-Menten kinetics to connect the process level to the cellular level in SimulKnock.SimulKnock with applied Monod kinetics showed similar results as Michaelis-Menten kinetics, with Monod being less computationally intensive.SimulKnock predicted different knockouts than OptKnock [11].Also, the space-time yield was significantly higher with SimulKnock compared to a sequential approach of OptKnock plus process optimization for aerobic and anaerobic conditions.Compared to experimental data [48], SimulKnock indicated that higher space-time yields could be achieved with fewer knockouts.
The computations for SimulKnock with embedded GEM were performed on a highperformance computing cluster.A decrease in the computation demand and reduction of computation time could be achieved by applying further network reduction methods [52] or by considering essential genes [53].Future work should consider that kinetics would have to be adjusted when knockouts are applied to the organism because Michaelis-Menten, describing the substrate uptake, and Monod, describing the growth behavior, are affected by genetic modifications.Especially for altered growth behavior, the update of the kinetics should be considered.Our case study with experimental data indicated that SimulKnock furnishes overly optimistic results.The conversion to a robust optimization formulation, as suggested in RobustKnock [12], could tackle this observation.Moreover, experimental flux data (for E. coli, see Pharkya and Maranas [14] and Fischer et al. [50]) could reduce the model-experiment mismatch by fixing the bounds of specific fluxes in a preprocessing step.
To make a more accurate prediction, SimulKnock could be extended both at the process level and at the cellular level, e.g., by more elaborate microbial formulations [8,7,2], as depicted by OptKnock [11] and its extensions [54,55].These formulations include a reference flux distribution, which could be furnished using a protein allocation model [51] or experimental in vivo flux data [56].The extension could also comprise other genetic modifications, i.e., gene insertion [13] and regulation [14], as well as other process operation modes, e.g., continuous mode with cell retention, batch, and fed-batch.0.013 g L −1 h −1 , respectively) than the results with only one kinetics embedded.At the same time, we observed elevated growth rates.For succinate production (2 knockouts), for example, the growth rate was 0.68 h −1 , which, set in relation to the experimental results of van Heerden and Nicol [48], is not realistic.
From our mathematical analysis and the exemplary study, which aligned with the mathematical analysis, we conclude that it is not beneficial to include both kinetics.

A.2 Data for one reaction elimination
maximize through gene knockouts, feed substrate concentration space¡ timeyield subject to • number of knockouts ¢ limit • mass balanceconstraints of continuous fermentation includingreformulated Monod kinetics for substrate concentration maximize through fluxes cellular objective subject to • mass balanceconstraints of metabolites • low threshold on biomass • Michaelis-Menten kinetics for substrate uptake • knockout constraints

Figure 1 :
Figure 1: The bilevel optimization formulation of SimulKnock.Note that either Monod or Michaelis-Menten kinetics are applied at the upper level or the lower level.

1 ,
..., n, where c describes the position of the biomass reaction and I is the unity matrix.The vector c kin ∈ {0, 1} |n| has exactly one nonzero entry at the index of the glucose uptake reaction.The scalar d describes the right-hand side of the kinetics, i.e., d = v max S c S with λ ∈ R |m| being the dual variables corresponding to the mass balances for the metabolites and μ ∈ R |2n+1| + being the dual variables corresponding to the bounds set on the reaction fluxes through the inequality constraints and the kinetics.The set of equations 6 can readily replace the cellular level in (4).The resulting program is a single-level mixed-integer nonlinear program.The nonlinearity is introduced by the fact that process and cellular variables are at the same level now.The nonlinear term is the kinetic term in d.These considerations about nonlinearity are also applicable to the reformulated SimulKnock with Monod embedded.

Figure 2 :
Figure 2: Illustrative network.The grey circles A to G denote metabolites, v O and v S denote the flux of oxygen and the substrate uptake reaction, respectively; v P and v bio are the target chemical and biomass flux, respectively.All stoichiometric coefficients are one, except where indicated with numbers.For one maximum allowable knockout, the wrenches indicate the knockout prediction of SimulKnock and OptKnock.

Figure 3 :
Figure3: Comparison of space-time yield using Michaelis-Menten and Monod kinetics for two reaction eliminations on the genome-scale metabolic model i ML1515[39].Ethanol, succinate, and lactate were produced via anaerobic pathways; acetate, formate, and fumarate were produced via aerobic pathways.

Figure 4 :
Figure 4: Space-time yield using Monod kinetics for three reaction eliminations on the genome-scale metabolic model i ML1515 Monk et al.[39].All six target chemicals were produced via aerobic pathways.

Figure 5 :
Figure5: Change of space-time yield and growth rate over the number of knockouts for i ML1515[39] with embedded Monod kinetics.Succinate was produced anaerobically for 1 and 2 knockouts and aerobically for 3 knockouts.Acetate was produced via an aerobic pathway.STY: space-time yield, Seq.Appr.: Sequential approach.

Figure 6 :
Figure6: Comparison of experimental laboratory data for succinic acid production using E. coli KJ134[48] with results from SimulKnock, based on the metabolic networks i ML1515[39] and i EC1349 Crooks[49].Two glucose feed concentrations were studied: 20 g L −1 and 50 g L −1 .

Figure A. 1 :
Figure A.1: Comparison of space-time yield using Michaelis-Menten and Monod kinetics for one reaction elimination on the genome-scale metabolic model i ML1515[39].Ethanol, succinate, and lactate were produced via anaerobic pathways; acetate, formate, and fumarate were produced via aerobic pathways.

Table 1 :
Comparison of Monod and Michaelis-Menten in SimulKnock.Abbreviations: c S : substrate concentration; v bio : growth rate; v S : substrate uptake rate; f , g: notation for functions.

Table A .
[48]ne reaction elimination: Comparison of space-time yield, product concentration, substrate concentration, biomass concentration, growth rate, substrate uptake flux v S , product flux v P , and molar yield using Michaelis-Menten and Monod kinetics for the genome-scale metabolic model i ML1515[39].Note that, for all results, the substrate feed concentration reached its upper bound of 10 g L −1 .STY: Space-time yield, MM: Michaelis-Menten Table A.4: Comparison of SimulKnock results with lab data from van Heerden and Nicol[48].