Data‐driven in silico prediction of regulation heterogeneity and ATP demands of Escherichia coli in large‐scale bioreactors

Escherichia coli exposed to industrial‐scale heterogeneous mixing conditions respond to external stress by initiating short‐term metabolic and long‐term strategic transcriptional programs. In native habitats, long‐term strategies allow survival in severe stress but are of limited use in large bioreactors, where microenvironmental conditions may change right after said programs are started. Related on/off switching of genes causes additional ATP burden that may reduce the cellular capacity for producing the desired product. Here, we present an agent‐based data‐driven model linked to computational fluid dynamics, finally allowing to predict additional ATP needs of Escherichia coli K12 W3110 exposed to realistic large‐scale bioreactor conditions. The complex model describes transcriptional up‐ and downregulation dynamics of about 600 genes starting from subminute range covering 28 h. The data‐based approach was extracted from comprehensive scale‐down experiments. Simulating mixing and mass transfer conditions in a 54 m3 stirred bioreactor, 120,000 E. coli cells were tracked while fluctuating between different zones of glucose availability. It was found that cellular ATP demands rise between 30% and 45% of growth decoupled maintenance needs, which may limit the production of ATP‐intensive product formation accordingly. Furthermore, spatial analysis of individual cell transcriptional patterns reveal very heterogeneous gene amplifications with hot spots of 50%–80% messenger RNA upregulation in the upper region of the bioreactor. The phenomenon reflects the time‐delayed regulatory response of the cells that propagate through the stirred tank. After 4.2 h, cells adapt to environmental changes but still have to bear an additional 6% ATP demand.


| INTRODUCTION
To reduce the human CO 2 footprint in the atmosphere, sustainable bioprocesses replacing fossil resources by sugar may play a crucial role. Microbial production offers the potential to provide products for agricultural, biopharmaceutical, and chemical markets (Delvigne et al., 2017). As a prerequisite, such approaches need to be transferred successfully from laboratory to large-scale without loss of economic attraction, that is, without reduction of the sensitive TRY values (titer, rates, and yields) that served as constraints for economic evaluation. However, performance losses may occur, comprising increased by-product formation, reduced substrate-toproduct conversion, reduced productivities, and so forth (Lara et al., 2006). They mirror cellular responses to large-scale heterogeneities that are induced by limited mass transfer and by insufficient mixing capacities (Noorman & Heijnen, 2017). Accordingly, research activities aimed to mimic large-scale conditions already in early-stage lab tests. One of the first examples is given by Oosterhuis and Kossen (1983), who repeatedly exposed cells to oxygen saturated and limiting conditions in a setting of two linked, stirred bioreactors. Multiple tests with alternate experimental scale-up simulators followed (overviews provided in Garcia-Ochoa & Gomez, 2009;Neubauer & Junne, 2010Noorman, 2011;Takors, 2012;Zieringer & Takors, 2018) mimicking not only fluctuations of dissolved oxygen (DO) levels, but also nutrient availability and pH variations. Today, such approaches received key consideration to design robust microbial processes (Noorman & Heijnen, 2017). Still, the valid a priori prediction of large-scale heterogeneities' impact on cellular performance is of crucial importance for developing novel bioprocesses.
Even further, findings of large-scale stress exposure may guide strain engineering to create particularly robust hosts. To reach this goal, Löffler et al. (2016) applied the so-called STR/PFR setup comprising a stirred tank reactor (STR) linked with a plug-flow reactor (PFR).
Steady-state nutrient-limited, continuous cultivations were performed in STR before PFR was connected, frequently exposing cells to glucose-limiting conditions. Accordingly, cells repeatedly experienced temporal feast/famine conditions that were characterized by the residence time in the PFR. Comprehensive sampling in STR and PFR created a highly valuable data set of short-and long-term metabolic and transcriptional responses on repeated starvation stimuli (Löffler et al., 2016). The data set revealed that Escherichia coli not only react on extracellular stress by instantaneous metabolic shifts.
Observations also revealed massive transcription of genes organized in operons (Nieß et al., 2017) and in fundamental regulons of strategic importance. For instance, the stringent response was repeatedly initiated by fast-rising intracellular (p)ppGpp levels in PFR, which were downregulated in STR. Löffler et al. (2016) reported an additional rise of growth-decoupled maintenance of up to 50%. So far, these findings were not yet used to predict the response of E. coli exposed to large-scale heterogeneities. First, a data-driven model is needed that describes the complex transcriptional response of E. coli to said stress conditions. Next, such a transcriptional model should be coupled with computational fluid dynamics (CFD) of a large-scale bioreactor to identify zones of different nutrient availability and to predict the cellular response of cells passing through those zones (Zieringer & Takors, 2018). Our study exactly tackles this two-step problem: Mixing heterogeneities and zones of different substrate availability of a 54 m 3 stirred bioreactor are predicted using CFD and assuming common operating conditions. The tracking of 120,000 E.
coli cells finally yielded the prediction of additional ATP demands.
Furthermore, spatially resolved transcriptional patterns of individual E. coli cells were predicted, unraveling the population heterogeneity in the industrial-scale bioreactor.

| Experimental setup
A glucose gradient was simulated in a stirred tank reactor (STR) coupled to a plug flow reactor (PFR), as depicted in Figure 1.
The experimental setup consists of an STR operated in continuous mode (dilution rate, D = 0.2 h −1 ) and connected with a plug flow reactor. Cells were grown under glucose-limited conditions in the STR (mean residence time of the cells in STR: 6.2 min) and experience starvation in the PFR (mean residence time of the cells in PFR: 125 s). The cells are circulating through the reactor system for 28 h process time, which equals, on average, around 200 passages of the starvation zone for each cell. In this way, the setup permits the analysis of transcriptional response for ongoing starvation passages through the PFR. Thereby, the tactical response is monitored via the PFR sample ports (P1-P5), while the strategic changes were tracked via the STR sample port (S). The cultivation was performed as biological triplicates under identical experimental conditions. For transcriptomic analysis, the samples were grouped by replicates, and sample time and location (STR or PFR) was chosen as a combined experimental design. Significantly expressed genes were determined using the described design and a generalized linear model within the 266 | egdeR R-package (v.3.8.6) (Robinson et al., 2010). The detailed experimental implementation and RNA sequencing results used in this publication were published in the paper of Löffler et al. (2016). 2.2 | RNA-sequencing data cluster analysis RNA-sequencing data contain time courses of messenger RNA (mRNA) abundance of 3908 genes. Thereof, three measurement sets sampled after 25 min, 2 h, and 28 h were chosen for further investigations. Significantly up-und downregulated genes of samples P1-P5 in PFR passing the threshold of log2 fold change (log2FC) > | 0.58| and false discovery rate (FDR)-corrected p value < .05 were identified. Cluster analysis was performed using the R-package flexclust v. 1.4-0 (Leisch, 2006) applying RStudio v. 1.2.1335 (RStudio, Inc.) RNA-sequencing data contain time courses of mRNA abundance of 3908 genes. Thereof, three measurement sets sampled after 25 min, 2 h, and 28 h were chosen for further investigations. Significantly up-und downregulated genes of samples P1-P5 in PFR passing the threshold of log2 fold change (log2FC) > |0.58| and FDRcorrected p value < .05 were identified. Cluster analysis was performed using the R-package flexclust v. 1.4-0 (Leisch, 2006), applying RStudio v. 1.2.1335 to significantly reduce simulation efforts while including basic features of gene dynamics. The function qtclust (included in flexclust package) was used to perform stochastic qualitybased clustering (SQBC) and k-means-clustering. Parameters of SQBC were set as follows (Table 1).
Ntry indicates the number of trials per iteration, while rmax is the maximum radius as a proxy for correlation. Min size defines the minimum number of observations per cluster. Data points not clustered by the algorithm are omitted. The setting of parameters ensured maximum comparability and five as the maximum number of clusters. Cluster properties are listed in the Supporting Information Appendix Tables D5-D8, and resulting clusters are displayed in Supporting Information C in the Appendix. The k-means algorithm was initialized with the centroids of the SQCB method.

| ATP calculation for single molecules
ATP requirements for the formation of amino acids and nucleotides were calculated using the results of Kaleta et al. (2013). The translational costs for protein formation and polymerization add up to 4 ATP per amino acid, including activation of the amino acid (1 ATP to 1 AMP) and peptide bond formation at the ribosome (2 GTP) (Stouthamer, 1973). Since there is a net production of 0.1 ATP per amino acid (for detailed calculation, see Löffler et al., 2016), the overall cost of amino acid synthesis and polymerization was estimated as four ATPs consumed per residue. The absolute numbers of synthesized and degraded nucleotides (nts) were estimated from experimental data. To recycle mono phosphorylated nucleotides (NMPs) to triphosphorylated nucleotides (NTPs), costs of 2 ATP were assumed. Assuming a P/O-ratio of 1.49 (ATP formation via NADH oxidation in respiration), the following ATP requirements were assumed for the bases ( Table 2).

| Calculation of mRNA abundance
Only additional ATP needs for transcription and translation were estimated considering the basic demands under nonperturbed conditions. Accordingly, total mRNA content was estimated following studies of Bremer and Dennis (2008) as 61.7 µg per 10 9 cells for a growth rate of 0.2 h −1 . 20% of the total dry weight was assumed to be RNA (Neidhardt et al., 1990), including 5% mRNA, a value which is F I G U R E 1 Scheme of a two-compartment system as used in the experimental setup (Löffler et al., 2016). The two-compartment device consists of a stirred tank reactor (STR) connected to a plugflow reactor (PFR). Derived from non-ideally mixed large-scale industrial fermenters (Lapin et al., 2006), the setup mimics periodic substrate availability experienced by cells in large-scale bioreactors. The well-mixed STR is operated in glucose-limited continuous mode (dilution rate, D = 0.2 h −1 ). As soon as cells enter the PFR compartment, the residual substrate is consumed within seconds leading to starvation. The steady state before PFR onset at time zero was used as the reference state (S 0 ). Samples were taken at eleven distinct time points over 28 h. The system is equipped with five PFR sample ports (P1-P5) at defined residence times τ(s), as well as an STR sample port S. The total mean PFR residence time is τ PFR = 125 s in line with conclusions from Stouthamer (1973). This results in total mRNA content of 6.17 g per 10 16 mRNA per cell. The relative distribution of specific mRNAs is taken from the measured normalized counts as transcripts per million. The relative fraction multiplied with the total mRNA content gives the total mass of all mRNA encoded by a single gene. Dividing this number with the corresponding molecular weight yields the absolute number of molecules. Molecular weights of mRNAs were calculated with Equation 1 (Kibbe, 2007). Results are listed in Supporting Information Table A3. As the phosphate groups of two nucleotides are bound together, an OH-group is cleaved. This leads to reduced molecular weights of nucleotides in the polymer chain. Accordingly, an additional term was added to account for the 5'-triphosphate cleavage (159 g mol −1 ). (1) n nucleotide codes for the number of nucleotide monophosphates which is multiplied by their corresponding molecular weight.

| The biological model
The model was implemented using MATLAB v. 2019b, considering the four processes: transcription, translation, mRNA degradation, and protein degradation. The first three are implemented as agent-based approaches, while the last considers protein degradation as a decomposition in the continuum. The governing variable that controls the expression is the number of active RNA polymerases (RNAPs) per each cluster.

| Estimating the number of active RNAP
We assumed that gene expression levels follow sigmoidal courses.
Hence, an equilibrium between synthesis and degradation may be achieved. Produced nts are given by with σ RNAP (t) coding for the number of active RNAPs at time t.
The shape of the sigmoidal function is defined as The parameters a, b, c, and d were fitted to the nucleotide synthesis, which was derived from experimental data by calculating

| Transcription
After initiation, the continuous one-stranded movement of RNAP creates the mRNA transcripts measured. However, individual gene expression profiles were observed that could be grouped in clusters of similar transcription dynamics. Accordingly, only expression dynamics of representative, average genes per cluster are described in the model (Supporting Information Appendix Tables D5-D7). The minimum distance of 100 nts (Δx RNAP ) was considered between two subsequent RNAPs. Furthermore, all genes of one cluster were supposed to be randomly initiated with the functional Gene transcription is modeled as one-dimensional nucleotide extension with the relative movement of RNAP as The constant transcriptional elongation rate, v RNAP , is set to 21 nts −1 , which equals the average value found in E. coli during starvation . The variable x RNAP indicates the relative position of RNAP on the DNA grid. The length of the resulting mRNA strand is equivalent as When the last nucleotide is reached, the mRNA is released. All fragments of mRNA are summed to get total mRNA amounts. While fractions of operons were found to be fully transcribed after initiation (Nieß et al., 2017), other scenarios coincided, too. For instance, only subsets of operons may be transcribed, or even opposing transcription reads in a single operon occurred (Mao et al., 2015). Accordingly, we assumed that only 10% of the initiated operons are transcribed completely. In other words, 10% of experimentally ob-

| Translation
The translational process is modeled by describing the movement of ribosomes (RIB) on the mRNA strand. The process is assumed to take place in cotranscriptional manner (Proshkin et al., 2010): After 268 | synthesized mRNA reached a minimum length of 80 nts (Δx RIB ) (Bremer & Dennis, 2008), the first ribosome attaches to the free 5'-cap end. Further ribosomes may bind too, provided that minimum distance between two subsequent ribosomes and the maximum number of translations per mRNA (max TL ) are fulfilled. Released ribosomes may be reused following the same scenario. The initiation trigger TL i,j for a translation j on mRNA i can be described as The movement of ribosomes is analogous to the movement of RNAP as With the translational elongation rate v RIB = v RNAP (Nieß et al., 2017;Proshkin et al., 2010) and the number of maximum translations max TL = 11 (Bremer & Dennis, 2008). Because one cluster of genes revealed rapid degradation after 50 s in PFR at 28 h, only one translation per transcript was assumed for this group of genes.
As soon as the ribosomes pass the last nucleotide, the protein is released and is assigned to the group of bulk proteins.

| Degradation
Degradation of transcripts is initiated as soon as ribosomal protection of the 5'-cap vanishes, which is a co-translational process. The velocity of the RNASE was adapted to the gene length L mRNA,i , that is, individual v RNASE,i was estimated considering the experimentally observed mRNA median lifetime t deg;med of 2.8 min in nutrient-rich and of 4.6 min in starvation zones . Degradation was initiated for mRNA i as with the movement of RNASE.
For transcripts longer than ≈ 1000 nts, degradation is initiated already when transcription is not finished yet. Chen et al. (2015) found that 88 of 263 mRNAs showed lifetimes of the 5'-cap shorter than the synthesis time of the transcript. Accordingly, cotranscriptional degradation was considered for long transcripts.
Protein degradation is described using a constant rate degradation rate k deg for the bulk proteins (Maurizi, 1992). First-order degradation kinetics were assumed depending on the nutrient condition, as Consequently, bulk protein of a subsequent time step t + 1 equals where q s,max is the maximum uptake rate, c s is the glucose concentration, and the approximated substrate-specific uptake constant K s with 4 mg L −1 . The maximum uptake rate was calculated with the biomass substrate yield Y XS = 0.25 g s ·g −1 CDW and the maximum growth rate μ = 0.2 h −1 (Villadsen et al., 2011). Based on the experimental observations in Löffler et al. (2016) we concluded that stringent response is the predominant regulatory scheme initiated by repeated starvation. As a key characteristic stringent response reduces ATP consuming procedures trying to keep carbon supply on the maximum level achievable under stress conditions. Accordingly, we consider glucose uptake as a Monod-type function not being affected by the stringent response observed.  Figure   B9. For transcription, costs are derived from nucleotide balancing, including the release of nucleotides by mRNA degradation and the need for mRNA synthesis. Cost for translation mirrors the amino acid needs and integration according to ribosomal activity. As indicated, translation costs are more than 2.5-fold higher than those of transcription. At maximum, cells have to bear 36.8% additional NAM, 10.4% coding for transcription, 26.4% for translation. This happens during the early phase of frequent starvation exposure, that is, after three starvation passages (25 min process time). After 2 h process time, the ATP demand still increases. More than 45% NAM increase is observed, illustrating the remaining high number of active RNAP. Later, after 28 h, NAM add-ons reduce more than fivefold compared with maximum needs. Then, transcription accounts for about 1% NAM rise only. The total NAM increase only mirrors 9.5%.  For simplicity, Eulerian simulations only considered the liquid phase, thereby assuming sufficient oxygen supply in the bioreactor without calculating DO concentrations explicitly. Furthermore, additional turbulence due to bubble interaction was neglected, too.

| Linking cluster kinetics
Substrate consumption followed Monod-type kinetics taking place in each numerical cell. This implied that bacterial cells were homogeneously distributed at each time step.

| Statistical evaluation
Lifeline statistics reflect the imprint of changing micro-environmental conditions on cells fluctuating through the bioreactor. To be precise, cellular residence times in different concentration zones and shifts between proximal regimes were studied. At the start, cells were "inserted" into the bioreactor along a straight line reaching from top to bottom. After a few simulation steps, cells were distributed homogeneously before individual cell tracking started for 180 s.
Lifeline records were cured by percolating only those with residence times longer than 0.13 s. The latter represent unrealistically turbulent fluctuations. The following threshold was defined for regime analysis: If cells experience lower or higher glucose levels than K S for at least one second, the period is labeled as starvation or saturation time, respectively. Noteworthy, the minimum residence time of 1 s equals the average metabolite turnover time in E. coli (Shamir et al., 2016;Taymaz-Nikerel et al.,2011). The implementation of the harsh regime boundary K S finally leads to rapid and somewhat artificial regime shifts. They were excluded from analysis by ignoring the upper and lower 1% of regime changes. Alternately, the consideration of alarmones such as (p)ppGpp serving as intracellular triggers to initiate transcriptional regulation may yield continuous models. Unfortunately, understanding of alarmone formulation, degradation, and alarmone induced regulation is still too fragmented to build dynamic transcriptional models. In total, measures for residence time percolation and shift filtering only reduced the data set by 3% (Supporting Information Appendix Figure E14).
At maximum, 41 regime shifts were observed during the 180 seconds observation period. Most frequently, 20 regime changes occurred and cells rested in a single zone no longer than 30 seconds.
As a key characteristic, cells once exposed to glucose starvation reset their regulation signal. But RNAP and ribosomes remain active, propagating the starvation response into the glucose excess regime (Figure 2c,d). According to their starvation pattern the cell lifelines were assigned to 70 different clusters. Thereby each cluster represents a specific fluctuation pattern, reflecting the amount and duration of changes between starvation and excess zone (Figure 2b).

| Coupling the biological model with lifelines
To minimize the computational efforts, particle lifelines were exported from ANSYS Fluent. Starvation patterns of lifelines ( Figure 2b) served as input for the biological model. A workflow scheme is provided in the Supporting Information Appendix Figure   H16. It was assumed that each entry of the starvation zone activated the expression of distinct gene clusters as experimentally observed.
Accordingly, individual expression patterns were estimated for each cell, mirroring their particular tracking history.
As indicated in Figure 7c, the basic expression level of the cellular population is increased by 37.7% compared with the reference.
This reflects the additional cellular needs to adapt to the hetero-

| DISCUSSION
To disclose spatial heterogeneities of regulation patterns and additional ATP demands of E. coli K12 W3110 exposed to a 54 m 3 STR CFD based lifeline analysis was coupled with agent-based modeling for transcription and translation.
As a prerequisite, a proper model describing stress-induced dynamics of transcription and translation is needed. Applying a clustering approach, it was possible to properly describe the experimentally observed transcription dynamics (Löffler et al., 2016)  | 275 efforts shifting control from σ 70 to σ 38 , more and more (Löffler et al., 2016). Given that E. coli cells with doubling times of 3.3 h contain about 120 active RNAPs (Bremer & Dennis, 2008), approximately one-fifth of the available RNAPs at 25 min is used in the transcriptional response (Figure 3).
The number of involved RNAPs slowed down after 28 h for two reasons. First, the absolute number of initiated transcripts is reduced, which mirrors cellular adaptation. Second, transcription even stopped after 50 s for a large group of genes ( Figure 5). Accordingly, reduced synthesis costs occur. This is amplified by the prolonged lifetime of transcripts during starvation, which further reduces the amount of synthesis to obtain a certain level of mRNA abundance as described in Section 2.4.4 mRNA lifetime is proportional to their distance from the 5′ end of the transcript according to Chen et al. (2015), which is in line with the observation that the 5′ end contains important determinants that regulate RNA lifetime (Arnold et al., 1998).
Supporting Information Appendix Figure B9 shows The approach was exploited further by estimating the entire addon ATP demand for 120,000 newborn cells monitoring 4.5 h process time ( Figure 8). As shown, the adaptation of the population is finished after 4.2 h disclosing a remaining ATP add-on of about 6% NAM compared to the 45% max NAM at the beginning. In terms of microbial productivity, these ATP needs simply reduce the amount of available ATP for product formation; that is, they limit biomassspecific productivities. The phenomenon has often be described in large-scale fermentation (Lara et al., 2006). Noteworthy, it is likely to be pronounced in hyper-producing cells with ATP intensive product formation. Often enough, such production processes run in fed-batch mode, installing reduced, limiting metabolic activity to stay within the technical limits of aeration and cooling. Consequently, those additional ATP needs hit cells with limited ATP forming capacities.
To evaluate the impact of particle simulation time, an additional simulation was conducted using a high-performance computation cluster studying 60,000 particles for about 460 s. Similar results were obtained for the key readouts, that is, time courses of ATP maintenance demands and residence time distributions remained. The simulated adaptation time reduced from 4.2 to 3.7 h, mirroring the lowered amount of particles staying in the starvation zone for the entire process time (around 5%) (see Supporting Information J).
However, modelers need to consider that long simulation times are likely to violate the intrinsic constraint of one-way coupling, neglecting particle-environment interactions for the sake of simplicity.
In this dilemma, we decided for the analysis of 180 s to capture key dynamics while still fulfilling the one-way coupling constraint.
As pointed out by Löffler et al. (2016), the majority of transcription dynamics are caused by the frequent on/off switching of stringent response, mediated by rising intracellular (p)ppGpp levels.
Hence, creating stringent response deficient strains (Michalowski et al., 2017) opens the door to prevent non-wanted NAM increase.
Besides, other cellular stress programs may be targeted as well (Supporting Information Appendix Table D4).

| CONCLUSION
The current modeling approach marries computational lifeline analysis with cellular regulation models, thereby introducing the noninstantaneous cellular response to changing extracellular conditions.
Consequently, the spot of stress induction and the location of cellular phenotype do not need to be the same. Accordingly, heterogeneities in large-scale bioreactors comprise the physical levels linking local conditions tightly with metabolic responses and the cellular regulation level encompassing delayed responses such as transcriptional or translational effects.
To detect the latter and to describe them properly in data-driven models, experimental scale-up simulators are necessary that mirror transcriptional and translational cellular replies as performed by Kuschel and Takors (2020). The setting of such devices may differ from "conventional" scale-up simulators that typically mimic circulation times. Because the entire transcriptional responses should be clearly detectable, rather long stress exposure periods should be installed and read.