Nonlinear state estimation as tool for online monitoring and adaptive feed in high throughput cultivations

Robotic facilities that can perform advanced cultivations (e.g., fed‐batch or continuous) in high throughput have drastically increased the speed and reliability of the bioprocess development pipeline. Still, developing reliable analytical technologies, that can cope with the throughput of the cultivation system, has proven to be very challenging. On the one hand, the analytical accuracy suffers from the low sampling volumes, and on the other hand, the number of samples that must be treated rapidly is very large. These issues have been a major limitation for the implementation of feedback control methods in miniaturized bioreactor systems, where observations of the process states are typically obtained after the experiment has finished. In this work, we implement a Sigma‐Point Kalman Filter in a high throughput platform with 24 parallel experiments at the mL‐scale to demonstrate its viability and added value in high throughput experiments. The filter exploits the information generated by the ammonia‐based pH control to enable the continuous estimation of the biomass concentration, a critical state to monitor the specific rates of production and consumption in the process. The objective in the selected case study is to ensure that the selected specific substrate consumption rate is tightly controlled throughout the complete Escherichia coli cultivations for recombinant production of an antibody fragment.

High-throughput cultivation systems are an essential step in modern bioprocess development (Neubauer et al., 2013).These systems have evolved from simple parallel microtiter plates to highly automatized systems, allowing to run cultivations mimicking industrial conditions, equipped with noninvasive sensors and liquid handlers that are capable to perform complex process control actions (Hemmerich et al., 2018;Teworte et al., 2022).This allows, for example, to run several fed-batch cultivations in parallel, which is the preferred cultivation mode in industrial biomanufacturing for its several advantages as are controlled growth conditions, reduced byproduct formation, and overcoming of engineering limitations, for example, insufficient oxygen transfer into the medium (Hewitt & Nienow, 2007).Typically, a constant specific growth rate between 0.1 and 0.3 h −1 (Lee, 1996) is pursued, whereas the optimal value depends on the strains' maximal growth capacity, product to be expressed, and cultivating conditions.After induction of recombinant protein production, often the feed rate is further reduced, resulting in a lower or even decreasing growth rate.Nevertheless, the maximum substrate uptake capacity is known to vary within the cultivation run, stress factors such as recombinant product formation or fluctuations in the substrate availability over time evoke adaptations in the cells' metabolism (Hoffmann & Rinas, 2004;Lara et al., 2009;Neubauer et al., 1995;San et al., 1994).For example, Sinner et al. (2022) reported an increase in the maximum growth rates for some substrates of a continuous process of Corynebacterium glutamicum growing on spent sulfite liquor, whereas Neubauer et al. (2003) reported a decrease in glucose uptake after induction of recombinant protein production in an Escherichia coli fed-batch process.Hence, adaptive approaches considering variations in model parameter changes during the bioprocess are needed to cope with the reduced description capabilities of tractable process models (Cruz Bournazou et al., 2016;Krausch et al., 2022;Sinner et al., 2022).For this reason, online monitoring of the biomass concentration is essential to have an accurate estimate of both, the current biomass concentration and the growth rate.Unfortunately, despite several efforts and existing tools, online monitoring of biomass concentration in parallel systems is still a challenging task, especially in small vessels and at large throughput (Ebert et al., 2018).
While miniaturized cultivation systems help to reduce the resources needed for process development, it is challenging to obtain information at similar quality as in bench-top bioreactors (Tajsoleiman et al., 2019).Most applications are limited to online monitoring of pH, dissolved oxygen, and temperature (Haby et al., 2019;Janzen et al., 2019;Velez-Suberbie et al., 2018).Some systems, such as the BioLector (Funke et al., 2010;Huber et al., 2009), allow for biomass measurements via online turbidity measurement.However, their microtiter plate format entails other limitations, such as the small reaction volume and limited scalability (Hemmerich et al., 2018).To our knowledge, off-gas analysis has only been implemented in parallel systems at larger scales, which reduces the number of possible experiments, for example, in the Ambr ® 250 Modular (Sartorius Stedim Biotech).Furthermore, Rowland-Jones et al. (2021) show integration of Raman spectroscopy into an Ambr ® 15 mini-bioreactor (MBR) system.However, the user must cope with a challenging implementation as well as the time delay and volume loss due to the necessity of sampling.In general, all aforementioned solutions require additional analytical equipment which translates to higher costs, reduced throughput, and higher risk of failure.
State observers (soft sensors) are widely used to estimate nonobserved process state variables from available measurements (Soroush, 1997).The first implementation of the Kalman Filter (KF) (Chui & Chen, 2017;Kalman, 1960) was later extended to Particle Filters (Chen et al., 2004;del Moral, 1996), which are arguably the most commonly applied observers in bioprocess engineering (Krämer et al., 2020;Müller et al., 2023;Neddermeyer et al., 2016;Sinner et al., 2022).While initially designed for linear systems, numerous alternatives have been proposed to the general KF approach (Kalman, 1960) to better capture nonlinearities in the system, for example, the derivative-free unscented KF (Julier et al., 1995), and the more generalized central differences KF (Ito & Xiong, 2000;Nørgaard et al., 2000).Both filters make use of decoupled sigmapoints to effectively represent the mean and covariance (György et al., 2014).KF algorithms have been implemented for monitoring of various microbial and mammalian processes using online process information, for example, derived from off-gas (Neddermeyer et al., 2016;Wang et al., 2010) or spectroscopic methods such as near-infrared (Krämer et al., 2020;Narayanan et al., 2020).Also recently, several data-driven and hybrid approaches have been developed (Narayanan et al., 2020), which have still not been able to outperform observers based on mechanistic models (Ortega et al., 2021).
Still, there is, to the knowledge of the authors, no implementation of any observer variant in parallel MBR systems.This is especially unfortunate, as the sophistication of such systems increases, and an accurate estimation of the biomass concentration becomes critical to enable advanced operation and control tools to be used.In this study, we implement a Sigma-Point Kalman Filter (SPKF) to estimate biomass, exploiting the information of ammonia consumption and dissolved oxygen tension in an automated parallel cultivation system at mL-scale.As proof of concept, we consider a process for production of antibody fragments in E. coli modeled using a macrokinetic growth model (Anane et al., 2017).The biomass concentration is estimated based on the ammonia added for pH control and online oxygen measurements.In microbial processes, growth leads to nitrogen consumption leading to a drop in the pH (Christensen & Eriksen, 2002;Kawohl et al., 2007;Sundström & Enfors, 2008).At controlled neutral pH, the increase in biomass can be predicted from the base additions.In this study, any additional effects such as acid formation, nitrogen incorporation in a product, or consumption from an additional substrate proved to be neglectable.
Therefore, this study is an answer to the need for methods for real-time monitoring of critical concentrations in MBRs, where the measurement density is usually low and standard sensors from lab or production scale are difficult to implement.The added value of the tool is demonstrated in the redesign of the feed rate in the exponential feeding phase and after induction of recombinant protein production.Mears et al. (2017) presented an overview on feed control strategies, including the popular generic model-based control by feedback linearization (Abadli et al., 2021;Kager, Bartlechner, et al., 2022;Kager, Horst, et al., 2022) and model-predictive control (Kawohl et al., 2007;Krausch et al., 2022;Waldraff et al., 1997).
These allow the direct control of specific rates or the further definition of constraints.However, due to the discontinuities of the present system, many control strategies are challenging to implement.Thus, a simple iterative recalculation of exponential feed rates (Hans et al., 2020) based on the SPKF biomass estimation was performed and found to be sufficient.

| Experimental setup and cultivation conditions
The cultivations were performed using an E. coli B strain with genome-integrated DNA for murine Fab expression in a highthroughput cultivation platform (Kemmer et al., 2023).The cultivation platform consists of a Tecan Freedom Evo 200 (Tecan Group Ltd) hosting a bioREACTOR ® 48 system (2mag AG).While the system allows for up to 48 parallel experiments, here 24 MBRs were used in each run to avoid scheduling issues during sample handling.The batch phase was initiated by automatic addition of cryostock to a start biomass concentration of ~0.006 g L −1 to each MBR filled with 10 mL chemically defined cultivation medium with 10 g L −1 initial glucose.After the batch phase of about 10 h, the cultivations were supplied with a 200 g L −1 glucose feed in a pulse-based manner by the robotic pipetting tips.The exact glucose concentration in the batch and feed medium was determined before the cultivation using the Cedex Bio HT Analyzer (Roche Diagnostics International Ltd).The cultivations were controlled at 37°C, and pH 6.8 ± 0.2 by addition of 7 M ammonia and 3 M phosphate solution.The cultivations were aerated first with 5 L n min −1 pressurized air, and after feed start with 0.6 L n min −1 pressurized air and 0.4 L n min −1 oxygen.
DOT m and pH were measured online every 30-60 s by fluorescence sensors (PreSens Precision Sensing GmbH).At-line samples were pipetted into a chilled 96 well microwell plate containing 15 μL dried 2 M anhydrous NaOH per well, which is dissolved upon addition of the sample leading to an increase in the pH which inhibits cell activity while still avoiding cell lysis in E. coli (Kemmer et al., 2023).The optical density at 600 nm (OD 600 ) was measured on a Microlab ® STAR liquid handling site (LHS) (Hamilton Company) with an integrated Synergy MX microwell plate reader (BioTek Instruments GmbH).Cells were separated from supernatant by centrifugation at 15,000g, 10 min, 4°C, and the glucose concentration was determined in the supernatant using the Cedex Bio HT Analyzer.Acetate could not be measured.Two experimental runs were conducted.The first run (data set 1) was used for parameter identification, tuning of the SPKF algorithm, determination of the measurement noise, and cross-validation of the model.Here, all available online and at-line measurements, namely the OD 600 , glucose concentration, DOT m and the relation between biomass growth and nitrogen consumption X NH 3 were used for the evaluation.The tuned state estimator was applied to monitor and control the second run (data set 2) based on the online measurements of DOT m and X NH 3 only.Table 1 shows the conditions applied during the runs.Conditions were selected to test different growth rates, low and high in relation to the expected q S,max (more details in Section 2.9).For data set 2, one of two different model parameters was predicted (more details in Section 2.6).

| Biomass estimation
It has been known for long that growth, that is, the production of biomass (C H O N α β γ δ ) is linearly associated with the consumption of a C-source, such as glucose (C H O 6 12 6 ), oxygen (O 2 ), and nitrogen (in form of ammonia NH 4 + ), and the production of protons (H + ), carbon dioxide (CO 2 ), and water (H O 2 ) (Christensen & Eriksen, 2002;Kawohl et al., 2007;Roels, 1983;Sundström & Enfors, 2008): Considering that the pH is controlled in the neutral range and production of acids does not occur in significant amounts, relation (1) can be used to determine the biomass growth from the ammonia additions.Ammonia release due to oxidative amino acid degradation (Siano, 1995) and recombinant product formation may influence the pH.However, in this study, these effects were considered negligible as no complex protein sources were provided and the concentration of recombinant product was low compared to the produced biomass concentration.The influence of low concentrations of acetate on the pH in this experimental setting has been found to be negligible in prior experiments (data not shown), which supports previous results (Christensen & Eriksen, 2002;Siano, 1995).
The biomass concentration approximated from ammonia X NH 3 . The first term refers to the dilution, causing the concentration to change over time as the volume is not constant: (2)

| E. coli growth model
The model-based approaches described in the following are based on a macro-kinetic growth model of E. coli (Anane et al., 2017).The generalized nonlinear differential equation model is given by: ẋ t f x t u t θ t x t x ( ) = ( ( ), ( ), , ), ( ) = .
0 0 (3) It is considered that the model equation f is present in a time- continuous form and the output equation h is only needed at discrete times t k since measurement samplings occur at distinct predefined time points.Here, m T n x represent the system states, namely concentrations of biomass X, substrate (glucose) S, acetate A, and the measured dissolved oxygen tension DOT m .The model output is denoted by . Since, in our case, we assume that DOT m can be measured directly and X NH 3 is obtained before the state estimation from the ammonia additions, the output function h takes a very simple linear form.The mechanistic model is described in greater detail in the Supporting Information.The model is integrated into our in-house developed framework for simulation, computing all gradients with symbolic differentiation (Andersson et al., 2012), parameter estimation using the packages lmfit (Newville et al., 2014) and pygmo (Biscani & Izzo, 2020), as well as state estimation as described below.
Samplings and inputs (e.g., feed, base, or acid additions) are given as boluses.For more details, please refer to section "Calculation of dilution by bolus additions" in the Supporting Information.

| Parameter identification
The model parameters are determined by a weighted least squares regression over all measurement variables ∈ l m [1, …, ] and all l minimizing the cost function (Cohen & Migliorati, 2017): where w l is a user-defined weighing factor to individually scale the importance of the corresponding measurement (usually 1), N l the total number of sampling time instances, y l k M , the measurement, and h l the simulated output given the parameters θ.Outliers were removed based on the criteria that values do not deviate more than 1.96 standard deviations from the mean (z .975 ) and must not have a coefficient of variation higher than 0.15.Measurements and simulated values were scaled using the sklearn RobustScaler (Pedregosa et al., 2011).As the parameters are strongly correlated, a subset selection is performed based on dynamical parameter sensitivities (López et al., 2013).Subsequently, a parameter estimation is performed with the parameter subset.Note: A total of 24 MBR cultivations were performed per run in three columns and using all eight rows in a randomized distribution.Each set of experimental conditions was performed in at least three replicates.For data set 2, the model parameters qS,max or Y XS em , were estimated by the state estimator and feed adaptations were performed based on these estimations in six sets for three MBRs each.A "constant" feed refers to no exponential increase of the feed rate during the induction phase.Here, the feed rate is the last value of the exponential phase.
Local parameter sensitivities were calculated via the integration of the state sensitivity matrix.The most favorable parameter candidates suitable for a parameter estimation can be determined.Here the sensitivity of the parameters q S,max , q m , and Y XS em , were analyzed regarding the states X, S, and DOT m .For this, the change of the output h is calculated for small changes in the parameters θ: The state sensitivity can be obtained by the linear, time-invariant matrix-differential-equation: whereby state sensitivities are zero for t t = 0 , since the initial state is known and independent from parameter values.For a better comparison, each of the unique elements of the sensitivity matrix s P m n , , are normalized by the parameter values θ and the mean value of the states x ̅ ˆ:

| SPKF
To incorporate all available information of the system, namely the model prediction and measurements with their respective uncertainties, an SPKF has been implemented in the form of a central differences KF (Nørgaard et al., 2000).There are two steps in the estimation procedure: the time update and the measurement update.
During the time update, no measurement information about the current system state is available and the estimator relies on the system model only.Here, the model equation f (Equation 3) is used to predict the evolution of the system state.
During the measurement update, measurements from one or more sensors arrive and are incorporated to update the state estimates.The current state and covariance estimates, x ˆk − and P k − for a given time point k are corrected using a weighted difference between the measured values y (resp.for the at-line ( ) ( ) with the Kalman gain K k calculated with the covariances P yy − and P xy − .
The reader is referred to the Supporting Information for a detailed description of the complete SPKF algorithm.The filter is designed to take state bounds into account through the projection method (Simon & Tien Li Chia, 2002).Furthermore, delayed at-line measurements are directly updated through a state augmentation method (van der Merwe, 2004) allowing for an optimal fusion of the delayed measurements at arrival without the need for recursive filter recalculations.This significantly lowers simulation time for the at-line scenario, especially, considering large delays of up to 40 min and MBRs running in parallel.
Additionally, adaptive parameters are also efficiently estimated by the SPKF: Here, δθ SE stands for a perturbation of the parameter subset θ SE .
Since parameters are by definition constant over time, a zero vector is added to the model function f .For symmetric parameter bounds, van der Merwe (2004) derived a very effective description of the dual-estimation problem: where θ i,0 is the i-th nominal, or initial, parameter value.In the SPKF, , is estimated and afterwards converted to the actual parameter value via Equation ( 13).This description has shown good behavior in terms of convergence speed and stability since δθ SE i , is unbounded.Furthermore, to be able to use nonsymmetrical parameter bounds, Equation ( 13) can be extended to: were lb and ub represent the physical or predefined lower and upper bound of the specific parameter respectively.This formulation allows for a greater flexibility in the parameter estimation, especially when parameters are initialized near parameter bounds or to promote a certain direction for the estimation.

| Measurement noise
To account for varying measurement errors depending on the measured DOT value, the standard deviation σ DOT m M was experimentally determined and then approximated through a polynomial of 1st order: For the calculated biomass concentration from ammonia addition X NH 3 , the standard deviation σ X NH 3 can be approximated originating  , thus the confidence that can be given to X NH 3 drops slightly with every further base addition.

| KF tuning
To tune the filter, the initial covariance P 0 , the system spectral density matrix Q t ( ), and the covariance matrix of the measurement noise R k need to be designed: Since in the present case, all states can be measured directly, P 0 can be built with the corresponding variances at initial time t 0 of the individual measurements for all estimated states (σ ,0 0 ).The system noise Q contains all information about uncertainties in the model formulation.Since model uncertainties and disturbances are unknown, it has been shown practical to set all σ i to 1 and only adjust the k Q i , in the modeling process.Finally, in R k all information about measurement noise is collected.The term T k i , stands for the sampling time interval for measurement i at time instance k and is used to weight high- frequency samples.The standard deviation σ i M is computed using Equations ( 15) and ( 16).
Hyperparameters k P i , , k Q i , and k R i , in Equation ( 17) characterize additional scaling factors for further refinement of the SPKF design.
They were determined in a comprehensive design process based on four historical experiments (of data set 1) with differing growth conditions to achieve good estimations for a wide range of conditions.To verify the generality of the design, cross-validations have been performed on the remaining 20 experiments.

| Model-based adaptive feeding
In fed-batch cultivations, an exponential feeding strategy F exp [Lh −1 ] is applied to keep the growth rate constant during exponential growth.
Since growth might deviate from its predicted exponential trajectory, we performed a repeated adaptation of F exp .As the maximum growth rate μ max is not directly included in the model, the related setpoint for the substrate uptake rate q S set , [h −1 ], specific maintenance coefficient q m [gg −1 h −1 ], yield of the biomass on the substrate, exclusive maintenance Y XS em , [gg −1 ] are used instead: Here, F i 0, [L h −1 ] stands for the initial feed rate, which is recalculated at the beginning of each interval i (every 15 min) from q S set , , the time from the beginning of the interval t [h], glucose concentration in the feed S feed [gL −1 ], current volume V i 0, [L] and current biomass concentration X i 0, [gL −1 ] present at the i-th interval: To achieve substrate uptake at a defined fraction of the maximum specific substrate uptake rate q S,max [gg −1 h −1 ], q S set , was calculated by multiplying q S,max by a factor φ 0 ≤ ≤ 1: Different cases were investigated in which, besides the states, also either one of the parameters q S,max or Y XS em , is estimated by the SPKF.The updated parameter value is then used in Equations ( 18) and (20).As the feeding was applied discontinuously using bolus additions V Δ j , the feed rates were discretized into pulses of 5-min intervals, adapted from Anane et al. (2019): This method, therefore, attempts to predict the optimal bolus feed additions for a specific time into the future based on the assumption of exponential growth under possibly changing metabolic parameters while optimal feed conditions are recalculated every 15 min.This results in cycles of consecutive short batch phases with initial substrate excess, followed by starvation, with a substratelimited mean supply.
Figure 1 shows the technical realization of the model-based adaptive feeding.The SPKF continuously gives an estimate about the current system state x ˆand selected parameters θ ˆregarding to all available measurement information y M and with respect to process noise w and measurement noise v (see Sections 2.6-2.8).The updated state and parameter values are given to the controller where the following bolus feed additions are recalculated (Equations 18-21) and carried out by the automated needles.

| Determination of the goodness of fit
The coefficient of determination R 2 was used to quantify the goodness of fit of the model and the state estimation for all states i separately.
Therefore, values of PI determine the percentage increase in the goodness of fit.Negative values for PI imply a decrease in the goodness of fit.The PI includes the biomass and substrate concentrations, as well as DOT m .

| RESULTS AND DISCUSSION
In this study, we developed an approach for adaptive feed centered around the biomass estimation using the base added by the pH controller (Figure 1).The plant model mismatch was tackled adapting one of two model parameters, namely (1) q S,max , the maximum glucose uptake rate, or (2) Y XS em , , the yield of biomass on substrate excluding maintenance.The SPKF was first tuned using experimental data from a campaign with 24 MBR cultivations.Following, the SPKF was validated in a second experimental campaign with again 24 parallel MBR cultivations.

| Biomass estimation based on ammonia addition
The yield of biomass per added base Y X H / + (see Equation 2) was determined from the relation of added ammonia and the biomass concentration.The biomass concentration was obtained from measurements of the OD at 600 nm (OD 600 ) X OD , after prior calibration of the OD 600 versus cell dry weight determination.Due to the low initial biomass concentration (~0.006 g L −1 ), growth during the first 8 h of the batch phase has an insignificant influence on the pH.Thus, very few base additions occur.Once the feed starts, the approximation of the biomass concentration based on added ammonia X NH 3 (Figure 2a,b) is in good agreement with the X OD values.
Table 2 shows the NRMSE for data set 2 of the X NH 3 to the at- line OD 600 -based biomass concentration measurements X OD .The NRMSE for X NH 3 differ depending on the feed rate and the experimental phase.In the batch phase, the NRMSE is below 3% for most experiments, and in the fed-batch phase at least below 10%.
In the induction phase, especially at high feed rates, the X NH 3 measurements increasingly deviate from the biomass based on OD 600 .Incorporation of nitrogen in the product is expected to be negligible due to the low amount of produced product (<0.3g L −1 ).
However, an increased addition of ammonia might be caused by the production of acids.While acetate was not observed at elevated levels, other by-products might be produced, which could attribute to changes in the metabolism after induction (Hoffmann & Rinas, 2004).
An increasing uncertainty of X NH 3 is included in the estimation algorithm.However, after induction, the user should be aware of a possible mismatch.An alternative is the correction with at-line OD 600 measurements after induction to adapt Y .

| Sensitivities
The sensitivities of most relevant states, namely the biomass concentration X , the substrate concentration S, and the measured dissolved oxygen tension DOT m , with respect to three parameters (q S,max , Y XS em , , and q m ) are shown in Figure 3.As expected, the individual evolutions for all parameter sensitivities follow a similar trend increasing during the batch phase but with different magnitudes.On average, values for the sensitivities of q S,max and Y XS em , reach similarly high values over the course of an experiment and were hence selected as candidates for the adaptive framework.In contrast to that, the values for q m are up to more than one magnitude lower, which leaves q m as a less suitable parameter for the parameter estimation.After the depletion of substrate, sudden changes in the sensitivities can be observed.
Sensitivities of the substrate with respect to the parameters drop rapidly close to zero and start to fluctuate with changing substrate concentrations after feed start due to the bolus feeding profile.Here, the values stay rather low under substrate-limited conditions.Also, sensitivities of biomass with regard to the parameters are dropping after substrate depletion but manage to recover over time.
Furthermore, the sensitivity of DOT m with regard to all parame- ters increases in the substrate-limited region.Still, since experimental results showed rather high deviations of predicted values of the DOT m (probably due to sensor delays), this has been buffered in the estimator by increasing the uncertainty and measurement noise for DOT m .

| Tuning of the state estimator
Due to the very low inoculation concentration, the SPKF was initiated after 8 h of cultivation, once sufficient biomass is present in the MBR.The selected values of hyperparameters for the system noise k Q i , , the initial covariance k P i , , and the measurement noise k R i , are shown in Table 3.For DOT m , high values for k Q i , were chosen to account for system uncertainty, mostly on the delay of the sensor which is not yet sufficiently described by the model.
The performance of the SPKF using at-line biomass and substrate measurements was compared against an SPKF using the estimation of biomass based on ammonia additions X NH 3 in addition to measurements of DOT m (Figure 4).In both cases, the filter shows good performance for the estimation of X, S, and DOT m , and can predict the end of the batch phase (total depletion of the substrate glucose) with high accuracy.In the at-line scenario (Figure 4b), delayed at-line measurements of X and S cause sudden changes in the state estimates.The filter optimally fuses the state estimates with these measurements directly at measurement arrival without the need of a filter recalculation.Therefore, measurement updates do not align with measurement samplings.As the time delays differ for X and S, a first adaptation is performed toward X M after ~15 min and then a following adaptation toward S M after ~40 min, after arrival of the respective measurement.The SKPF using the base addition (Figure 4a) gives much smoother estimates detecting deviations from the true system state significantly earlier.This clearly suggest that the SPKF can be used to replace biomass sampling and determination via the OD.
It is worth noting that at ~12 h cultivation time, after depletion of the glucose, a decrease in the oxygen concentration occurs, which can be attributed to the consumption of residual glycerol originating from the cryostock.Although not included in the model, this supply of carbon source is detected by the state estimator and leads to a small positive correction of the biomass estimate.The rise in DOT m after ~15 h cultivation time is due to the switch to a higher oxygen content in the inlet gas.T A B L E 2 NRMSE for X NH 3 is given as mean ± standard deviation depending on the feeding regime (low/high feed rate) and the experimental phase (batch, fed-batch, induction) (data set 2).

| Online monitoring via nonlinear state estimation in MBRs
Figure 5 shows the initial predictions of the macro-kinetic growth model without corrections compared to the state estimation in (a) for a lower feeding rate and estimation of q S,max , and (b) for a higher feeding rate and estimation Y XS em , .It can be clearly observed that state estimation and parameter adaptations are required to overcome the model-system mismatch and compensate for process disturbances and uncertainties in the initial conditions.Using the a priori parameter estimates, an underestimation of X by the model prediction can be seen throughout all experiments during the batch phase.Reasons for this might be a lower-than-expected inoculum volume or cell count.Additionally, the model overestimates the E. coli growth during the exponential feeding phase, and especially after induction.With the SPKF, a much better fit to all at-line and online measured data is achieved.Valuable information given by the DOT measurements and the biomass relation X NH 3 are successfully exploited to give accurate estimations of S and to reliably predict the batch end.
For a lower feeding (Figure 5a), the state estimates show a very good fit with low NRMSE values for X, S, and DOT m (see Supporting Information: Table A3).For a higher feeding rate (Figure 5b), still a very good overall performance can be seen.However, during the production phase, an increasing mismatch between estimates and measurements of the biomass concentration can be found.The model prediction overestimates the biomass concentration.A cause could be stress leading to decreased growth.The pulse-based feeding in the MBRs introduces oscillations in terms of substrate availability presenting a high-frequency stress (Neubauer et al., 1995).Further, induction of recombinant protein synthesis subjects the cells to metabolic burdens, which can inhibit the growth and alter the catabolism and anabolism (Hoffmann & Rinas, 2004).Parallel synthesis and degradation of the product can result in a decreased yield (Hoffmann & Rinas, 2004).Additionally, when production of recombinant proteins is induced at too high growth rates, alternative pathways for energy generation are activated (Hoffmann & Rinas, 2004), which could explain why the effect is more pronounced at higher feed rates.

| Parameter estimation
The SPKF was used to estimate a selection of parameters in addition to the states.The adaptation of q S,max (Figure 5a) suggests an increased substrate uptake rate after the start of the feeding.After a temporary drop, Y XS em , (Figure 5b) is estimated to increase during the fed-batch phase and then to stay nearly constant during production, indicating an overall rise of Y XS em , .During the substrate-limited fed-batch, a reduction of q S,max due to a degradation of excess transporters has been observed Abbreviation: SPKF, Sigma-Point Kalman Filter.(Lin et al., 2001).However, in contrast to the continuous supply of glucose via pumps in large-scale bioreactors, in the MBR system, the feed is added by the needles of an LHS in small boluses.This discontinuous supply subjects the cells to oscillating conditions regarding the carbon source.In scale-down experiments, during oscillations in the glucose availability an increased specific glucose uptake rate has been observed (Neubauer et al., 1995).Still, the absolute biomass increase is mostly independent of q S,max , and changes in this parameter rather influence the oxygen consumption.
Regarding Y XS em , , the lower growth during the feeding phase and after induction could be explained by a decreasing rather than an increasing value.
The reason for this rather unexpected behavior probably lies in the oxygen demand, which is higher than what would be expected regarding the low biomass concentration.As mentioned above, this can be compensated for by parameter changes, such as a higher q S,max value, resulting in a faster consumption of substrate and thus oxygen.
Additionally, the SPKF by design adapts the parameters to achieve a better fit of all states to the available data.Thus, parameter changes are a reaction to metabolic adaptations of the organism, but may also compensate other effects, such as inaccuracies in the model.While T A B L E 1 Overview of cultivation conditions.
the ammonia solution.△ σ V NH 4 OH was experi- mentally determined to 1.25 μL and has been found to be independent of the added volume.Equation (16) shows that the uncertainty rises with n NH OH 4

F
I G U R E 2 Biomass concentration calculated from base addition.The biomass concentration calculated from the cumulated ammonia additions (dark blue line), the measurements of the biomass concentration based on OD 600 (red circles), and the added ammonia volumes (light blue bars) are shown for two different cultivations from preliminary experiments with (a) a lower and (b) a higher exponential feeding rate after batch end.

F
I G U R E 3 Local parameter sensitivities (normalized) of q S,max (a), q m (b) and Y XS em , (c) regarding the states X (blue dashed line), S (black solid line), and DOT m (black dotted line) for the conditions of the calibration data set 1. Time points of feed start and inoculation are marked in the plots.The values of the sensitivities are displayed in logarithmic scale while only absolute values above 10 −4 are shown.T A B L E 3 Design matrices of the SPKF.Design parametersX [gL −1 ] X NH3 [gL −1 ] S [gL −1 ] A [gL −1 ] DOT m [%] q S,max [gg −1 h −1 ] Y XS,em [gg −1 ] high values have been chosen during the tuning of the SPKF for the hyperparameters k Q DOT , m and k R DOT , m , this still indicates a dependency on the noise in this measurement during the feeding phase.

F
I G U R E 4 Results for the state estimation.Data from one of 24 mini-bioreactor experiments (data set 1) is shown using the Sigma-Point Kalman Filter (SPKF) with online measurements of DOT m and (a) online measurements of X NH 3 , and (b) at-line measurements of X and S. Exponential feed was applied with φ = 0.25 (before induction)/φ = 0.1 (after induction).State estimates with area of uncertainty are shown for X NH3 (dark blue line), as well as X, S, and DOT m (dark gray line).Measurements are displayed as red dots, and model prediction without state estimation as dashed gray line.F I G U R E 5 Estimation of state variables and model parameters.Data from two of the 24 mini-bioreactor experiments are shown, in (a) with estimation of q S,max (reactor 4, condition 4), and (b) with estimation of Y XS em , (reactor 8, condition 8).State estimates with area of uncertainty are shown for X NH3 (dark blue line), as well as X, S, DOT m and estimated parameter q S,max /Y XS em , (dark gray line).Measurements are displayed as red dots, and model prediction without state estimation as dashed gray line.Below, the added feed rate (dark blue line), the originally planned feed rate (gray line) and the added feed volumes (light blue bars) are shown.
Kalman filter and control concept.The cultivation process can be approximated by a process model.Over the known input u, external interventions can be applied to the process.All unknown disturbances and model inaccuracies are described by the process noise wthis distinguishes the real system from the simulation.The prior state x ˆ− is predicted by the model equations only.The measurements y M and measurement noise v, are incorporated to get a more accurate posterior state and parameter estimate.A controller computes new feed setpoints based on the current state of the process, which are realized by the needles of a liquid handling site.
To give a suggestion of the performance improvement (PI) obtained by the state estimator, a quantitative measure is introduced, which relates the distance between the fit of the state estimation R SE 2, and the evaluation of the model function without F I G U R E 1 state estimation (further referred to as model prediction) R Mod 2, .to the distance between the model and 100% fit: Note:The NRMSE is normalized to the max-min of the measurements for the whole experiment.