Parameter and state estimation of backers yeast cultivation with a gas sensor array and unscented Kalman filter

Abstract Real‐time information about the concentrations of substrates and biomass is the key to accurate monitoring and control of bioprocess. However, on‐line measurement of these variables is a challenging task and new measurement systems are still required. An alternative are software sensors, which can be used for state and parameter estimation in bioprocesses. The software sensors predict the state of the process by using mathematical models as well as data from measured variables. The Kalman filter is a type of such sensors. In this paper, we have used the Unscented Kalman Filter (UKF) which is a nonlinear extension of the Kalman filter for on‐line estimation of biomass, glucose and ethanol concentration as well as for estimating the growth rate parameters in S. cerevisiae batch cultivation, based on infrequent ethanol measurements. The UKF algorithm was validated on three different cultivations with variability of the substrate concentrations and the estimated values were compared to the off‐line values. The results obtained showed that the UKF algorithm provides satisfactory results with respect to estimation of concentrations of substrates and biomass as well as the growth rate parameters during the batch cultivation.


INTRODUCTION
The ability to measure primary process variables, such as biomass, substrate and product concentrations is essential in order to guarantee the successful operation and automatic control of bioprocesses at their optimal state. But The classical Kalman filter and its nonlinear extensions are a type of such software sensors, which have received a huge interest for state estimation of bioprocesses. In general, the Kalman filters combine the information of a process model and the available measurements for state and parameter estimation. As Harvey [5] pointed out, in the classical Kalman filter, both the process model and the measurement equation of the state space model are linear. However, most bioprocesses are highly nonlinear, therefore the classical Kalman filter cannot be used for state estimation in such processes. Different nonlinear extensions of the Kalman filter are available which mostly differ in the approximation of the prediction uncertainty. The extended Kalman filter (EKF) is a standard nonlinear estimation technique which can handle nonlinearity's by local first order Taylor approximations of the non-linear functions in the model. Implementation of EKF algorithms for state and parameter estimation in bioprocess have been reported in numerous studies. Lisci and Tronci et al. [6] have implemented an extended Kalman filter for state estimation in fed-batch cultivation of S. cerevisiae based on temperature, dissolved oxygen and substrate concentration measurements. Krishna et al. [7] have implimented an EKF algorithm for estimation of lactose concentration in fed-batch cultivation of Kluyveromyces marxianus based on dissolved oxygen measurements; Krämer and King [8] used an EKF for estimation of substrate and biomass concentration in fed-batch cultivation of S. cerevisiae; Lee et al. [9] applied an EKF algorithm for noise filtering from the dissolved oxygen measurements during batch cultivation of E. coli and Hitzmann et al. [10] implemented an EKF complemented by a special flow-injection analysis system for glucose measurements during fed-batch cultivation of S. cerevisiae. Based on the estimation, a feed forward PIcontrol with a set point of 0.5 g/L was carried out. The mean deviation of the set point and the estimated value as well as the set point and the measured value were 0.05 and 0.11 g/L respectively. A similar approach is discussed by Klockow et al. [11] where the time delay of the measurements was compensated by a ring-buffer. They showed that a set point of 0.007 g/L can be realized reliably.
Usually the well-known EKF shows good prediction results. Nevertheless, in spite of the reported satisfactory results, it has some disadvantages. It is reliable for systems which are almost linear on the time scale of the update intervals; it requires the calculation of Jacobians at each time step, which may be difficult to obtain for higher order systems; it does linear approximations of the system at a given time instant, which may introduce errors in the estimation, leading to a state divergence over time [12,13].
The unscented Kalman filter (UKF) is another nonlinear extension of the classical Kalman filter which is very similar to the EKF, but instead of approximating the non-linear

PRACTICAL APPLICATION
In a previous study we have designed and implemented a model-based calibrated gas sensor array for on-line measurement of ethanol concentration in batch cultivation with the yeast S. cerevisiae [1]. The obtained results indicate that the gas sensor array was able to predict ethanol concentration with high accuracy. However the predicted values are only available every five minutes. Therefore in this work, in order to have continues values of ethanol concentration as well as the values of biomass, glucose and the growth rates, we have implemented an unscented Kalman filter (UKF) algorithm. The obtained results indicate, that accurate continues concentrations of the state variables as well as the growth rates can be obtained by the UKF algorithm. No off-line measurements for calibration are required in the proposed algorithm. The proposed method is a cheap alternative to other tools that are used for monitoring yeast cultivations such as spectroscopy based methods. process model by calculating the Jacobian of the dynamics for the determination of the estimation error variance, the transformed probability distributions are approximated directly. This is done by representing the distribution by a set of chosen sample points, transforming these points by the non-linear model function, and then approximating the mean and variance of the transformed distribution by the mean and variance of the transformed points [14]. In recent years, a number of authors have demonstrate the successful application of UKF algorithms for state and parameter estimation in bioprocesses. For instance, Jianlin et al. [15] have implimented an UKF algorithm for biomass and substrate prediction based on dissolved oxygen and carbon dioxide measurements in a fed-batch cultivation of S. cerevisiae. Using the same microorganism, Simutis and Lübbert [16] have applied an UKF algorithm for estimation of biomass and its specific growth rate based on oxygen uptake and CO 2 formation rate measurements in a fed-batch cultivation. Furthermore, Krämer and King [17] have used an UKF algorithm for filtering out noise from measured state variables which were predicted with a near infra-red spectrometer in a fed-batch cultivation of S. cerevisiae.
As it can be seen, previous studies have exclusively focused on implementing UKF algorithms in fed-batch cultivations, however on-line monitoring and estimation of state variables in batch cultivations is also crucial in order to achieve high productivity over the process. Therefore, in this contribution an UKF algorithm is designed for on-line estimation of biomass, glucose and ethanol concentration as well as for estimating the growth rate parameters in S. cerevisiae batch cultivation, based on the infrequently available ethanol measurements. In order to evaluate the reliability of the UKF, the proposed algorithm was validated on three different cultivations with variability of the substrate concentrations.
This paper is organized as follows: In the coming section of this work, the experimental setup and the cultivation conditions, the dynamic model of S. cerevisiae batch cultivation and a brief description of the on-line ethanol measurement method as well as the unscented Kalman filter are described. In section 3 results and discussion is presented, and section 4 concludes this paper.

Batch cultivation process
Three batch cultivations of S. cerevisiae, named BC1, BC2, and BC3, were performed. S. cerevisiae (fresh baker's yeast, Oma's Ur-Hefe) was pre-cultivated before fermentation. 5 g of baker's yeast was used in all cultivations. The baker's yeast was inoculated into 100 mL Schatzmann medium [18] and after shaking for 10 min, they were added into the stainless steel tank bioreactor (Minifors, Inifors HT, Bottmingen, Switzerland). The medium used for batch cultivations was the same as for the pre-culture, but with 9 g/L, 5 g/L and 2.85 g/L glucose for BC1, BC2, and BC3 respectively and 1 mL/L trace elements solution. All three batch runs were operated at the same conditions, that is, a constant temperature at 30 • C and a maintained pH at 5. The aeration and agitation rates were kept constant at 3.5 L/min and 450 rpm, respectively. Detailed experimental conditions of the cultivations are described by Yousefi-Darani et al. [1].

Nonlinear process model
For modelling the process, an ideal stirred tank reactor in batch mode has been assumed with a cell growth kinetic approximated by the Monod model, where the substrate glucose as well as ethanol (when glucose is depleted) are the single growth-limiting factor. According to the mass balance, the dynamic process model consists of the following equations [19]: were G, E and X are the glucose, ethanol and the biomass concentrations, respectively. Y X∕G , Y E∕G and Y X∕E are the yield coefficients with respect to the conversion from glucose to biomass, glucose to ethanol and ethanol to biomass, respectively. μ G and μ E are the specific growth rates on glucose and ethanol, respectively and are calculated as , and , are the maximum specific growth rates on glucose and on ethanol, respectively. The values for , and , are estimated with the UKF filtering.
Therefore they are described with two additional differential terms, which do not alter the values: In this contribution, the yield coefficients as well as the Monod equation's constants (K E and K G ) have been fixed to values which were chosen from literature [1,20].

On-line ethanol measurements
The on-line ethanol measurements were performed in a self-developed measurement system which contains two main parts, namely the headspace sampling system and the measurement chamber. Headspace sampling procedure consisted of an automated sequence of internal operations. First the head space samples of the bioreactor are pumped past the measurement chamber for 10 s at a flow rate of 400 mL min −1 with a diaphragm pump (Schwarzer Precision, Essen, Germany) every five minutes. The measurement chamber has a volume of 250 mL and contains a gas sensor array which is equipped with commercially available metal oxide semiconductor (MOS) gas sensors (TGS 822, TGS 813 and MQ3). In the next step the chamber is flushed by pure oxygen for regeneration. By preforming these steps, a peak shaped measurment signal is obtained.

F I G U R E 1
The operation scheme of the on-line ethanol measurement system and the UKF algorithm for continuous state variables and parameter estimation Ethanol concnetration is obtained from the raw signals by implementing signal processing methods and a chemometric model, which is described in detail in the literature [1]. Using the ethanol measurement in the gas phase, the ethanol concentration in the liquid phase of the cultivation broth is determined. Every 5 min a new ethanol measurement value is sent to the UKF algorithm. As measurement model the identity is used, i. e. the ethanol value itself. The operation scheme of the on-line ethanol measurement system and the UKF algorithm is presented in Figure 1.

Unscented Kalman filter
In this work, an UKF is implemented to estimate continuous ethanol, glucose and biomass concentrations as well as the maximal growth rates during S. cerevisiae batch cultivation. As all Kalman filter approaches, the UKF calculates the most probable system state by appropriately weighing model predictions and actual measurements according to model uncertainty and measurement error, respectively. The UKF was chosen here over other extensions of the Kalman filter since it accurately calculates the statistical distributions of even nonlinear systems. The UKF algorithms consist of two steps namely the prediction step (time update) and the filtering step (measurement update) which are summarized as follows: • Prediction step (time update): Using the last known statêthe process model is used to predict the state variables until the next measurement is available. For the Kalman filtering to work, the state covariance must also be estimated somehow from the last known state covariancê. In the UKF the unscented transformation is used to estimate this state covariance.
The idea is to use a collection (2n+1, where n is the number of state variable) of well-chosen system states [22], based on the last known state and state covariance and then propagate/predict the system state from each of those so called sigma points. The predicted system state k is then the weighted average of these 2n+1 predictions. The estimated covariance is essentially the weighted covariance of the same predictions.
• Correction step (measurement update): Whenever a measurement is available, the predicted state values k are combined with the measured values to provide corrected or filtered state estimateŝ. For this reason the Kalman gain matrix K k must be calculated: Here R is the measurement error, H is the Jacobian matrix of the measurement model h ( ) and P k is the already mentioned estimated state covariance matrix of the prediction. The corrected or filtered system statêis essentially the weighted average of the predicted system state and the measurement with the Kalman gain as weight: The filtered state covariancêis updated based on the estimated state covariance , the Kalman gain and the process noise covariance matrix : Δ is the time difference to the last known state/ measurement.
The filtered valueŝand̂can then be used as initial conditions for the next prediction/ simulation of the process as well as for the estimation of the covariance until the next measurement is obtained and everything repeats again.
The reliability and quality of a Kalman filter estimator can be evaluated by observabilty analyses, the theory of which has been established previously [21,22]. Observabilty analysis provides an assessment of the theoretical possibility of estimating the state variables or parameters of the system from the available measurements (sensors) [23]. Due to Salau et al. [24], if the idea is to use the smallest number of sensors in order to simultaneously estimate the state and parameters of a system, using the Kalman filter makes it impossible to find complete system observabilty. For instance, in the process considered here, the two additional differential equations for parameter estimation of μ max,G and μ max,E produces a Jacobian matrix with corresponding row elements equal to null and therefore the observability is not given. However, due to the chosen process model with low correlation of the state variables, the observabilty of the process is guaranteed, since in this case, diagonal time-invariant matrices Q and R can be successfully applied, which makes the UKF tuning considerably simple. A more comprehensive description about implementation of the UKF algorithm as well as the observabilty analysis can be found in [25][26][27][28].
In this study, a continuous-discrete UKF is used, i.e., a continuous time update and a discrete-time measurement update. Table 1 presents the initial values for the UKF filter as well as the parameters of the model.
The UKF was implemented using the software Matlab R2019a (version 9.6.0); for all calculations a normal office PC (Intel CoreR i5 8500 with 8 GiB of RAM) with Window 10 was used. For the simulation, the system of in total 5 differential equations was solved numerically using the explicit, Runge-Kutta based ode45 method from Matlab.

Off-line measurements
Samples for analysing the concentrations of biomass, glucose, and ethanol were regularly taken from the bioreactor and put into pre-weighed and pre-dried micro-centrifuge tubes.
i represents the predicted ethanol concentration from the UKF algorithm or the measured on-line ethanol concentration, Y i is the concentration determined by the offline values, N stands for the measurement count and is the maximum ethanol concentration in the corresponding off-line data.
To evaluate the accuracy of the estimated biomass and glucose concentrations from the UKF algorithm, the RMSE between the predicted values and the off-line values as well as the SE with respect to the maximum offline value were calculated.

RESULTS AND DISCUSSION
The UKF presented in section 2 is used for continuous estimation of ethanol concentration on the basis of infrequent on-line ethanol measurements from the gas sensor array. The UKF algorithm was also used for estimation of biomass and glucose as well as estimating the maximum growth rate on glucose and ethanol. The algorithm was validated on three cultivations with different initial conditions. Figure 2 presents the performance of the UKF for the estimation of ethanol concentration during three cultivation runs.  Table 2. Table 1 shows the performance of the UKF compared with the case where the ethanol concentrations were measured off-line. In BC2 the RMSEP and SEP of the offline measured ethanol concentration is considerably larger compared to the filtered ones. In BC1 and BC3 the UKF slightly increased the accuracy of the ethanol concentrations. However, the standard error of estimated ethanol concentration with the UKF during all cultivations is below 5% which is a decent value. The UKF is also able to accurately estimate the concentrations of biomass and glucose during the cultivations. Figure 3 presents the estimated values as well as the offline values of biomass and glucose for BC1 -BC3. In BC1 the estimates of the biomass concentration between 6 and 8 h cultivation time, deviate from the off-line values, however their evaluation is almost the same. This might be due to faulty sample handling or measurement errors. The data in Figure 3 indicates that the typical diauxic growth pattern of baker's yeast on glucose is observed. First the glucose is consumed and biomass as well as ethanol are produced, then ethanol is converted to biomass. The offline measurements and its corresponding estimated values fit quite well together as can be seen in Table 3.

TA B L E 2 Comparison of off-line measured values and UKF estimated ethanol concentration
As illustrated above, the UKF is able to accurately predict the state variables in all three cultivations, based on the available infrequent on-line ethanol measurements. However, as indicated in Figure 2 and Table 2, the measured ethanol concentration is not noisy, therefore an accurate estimation of the state variables by the UKF is not far from expectation. Therefore, in order to check the prediction ability of the UKF algorithm when the measurement data are noisy, the measured ethanol values were artificially distorted by random noise and the UKF algorithm was performed. Figure 4 shows the estimated as well as the off-line values of state variables of BC2, by the UKF with considering noisy ethanol measurements. As it can be seen, even if the measurement values are artificially distorted by random noise, the UKF does not show much different results in predicting the state variables (SEP for all state variables is below 6%). The results for the other two cultiva-tion data records are qualitatively the same, and are thus not repeated here.
As already stated previously, the UKF algorithm was also used for estimating the maximal specific growth rates. To prove the capability of the UKF for estimating the maximal specific growth rates, different starting values were chosen for these parameters in each cultivation. These values are chosen according to a rough estimate of these parameters. Figure 5 presents the estimated maximum specific growth rates with respect to glucose μ max,G and ethanol μ max,E as well as specific growth rates itself for glucose μ G and ethanol μ E across all three cultivations.
In BC1 the μ max,G and μ G are increasing sharply short after the inoculation starts, this indicates that the chosen starting values are lower than the actual values, therefore the UKF algorithm converges to the true values. When the glucose is almost depleted, the transition from glucose to ethanol as substrate takes place, therefore μ max,E and μ E would start to increase. However, shortly before glucose is completely depleted, μ max,G increases which results in the decrease of μ E , therefore the UKF increases the μ max,E and μ E to compensate the under estimation of μ E . According to the typical Monod behaviour, before ethanol is depleted, due to the low substrate concentration, μ max,E should be almost constant while μ E should be increasing. However this is not observed in BC1 which is due to the fluctuation of the measured and estimated ethanol concentration which can be seen between 4 and 7 h of cultivation time.
In BC2, after inoculation, the specific growth rates and its maximum values with respect to glucose are increasing slightly. This indicates that the chosen starting values are not far from the actual values. Accordingly, in BC3, the specific growth rates and its maximum values with respect to glucose are decreasing after the inoculation and shortly thereafter they increase again. This indicates that the chosen starting values are lower than the actual values, nevertheless the UKF algorithm converges them to reasonable values.
The high sensitivity of the estimated values due to the measurement noise variance and the process noise variance can be observed by comparing the estimated growth rates from BC1 to BC3. In BC2 and BC3, the measured and estimated ethanol concentrations shows less fluctuation compared to the ones from BC2, therefore the estimated values for the specific growth rates show less fluctuating compared to the ones from BC1.
In Figure 6 the estimation error variances of the process variables and the maximal specific growth rate with respect to glucose and ethanol are presented. The initial values of the variances are all set to zero. All values seem to be very much reproducible with respect to all three cultivation runs. During the glucose phase the F I G U R E 5 Estimated maximum specific growth rates with respect to glucose μ max,G ( ) and ethanol μ max,E ( ) as well as the specific growth rates μ G ( ) and μ E ( ) for glucose and ethanol respectively F I G U R E 6 Estimated error variance of the maximal specific growth rates (μ max,G ( ) and μ max,E ( )) and estimated error variance of the process variables (biomass ( ), glucose ( ) and ethanol ( )) estimation error variance values of glucose are much higher compared to the ones of biomass and ethanol. The values are roughly one order of magnitude higher. The reason is, that the ration of the change of ethanol and glucose with time during the glucose phase is the ration of the yield coefficients Y GX /Y GE whose value is 0.37 g g −1 . Therefore, the change in glucose is much higher (2.7 times) which causes a larger error and a larger variance. If one considers the variance of glucose as 0.1 g 2 L −2 and ethanol as 0.01 g 2 L −2 then the square root is 0.316 g/L and 0.1 g/L respectively. If one calculate the ration of error of ethanol by the error of glucose 0.1/0.316 then almost the same values is obtained as the ration of the yield coefficients. During the ethanol phase the estimation error variance of glucose become small, because no increase in ethanol can be detected and therefore no glucose is present. The variance of biomass and ethanol are stable throughout the cultivation. The variances of the of maximal growth rate on glucose are increasing fast to a constant value during the glucose phase and are increasing constantly during ethanol phase. The corresponding values with respect to ethanol behave in the same manner but inverted. If no measurement information of growth on the substrate is present, the estimation error variance is just increasing constantly, however the variance decreased clearly when growth occurred on that substrate. The constant values as well as the slope during increasing are the same.

CONCLUDING REMARKS
The design of monitoring and control algorithms to improve the performance of bioprocesses is of major importance. However, it is often difficult to find inexpensive and robust commercially available sensors that allow real-time monitoring of important process variables, such as the biomass and substrate concentrations as well as the growth rates. Therefore, the development of software sensors is of paramount importance. In this work, a dynamic nonlinear model was used and an unscented Kalman filter algorithm was implemented for parameter and state estimation during S. cerevisiae batch cultivaton. The proposed UKF algorithm only requires on-line data from infrequent ethanol measurements together with the yield coefficients of the process model.
Three batch cultivations with different initial conditions were conducted in order to analyse the behaviour and performance of the UKF. The result obtained showed that with the proposed UKF algorithms, it was possible to estimate the specific growth rates as well as continuous ethanol, glucose and biomass concentrations with great accuracy, during the cultivation process. In order to check the quality margin for estimation with respect to presented noise in the measured on-line ethanol values, a simulation was preformed; as a conclusion, the UKF algorithm is still able to predict the parameters and state variables, if the noise is about less than 10%.
The proposed UKF algorithm can be used for comprehensive monitoring of the baker's yeast batch fermentation process as well as for design and implementation of control strategies for the fed-batch fermentation process of the baker's yeast.

A C K N O W L E D G M E N T S
Open access funding enabled and organized by Projekt DEAL.

C O N F L I C T O F I N T E R E S T
The authors have declared no conflict of interest.

D ATA AVA I L A B I L I T Y S TAT E M E N T
The data used to support the findings of this study are available from the corresponding author upon reasonable request.