Error propagation in fed-batch
One of the goals of this contribution is the evaluation of the extractability of information by quantitative analysis of typical data from a bioprocess; hence error propagation from raw data has to be analyzed. Table 1 shows typical measurement errors (according to suppliers' specification) for on-line devices and also for biomass quantification. The latter is typically much higher than all other items. For off-line biomass quantification this error can be reduced by replicates according to the equation for the standard error of the arithmetic mean (Eq. 20). For example using four replicates the expected relative error is reduced from 4% to 2%. Obviously more replicates come with diminishing effects and also time consuming extra work. Typically probes for in-line quantification of biomass come with similar or even higher relative errors.
Table 1. Methods, Standard Deviations, Relative Error (Biomass) and Error Types Typically for Methods/Devices Typically used for Quantitative Evaluation of Fed-Batches| Device/Method | Relative Error | Type of Error | Range | Unit |
|---|
| Feed balance | 1 | Absolute error | 0–35.000 | (g) |
| Base balance | 1 | Absolute error | 0–35.000 | (g) |
| Reactor balance | 1 | Absolute error | 0–35.000 | (g) |
| O2 off-gas analysis paramagnetic | 0.02 | Relative error | 0–26 | (%) |
| CO2 off-gas analysis infrared | 0.01 | 0.1% absolute error on full scale 0–10% | 0–10 | (%) |
| MFC_Air thermal mass flow meter | 0.035 | Relative error | 0–40 | (l/min) |
| Biomass quantification e.g.: dryweight, capacitance | 2% (dry weight, for s = 4 %) and 4 replicates according to Equation 20), or 8% (capacitance) | Relative error | >0.1 | (g/l) |
Using finite difference approximation according to section “Calculation of rates by finite difference approximation,” it is typically recommended to choose Δt as small as possible; however, error propagation e.g., from biomass measurements is highly unfavorable, so smaller Δt (further on also called averaging window) leads to more noise on the calculated rate (see Figure 1). Furthermore, the specific growth rate directly increases the signal to be evaluated (Δi), since most of the other rates are directly proportional to it. In a previous contribution29 it was shown that, summing up, SNR is dependent on the following factors: the biological activity, the averaging window (or temporal resolution), and the measurement error. With a greater signal and lower measurement error, higher time resolution can be achieved with sufficient signal quality.29 Connecting two samples for biomass in Figure 1 by a line, is in fact the graphical representation for the calculation of the biomass conversion rate by finite difference approximation according to section “Calculation of rates by finite difference approximation.” Random error is considered as presented by the error bars. Looking at Figure 1 it is pretty obvious that the resulting rate is much more governed by random error (here = 2% relative error on each sample) if Δt is small (solid line, 0.1 h on the x-axis) compared to larger Δt (dotted line, 2 h on the x-axis), since the connecting lines (= graphical representation for the calculation of the biomass conversion rate) differ much more in the first case due to random error. This is even though the actual rate is constant over the whole range, since a linear growth function was used to generate the data points. Other growth functions such as exponential growth lead to similar results (not shown). Filtering techniques, which can be used to smooth rates, typically also come at the cost of temporal resolution (e.g: moving average filter), or require prior knowledge (e.g: a process model).
While this error propagation is easily understood for the example discussed above (also see a previous publication29), things get more complex if dynamic variations due to cell metabolism are added, e.g., due to feed profile. In fact, we want to differentiate those variations from random noise. Figure 2A shows in silico generated data from a typical microbial fed-batch, which are required to calculate specific growth rates: the biomass concentration, the reactor broth weight and the weight of feed over time. Noise according to Table 1 was artificially added. A variation in the specific growth rates from μ = 0.05 h−1 to μ = 0.1 h−1 at process time = 8 h and back to μ = 0.05 h−1 at process time = 16 h was simulated, which is barely noticeable in the raw data (Figure 2A). Figure 2B shows specific rates calculated from the raw data in Figure 2A with a Δt of 3 h according to Eq. 19. Figure 2C shows specific rates calculated from the same raw data, but with a Δt of 1 h according to Eq. 19. A relative error for biomass quantification of 1.5% with a Δt of 1 h leads to variations of up to about as large as the signal (the specific growth rate) itself, as seen in Figure 2C, which makes visual interpretation of this plot very difficult. In Figure 2B visual interpretation is much easier, due to the Δt of 3 h according to Eq. 19. The SNR can be used to evaluate the quality of the calculated specific growth rate on a quantitative basis. Since, the noise on the signal is known and constant for a defined time window in this artificially generated example, calculation of standard deviation and arithmetic mean to get the SNR according to Eq. 21 is straight forward. A signal to noise ratio of 3 (= signal is 3 times than the residual standard deviation) is defined as the limit of detection while a ratio of 12 is the limit of quantification.36 With a SNR of 12 a 100% variation of the signal can be reliably detected, to quantify a smaller variation the SNR should be even higher (e.g., a SNR of 120 for 10%). Accordingly, it is hardly possible to extract useful information in Figure 2C, since the signal to noise ratio is barely good enough to detect a change (SNR = 2.5 or 5). The window should be increased to 3 h or higher, to get a specific growth rate with a SNR higher than three or preferably >12 (Figure 2B), in order to be able to distinguish between random noise and real physiological variability based on previously established definitions for limit of detection and quantification. Obviously, the SNR increases linearly with the specific growth rate (μ), since μ is in the top of the fraction in Eq. 21. Figure 2D shows the specific uptake rate (qs) for this data set. Since, qs was calculated using data from the feed balance, which comes with a much lower measurement error as compared to the biomass measurement, the resulting rate is less noisy. If the sampling strategy and the process setup are optimized to meet signal quality requirements for the most noisy process variable (here the biomass concentration), all other process variables will follow the requirements as well.
The dependency of SNR on the specific growth rate h−1, averaging window h (Δt as used in Eq. 19) and biomass error % on a broader range is shown in a contour plot in Figure 3. The plot was generated by setting up a multi-linear regression model (Software: Modde, Umetrics, Sweden). Noisy (biomass) data result in a low SNR, especially at low growth rates (e.g., μ = 0.03 h−1). This can be alleviated by either using a higher averaging window (Δt as used in Eq. 19) at the cost of time resolution or more replicates for the biomass quantification. However, this is not always applicable e.g., with real time measurement by a capacitance probe, since additional measurements by increasing the sampling frequency are no real replicates.
The model can be condensed in one coefficient, by putting the positive effects (specific growth rate and averaging window Δt as used in Eq. 19) in the top and the negative effects (biomass error) in the bottom of the fraction (Eq. 24). This also represents the signal in relation to the error in analogy to general equation for SNR (Eq. 21). As shown in Figure 4 the model has a quadratic effect for higher SNRs, but can be approximated linearly at lower SNRs (Eq. 24). The quadratic effect is due to the finite difference approximation according to Eq. 19; too high values for h are counterproductive.
Rule of thumb equation for SNR
(24)
Noise reduction using little prior knowledge: reconciliation
Higher averaging windows (Δt as used in Eq. 19) can only deal with random noise; systematic errors cannot be reduced this way. A procedure according to a previous publication23 can be used to reconcile rates to remove random error and even more importantly also small systematic errors such as slight miscalibration of equipment, instrument drifts, and even minorly aberrant constants (e.g., feed concentration). The basic idea is to adjust the rates to fit constraints (elemental balances) according the expected error (e.g., according to manufacturer specifications or method replicate error) on each rate. This error has to be specified in the variance-covariance matrix ψ. As long the constraints were based on correct assumptions (e.g., stoichiometric equation) and the experimental errors do not exceed the errors specified in ψ, random and also systematic error can be effectively removed by reconciliation.23 However, the specified errors have to reasonably substantiated (e.g., according to manufacturer specifications or method replicate error), else the reconciliation procedure may result in artifacts. Furthermore, the χ2 distribution (used for the definition of the threshold value for the h-value, see section “Consistency check”) is for normally distributed values. Systematic error does not necessarily follow a normal distribution (99% of the observed values are distributed within three standard deviations) and may be constant. Hence, the threshold for the h-value according to the χ2 distribution might be too forgiving if a major fraction of the residuals is due to systematic error. This should be considered if the error structure on the measurement is known.
The biomass measurement is typically more prone to error as compared to other data, it can be expected that most of the noise is on this rate. A good estimate for the expected error is the reciprocal of the SNR, which can be simply calculated using Eq. 24 (which was inferred from Eq. 21). The second highest noise is on the rate for oxygen uptake, which is prone to systematic error; e.g., dilution by water in the off-gas, which can also vary during the process. The error on the other rates is mainly systematic as well (miscalibration, sensor-drift, measurement error on constants such as feed concentration etc.), since the random measurement error propagated by on-line devices (see Table 1) is typically negligibly small (<10−4 %). Assumptions for errors on necessary items for the calculation of rates are shown in Table 2 and based on that recommendation for ψ are given. Here, most of the systematic error is due to constants acquired by measurement (e.g., feed concentration, density, water dilution etc.); hence, it is safe to assume most of the systematic error is normally distributed. Figure 5A shows the biomass production rate (rX) for a typical mircobial fed-batch, which was reconciled according to section “Data reconciliation” using the errors specified in the variance-covariance matrix ψ from Table 2. The other rates (rS, rCO2, and rO2) were reconciled as well but as explained above most of the error is in the biomass rate. The h-value is a statistical test (threshold = 4.61, can be read from the χ2 distribution, dF = 2, α = 0.9), which states whether the residuals on the balance are within the expected range according to ψ. If the threshold is exceeded, the error is higher than previously specified. The reconciliation result might be still useful; however, the procedure fitted higher error to the elemental balances than previously expected. This can be also due to a wrong assumption of the growth stoichiometry, e.g., unaccounted formation of metabolites. As can be seen in Figure 5B the SNR (according to Eq. 21) is increased from 6 to 100 by reconciliation only. This means, due to the removal of measurement noise, a transient change in rates and yields almost 15 times smaller can be reliably detected. Or, if the physiological variation is expected to be very dynamic, the temporal resolution could be increased by a factor of 15, to detect short time variations enabling process control for example.
Table 2. Assumptions for Errors on Necessary Items for the Calculation of Rates and Recommendation for the Variance-Covariance Matrix ψ| Rate | Influencing Factors | Relative Error on Factor (%) | Effect to the Rate (%) | ψ |
|---|
| rx | Biomass quantification error | e.g.: 2 | 1/SNR × 100 | 1/SNR + 0.01 |
| rx | DoR Biomass | e.g.: 1 |
| rs | Feed concentration | e.g.: 1 | 0.03 |
| rs | Feed density | e.g.: 1 |
| rCO2 | Miscalibration/sensor drift plus random error | 0.01 |
| rO2 | Miscalibration/sensor drift plus random error | 0.06 |
| rO2 | yo2_wet | 0.2 | up to 6 |
Verification with data from a real fed-batch
The approach was verified using real data from an E. coli fed batch. Following batch phase (data not shown) an exponential fed-batch with a μ_set of 0.15 h−1 was initiated, as shown in Figure 6A (process time 13 h). This was followed by a linear feeding phase with a μ_initial = 0.1 h−1 at process time 22 h. Because of the linear feedrate and the further increasing biomass, the specific growth rate decreased over time. The sampling interval was chosen according to Eq. 24. With a measurement error for biomass of 2% (Table 1) and an initial growth rate of 0.1 h−1 a Δt (as used in Eq. 19) of 4 h is required to get a signal to noise ratio >12 (limit of quantification). This way a reasonable maximum sampling frequency was determined, since additional data points do not contribute as replicates, hence cannot reduce random noise.29 Furthermore, the presented approach was also applicable to signals from a biomass probe in the same experiment, a capacitance sensor with a very high sampling frequency compared to off-line biomass quantification (section “In-line capacitance analysis”). There was clearly a lot of random noise on the signal of the probe as can be seen in Figure 6B in addition to potential systematic error by measurement principle. The capacitance signal is dependent on electrical properties of the cells and can be related to intact bio volume or also to biomass dry weight. Linear regression analysis came up with a relative standard error of 8%, which results in a SNR of 3 with a Δt of 4 h (Figure 6C) or an SNR of 12 with a Δt 15 h as used in Eq. 19 (Figure 6D). While it is hardly possible to distinguish between the exponential phase and the linear phase in Figure 6C, this is impossible in Figure 6D. This clearly shows the limits of noise reduction by using bigger Δt as used in Eq. 19. If the temporal resolution (15 h, which is in fact half of the fed-batch) is too poor, one might miss important process events. Furthermore, using a large Δt such as 15 h as used in Eq. 19, the approximation error from finite difference approximation can have a significant impact on the calculated growth specific rate. To evaluate the impact of this approximation error, prior knowledge in the form of the function for exponential growth function (Eq. 25) was used instead of Eq. 11 together with Eq. 19, which is possible, since it safe to assume growth is exponential in the exponential phase.
Calculation of μ the capacitance signal (i at time points t1 and t2) by exponential growth function
(25)
With a Δt of 4 h (Figure 6C, μ cap exponential and finite difference approx.) there is hardly any difference between the specific growth μ calculated from the capacitance signal by finite approximation and the exponential growth function respectively, but with a Δt 15 h there is major deviation in the growth rates for exponential phase, as shown in Figure 6D (μ cap exponential and finite difference approx.). The specific growth rate is artificially lowered by finite approximation. Summing up, a large Δt of 15 h is probably not useful.
Reconciliation of fed-batch data
As discussed above a Δt of 15 h is probably not useful, while a Δt of 4 h results in a specific growth rate with an SNR of only 3, which is not satisfactory since this way, variations in the growth rate can only be detected but not quantified (following the definitions for limit of detection and quantification). Hence, we want to introduce prior knowledge in the form of elemental balances and reconcile the data (section “Data reconciliation”). Using this approach, the temporal resolution can be increased due to the effective removal of measurement error. Using the Capacitance data from section “Verification with data from a real fed-batch”, a Δt of 1 h, as used in Eq. 19 results in a SNR of 0.84 according to Eq. 21, which means the random noise on the signal is greater than the signal itself. Hence, the specific growth rate in Figure 7(B) is more scattered (Δt = 1 h), compared to Figure 6 (C, Δt = 4 h). This clearly shows limited use of noisy signals such as the capacitance signal to calculate rates with a high temporal resolution. Nevertheless this high level of noise can be effectively removed by reconciliation (Figure 7A) as long the h-value is below the threshold value (4.61), which is true for most of the process. At process time 17 h there was a small problem with the off-gas analyzer (data not shown), while at process time 20 h the manipulation of the feed-rate controller disturbed the input rates for the reconciliation procedure, hence increased residuals on the elemental balances, which resulted in h-values above the threshold value (4.61). Figure 7(B) shows a comparison of specific growth rate calculated from the raw capacitance signal using a Δt of 1 h and the specific growth rate after the reconciliation procedure. The reconciliation procedure was able to retrieve the μ profile from the rate calculated based on the capacitance signal (which was very scattered due to low Δt); however, the capacitance signal did contribute very little to the result. Nevertheless, the reconciliation procedure allows making use of higher measurement frequencies, since less averaging time (Δt as used in Eq. 19) is required to deal with noise.