Macroscopic modeling of bioreactors for recombinant protein producing Pichia pastoris in defined medium

The methylotrophic yeast Pichia pastoris is widely used as a microbial host for recombinant protein production. Bioreactor models for P. pastoris can inform understanding of cellular metabolism and can be used to optimize bioreactor operation. This article constructs an extensive macroscopic bioreactor model for P. pastoris which describes substrates, biomass, total protein, other medium components, and off‐gas components. Species and elemental balances are introduced to describe uptake and evolution rates for medium components and off‐gas components. Additionally, a pH model is constructed using an overall charge balance, acid/base equilibria, and activity coefficients to describe production of recombinant protein and precipitation of medium components. The extent of run‐to‐run variability is modeled by distributions of a subset of the model parameters, which are estimated using the maximum likelihood method. Model prediction from the extensive macroscopic bioreactor model well describes experimental data with different operating conditions. The probability distributions of the model predictions quantified from the parameter distribution are quantifiably consistent with the run‐to‐run variability observed in the experimental data. The uncertainty description in this macroscopic bioreactor model identifies the model parameters that have large variability and provides guidance as to which aspects of cellular metabolism should be the focus of additional experimental studies. The model for medium components with pH and precipitation can be used for improving chemically defined medium by minimizing the amount of components needed while meeting cellular requirements.


| INTRODUCTION
The methylotrophic yeast Pichia pastoris is widely used as a microbial host for recombinant protein production. This system has the advantages of (i) tight gene regulation and high product titers with methanol-induced alcohol oxidase (AOX1) promoter, (ii) growth on simple media at high cell density, (iii) eukaryotic posttranslational modification such as glycosylation and disulfide bond formation, and (iv) simplified downstream purification with foreign protein secretion and low amount of naturally secreted proteins (Cereghino & Cregg, 2000).
Bioreactor modeling can inform understanding of cellular metabolism-including aspects of growth kinetics, productivity, media consumption, and metabolite secretion-and can be used to optimize bioreactor operations. Models for bioreactors are developed with different levels of detail and complexity depending on their purpose. With simple mathematical formulations and low computational costs, macroscopic models can produce useful predictions of bioreactor operations while being suitable for parameter estimation and real-time model predictive control (Craven et al., 2014;Ramaswamy et al., 2005).
Although P. pastoris has been used to produce numerous recombinant proteins, limited macroscopic models have been published, with their main focus being on the substrate consumption, growth, and production (Barrigon et al., 2015;Çelik et al., 2009;Jahic et al., 2002;Ren et al., 2003). More comprehensive macroscopic models are necessary for P. pastoris to develop further understanding of cellular metabolism and optimization of bioreactor operations. In this study, an extensive macroscopic model for a bioreactor was constructed for substrates (glycerol and methanol), biomass, total protein, important medium components, and off-gas components (oxygen and carbon dioxide). The modeling of the medium components is facilitated by using a chemically defined medium (Matthews et al., 2018), which also reduces run-to-run variability and simplifies purification in the downstream processes. Species and elemental balances are introduced to describe uptake and evolution rates for medium components and off-gas components. Additionally, a pH model was constructed using an overall charge balance, acid/base equilibria, and activity coefficients to describe the dependence of recombinant protein production on pH, precipitation of medium components, and carbon dioxide to carbonate species reactions.
For a set of bioreactors operated in fed-batch with P. pastoris, variations among individual runs are observed, especially with respect to volumetric productivity for an exemplar recombinant subunit vaccine component for rotavirus. These variations are observed even with highly similar target operating conditions with controlled temperature, pH, and dissolved oxygen and reproducible substrate consumption and growth rates (Plantz et al., 2006;Surribas et al., 2006;Velez-Suberbie et al., 2020). These observations suggest that the experimental data for the bioreactors are not describable by a mathematical model with a single set of parameters. An additional consideration is that any macroscopic model does not describe all of the complex biology that occurs in the bioreactor; such models effectively lump multiple biological pathways into some of the model parameters. In this study, the extent of run-to-run variability is modeled by using distributions for a subset of the model parameters, in which each run is associated with a single set of model parameters.  (Velez-Suberbie et al., 2020). Fermenter operating conditions were 25°C, 25% dissolved oxygen tension (DOT), and pH 4.00 or 6.50 ± 0.15 (Table 1) controlled with 10% (v/v) ammonium hydroxide. In some fermentation, a pH pulse was implemented at the end of methanol adaptation (~30 h), which reduced pH to 3.00 ± 0.15, held low for~4 h, and then ramped pH back up to original set point.
Harvest was performed 3 h after the end of the pH pulse.

| Analytical methods
Biomass and protein concentration were measured as previously reported (Velez-Suberbie et al., 2020). Metabolite concentration in fermentation supernatant were determined off-line using (1) where y is the concentration in the reactor, V L is the liquid medium volume, F is the media flow rate, the subscripts in and out referring to inlet and outlet streams, δ is the separation factor at the outlet, and r y is the volumetric reaction rate (Enfors, 2011).
The developed macroscopic bioreactor model is unstructured and nonsegregated, that is, the cell population is treated as single compartment and all cells are considered to be identical. For fed-batch operation, differential balances for the concentrations of biomass X , substrate S (subscript g refers to glycerol and m refers to methanol), total protein P (including recombinant protein and host cell proteins), intracellular carbon C X , and intracellular nitrogen N X are where μ is the specific growth rate, q S is the specific substrate consumption rate, q P is the specific total protein production rate, q C is the specific carbon dioxide evolution rate, q N X is the specific nitrogen uptake rate, C S is the concentration of carbon in the substrate, C P is the concentration of carbon in the protein, C C is the concentration of carbon in the carbon dioxide, and N P is the concentration of nitrogen in the protein.

| Cellular metabolism
The specific growth rate is modeled to be proportional to the specific substrate consumption rate, which is most commonly described with the Monod model (Monod, 1949), where q m is the maintenance coefficient, Y em is the biomass yield coefficient for exclusive maintenance, q S, max is the maximum specific consumption rate, and K S is the saturation constant.
The specific total protein production rate is the sum of the specific recombinant protein production rate and the specific host cell protein production rate, which are both related to the specific growth rate. The pH of the medium also affects the total protein concentration. Low pH conditions are often applied to reduce degradation of protein by inactivating proteases. However, low extracellular pH also affects the organization of the cell wall (Kapteyn et al., 2001), which may have impact on the protein secretion. For modeling this effect of the pH, it is assumed that there is an enzyme responsible for the secretion of the proteins and it is only active in its where K a E , is the acid dissociation constant for the enzyme. Then the active portion of total enzymes is included in the specific total production rate, where α P is the growth-associated proportional coefficient.
Elemental composition of P. pastoris has been reported from several past studies. Carnicer et al. (2009)  Therefore, concentrations of intracellular carbon and nitrogen were included as the states in the model and modeled based on the mass balance of biomass. The specific carbon uptake rate for anabolism and specific nitrogen uptake rate are modeled to be proportional to the specific growth rate, where α C and α N are the growth-associated proportional coefficients.
The concentration of carbon and nitrogen in the protein were calculated from the amino acid sequences of the host cell proteins of P. pastoris and the recombinant proteins.

| Medium components
Molar concentration of RDM components M include total phosphate species P T , total sulfate species S T , total ammonium species N T , total glutamine species Q T , total arginine species R T , po- where q M is the specific uptake rate and r P M , is the precipitation (and dissolution) rate for the related species.
Ammonium, glutamine, and arginine species are nitrogen sources that can be utilized simultaneously by P. pastoris (Matthews et al., 2018). Nitrogen regulation in P. pastoris is simplified as where n M is the number of nitrogen atoms. For the components without nitrogen, the specific uptake rate is modeled assuming constant elemental composition, where M X is the concentration of intracellular element (phosphorous for phosphate and sulfur for sulfate). This assumption is consistent with constant sulfur and ash composition under different conditions (Carnicer et al., 2009).

| pH and precipitation
Precipitation caused by medium components is one of several common problems in cell culture media. Here a pH model with precipitation is developed with the medium components. Acid and base species in the medium are described as the family of sets of the acid and base species A, the subscripts i refer to each acid and its conjugate base(s), n a is the number of potential protons to donate, and z a is the charge when all potential protons are donated. Remaining species in the medium are described as where S is the set of remaining species. The pH model is based on the total charge balance and acid/base equilibria: where K a is the acid dissociation constant and γ z is the activity coefficient for z-charged ions. Activity is applied instead of concentration in the acid/ base equilibria due to the high ionic strength I in the fermentation conditions, and is modeled by the Davis equation (Davies, 1962): where the parameter A depends on temperature T and dielectric constant ϵ as = × ϵ − ∕ A T 1.82 10 ( ) 6 3 2 and ≈ A 0.51 for water at 25°C.
The precipitate species are described as where P is the set of the precipitate species, K sp is the solubility product constant, n p is the number of different ion species forming the precipitate, and s is the stoichiometric coefficient. Then the concentration of precipitates and the precipitation (and dissolution) rate can be described as where Q sp is the reaction quotient and k is the reaction rate constant.

| Gas phase
The remaining reactor volume comprises a volume of the homogeneously dispersed bubbles and volume of the headspace gas above the liquid medium. These two gas phases can be modeled as stirred tanks connected in series, but combining the gas together as a sole gas phase is a reasonable approximation. Gas transfer between headspace gas and medium is negligible compared to gas transfer between bubbles and medium because of order-of-magnitude smaller surface area, resulting in negligible difference between headspace gas and bubbles (de Jonge et al., 2014). Then for the gas phase, differential balances for mole fraction of oxygen O and carbon where x is the mole fraction, the subscripts i refer to the components, Q is the gas flow rate, V G is the total gas volume, p is the pressure, T is the temperature, and i is the gas transfer rate.
Oxygen transfer is especially important for system with P.
pastoris because of the high oxygen uptake rate at high densities of cells. The bioreactor is designed to diffuse oxygen from sparged bubbles into the bulk liquid. The oxygen transfer rate (OTR) can be calculated from the differential balance for the concentration of dissolved oxygen (DO) and a scaling analysis that shows that terms with DO can be neglected due to the low solubility of oxygen in water: ≈ q X OTR O , where q O is the specific oxygen uptake rate. Carbon dioxide produced from the cell metabolism has a net flux in the opposite direction of oxygen, from the bulk liquid to sparged bubbles.
The carbon dioxide transfer rate (CTR) can be calculated from the differential balance for the concentration of dissolved carbon dioxide (DC): where r C is the reaction rate from dissolved carbon dioxide to carbonate species described by the chemical reactions: The approximation that was used for oxygen is not applicable for carbon dioxide because of its high solubility in water. DC will be generally higher than the saturated concentration DC * since the concentration gradient is needed for carbon dioxide transfer. Considering that CTR and OTR are usually in the same order of magnitude (Royce & Thornhill, 1991), where k a L is the volumetric mass transfer coefficient. Because the values of k a L O and k a L C are of the same order of magnitude (Royce & Thornhill, 1991), scaling analysis similar to what was applied to the oxygen shows that DC is close to its saturated value: where H is the Henry's law constant. Then For oxygen uptake rate, Jahic et al. (2002) where ∕ Y O S, an is the oxygen yield coefficient for anabolism, and ∕ Y O S, en is the oxygen yield coefficient for energy metabolism. The metabolic fluxes of glycerol and methanol are simplified in the reaction networks (see Figure 1), which also specify the oxygen yield coefficients.

| Maximum likelihood estimation and sensitivity analysis
Most of the parameters in the bioreactor model are taken from the literature; this section describes the procedure for the estimation of the model parameters that were fit to experimental data. Maximum likelihood (ML) estimation is commonly applied in literature assuming that there exists constant parameter set θ for all experiments (Ashyraliyev et al., 2009). Then the model equation for the system can be determined by ,~(0, ), 1, , , where y is the vector of measured states, x is the vector of states, u is the vector of inputs, ϵ m is the vector of measurement error which is a white noise vector with covariance matrix V m , the subscript i refers to each experimental data set, and d is number of experimental data sets. Then the maximum likelihood estimate is given as In this study, however, the parameters were estimated assuming that there exists a distribution of parameter sets between different experiments: where θ i is constant parameter set for each of the experiment. By linearization of the model, the covariance matrix of the measured states can be expressed as and then the maximum likelihood estimation can be formulated as Sensitivities of the states with respect to the parameters are computed via sensitivity equations: 4 | RESULTS AND DISCUSSION

| pH and precipitation
Precipitation caused by medium components is a common problem in cell culture media and was observed in bioreactor runs when base ammonium hydroxide was added for pH adjustment before the inoculation. Concentration measurements of media components before and after pH adjustment indicated that the ammonium and phosphate ions are included in the precipitates (data not shown). This result suggests magnesium ammonium phosphate (MAP,MgNH PO 4 4 ) as the candidate precipitates. For the pH model with precipitation, pH of the rich defined medium was measured as base 10% (v/v) ammonium hydroxide was added ( Figure 2). Model prediction of pH was shifted to match initial pH data to deal with run-to-run variability of measured initial pH. The acid dissociation constants were taken from the literature (Kern, 1960;Lide, 2004 (Aage et al., 1997;Ohlinger et al., 1998;Snoeyink & Jenkins, 1980). The differences may be due to the different activity models and considered chemical species. The pH measurement data of rich defined medium at different base concentrations were well described by the pH model with precipitation, with a root-mean-squared error (RMSE) of 0.042 which is within the accuracy of the pH measurement (Rosemount Analytical, 2010). It was also observed in the experiment that the precipitation occurred when the volume of base added was 0.3% to 0.4% of volume of the rich defined medium, which also agreed with the model prediction. Base ammonium hydroxide used for pH control in the experiments can have a different concentration than was intended. Lab solutions that are purchased in a concentrated form usually report lower bounds of concentration in the label (e.g., ≥25%). Furthermore, ammonia gas liberates over time, which decreases the concentration. To take this phenomenon into account, concentration of base ammonium hydroxide was corrected based on the pH measurements before inoculation (Figure 2). The concentration of the base used for pH control was initially higher than the intended concentration but the concentration decreased over time as ammonia gas evaporated. The estimated reaction rate constant for precipitation was k = 1.3 × 10 11 M −2 s −1 .

| Cellular metabolism
Measurements of biomass and total protein concentration were used to validate substrate consumption, growth, and production in the macroscopic bioreactor model. Bioreactor model predictions are compared with experimental data for the biomass and total protein concentration in Figure 3. Parameters required for the growth and substrate consumption were taken from the literature (Jahic et al., 2002). The biomass concentration data with different operating conditions (pH) and different strains (products) were well described by the model with the reported parameters, with an RMSE of 3.1 g/L, which is within 5% to 10% of the measured biomass.
The parameters for the total protein production estimated with maximum likelihood using pH values from the measurements were . The total protein production during the glycerol period was consistent with most of the proteins produced being host cell proteins, with a RMSE of 49 mg/L, which is within 5% to 8% of the measured total protein production. Run-to-run variability of total protein production occurred mainly during the methanol period, which is associated with recombinant protein production (RMSE of 250 mg/L) and may be due to unmeasured cell properties such as viability and copy number variation.
The pH dependence of the total protein production is evident from total protein concentration for different pH conditions during the glycerol period and the trend of total protein concentration after

| Medium components
The metabolite concentrations in fermentation supernatant were measured in bioreactor runs to validate the modeling of media consumption. The bioreactor model predictions are compared with experimental measurements of the concentrations of ammonium, phosphate, and potassium ions in Figure 4. For ammonium and phosphate, data with low pH conditions were used because precipitation occurs at higher pH conditions. Elemental compositions of biomass were estimated using the maximum likelihood method, and estimated values reasonably agreed with values from the literature (Carnicer et al., 2009;Tavasoli et al., 2017;van Eunen et al., 2010) ( Table 2). The parameter for specific nitrogen uptake rate was also .
The potassium and phosphate data were well described by the model (RMSE of 6.5 and 4.0 mM, respectively), which demonstrates that constant elemental composition was a reasonable assumption.
Ammonium data have more run-to-run variability compared to other medium components (RMSE of 13 mM). Run-to-run variability of ammonium consumption occurred mainly during the methanol period while the concentration of intracellular nitrogen is increasing due to recombinant protein production.

| Gas phase
Lastly, off-gas data were used to validate the modeling of oxygen uptake and carbon dioxide evolution. Figure 5 compares bioreactor model predictions with experimental measurements of off-gas compositions of oxygen and carbon dioxide. The Henry's law constant and carbonate species reaction constants were taken from the literature (Kern, 1960;Sander, 2015). Inlet air and oxygen were dry ) and the mole fraction of water vapor in the reactor was assumed to be = x 0.01 W based on the steady-state off-gas oxygen concentration before inoculation. Carbon composition of biomass during glycerol period was estimated using maximum likelihood and the estimated values reasonably agreed with values from the literature (Carnicer et al., 2009;Tavasoli et al., 2017) (

| Parameter distributions and sensitivity analysis
The developed macroscopic bioreactor model with pH and precipitation was implemented and compared with the entire experimental data set of bioreactor runs ( Figure 6). ) in the medium was increased from 12 to 15 g/L, the PO 4 concentration would shift up to the dashed line in Figure 6, and the model predicts that phosphate depletion would be delayed to ± 71.2 3.2 hr.
Furthermore, the effects of the parameter distributions estimated by the maximum likelihood method on the predicted variables were quantified. Their probability distribution functions were constructed by Monte Carlo simulation with 10,000 random parameter sets sampled from the parameter distributions (Figure 7). The probability distribution functions capture the run-to-run variability in experimental data, especially for total protein and ammonium concentration.
Sensitivity analysis identifies key parameters for the states of interest in the macroscopic bioreactor model (Figure 8). The total protein concentration was sensitive to α P of each substrate during the period of its consumption, while being sensitive to K p a E , only when operating in low pH conditions. For medium components and off-gas components, key parameters are those related to elemental composition of biomass. Medium components are also sensitive to inlet acid and base concentrations because of their high concentration. This result indicates that the acid and base concentrations should be accurately identified for the model application.

| CONCLUSION
This article constructs an extensive macroscopic bioreactor model for P. pastoris producing recombinant proteins in defined medium.
The model describes substrates, biomass, total protein, other medium components, and off-gas components based on species and elemental balances. The model also describes pH based on overall charge balance, acid/base equilibria, and activity coefficients to describe total protein production and precipitation of medium components. The distributions of uncertain model parameters were estimated using the maximum likelihood method to model the extent of run-to-run variability.
Model predictions from the extensive macroscopic bioreactor model well describe experimental data with different operating conditions. The pH dependence of the total protein production suggests that the low pH condition for inactivating proteases may decrease the quantitative productivity of recombinant proteins. The model predicts that the specific total production rate drops from pH 6.5 by 21% to pH 4, 45% to pH 3.5, and 72% to pH 3. Elemental compositions of biomass are identified as the most sensitive parameters of the medium and off-gas components. Precipitation of medium components is also critical for accurate prediction of concentration of medium components and pH. The model of medium components with pH and precipitation can be used for optimizing a chemically defined medium, especially for extended operations in perfusion mode. The metabolic flux model well describes oxygen uptake rate and carbon dioxide evolution rate and provides an understanding of the state of cell metabolism from the off-gas measurement data.
The probability distribution of model predictions from the parameter distribution quantifies the run-to-run variability observed in the experimental data. The uncertainty description in extensive macroscopic bioreactor model identifies large variability of the model parameters related to the total protein production and nitrogen consumption during the Hong ET AL.