Bioprocess optimization using design-of-experiments methodology



This review surveys recent applications of design-of-experiments (DoE) methodology in the development of biotechnological processes. Methods such as factorial design, response surface methodology, and (DoE) provide powerful and efficient ways to optimize cultivations and other unit operations and procedures using a reduced number of experiments. The multitude of interdependent parameters involved within a unit operation or between units in a bioprocess sequence may be substantially refined and improved by the use of such methods. Other bioprocess-related applications include strain screening evaluation and cultivation media balancing. In view of the emerging regulatory demands on pharmaceutical manufacturing processes, exemplified by the process analytical technology (PAT) initiative of the United States Food and Drug Administration, the use of experimental design approaches to improve process development for safer and more reproducible production is becoming increasingly important. Here, these options are highlighted and discussed with a few selected examples from antibiotic fermentation, expanded bed optimization, virus vector transfection of insect cell cultivation, feed profile adaptation, embryonic stem cell expansion protocols, and mammalian cell harvesting.


Statistical experimental planning, factorial design, and design-of-experiments (DoE), are more or less synonymous concepts for investigating the mathematical relationships between input and output variables of a system. Although the fundamentals of the methodology have been known since the early 1900s,1–3 it was not until recent years that it was widely applied in biotechnology.

In its basic outline, the DoE methodology when applied to bioprocesses is uncomplicated (Figure 1). It investigates defined input factors to a converting biosystem from which mostly common and well-defined output factors or responses are generated, such as product yield and productivity. Applied in this way, DoE is similar to mass balancing. However, the strength of DoE is that it also reveals how interactions between the input factors influence the output responses. These interactions are often difficult to discover and interpret with other methods.

Figure 1.

(a) Principle of input factors and output responses of a bioprocess as a basis for DoE, (b) Stages in a typical bioprocess: bioreactor and recovery (downstream processing), (c) Bioprocesses where also the upstream stage input factors are considered adds further complexity to the experimental design models.

Biotechnology processes can be considered as a transformation of nutrients and other medium components to biomass and/or molecular products. The utilization of the carbon and nitrogen sources is relatively easily overviewed via total stochiometric balances. To this, a degree of complexity is added by shifts in metabolism caused by activating or deactivating medium components such as growth factors, vitamins, and microbial stressors. Figure 1a illustrates the situation where all inputs exert their effects as an initial load of multiplicity effects onto the transforming bioprocess in an apparently steady-state condition. However, in reality the steady-state is unstable, since the nutrient factors are degraded and depleted.

Everyone with experience in bioprocessing is well aware of actions and effects that are displaced over time in cultivation. Some factors are settled from the beginning of the process, while others are provided later, e.g. by feeding, for the purpose of regulation or setting of profiles (Figure 1b).

In most bioprocesses, products are recovered downstream. Additional factors, e.g. eluents, are sequentially supplied with the purpose of purifying the product. Here, too, input factors are distributed over time. The influence of upstream procedures follows the same pattern (Figure 1c). Thus, any attempt to encompass all factors in a bioprocess ends up in an incomprehensible number of interacting and noninteracting factors which may, or may not, affect the specific outputs.

Process analytical technology (PAT), as defined by the Food and Drug Administration,4 aims in particular at mobilizing mathematical modeling methods for enhancing the necessary deeper understanding of the manufacturing (bio)process in the pharmaceutical field, with the long-term purpose of adjusting manufacture online and eliminating delays in product release. Although not covered by FDAs definition, several other bioprocess applications in pharmaceuticals may benefit from advancing PAT. In fact, all biotechnology manufacture may benefit from DoE, whether the products are diagnostics, foodstuffs, commodities, or fine chemicals.

Previous reviews on statistical methods have covered the period leading up to the 1990s (see e.g. Refs. 5,6); this review focuses mainly on the period thereafter. As previously, most attention in published literature is on the development and optimization of cultivation and production media where the most effective medium components are screened for and attempts are made to optimize their levels. Sometimes, the media factors are combined with operational process parameters such as temperature, agitation, and aeration. New bioprocess-related technologies, for example differentiation of stem cells and expanded bed technique, are occasionally touched upon. However, few efforts have been made to combine factors displaced in time at different stages in the bioprocess sequence as illustrated in Figure 1. Here, the mainstream of bioprocess applications is covered; a few pertinent cases are discussed in more detail in order to illustrate the possibilities for continuing use and further refinements of the methodology.

Factorial Design and Response Surface Methodology

When planning a set of factorial experiments to evaluate important experimental variables in a bioprocess, or to perform an optimization of them in a biotechnical system, the DoE is a powerful methodology which avoids experimental biases and reduces the required number of experiments.7–10 In optimization, the DoE methodology is clearly preferable to methods which vary one variable at a time, since with a single-variable approach it is likely that the experimenter ends up at a quasi-optimum (Figure 2a).

Figure 2.

(a) The figure shows how a quasi-optimum is achieved by varying one variable at a time. When keeping variable X1 constant, five experiments varying variable X2 are performed. Then, starting from the optimum (the center point above), variable X1 is varied in another five experiments. A correct optimum is never reached as there is a dependency between variable X1 and X2. (b) By simultaneous variations of variable X1 and X2, and analyzing the result in an experimental design software, the direction of the true optimum can be found.

By performing factorial design, a reliable result can be achieved with only a few experiments, after which the most favorable direction to move in order to find a true optimum can be evaluated (Figure 2b). The graphs in Figure 2 illustrates how easy it is to be trapped in a quasi-optimum by varying one variable at a time even in the two-variable case when keeping variable X1 constant; five experiments are performed while varying variable X2. From the obtained optimum, variable X1 is then varied in another four experiments. A correct optimum is never reached since there is a dependency between the two variables. By simultaneously varying both X1 and X2, and analyzing the result with experimental design software, the direction of the true optimum can be found.

Screening of factors

As initial step in a DoE plan, the most efficient approach to start is with a screening of important variables of the studied bioprocess through a set of experiments proceeding from selected corners in the experimental space. This procedure is called fractional factorial design, since only a fraction of the possible values of the corners are investigated. An example of a screening with three variables is shown in Figure 3a. The three selected factor variables are screened at two levels of values, and each variable is tested twice at low and high level, as the figure depicts.

Figure 3.

(a) Experimental design performed as screening of important variables applied in a three-variable case. In addition, replicate experiments at the center point are recommended (not shown in the figure). (b) Two types of central composite designs: the central composite face centered design (CCF, left) and the central composite circumscribed design (CCC, right) in a three-variable case including triplicate experiments at the center point.

This step of the DoE plan is often easy to set up in a bioprocess system since the procedures for cultivation or separation have already established factor variables well defined, for example, concentration of components in the media and settings of state variables in the fermenter. However, the selection of variables and their levels is entirely up to the experimenter's judgment and pre-knowledge about the studied process. Inappropriate choices will limit the usefulness of the results and making it necessary to carry out new experiments with other variables and levels.

The reduced set of experiments can be described mathematically as 2nk, where n is the number of factors to be investigated at the low and high levels and k is the number of steps to reduce the experimental design. If, for example, five variables are involved in the experiment, we will end up with 16 (25−1) or 8 (25−2) experiments respectively, depending on by how many steps the design is reduced. The first of these designs is preferred; since linear terms are not confounded with 2-factor interactions (see Evaluation of the experimental design, below). However, in current practice such a procedure is seldom carried out systematically—instead a level of appropriate reduction is chosen. The screening will be further improved by replicates in the center point of the experimental domain, with the dual purpose of collecting additional response values while also determining the experimental error for the same response. The reduction of factor experiments decreases the statistical quality and should consequently be applied with caution. The effects become apparent in the goodness of prediction and goodness of fit performance parameters (see below).

The complete set of experiments should be performed in a random order, to avoid systematic error. Systematic error can occur when, for example, the three experiments at the center point of a design are performed one after another using the same experimental procedure. Randomization also blinds the experimenter to any expected good or bad results.

The typical result of a bioprocess screening experiment is that a subset of a few important factors is identified. These factors can be used in a new experimental design with the purpose of determining optimal factor values. This requires a central composite face centered design (CCF) or central composite circumscribed design (CCC). Figure 3b illustrates a three-variable case of the CCF and CCC designs. In the CCF case, additional values of the variables are included in the surface center points between the corners of the experimental space. In the CCC, these are displaced outside the space at the same distance from the center point as the distance from the center point to the corners. Theoretically, the CCC design is somewhat better than the CCF design since the CCC design covers a larger volume. Moreover, the CCC design has five levels of each factor and can therefore better capture strong curvature and even cubic response behavior.

When starting with three factor variables, a set of 14 new experiments was generated; adding to this the three experiments at the center point, 17 experiments are required. The reason for the selection of 14 experiments is the ambition to demarcate a space with the optimum for the three variables, and with the selected geometry of levels of the variables there is enough information to be able to identify optimal conditions within the space. The result is depicted in a contour plot or a response surface where the optimum is clearly visualized. This final step has named the whole procedure response surface methodology.

The response surface does not reveal if it is a local or global optimum. However, by carrying out the evaluation procedure with support of multiple simplex optimizers the risk to end up in a local optimum is significantly limited.

Usually, commercially available software is utilized when planning the experimental designs. Frequently used software packages for experimental design are Modde™ (Umetrics AB, Umeå, Sweden;, MiniTab™ (Minitab Inc., State Collage, PA), and Design-Expert™ (, all of which are convenient for applying DoE procedures. Most of the examples quoted in this review use one of these packages, although other software programs can also be used, or special algorithms can be built in, for example, Matlab or Excel.

Qualitative factor variables

It is important to remember that both quantitative and qualitative variables can be included in the factor set, as can quantitative and qualitative multilevel factors. For example, in a case with two qualitative factors, X1 (with four discrete levels) and X3 (with three discrete levels), and one quantitative factor, X2, a design can be generated as shown in Figure 4. Here, it is not possible to create any center points between the discrete levels of factor X1 and X3. However, a center point can be positioned at one of the discrete levels of the qualitative variables and on an intermediate level of the quantitative variable (when appearing together in a design as in Figure 4). A typical example from bioprocessing is when comparing two different impeller turbines at varying temperatures and oxygen transfer rates.

Figure 4.

A three-variable design including two qualitative variables at three and four discrete levels, respectively, as well as a third quantitative variable.

Uncontrolled variables, when recording a factor without controlling the process temperature of the experiment, for example, can also be used in the design in addition to the controlled factor variables, since these data can be included when evaluating the influence of the uncontrolled variable on the experimental results. This is typical for bioprocess experimental conditions, involving, for example, dissolved oxygen tension, and complex hydrolysate media, which may require special attention.

Design of media and mixtures

Of particular interest for bioprocesses is the optimization of media and mixture composition. A mixture/media design can easily be performed in DoE for culture media and galenic formulations, for example. It is convenient here to assign suitable intervals for the percentage variation of each variable, where 0–100% is then written as an interval between 0 and 1. In a three-variable case, the design will be a symmetric triangle, and in a four-variable case it will appear as a symmetric tetrahedron.

If one component in a medium is dominant to a total desired volume or amount (e.g. a solvent such as a buffer solution or albumin component of a culture medium), then the effect of this component can also be evaluated. Often this is referred to as the filler substance of which there is only one in the medium.

In a media/mixture design using this method, it is recommended for the reasons presented above, to start by performing a screening of the experimental domain if this is unknown, before applying more advanced designs for optimization.

Evaluation of the experimental design

When the experiments have been carried out according to the experimental design, and the results of response variables have been compiled, multiple linear regression (MLR) is used. The purpose is to evaluate whether the experimental space has been orthogonally designed and whether it remains so after the experiments have been performed.

In MLR, the mathematical model attempts to describe a relationship between one or more independent variables and a response variable, described as yi = β0 + β1xi1 + β2xi2 + ….+ βpxip + εi when i = 1,2, … n. In addition, more complex interaction terms describing co-variation of linear or quadratic terms can be included. However, the interaction terms or other complex terms must be assessed to see if they give acceptable contributions to the model in comparison to the uncertainty of the contribution (signal-to-noise ratio).

In the evaluation of the contribution of each coefficient and the subsequent optimization of the model, the aim is to reduce the model deviation factor εi as much as possible. When characterizing a response surface, the co-variation terms describe its skewness, while quadratic terms describe its curvature.

In a complete design, due to the number of degrees of freedom, it is likely that term describing interaction and curvature can be included in the MLR model. When evaluating fractional factorial designs by MLR, it is a common practice to support the linear terms with a limited number of complex terms. In a fractional factorial design, such terms are often confounded with each other.

When, for example, the design is not orthogonal, partial least squares regression (PLS) is a better alternative. This evaluation method can also be used when there are several correlated responses in the data set. PLS provides a more robust evaluation method which can be used even if there are a limited number of missing data in the response matrix.

The model evaluations in the bioprocess applications cited in the following are all examples of these analytical procedures.

Normally, the model is validated by two diagnostic residuals. The first of these, the R2-value, is the fraction of the variation of the responses that can be explained by the model. This describes the goodness of fit, or how well current runs can be reproduced in a mathematical model.

The other residual, the Q2-value, is the fraction of the variation of the response predicted by the model according to cross validation. This describes the goodness of prediction, or how well new experiments can be predicted using the mathematical model.

In literature reports, including those cited below, the R2 and Q2 testing are sometimes neglected which unfortunately reduces the possibility to fully appreciate the results. Typical values indicating good models are R2 > 0.75 and Q2 >0.60. Normally, values below 0.25 would be considered useless.

Robustness testing

As described above, for a case with five factors, we need either 16 (25−1) or 8 (25−2) factors in the experiments. The latter design is mainly used in robustness testing, where the sensitivity of the variation of the response variable is tested in a reduced set of experiments around a response variable optimum. The variation of the variables is usually made in very small steps around a specific field of interest, such as a preferred setting for an industrial process, in order to evaluate the process stability.

The aim of robustness testing is to identify responses which are robust, or alternatively, sensitive to small factor changes, and to find factors which need to be controlled to achieve robustness.

Usually, a linear model is used in robustness testing. However, it is important to notice that a response exhibiting low Q2 is a good indicator of robustness.

Bioprocess Cultivation Applications

In the following a number of applications of DoE in relation to bioprocesses are overviewed. These examples clearly illustrate the potential of the DoE-methodology in bioprocessing, in particular for new bioprocesses but also in retrofitting existing. Some of the examples are described in additional detail in order to demonstrate the practice, while others are mentioned only briefly in order to exemplify the versatility of DoE for bioprocess applications.

Optimization of culture media

So far, most applications of DoE have concerned optimization of the composition of growth and production culture media. Many examples can be found for production of antibiotics (see Table 1). The typical objective is to identify a better selection and quantitative composition of medium factors.

Table 1. Design-of-Experiments Applications for Production of Secondary Metabolites
ProductDesign purposeMethodologyRef.
  • RSM, response surface methodology; FD, factorial design.

  • *

    6-APA converted by penicillin acylase.

 Bacteriocin (B. licheniformis)Maximizing bacteriocin yield by optimizing  of medium and process parametersMedia ranking and RSM32
 Clavulanic acid (Streptomyces clavuligerus)Maximizing clavulanic acid yield by optimizing  medium compositionScreening by fractional FD and  optimizing by RSM11
 Meilingmycin (S. nanchangensis)Maximizing meilingmycin yield by optimizing  medium compositionScreening by fractional FD and  optimization by steepest ascent and RSM14
 6-Aminopenicillanic acid (penicillin acylase)*Maximizing 6-APA yield by optimzing  process parametersFull FD and modelling32
 Antifungal antibiotic (S. chattanoogensis)Maximizing antibiotic yield by optimizing  process parametersFull FD and statistical analysis40
 Neomycin (S. marinensis)Maximizing neomycin yield by optimizing  medium compositionFull FD and RSM12
 Nisin (Lactococcus lactis)Maximizing nisin specific productivity by  optimizing medium compositionFractional factorial design and  statistical analysis13
Other secondary metabolites
 Carotenoid (Rhodotorula glutinis)Maximizing carotenoid yield for media and  process parametersScreening by fractional FD and  optimization by RSM35
 Butter flavor compounds (Pediococcus  pentosaceus and L. acidophilus)Maximizing flavor yield by optimizing  medium compositionScreening by fractional FD followed by  statistical analysis16
 Astaxanthin (Phaffia rhodozyma)Maximizing astaxanthin yield for media  and process parametersScreening by full FD and optimization  by RSM41

An elucidating example of this is from the production of the antibiotic clavulanic acid using the fungus Streptomyces clavuligerus. Wang and coworkers (2004) optimized the medium composition by first screening a variety of media ingredients by a two-level fractional factorial design approach which subsequently was followed by optimizing their levels by response surface methodology.11 A set of fractional factorial design experiments identified soy meal powder, FeSO4·7H2O, and ornithine as the most influential medium factors. In the subsequent analyis by a response surface model using a CCC design, these three factors exhibited significant effects on the clavulanic acid yield, and the optimal concentration of soy meal powder was determined as 38.10 g/L, that of FeSO4·7H2O as 0.395 g/L, and that of ornithine as 1.177 g/L. The correlation factor was 0.98 and the coefficient of variation was 6.6%. Running the cultivation with these settings in a 72 h experiment increased the product yield by 50%.

Other successful examples of applying DoE to optimization of media composition for antibiotics production come from neomycin production by S. marinensis with solid-state fermentation,12 nisin by Lactococcus lactis,13 and meilingmycin by S. nanchangensis,14 where fractional design methodology in combination with response surface was applied in a manner similar to the clavulanic acid example above (for more details of the design conditions used, see Table 2).

Table 2. Conditions and Outcome of Optimization of Culture Media
ProductProduction organismDesign and media factorsResponsesImprovementRef.
Polyglutamic acidBacillus sp.215−k (13 medium factors;  pH; impeller speed)PGA yield3-fold increase18
PeroxidaseCoprinus sp27 (6 medium factors; pH)Enzyme activity yield2-fold increase22
Cystein CP. pastoris (recomb)23 (3 medium factors)Glycosylation degree14% increase in  glycosylation26
Yoghurt cultureBifidobacterium  longum28−4 (7 media factors.; pH) +  CCF designsGrowth rate; glucose rate50% reduction,  160% increase25
MannitolL. intermedius24 (4 media factors)Metabolite yieldsIncreases15
Degradation enzymesA. oryzae24−1 (3 medium factors; pH)Enzyme activity20–36% increase24
Chitinolytic enzymesSerratia marcescens23 (chitin; temp; pH)Enzyme activityRanking of chitins23
Penicillin acylaseS. lavendulaeMedium comp. (YE, olive oil, etc)Enzyme yieldOptimization21
LysozymeA. niger (recomb)25−1 (starch; peptone;  medium comp.)Lysozyme yieldOptima determined20
Butter flavor compounds (deacetyl)Pediococcus, Lactobacillus24 (4 pretreatment factors)Diacetyl yieldOptima determined16
Clavulanic acidS. clavuligerus26−2 (6 factors) and 33 (3 factors)Clavulanate yield50% increase11
NisinLactococcus lactis24−1 (4 media comp.)Nisin yield; specific  productivityIncreased activity yield13
NeomycinS. marinensis23 (3 media comp.)Neomycin yieldOptimum determined12
MeilingmycinS. nanchangensis29−5 (9 factors) + 22 (2 factors)Meilingmycin yield4.5-fold increase14

Experiences of media optimization reported for other primary and secondary metabolite production processes include mannitol fermentation by Lactobacillus intermedius,15 carotenoid production by Pediococcus pentosaceus and L. acidophilus in semisolid maize-based cultures,16 mycelial biomass and exo-polymer production by Grifola frondosa,17 and γ-polyglutamic acid fermentation by Bacillus sp.18, 19 (Tables 1 and 2).

Another prominent area of application is in production processes for enzymes and other proteins (Table 3); for example, the effect of the medium composition used for hen's egg white lysozyme production by Aspergillus niger.20 The influence by the ingredient factors, starch, peptone, ammonium sulfate, yeast extract, and CaCl2 2H2O, were screened in a 25−1 fractional factorial design using center points. The lysozyme response revealed that peptone, starch, and ammonium sulfate were the most influential factors, while the other factors had little effect at the levels tested. In this case, optimization was accomplished by a steepest ascent procedure followed by response surface modeling using a CCD. The medium composition determined as optimal for the lysozyme production was 34 g/L for starch and peptone, 11.9 g/L for ammonium sulfate, 0.5 g/L for yeast extract, and 0.5 g/L for CaCl2 2H2O. Using this medium in a 7-day fermentation resulted in a lysozyme yield of 209 ± 18 mg/L, which is close to the theoretically estimated yield (212 mg/L).

Table 3. Design-of-Experiments Applications for Production of Enzymes and Other Proteins
ProductOptimization PurposeMethodologyRef.
  • *, †

    used a EVOP (*) and Plackett-Burman design (†) which are common alternative experimental design methods.

 Chitinolytic enzyme production  (Serratia sp)Maximizing enzyme yield for  chitinous substratesFull FD and statistical analysis23
 Hydrolase (Tetrahymena thermophila)Optimizing enzyme yield and growth  for culture factorsFractional FD and RSM39
 Aminolevulinate synthase  (recomb E. coli)Maximizing synthase yield for medium  composition and inducer concentrationFull FD36
 Fungal peroxidase (2 Coprinus sp)Maximizing of peroxidase yield for  medium compositionFractional FD and RSM22
 Hen lysozyme (recomb A niger)Maximizing of peroxidase yield for  medium compositionFD and RSM20
 Penicillin acylase (S. lavendulae)Maximizing of acylase yield for  medium compositionFD and RSM21
 Inulinase (solid-state fermentation)Maximizing of inulinase yield for  sugarcane substrate and culture conditionsFD and RSM43
 Glycosidic enzymes (A. oryzae)Maximizing enzyme yields for solid state  fermentation conditionsFractional FD and RSM24
 Lipase (Burkholderia sp.)Maximizing of lipase yield for medium  compositionFractional FD and RSM42
 Amylase/protease (A. awamori)Optimization of amylase and protease  yield by solid-state fermtation parametersFactorial FD* and statistical analysis52
Other proteins
 Recombinant protein (A. niger)Optimizing protein yield for process  parametersFractional FD37
 Cystatin C mutant (in P. pastoris)Maximizing yield for nitrogen source and  glycosylation degreeFull FD26
 Fab' fragment (E. coli)Optimizing Fab yield for mixing and oxygen  transfer of the cultureFull FD38
 Polyglutamic acid (B. subtilis)Maximizing PGA yield for media parametersFractional FD and RSM19
 Polyglutamic acid (Bacillus sp.)Maximizing PGA yield for media and process  parametersFractional FD and statistics18
 Glucocorticoid receptor  (in insect cell culture)Maximizing the receptor yield from transfection  and process conditionsFractional FD and RSM45

Several other examples of optimization of production media for proteins along the lines outlined above can be mentioned (Table 3), such as penicillin acylase production by S. lavendulae,21 fungal peroxidase production by Coprinus species,22 production of chitinolytic enzymes by Serratia marcescens QMB1466,23 and production of cellulose degrading enzymes from A. oryzae.24

For most of the media optimization applications, it is the final product yield or final concentration that is the aim of the design experiments; that is, the intention is to find the most favorable mix of nutrient factors to maximize the cellular productivity by supplying a well balanced composition of nutrients that enhances the maximum yield of the product molecule. Thus, the principle of Figure 1 above is the basis for the DoE.

So far, few attempts have been made to reach a specific response related to the efficiency of the cellular machinery. Such an example is reported where the resulting glycosylation patterns of the protein was the studied response.26 The degree of glycosylation for three glycoforms of cystein C glycoprotein was studied in a full factorial design experiment with three different nitrogen media in recombinant Pichia pastoris cultures. A maximization of glycosylation was reached at the expense of productivity.

The overview in Table 2 compares the methodology applied in these recent applications of media optimization. As the table illustrates, most researchers have applied quite similar DoE approaches.

An inventive step towards accelerating media optimization by DoE was taken by Deshpande et al. (2004).27 They used a micro-titer plate with on-line measurement of dissolved oxygen for optimization of a cultivation of Chinese hamster ovary cells in a culture medium with selected factors. By a dynamic liquid phase mass balance, the oxygen uptake rates were calculated from the dissolved oxygen level and used to indicate cell viability. Using a full factorial design with CCF, the optimum medium composition could be identified and determined for glucose, glutamine, and inorganic salts in one single micro-titer plate experiment. The concentration of inorganic salts was found to have the most significant influence on the cultivation. The method seems to have good potential for medium optimization of cell culture media.

Recently, elaborate DoE protocols for 96-multi-well plates have been developed.28 By applying these rational protocols for experimental design, the optimization can be made more efficient and less time-consuming. The technique can also be applied to larger formats such as 384-well and 1536-well plates.28

Optimization of process operation

Optimization of the operational procedures in bioprocessing can either concern state variables that should be kept under control at a predetermined set-point or that should follow the course of a preset trajectory, or discrete sequential actions. Typical examples are control variables such as temperature, pH, and dissolved oxygen or feed rate profiles in fed-batch reactors, or gradient and washing procedures in chromatography columns.

Normally, these variables are feedback or feedforward controlled by adjusting or adding a specific factor to keep the state at the preset optimal level, contrary to most of the concentrations of medium components which undergo transient decreases until depletion leads to cessation of growth and production. Moreover, the objective of the DoE is often the optimal combinations of several such state variables at favorable levels of nutrients.

Optimization of basic process state variables

Roebuck et al. (1995) applied fractional factorial design and response surface modeling to determine the optimal temperature and pH for the growth of the yeast Pachysolen tannophilus.29 This yeast strain has evoked considerable interest as a large-scale ethanol and xylitol producer due to its ability to ferment a wide range of carbohydrates, which makes it suitable for the conversion of lignocellulosic wastes.

The authors undertook a small-scale batch study in aerated shake-flasks using a standard cultivation medium. On the basis of literature data on variability of the yeast, they chose to screen a wide range of temperature (30–40°C) and pH (pH 2–6) values for optimization. Using the Modde™ software package an experimental two-factor design was set up with three replicates at the center point (35°C, pH 4.0) and others in duplicate; 19 experiments in total (Figure 5a). Batch flask experiments were run for 13 h under controlled conditions and the attained optical density of the culture was measured at 600 nm. Typically, a growth rate of 0.25 h−1 was reached at the exponential phase in the flasks. The data from the 19 experiments were treated by regression analysis in order to fit the response function. Coefficients were calculated with 95% confidence intervals on the levels of the experiments. A response surface was generated according to Figure 5b. Maximum optical density was obtained at pH 3.7 and a temperature of 31.5°C with an initial carbon source (D-xylose) concentration of ∼50 g/L. The dry weight value of the maximum optical density at this point was determined as 3.4 g/L. Both values coincided with previously reported data.

Figure 5.

(a) Experimental design composite for screening of state variables in a P. tannophilus fermentation. (b) A response surface model of experiments of the fermentation for the selected variables temperature and pH.

Another example describes a similar fractional factorial design approach to determine the cultivation parameters in a fed-batch E. coli cultivation for production of a recombinant Fab-antibody.25 The experiments were carried out in a multi-fermenter system (Figure 6a) consisting of 12 one-liter vessels equipped with pH, dissolved oxygen, and temperature control (Greta system, Belach AB, Solna, Sweden). This type of parallel equipment is especially feasible for chemometrics experiments.

Figure 6.

(a) Panel with twelve bioreactors for fed-batch experimental design for recombinant Fab fragment production in E. coli. The unit to the left is the steam generator for in situ sterilization of the reactor panel. (b) The response surface for the system based on a model with cultivation time and specific growth rate as input factors and recombinant protein yield as response.

The fermenters were operated as a fed-batch. Models were developed including cultivation time, temperature, and feed rate. A CCF design based on 32 cultivations was used. The factors were at three levels of pH (6.5, 7.0, 7.5), three levels of temperature (35.5, 37.0, 38.5°C), three lengths of cultivation time (16, 18, 20 h), and three specific growth rates (0.15, 0.25, 0.35 h−1, as controlled by feeding). The response parameter was the expression rate of recombinant antibody, which was analyzed by a Bioanalyzer instrument (2100, Agilent Technology Inc). Its value varied in the range 25–175 μg/mL. Response surface modeling (Figure 6b) showed maximum protein expression at a specific growth rate of 0.23 h−1 at a culture time of 17.8 h, a temperature of 36.5–37.0°C, and a pH of 6.8. However, goodness of fit (R2) and goodness of prediction (Q2) were modest (0.77 and 0.62, respectively).

Recent research further exemplifies the versatility of DoE for the purpose of optimizing key process response variables (Table 4). Examples are optimal rate of CO2-fixation by a chemoautotrophic microorganism,31 effects of inhibition treatment, type of inocula, and incubation temperature on batch hydrogen production from organic solid wastes,32 bacteriocin production by Bacillus licheniformis in cheese whey,33 6-aminopenicillanic acid (6-APA) production by using immobilized penicillin acylase,34 and carotenoid production by Rhodotorula glutinis.35 Also examples from food and fuel,15, 25, 44, 51 and gas phase production33, 50 bioprocessing can be mentioned.

Table 4. Miscellaneous Design-of-Experiments Applications
ProductOptimization PurposeMethodologyRef.
  • *

    Box-Behnken design is a common variant of FD.

Food and Fuels
 Mannitol (by Lactobacillus)Maximizing mannitol yield by optimizing the  medium compositionFractional FD and RSM15
 Wine yeast S. bayanus var.  uvarumImproving the growth of wine yeast hybrid by  optimizing the temperature and pHFractional FD44
 Yeast growth (genetically  engineered S. cerevisiae)Interaction effects between the inhibitors in spent  sulfite liquor on growth of recombinant a xylose fermenting Saccharomyces cerevisiae.Fractional FD51
 Probiotics of Bifidobacterium  longumMaximizing the growth of the bacteria by  optimizing the cultivation parametersFractional FD and model25
Gas phase production
 Hydrogen production (by  meso- and termophilic bacteria)Maximizing the H2 productivity from waste by  optimizing process parametersFull FD and statistical analysis33
 Hydrogen production  (by Enterobacter cloacae)Maximization of hydrogen yield by optimization  of key process parameters in a hydrolyzate mediumBox-Behnken FD1 and RSM50
Downstream Processing
 Tobacco rotein separation by  aqueous two-phase extractionMaximizing the yield of protein in the separation  by optimizing separation conditions in the phasesFractional FD and RSM49
 Protein separation by  chromatographyOptimizing separation conditions in the  chromatographic columnFractional FD and RSM46
 CHO cell separation by acoustic  deviceMaximizing the performance of the separation unitFractional FD and RSM47
 Chemoautotrophic microorganismOptimizing culture conditions for CO2 fixationFractional FD and RSM31
 CHO cell cultureOptimizing the medium composition from  oxygen uptakeFractional FD and RSM27
 Biomass/exo-polymer (by  G. frondosa)Maximizing yield of biomass/product by  optimizing the medium compositionBox-Behnken FD* + RSM17
 CHO cell cultureOptimizing the medium composition of  growth factorsFull FD and RSM52
 Embryonic stem cellsOptimizing the differentiation method to  enhance stem cell transformationFractional FD and statistical analysis48

Many of the examples here illustrate protein production (see also Table 3), for example recombinant aminolevulinate synthase production in Escherichia coli,36 recombinant protein production in A. niger fermentation,37 effects of mixing and oxygen transfer on the production of Fab-antibody fragments in E. coli fermentation,38 and growth and hydrolase production by Tetrahymena thermophila.39

Often, optimizations of culture media and state variables are combined, for example in antifungal antibiotic production by Streptomyces sp.,40 astaxanthin production by Phaffia rhodozyma,41 lipase production of Burkholderia,42 and inulinase production by solid-state fermentation on bagasse.43 Additional conditions for the experimental design are shown in Table 5.

Table 5. Design Methodology for Optimization of Bioprocess State Variables
ProductProduction organismDesign and FactorsResponsesImprovementRef.
State variable optimization
 Aminolevulinate  acid synthaseE. coli (recomb)24 (4 factors; IPTG)Synthase activity yield18-fold yield increase36
 InulinaseKlyveromyces22 (medium.; temp.)Inulinase activity yieldOptima determined43
 LipaseBurkholderia25–1 (temp.; pH; 3 media comp.)Lipase yield5-fold yield increase42
 AstaxanthinPfaffia rhodozyma25 (pH; temp.; carbon; nitrogen;  inoculum)Astaxanthin yield92% yield increase41
 Antifungal antibioticStreptomyces sp.33 (glucose; soyabean; temp.)Antifugal activityOptimization of factors23
 HydrolaseTetrahymena thromphila27–3 (pH; light; DO; 4 medium  comp.)Hydrolase yield; growthIdentification of most  important factors39
 CarotenoidRhodotorula glutinis25 (pH; 4 medium comp.)Carotenoid yield, biomass2.5-fold increase35
 EthanolP. tannophilus22 (pH; temp.)Optical densityOptima determined29
 Fab fragmentE. coli (recomb)34 (pH; temp.; feed rate; time)Fab fragment yieldOptima determined30
 YeastS. bayanus32 (pH; temp)Growth rate, spec. rateOptima determined44
Combined medium and state variable optimization
 CO2 fixationChemoautotroph34–1 (H2; O2; CO2; pH)Dry weight, CO2 fixationOptima determined31
 Hydrogen productionAnaerobic consortia23 (Inoculum; inducer; temp.)H2 accumulation; lag time4-fold increase33
 Fusion proteinA. niger (recomb)24–1 (agitation; CGlu; extract;  DO)Protein yield, protease  inhibition5.5-fold increase37
 Fab' fragmentE. coli22 (agitation; DO)Fab' yield77% yield increase38
 BacteriocinB. licheniformis23 (pH; temp.; cheese whey)Bacteriocin yield, Biomass  yieldOptima identified32
 6-Aminopenicillanic  acidPenicillin acylase23 (pH; temp.; substrate)Enzyme hydrolysisOptima determined32

Optimizing process operation

The DoE applications described above are predominately related to the identification of a particular operational state that should be maintained during processing. DoE also allows the identification of an optimal sequence of operational actions, such as when to inoculate, harvest, or induce/transfect a culture. The following example taken from an insect cell culture-virus vector system, illustrates how a decisive time-related parameter can be determined.45

The Spodoptera frugiperda insect cell line Sf-9 was transfected by a baculovirus vector carrying the gene for the human glucocorticoid receptor protein. A fed-batch cultivation protocol for a lab-scale reactor was applied when culturing the insect cells; the object of optimization was transfection time by the viral strain as well as the expression of the recombinant protein from the successfully transfected vector. The yield of the expressed receptor protein was optimized on the basis of four factors; the optical density value at infection, the multiplicity of infection, the temperature after transfection, and the time interval between transfection and harvest. The screening included a fractional factorial design with a total of 38 experiments at two and three levels (1, 2.0, and 2.8 cells/mL; 1, 5.5, and 10 pfu/cell; 36, 47, and 60 h intervals; and 25 and 27°C). Analysis by response surface modeling resulted in a twofold optimization of the yield (11.25 mg/L) of the glucocorticoid receptor. The temperature after the transfection and the time interval between transfection and harvest could both be optimized, while the multiplicity of infection was shown to have minor influence.

Other Bioprocess-Related Applications

DoE methods can also be applied in other applications of bioprocessing than the cultivation. The operation of recycling devices, such as centrifugation units or other cell separation units where process parameters can easily be identified, is one area which is suitable for screening and optimization. The operation of chromatographic column separation units and stem cell differentiation are other up- and downstream procedures that can be improved.

Optimization of mammalian cell culture separation

An acoustic cell retention device (BioSep ADI 1015 AppliSens, Applikon BV, Schidam, NL) was used to harvest a Chinese hamster ovary (CHO) cell culture producing a recombinant protein (Figure 7) as an alternative to a continuous centrifuge operation. The acoustic device separates the cell suspension by a combined action of gravimetric and oscillating forces. The use of this device allows continuous harvesting from a culture, as well as recycling or harvest of the retentate flow, whereas the filtrated supernatant can be advanced to a subsequent processing step.

Figure 7.

Mammalian cell culture process with an acoustic device for separation of medium with protein product and cells recycled to the perfusion reactor.

Factorial design parameters are indicated (factors within dashed lines, responses within double lines).

The operational parameters affecting the separation yield were the flow rate in the recycling stream, the oscillation power exerted on the acoustic flow cell, and the density of the purged CHO cell suspension.46 The investigated factors included power consumption, harvest rate, and on- and off-time of the acoustic oscillation.

The response surface regression model used was based on a full factorial design with CCC. Experiments were carried out in the growth and production phases of the CHO cell culture and the separation efficiency in the harvest stream was measured at two or three factor levels. The factor analysis showed that all factors had significant influence and some also interacted. As could be expected, higher power and low harvest rate resulted in better separation and an interaction between on- and off-time was also revealed in the analysis. The regression analysis determined the optima for power to 6.6 W, on-time to 6000 s and off-time to 18 s at a constant low harvest rate. The model showed good robustness; R2 was 0.95 and Q2 0.72.

Downstream processing of expanded-bed separation

Expanded-bed adsorption (EBA) chromatography was applied to purify a crude homogenate from an E. coli culture producing recombinant human enzyme phophobilinogen deaminase (rhPBGD). The unclarified homogenate was diluted and applied on an EBA column (Streamline 25, GE Healthcare AB, Uppsala, Sweden), and a phenyl hydrophobic interaction chromatography gel with a salt gradient was used to purify the target protein (Figure 8).

Figure 8.

Bioprocess using expanded-bed adsorption (EBA) chromatography for purification shown.

Factorial design parameters are indicated (factors within dashed lines, responses within double lines).

Björnsson (2001) used fractional factorial design to set up screening and optimization experiments to investigate this system.47 The factor variables were the pH of the elution buffer, the salt concentration, and the volumetric flow rate through the expanded column (Figure 8). The response variables, the recovery, purity, and optical density of the effluent stream of the product protein rhPBGD, were analyzed with conventional methods. The factorial design model was based on CCC, with 17 experimental runs of the EBA column. The screening revealed that pH and salt were important factors, and these were analyzed further by response surface plot. The maximum product recovery was found at pH 7.6 and a salt concentration of 0.9 mol/L; the R2 was 0.87 and the Q2 was 0.71. The optimized EBA column resulted in a 25–30% increase in recovery of rhPBGD, in comparison with the alternative processing method of cross-flow filtration followed by packed-bed hydrophobic interaction chromatography.

Culture conditions for differentiation of embryonic stem cells

The development of new culture production systems for embryonic stem (ES) cells requires substantial work to find suitable production conditions. This is a particular concern in the pre-processing of time-consuming and labor-demanding in vitro differentiation procedures, where the complexities of factors that impact upon ES cell differentiation are profound. Thus, advanced factorial design can be a useful methodology for analyzing time-dependent factors during the differentiation process, as it has a good chance of being able to reveal unexpected interactions which would be missed by a conventional one-variable-at-a-time analysis.

Chang and Zandstra (2004) have developed and validated such a technology for ES cell differentiation analysis.48 They used a quantitative screening platform based on automated fluorescence microscopy, which enumerated the ES cells that had entered endodermal differentiation through expression of two biomarkers (cytokeratin-8 and hepatocyte nuclear factor 3β). Using a two-level fractional factorial design model based on 32 triplicate experiments, they screened important medium components for the differentiation process to endodermal cells (glucose, insulin, basic fibroblast growth factor, retinoic acid, and epidermal growth factor) with the biomarkers and cell numbers as responses. The model was further refined using a subsequent three-level factorial experiment for two of the factors. A statistical regression model was used to identify major and interactive effects on the endoderm formation. Retinoic acid was found to have an inhibitory effect on endoderm formation, while low glucose levels were beneficial. DoE proved to be a powerful tool for studying the factors impacting endoderm-specific ES cell differentiation; but it does require a relevant and sufficiently sensitive technique for the analysis of responses.

This need may be met to a substantial degree by new metabolomics and other post-genomics tools such as gene expression microarrays53, 54 and MALDI-TOF masspectrometry.55 With these tools key response variables can, if required, be provided using high-throughput protocols.56 Combination with powerful data mining methods would further enhance these possibilities.


This review has shown a collection of recent applications of DoE methods in bioprocess-related biotechnology (Tables 15). From these, it is evident that the focus, so far, has predominantly been on media optimization of antibiotics, enzyme, and recombinant protein production processes, although products such as ethanol, hydrogen, and primary metabolites are also amply dealt with.

In some cases, this has been combined with other process operation parameters such as temperature, pH, and time for induction of protein expression. The DoE methodology applied follows more or less the same basic approach, with fractional or full factorial design followed by response surface modeling to identify optima of product yield and productivity.

The majority of cases involving medium and state variable optimization is probably due to the fact that the theory and methodology of factorial design is relatively easy to grasp when factors and responses are well-structured and restricted to two well-defined states, input and output.

The surveyed applications here illustrate that the existing DoE methodology is not fully exploited. A more advanced use of factorial design and statistical analysis would be possible but requires a deeper insight into the flexibility of DoE, in combination with additional consideration and analysis of data. This would open up many more applications, and possibilities of fulfilling more intricate optimization needs. A few of the examples quoted in the present review point at this possibility; for example, recombinant protein induction, vector infection, harvest time, and embryonic stem cell differentiation, where, as shown in Figure 1, the factors are distributed at different places and times over the bioprocess analyzed. It would also be of value to extend the models to include more of the intracellular processes of the biological systems of the bioprocess.

In particular, in the light of the incentives expressed in the PAT initiative,4 the following would be of value:

  • to investigate the responses of diverse factors in a manufacturing process that are distributed over the process sequence;

  • to investigate how process control factors are interdependent with product quality attributes; and

  • to investigate in a more systematic manner the factors that affect intrinsic responses in the cells/product protein of the bioprocess, such as glycosylation pattern and other modifications. Here, data mining techniques may provide a useful resource of analytical evaluation methods.

As the PAT initiative emphasizes, the use of DoE to exploit new optimization possibilities in biotechnology can be a very useful resource for bioprocess development.


The authors wish to thank Peter Björnsson, Johan Larsson, Jenny Sävenhed and Ola Tuvesson for valuable contributions and Umetrics AB and Belach AB for providing access to materials used in the figures.