This review surveys recent applications of design-of-experiments (DoE) methodology in the development of biotechnological processes. Methods such as factorial design, response surface methodology, and (DoE) provide powerful and efficient ways to optimize cultivations and other unit operations and procedures using a reduced number of experiments. The multitude of interdependent parameters involved within a unit operation or between units in a bioprocess sequence may be substantially refined and improved by the use of such methods. Other bioprocess-related applications include strain screening evaluation and cultivation media balancing. In view of the emerging regulatory demands on pharmaceutical manufacturing processes, exemplified by the process analytical technology (PAT) initiative of the United States Food and Drug Administration, the use of experimental design approaches to improve process development for safer and more reproducible production is becoming increasingly important. Here, these options are highlighted and discussed with a few selected examples from antibiotic fermentation, expanded bed optimization, virus vector transfection of insect cell cultivation, feed profile adaptation, embryonic stem cell expansion protocols, and mammalian cell harvesting.
Statistical experimental planning, factorial design, and design-of-experiments (DoE), are more or less synonymous concepts for investigating the mathematical relationships between input and output variables of a system. Although the fundamentals of the methodology have been known since the early 1900s,1–3 it was not until recent years that it was widely applied in biotechnology.
In its basic outline, the DoE methodology when applied to bioprocesses is uncomplicated (Figure 1). It investigates defined input factors to a converting biosystem from which mostly common and well-defined output factors or responses are generated, such as product yield and productivity. Applied in this way, DoE is similar to mass balancing. However, the strength of DoE is that it also reveals how interactions between the input factors influence the output responses. These interactions are often difficult to discover and interpret with other methods.
Biotechnology processes can be considered as a transformation of nutrients and other medium components to biomass and/or molecular products. The utilization of the carbon and nitrogen sources is relatively easily overviewed via total stochiometric balances. To this, a degree of complexity is added by shifts in metabolism caused by activating or deactivating medium components such as growth factors, vitamins, and microbial stressors. Figure 1a illustrates the situation where all inputs exert their effects as an initial load of multiplicity effects onto the transforming bioprocess in an apparently steady-state condition. However, in reality the steady-state is unstable, since the nutrient factors are degraded and depleted.
Everyone with experience in bioprocessing is well aware of actions and effects that are displaced over time in cultivation. Some factors are settled from the beginning of the process, while others are provided later, e.g. by feeding, for the purpose of regulation or setting of profiles (Figure 1b).
In most bioprocesses, products are recovered downstream. Additional factors, e.g. eluents, are sequentially supplied with the purpose of purifying the product. Here, too, input factors are distributed over time. The influence of upstream procedures follows the same pattern (Figure 1c). Thus, any attempt to encompass all factors in a bioprocess ends up in an incomprehensible number of interacting and noninteracting factors which may, or may not, affect the specific outputs.
Process analytical technology (PAT), as defined by the Food and Drug Administration,4 aims in particular at mobilizing mathematical modeling methods for enhancing the necessary deeper understanding of the manufacturing (bio)process in the pharmaceutical field, with the long-term purpose of adjusting manufacture online and eliminating delays in product release. Although not covered by FDAs definition, several other bioprocess applications in pharmaceuticals may benefit from advancing PAT. In fact, all biotechnology manufacture may benefit from DoE, whether the products are diagnostics, foodstuffs, commodities, or fine chemicals.
Previous reviews on statistical methods have covered the period leading up to the 1990s (see e.g. Refs. 5,6); this review focuses mainly on the period thereafter. As previously, most attention in published literature is on the development and optimization of cultivation and production media where the most effective medium components are screened for and attempts are made to optimize their levels. Sometimes, the media factors are combined with operational process parameters such as temperature, agitation, and aeration. New bioprocess-related technologies, for example differentiation of stem cells and expanded bed technique, are occasionally touched upon. However, few efforts have been made to combine factors displaced in time at different stages in the bioprocess sequence as illustrated in Figure 1. Here, the mainstream of bioprocess applications is covered; a few pertinent cases are discussed in more detail in order to illustrate the possibilities for continuing use and further refinements of the methodology.
Factorial Design and Response Surface Methodology
When planning a set of factorial experiments to evaluate important experimental variables in a bioprocess, or to perform an optimization of them in a biotechnical system, the DoE is a powerful methodology which avoids experimental biases and reduces the required number of experiments.7–10 In optimization, the DoE methodology is clearly preferable to methods which vary one variable at a time, since with a single-variable approach it is likely that the experimenter ends up at a quasi-optimum (Figure 2a).
By performing factorial design, a reliable result can be achieved with only a few experiments, after which the most favorable direction to move in order to find a true optimum can be evaluated (Figure 2b). The graphs in Figure 2 illustrates how easy it is to be trapped in a quasi-optimum by varying one variable at a time even in the two-variable case when keeping variable X1 constant; five experiments are performed while varying variable X2. From the obtained optimum, variable X1 is then varied in another four experiments. A correct optimum is never reached since there is a dependency between the two variables. By simultaneously varying both X1 and X2, and analyzing the result with experimental design software, the direction of the true optimum can be found.
Screening of factors
As initial step in a DoE plan, the most efficient approach to start is with a screening of important variables of the studied bioprocess through a set of experiments proceeding from selected corners in the experimental space. This procedure is called fractional factorial design, since only a fraction of the possible values of the corners are investigated. An example of a screening with three variables is shown in Figure 3a. The three selected factor variables are screened at two levels of values, and each variable is tested twice at low and high level, as the figure depicts.
This step of the DoE plan is often easy to set up in a bioprocess system since the procedures for cultivation or separation have already established factor variables well defined, for example, concentration of components in the media and settings of state variables in the fermenter. However, the selection of variables and their levels is entirely up to the experimenter's judgment and pre-knowledge about the studied process. Inappropriate choices will limit the usefulness of the results and making it necessary to carry out new experiments with other variables and levels.
The reduced set of experiments can be described mathematically as 2n−k, where n is the number of factors to be investigated at the low and high levels and k is the number of steps to reduce the experimental design. If, for example, five variables are involved in the experiment, we will end up with 16 (25−1) or 8 (25−2) experiments respectively, depending on by how many steps the design is reduced. The first of these designs is preferred; since linear terms are not confounded with 2-factor interactions (see Evaluation of the experimental design, below). However, in current practice such a procedure is seldom carried out systematically—instead a level of appropriate reduction is chosen. The screening will be further improved by replicates in the center point of the experimental domain, with the dual purpose of collecting additional response values while also determining the experimental error for the same response. The reduction of factor experiments decreases the statistical quality and should consequently be applied with caution. The effects become apparent in the goodness of prediction and goodness of fit performance parameters (see below).
The complete set of experiments should be performed in a random order, to avoid systematic error. Systematic error can occur when, for example, the three experiments at the center point of a design are performed one after another using the same experimental procedure. Randomization also blinds the experimenter to any expected good or bad results.
The typical result of a bioprocess screening experiment is that a subset of a few important factors is identified. These factors can be used in a new experimental design with the purpose of determining optimal factor values. This requires a central composite face centered design (CCF) or central composite circumscribed design (CCC). Figure 3b illustrates a three-variable case of the CCF and CCC designs. In the CCF case, additional values of the variables are included in the surface center points between the corners of the experimental space. In the CCC, these are displaced outside the space at the same distance from the center point as the distance from the center point to the corners. Theoretically, the CCC design is somewhat better than the CCF design since the CCC design covers a larger volume. Moreover, the CCC design has five levels of each factor and can therefore better capture strong curvature and even cubic response behavior.
When starting with three factor variables, a set of 14 new experiments was generated; adding to this the three experiments at the center point, 17 experiments are required. The reason for the selection of 14 experiments is the ambition to demarcate a space with the optimum for the three variables, and with the selected geometry of levels of the variables there is enough information to be able to identify optimal conditions within the space. The result is depicted in a contour plot or a response surface where the optimum is clearly visualized. This final step has named the whole procedure response surface methodology.
The response surface does not reveal if it is a local or global optimum. However, by carrying out the evaluation procedure with support of multiple simplex optimizers the risk to end up in a local optimum is significantly limited.
Usually, commercially available software is utilized when planning the experimental designs. Frequently used software packages for experimental design are Modde™ (Umetrics AB, Umeå, Sweden; www.umetrics.com), MiniTab™ (Minitab Inc., State Collage, PA), and Design-Expert™ (www.statease.com), all of which are convenient for applying DoE procedures. Most of the examples quoted in this review use one of these packages, although other software programs can also be used, or special algorithms can be built in, for example, Matlab or Excel.
Qualitative factor variables
It is important to remember that both quantitative and qualitative variables can be included in the factor set, as can quantitative and qualitative multilevel factors. For example, in a case with two qualitative factors, X1 (with four discrete levels) and X3 (with three discrete levels), and one quantitative factor, X2, a design can be generated as shown in Figure 4. Here, it is not possible to create any center points between the discrete levels of factor X1 and X3. However, a center point can be positioned at one of the discrete levels of the qualitative variables and on an intermediate level of the quantitative variable (when appearing together in a design as in Figure 4). A typical example from bioprocessing is when comparing two different impeller turbines at varying temperatures and oxygen transfer rates.
Uncontrolled variables, when recording a factor without controlling the process temperature of the experiment, for example, can also be used in the design in addition to the controlled factor variables, since these data can be included when evaluating the influence of the uncontrolled variable on the experimental results. This is typical for bioprocess experimental conditions, involving, for example, dissolved oxygen tension, and complex hydrolysate media, which may require special attention.
Design of media and mixtures
Of particular interest for bioprocesses is the optimization of media and mixture composition. A mixture/media design can easily be performed in DoE for culture media and galenic formulations, for example. It is convenient here to assign suitable intervals for the percentage variation of each variable, where 0–100% is then written as an interval between 0 and 1. In a three-variable case, the design will be a symmetric triangle, and in a four-variable case it will appear as a symmetric tetrahedron.
If one component in a medium is dominant to a total desired volume or amount (e.g. a solvent such as a buffer solution or albumin component of a culture medium), then the effect of this component can also be evaluated. Often this is referred to as the filler substance of which there is only one in the medium.
In a media/mixture design using this method, it is recommended for the reasons presented above, to start by performing a screening of the experimental domain if this is unknown, before applying more advanced designs for optimization.
Evaluation of the experimental design
When the experiments have been carried out according to the experimental design, and the results of response variables have been compiled, multiple linear regression (MLR) is used. The purpose is to evaluate whether the experimental space has been orthogonally designed and whether it remains so after the experiments have been performed.
In MLR, the mathematical model attempts to describe a relationship between one or more independent variables and a response variable, described as yi = β0 + β1xi1 + β2xi2 + ….+ βpxip + εi when i = 1,2, … n. In addition, more complex interaction terms describing co-variation of linear or quadratic terms can be included. However, the interaction terms or other complex terms must be assessed to see if they give acceptable contributions to the model in comparison to the uncertainty of the contribution (signal-to-noise ratio).
In the evaluation of the contribution of each coefficient and the subsequent optimization of the model, the aim is to reduce the model deviation factor εi as much as possible. When characterizing a response surface, the co-variation terms describe its skewness, while quadratic terms describe its curvature.
In a complete design, due to the number of degrees of freedom, it is likely that term describing interaction and curvature can be included in the MLR model. When evaluating fractional factorial designs by MLR, it is a common practice to support the linear terms with a limited number of complex terms. In a fractional factorial design, such terms are often confounded with each other.
When, for example, the design is not orthogonal, partial least squares regression (PLS) is a better alternative. This evaluation method can also be used when there are several correlated responses in the data set. PLS provides a more robust evaluation method which can be used even if there are a limited number of missing data in the response matrix.
The model evaluations in the bioprocess applications cited in the following are all examples of these analytical procedures.
Normally, the model is validated by two diagnostic residuals. The first of these, the R2-value, is the fraction of the variation of the responses that can be explained by the model. This describes the goodness of fit, or how well current runs can be reproduced in a mathematical model.
The other residual, the Q2-value, is the fraction of the variation of the response predicted by the model according to cross validation. This describes the goodness of prediction, or how well new experiments can be predicted using the mathematical model.
In literature reports, including those cited below, the R2 and Q2 testing are sometimes neglected which unfortunately reduces the possibility to fully appreciate the results. Typical values indicating good models are R2 > 0.75 and Q2 >0.60. Normally, values below 0.25 would be considered useless.
As described above, for a case with five factors, we need either 16 (25−1) or 8 (25−2) factors in the experiments. The latter design is mainly used in robustness testing, where the sensitivity of the variation of the response variable is tested in a reduced set of experiments around a response variable optimum. The variation of the variables is usually made in very small steps around a specific field of interest, such as a preferred setting for an industrial process, in order to evaluate the process stability.
The aim of robustness testing is to identify responses which are robust, or alternatively, sensitive to small factor changes, and to find factors which need to be controlled to achieve robustness.
Usually, a linear model is used in robustness testing. However, it is important to notice that a response exhibiting low Q2 is a good indicator of robustness.
Bioprocess Cultivation Applications
In the following a number of applications of DoE in relation to bioprocesses are overviewed. These examples clearly illustrate the potential of the DoE-methodology in bioprocessing, in particular for new bioprocesses but also in retrofitting existing. Some of the examples are described in additional detail in order to demonstrate the practice, while others are mentioned only briefly in order to exemplify the versatility of DoE for bioprocess applications.
Optimization of culture media
So far, most applications of DoE have concerned optimization of the composition of growth and production culture media. Many examples can be found for production of antibiotics (see Table 1). The typical objective is to identify a better selection and quantitative composition of medium factors.
Table 1. Design-of-Experiments Applications for Production of Secondary Metabolites
An elucidating example of this is from the production of the antibiotic clavulanic acid using the fungus Streptomyces clavuligerus. Wang and coworkers (2004) optimized the medium composition by first screening a variety of media ingredients by a two-level fractional factorial design approach which subsequently was followed by optimizing their levels by response surface methodology.11 A set of fractional factorial design experiments identified soy meal powder, FeSO4·7H2O, and ornithine as the most influential medium factors. In the subsequent analyis by a response surface model using a CCC design, these three factors exhibited significant effects on the clavulanic acid yield, and the optimal concentration of soy meal powder was determined as 38.10 g/L, that of FeSO4·7H2O as 0.395 g/L, and that of ornithine as 1.177 g/L. The correlation factor was 0.98 and the coefficient of variation was 6.6%. Running the cultivation with these settings in a 72 h experiment increased the product yield by 50%.
Other successful examples of applying DoE to optimization of media composition for antibiotics production come from neomycin production by S. marinensis with solid-state fermentation,12 nisin by Lactococcus lactis,13 and meilingmycin by S. nanchangensis,14 where fractional design methodology in combination with response surface was applied in a manner similar to the clavulanic acid example above (for more details of the design conditions used, see Table 2).
Table 2. Conditions and Outcome of Optimization of Culture Media
Experiences of media optimization reported for other primary and secondary metabolite production processes include mannitol fermentation by Lactobacillus intermedius,15 carotenoid production by Pediococcus pentosaceus and L. acidophilus in semisolid maize-based cultures,16 mycelial biomass and exo-polymer production by Grifola frondosa,17 and γ-polyglutamic acid fermentation by Bacillus sp.18, 19 (Tables 1 and 2).
Another prominent area of application is in production processes for enzymes and other proteins (Table 3); for example, the effect of the medium composition used for hen's egg white lysozyme production by Aspergillus niger.20 The influence by the ingredient factors, starch, peptone, ammonium sulfate, yeast extract, and CaCl2 2H2O, were screened in a 25−1 fractional factorial design using center points. The lysozyme response revealed that peptone, starch, and ammonium sulfate were the most influential factors, while the other factors had little effect at the levels tested. In this case, optimization was accomplished by a steepest ascent procedure followed by response surface modeling using a CCD. The medium composition determined as optimal for the lysozyme production was 34 g/L for starch and peptone, 11.9 g/L for ammonium sulfate, 0.5 g/L for yeast extract, and 0.5 g/L for CaCl2 2H2O. Using this medium in a 7-day fermentation resulted in a lysozyme yield of 209 ± 18 mg/L, which is close to the theoretically estimated yield (212 mg/L).
Table 3. Design-of-Experiments Applications for Production of Enzymes and Other Proteins
used a EVOP (*) and Plackett-Burman design (†) which are common alternative experimental design methods.
Several other examples of optimization of production media for proteins along the lines outlined above can be mentioned (Table 3), such as penicillin acylase production by S. lavendulae,21 fungal peroxidase production by Coprinus species,22 production of chitinolytic enzymes by Serratia marcescens QMB1466,23 and production of cellulose degrading enzymes from A. oryzae.24
For most of the media optimization applications, it is the final product yield or final concentration that is the aim of the design experiments; that is, the intention is to find the most favorable mix of nutrient factors to maximize the cellular productivity by supplying a well balanced composition of nutrients that enhances the maximum yield of the product molecule. Thus, the principle of Figure 1 above is the basis for the DoE.
So far, few attempts have been made to reach a specific response related to the efficiency of the cellular machinery. Such an example is reported where the resulting glycosylation patterns of the protein was the studied response.26 The degree of glycosylation for three glycoforms of cystein C glycoprotein was studied in a full factorial design experiment with three different nitrogen media in recombinant Pichia pastoris cultures. A maximization of glycosylation was reached at the expense of productivity.
The overview in Table 2 compares the methodology applied in these recent applications of media optimization. As the table illustrates, most researchers have applied quite similar DoE approaches.
An inventive step towards accelerating media optimization by DoE was taken by Deshpande et al. (2004).27 They used a micro-titer plate with on-line measurement of dissolved oxygen for optimization of a cultivation of Chinese hamster ovary cells in a culture medium with selected factors. By a dynamic liquid phase mass balance, the oxygen uptake rates were calculated from the dissolved oxygen level and used to indicate cell viability. Using a full factorial design with CCF, the optimum medium composition could be identified and determined for glucose, glutamine, and inorganic salts in one single micro-titer plate experiment. The concentration of inorganic salts was found to have the most significant influence on the cultivation. The method seems to have good potential for medium optimization of cell culture media.
Recently, elaborate DoE protocols for 96-multi-well plates have been developed.28 By applying these rational protocols for experimental design, the optimization can be made more efficient and less time-consuming. The technique can also be applied to larger formats such as 384-well and 1536-well plates.28
Optimization of process operation
Optimization of the operational procedures in bioprocessing can either concern state variables that should be kept under control at a predetermined set-point or that should follow the course of a preset trajectory, or discrete sequential actions. Typical examples are control variables such as temperature, pH, and dissolved oxygen or feed rate profiles in fed-batch reactors, or gradient and washing procedures in chromatography columns.
Normally, these variables are feedback or feedforward controlled by adjusting or adding a specific factor to keep the state at the preset optimal level, contrary to most of the concentrations of medium components which undergo transient decreases until depletion leads to cessation of growth and production. Moreover, the objective of the DoE is often the optimal combinations of several such state variables at favorable levels of nutrients.
Optimization of basic process state variables
Roebuck et al. (1995) applied fractional factorial design and response surface modeling to determine the optimal temperature and pH for the growth of the yeast Pachysolen tannophilus.29 This yeast strain has evoked considerable interest as a large-scale ethanol and xylitol producer due to its ability to ferment a wide range of carbohydrates, which makes it suitable for the conversion of lignocellulosic wastes.
The authors undertook a small-scale batch study in aerated shake-flasks using a standard cultivation medium. On the basis of literature data on variability of the yeast, they chose to screen a wide range of temperature (30–40°C) and pH (pH 2–6) values for optimization. Using the Modde™ software package an experimental two-factor design was set up with three replicates at the center point (35°C, pH 4.0) and others in duplicate; 19 experiments in total (Figure 5a). Batch flask experiments were run for 13 h under controlled conditions and the attained optical density of the culture was measured at 600 nm. Typically, a growth rate of 0.25 h−1 was reached at the exponential phase in the flasks. The data from the 19 experiments were treated by regression analysis in order to fit the response function. Coefficients were calculated with 95% confidence intervals on the levels of the experiments. A response surface was generated according to Figure 5b. Maximum optical density was obtained at pH 3.7 and a temperature of 31.5°C with an initial carbon source (D-xylose) concentration of ∼50 g/L. The dry weight value of the maximum optical density at this point was determined as 3.4 g/L. Both values coincided with previously reported data.
Another example describes a similar fractional factorial design approach to determine the cultivation parameters in a fed-batch E. coli cultivation for production of a recombinant Fab-antibody.25 The experiments were carried out in a multi-fermenter system (Figure 6a) consisting of 12 one-liter vessels equipped with pH, dissolved oxygen, and temperature control (Greta system, Belach AB, Solna, Sweden). This type of parallel equipment is especially feasible for chemometrics experiments.
The fermenters were operated as a fed-batch. Models were developed including cultivation time, temperature, and feed rate. A CCF design based on 32 cultivations was used. The factors were at three levels of pH (6.5, 7.0, 7.5), three levels of temperature (35.5, 37.0, 38.5°C), three lengths of cultivation time (16, 18, 20 h), and three specific growth rates (0.15, 0.25, 0.35 h−1, as controlled by feeding). The response parameter was the expression rate of recombinant antibody, which was analyzed by a Bioanalyzer instrument (2100, Agilent Technology Inc). Its value varied in the range 25–175 μg/mL. Response surface modeling (Figure 6b) showed maximum protein expression at a specific growth rate of 0.23 h−1 at a culture time of 17.8 h, a temperature of 36.5–37.0°C, and a pH of 6.8. However, goodness of fit (R2) and goodness of prediction (Q2) were modest (0.77 and 0.62, respectively).
Recent research further exemplifies the versatility of DoE for the purpose of optimizing key process response variables (Table 4). Examples are optimal rate of CO2-fixation by a chemoautotrophic microorganism,31 effects of inhibition treatment, type of inocula, and incubation temperature on batch hydrogen production from organic solid wastes,32 bacteriocin production by Bacillus licheniformis in cheese whey,33 6-aminopenicillanic acid (6-APA) production by using immobilized penicillin acylase,34 and carotenoid production by Rhodotorula glutinis.35 Also examples from food and fuel,15, 25, 44, 51 and gas phase production33, 50 bioprocessing can be mentioned.
Many of the examples here illustrate protein production (see also Table 3), for example recombinant aminolevulinate synthase production in Escherichia coli,36 recombinant protein production in A. niger fermentation,37 effects of mixing and oxygen transfer on the production of Fab-antibody fragments in E. coli fermentation,38 and growth and hydrolase production by Tetrahymena thermophila.39
Often, optimizations of culture media and state variables are combined, for example in antifungal antibiotic production by Streptomyces sp.,40 astaxanthin production by Phaffia rhodozyma,41 lipase production of Burkholderia,42 and inulinase production by solid-state fermentation on bagasse.43 Additional conditions for the experimental design are shown in Table 5.
Table 5. Design Methodology for Optimization of Bioprocess State Variables
The DoE applications described above are predominately related to the identification of a particular operational state that should be maintained during processing. DoE also allows the identification of an optimal sequence of operational actions, such as when to inoculate, harvest, or induce/transfect a culture. The following example taken from an insect cell culture-virus vector system, illustrates how a decisive time-related parameter can be determined.45
The Spodoptera frugiperda insect cell line Sf-9 was transfected by a baculovirus vector carrying the gene for the human glucocorticoid receptor protein. A fed-batch cultivation protocol for a lab-scale reactor was applied when culturing the insect cells; the object of optimization was transfection time by the viral strain as well as the expression of the recombinant protein from the successfully transfected vector. The yield of the expressed receptor protein was optimized on the basis of four factors; the optical density value at infection, the multiplicity of infection, the temperature after transfection, and the time interval between transfection and harvest. The screening included a fractional factorial design with a total of 38 experiments at two and three levels (1, 2.0, and 2.8 cells/mL; 1, 5.5, and 10 pfu/cell; 36, 47, and 60 h intervals; and 25 and 27°C). Analysis by response surface modeling resulted in a twofold optimization of the yield (11.25 mg/L) of the glucocorticoid receptor. The temperature after the transfection and the time interval between transfection and harvest could both be optimized, while the multiplicity of infection was shown to have minor influence.
Other Bioprocess-Related Applications
DoE methods can also be applied in other applications of bioprocessing than the cultivation. The operation of recycling devices, such as centrifugation units or other cell separation units where process parameters can easily be identified, is one area which is suitable for screening and optimization. The operation of chromatographic column separation units and stem cell differentiation are other up- and downstream procedures that can be improved.
Optimization of mammalian cell culture separation
An acoustic cell retention device (BioSep ADI 1015 AppliSens, Applikon BV, Schidam, NL) was used to harvest a Chinese hamster ovary (CHO) cell culture producing a recombinant protein (Figure 7) as an alternative to a continuous centrifuge operation. The acoustic device separates the cell suspension by a combined action of gravimetric and oscillating forces. The use of this device allows continuous harvesting from a culture, as well as recycling or harvest of the retentate flow, whereas the filtrated supernatant can be advanced to a subsequent processing step.
The operational parameters affecting the separation yield were the flow rate in the recycling stream, the oscillation power exerted on the acoustic flow cell, and the density of the purged CHO cell suspension.46 The investigated factors included power consumption, harvest rate, and on- and off-time of the acoustic oscillation.
The response surface regression model used was based on a full factorial design with CCC. Experiments were carried out in the growth and production phases of the CHO cell culture and the separation efficiency in the harvest stream was measured at two or three factor levels. The factor analysis showed that all factors had significant influence and some also interacted. As could be expected, higher power and low harvest rate resulted in better separation and an interaction between on- and off-time was also revealed in the analysis. The regression analysis determined the optima for power to 6.6 W, on-time to 6000 s and off-time to 18 s at a constant low harvest rate. The model showed good robustness; R2 was 0.95 and Q2 0.72.
Downstream processing of expanded-bed separation
Expanded-bed adsorption (EBA) chromatography was applied to purify a crude homogenate from an E. coli culture producing recombinant human enzyme phophobilinogen deaminase (rhPBGD). The unclarified homogenate was diluted and applied on an EBA column (Streamline 25, GE Healthcare AB, Uppsala, Sweden), and a phenyl hydrophobic interaction chromatography gel with a salt gradient was used to purify the target protein (Figure 8).
Björnsson (2001) used fractional factorial design to set up screening and optimization experiments to investigate this system.47 The factor variables were the pH of the elution buffer, the salt concentration, and the volumetric flow rate through the expanded column (Figure 8). The response variables, the recovery, purity, and optical density of the effluent stream of the product protein rhPBGD, were analyzed with conventional methods. The factorial design model was based on CCC, with 17 experimental runs of the EBA column. The screening revealed that pH and salt were important factors, and these were analyzed further by response surface plot. The maximum product recovery was found at pH 7.6 and a salt concentration of 0.9 mol/L; the R2 was 0.87 and the Q2 was 0.71. The optimized EBA column resulted in a 25–30% increase in recovery of rhPBGD, in comparison with the alternative processing method of cross-flow filtration followed by packed-bed hydrophobic interaction chromatography.
Culture conditions for differentiation of embryonic stem cells
The development of new culture production systems for embryonic stem (ES) cells requires substantial work to find suitable production conditions. This is a particular concern in the pre-processing of time-consuming and labor-demanding in vitro differentiation procedures, where the complexities of factors that impact upon ES cell differentiation are profound. Thus, advanced factorial design can be a useful methodology for analyzing time-dependent factors during the differentiation process, as it has a good chance of being able to reveal unexpected interactions which would be missed by a conventional one-variable-at-a-time analysis.
Chang and Zandstra (2004) have developed and validated such a technology for ES cell differentiation analysis.48 They used a quantitative screening platform based on automated fluorescence microscopy, which enumerated the ES cells that had entered endodermal differentiation through expression of two biomarkers (cytokeratin-8 and hepatocyte nuclear factor 3β). Using a two-level fractional factorial design model based on 32 triplicate experiments, they screened important medium components for the differentiation process to endodermal cells (glucose, insulin, basic fibroblast growth factor, retinoic acid, and epidermal growth factor) with the biomarkers and cell numbers as responses. The model was further refined using a subsequent three-level factorial experiment for two of the factors. A statistical regression model was used to identify major and interactive effects on the endoderm formation. Retinoic acid was found to have an inhibitory effect on endoderm formation, while low glucose levels were beneficial. DoE proved to be a powerful tool for studying the factors impacting endoderm-specific ES cell differentiation; but it does require a relevant and sufficiently sensitive technique for the analysis of responses.
This need may be met to a substantial degree by new metabolomics and other post-genomics tools such as gene expression microarrays53, 54 and MALDI-TOF masspectrometry.55 With these tools key response variables can, if required, be provided using high-throughput protocols.56 Combination with powerful data mining methods would further enhance these possibilities.
This review has shown a collection of recent applications of DoE methods in bioprocess-related biotechnology (Tables 1–5). From these, it is evident that the focus, so far, has predominantly been on media optimization of antibiotics, enzyme, and recombinant protein production processes, although products such as ethanol, hydrogen, and primary metabolites are also amply dealt with.
In some cases, this has been combined with other process operation parameters such as temperature, pH, and time for induction of protein expression. The DoE methodology applied follows more or less the same basic approach, with fractional or full factorial design followed by response surface modeling to identify optima of product yield and productivity.
The majority of cases involving medium and state variable optimization is probably due to the fact that the theory and methodology of factorial design is relatively easy to grasp when factors and responses are well-structured and restricted to two well-defined states, input and output.
The surveyed applications here illustrate that the existing DoE methodology is not fully exploited. A more advanced use of factorial design and statistical analysis would be possible but requires a deeper insight into the flexibility of DoE, in combination with additional consideration and analysis of data. This would open up many more applications, and possibilities of fulfilling more intricate optimization needs. A few of the examples quoted in the present review point at this possibility; for example, recombinant protein induction, vector infection, harvest time, and embryonic stem cell differentiation, where, as shown in Figure 1, the factors are distributed at different places and times over the bioprocess analyzed. It would also be of value to extend the models to include more of the intracellular processes of the biological systems of the bioprocess.
In particular, in the light of the incentives expressed in the PAT initiative,4 the following would be of value:
to investigate the responses of diverse factors in a manufacturing process that are distributed over the process sequence;
to investigate how process control factors are interdependent with product quality attributes; and
to investigate in a more systematic manner the factors that affect intrinsic responses in the cells/product protein of the bioprocess, such as glycosylation pattern and other modifications. Here, data mining techniques may provide a useful resource of analytical evaluation methods.
As the PAT initiative emphasizes, the use of DoE to exploit new optimization possibilities in biotechnology can be a very useful resource for bioprocess development.
The authors wish to thank Peter Björnsson, Johan Larsson, Jenny Sävenhed and Ola Tuvesson for valuable contributions and Umetrics AB and Belach AB for providing access to materials used in the figures.