• Open Access

Seed-based expression systems for plant molecular farming

Authors


(fax 44 (0) 1582 461 366; e-mail: maurice.moloney@bbsrc.ac.uk)

Summary

The evolution of the seed system provides enormous adaptability to the gymnosperms and angiosperms, because of the properties of dormancy, nutrient storage and seedling vigour. Many of the unique properties of seeds can be exploited in molecular farming applications, particularly where it is desirable to produce large quantities of a recombinant protein. Seeds of transgenic plants have been widely used to generate a raw material for the extraction and isolation of proteins and polypeptides, which can be processed into valuable biopharmaceuticals. The factors that control high-level accumulation of recombinant proteins in seed are reviewed in the following paragraphs. These include promoters and enhancers, which regulate transcript abundance. However, it is shown that subcellular trafficking and targeting of the desired polypeptides or proteins play a crucial role in their accumulation at economically useful levels. Seeds have proven to be versatile hosts for recombinant proteins of all types, including peptides or short and long polypeptides as well as complex, noncontiguous proteins like antibodies and other immunoglobulins. The extraction and recovery of recombinant proteins from seeds is greatly assisted by their dormancy properties, because this allows for long-term stability of stored products including recombinant proteins and a decoupling of processing from the growth and harvest cycles. Furthermore, the low water content and relatively low bioload of seeds help greatly in designing cost-effective manufacturing processes for the desired active pharmaceutical ingredient. The development of cGMP processes based on seed-derived materials has only been attempted by a few groups to date, but we provide a review of the key issues and criteria based on interactions with Food and Drug Administration and European Medicines Agency. This article uses ‘case studies’ to highlight the utility of seeds as vehicles for pharmaceutical production including: insulin, human growth hormone, lysozyme and lactoferrin. These examples serve to illustrate the preclinical and, in one case, clinical information required to move these plant-derived molecules through the research phase and into the regulatory pathway en route to eventual approval.

Introduction

The appearance of the seed-system of reproduction is one of the most remarkable steps in plant evolution. This is because the seed provides its genotype with powerful selective advantages because of its flexibility in the face of a changing environment. Seeds provide a plant with the ability to germinate and grow under nutrient-limited conditions because of the availability of storage products. Seeds allow germination to take place under environmental conditions, which most favour the seedling’s survival. Furthermore, seeds contribute to numerous mechanisms, which aid dispersal of the species and the colonization of new ecological niches. Perhaps, one of the most startling examples of these properties is the case of the date palm (Phoenix dactylifera L.), which was excavated from Masada, close to the Dead Sea in Israel in 2005. This seed was radiocarbon dated to the first century Common Era, and yet it germinated to produce a viable sapling sexually compatible with date palms produced from modern cultivars (Sallon et al., 2008). This discovery along with others (Shen-Miller et al., 2002), illustrates the remarkable properties of seeds in protecting the next sporophyte generation and remaining nutritionally and functionally intact for long periods to permit germination under favourable circumstances.

Among the many reasons why seeds provide an evolutionary advantage to their genotype is the nature of their storage products. From its earliest origin of adapting megagametophyte tissues as protection and a nutritional supply for the sporophyte, seeds have diverged widely in their ability to store different nutrients. This diversity is reflected in the development of storage tissues and organs capable of long-term storage and maintenance of nutritional components adapted to the germination needs of the new sporophytic generation. Major classes of storage products include: proteins, lipids including oils and waxes, carbohydrates, phytate for phosphorus, antioxidants such as tocopherols and a vast array of secondary products such as anti-fungal terpenes (Jasicka-Misiaka et al., 2004), alkaloids (Xiao et al., 1999) and potent toxic antinutritionals such as the phorbol esters of Jatropha spp. and other members of the Euphorbiaceae (Goel et al., 2007).

This wide variety of storage products acts as a prolific source of nutrients for the seedling, as protection for the long-term stability of these nutrients and as a co-evolutionary modulator, sometimes warding off predation or other times attracting an animal that might assist in dispersal of the seed.

Seed dormancy is a very important property of many seeds. The ability of a seed to go into a quiescent state for long periods or until an appropriate environmental signal is detected provides a substantial survival advantage for many wild species. Plant breeders have tended to select for lower dormancy in cultivated species as a means of improving uniformity of germination under field conditions. However, many of our most important grains, legumes and oilseeds derive from species, which display strong dormancy characteristics. Mechanisms of seed dormancy vary widely and include physical factors such as physical exclusion of water from entering a seed, hormonal inhibitors of germination or a requirement for exposure to a physicochemical stimulus such as heat or acid before germination is activated. It is clear that throughout long periods of dormancy, major storage products remain substantially intact (Golovina et al., 1997).

In seeds, storage products may be deposited in a variety of tissues or organs such as the embryo proper, cotyledons or scutellum, endosperm and aleurone. From an evolutionary viewpoint, these different storage tissues are adapted to the survival and competitiveness of the species, by enhancing rate of germination, seedling vigour or provision of a sustained source of nutrients until the new sporophyte is fully established. However, these same storage ‘strategies’ may be exploited in molecular farming applications, where these different locations in the seed facilitate accumulation of the desired product or increase the efficiency of recovery or downstream purification.

Localization of storage products at the cellular level is also a major determinant of accumulation levels of the product as well as the potential stability of the product during prolonged periods of dormancy. So, for example, a recombinant protein targeted to the cytoplasm may accumulate at moderate levels, but never satisfactory levels, whereas the same protein in a configuration which targets the secretory pathway can accumulate at high levels suitable for economic production. Within the secretory pathway, it may sometimes be necessary to retain the recombinant protein in the endoplasmic reticulum (ER), because without the retention signal, the protein follows the ‘default’ pathway and may sometime undergo substantial proteolysis during seed development. Although this phenomenon is encountered frequently and is well known to researchers in this area, it is not as easily predicted for any given protein. For example, growth hormone, which is a secreted hormone in vertebrates, accumulates much better in plants when not subjected to the secretory pathway (Bosch et al., 1994). This was also true for the blood anticoagulant hirudin, which is also secreted in its natural state, but which accumulates in seeds much more reliably in the cytoplasm (Parmenter et al., 1995). Thus, as will be seen in the following paragraphs, significant attention must be paid to the site of accumulation of a desired protein at both the histological and cytological levels.

Seeds lend themselves readily to many applications in the area of molecular farming. A number of species have been investigated including Arabidopsis (Fedosov et al., 2003; Downing et al., 2006), barley (Schünmann et al., 2002), Brassica spp. (Parmenter et al., 1995; corn (Streatfield et al., 2003; Zhong et al., 2006), pea (Perrin et al., 2000), rice (Stöger et al., 2000; Huang, 2004), safflower (Szarka et al., 2006) and soya bean (Philip et al., 2001). Of these various species, only a subset have actually gone forward to produce a product that has entered clinical trials, but each has its own singular advantage. Cereals such as rice and barley are highly productive but are self-pollinating, thus reducing the risk of illegitimate gene flow. Corn, despite the potential issue of pollen movement, has very high yields of seed per hectare and requires lower acreage than many alternatives. The legumes have naturally high protein content in their seeds. The oilseeds lend themselves readily to specialized recovery technology in which the desired protein is attached to the oilbody, covalently or noncovalently (van Rooijen and Moloney, 1995; Seon et al., 2002). Seeds have been shown to be capable of accumulation of a wide range of proteins including the following: viral and bacterial antigens, antibodies, proteases and protease inhibitors, hormones and growth factors and a wide range of enzymes for industrial or medical applications.

Although seeds show a high degree of versatility for product type and raw material management, they are most competitive in applications that require large volumes of recombinant protein product per annum. They are less suited to certain applications such as the production of influenza vaccines, which change each year, because of the amount of time needed to produce sufficient quantities of seed for processing.

Use of seeds for recombinant protein production

Advantages and disadvantages of seeds

As described earlier, many of the natural properties of seeds lend themselves readily to recombinant protein production. It is a tacit assumption that if seeds can accumulate individual storage proteins and oleosins at levels of up to 10% of the total seed protein content, then it ought to be possible to exploit this property to obtain high levels of protein accumulation in seeds. Claims vary widely on the ability of seeds to perform in this way, but there are many cases where recombinant protein accumulation in seed has been shown to mimic the abilities of naturally occurring proteins. So for example, in rice, the use of storage protein promoters to drive expression of genes for mammalian proteins such as human lysozyme has resulted in an average expression level of 13%–14% of total soluble protein (Huang, 2004), which is well within an economic threshold for the target protein. Antibodies and scFv proteins accumulate in a variety of seed production hosts at levels of 1%–5% of total seed protein (Fiedler et al., 1997; Van Droogenbroeck et al., 2007). It is noteworthy that expression levels in Arabidopsis, which is often used as a model system for recombinant protein expression, may result in very high levels of accumulation (30%–40%) of the total cellular protein (De Jaeger et al., 2002). In our experience, these unusually high levels are typically found in Arabidopsis with high copy-number insertions. The high copy number of inserts in the Arabidopsis genome has a high probability of residing in transcribed regions, whereas in many crop plants with larger genomes, there are large regions of untranscribed DNA, which may lessen the effect of the gene dosage. However, if this hypothesis is correct, it still suggests the possibility that seeds in general can tolerate large perturbations in their seed protein without any major change in germination rates or seed viability. Indeed, Scheller et al. (2006) achieved levels of 25% of total seed protein in tobacco seed, by the fusion of an scFv with an elastin-like peptide repeat.

It has been widely demonstrated that seeds expressing recombinant proteins are capable of storing them stably just as they maintain storage proteins. This does not appear to require specialized subcellular targeting. Thus, a cytoplasmically accumulated product, hirudin, was stable in dry canola seed for over 3 years (Boothe et al., 1997). However, apoplastic recombinant proteins such as phytase and scFvs were also stable for more than a year at room temperature (Pen et al., 1993; Ramírez et al., 2007). This degree of stability provides seed systems with several major advantages for industrial-scale production of a recombinant protein. For some proteins such as vaccines, it could circumvent some difficult cold-train requirements (Nochi et al., 2007). The stability of the recombinant protein and the dormancy of the seed also allows for a complete decoupling of the cycle of cultivation from the processing and purification of the protein. In fact, the seed becomes a raw material as part of the process. For cGMP manufacturing, this is extremely helpful as it permits the establishment of such concepts as a ‘master seed bank’ and also allows for the establishment of quality-based release criteria for the seeds as a precursor. This property makes the whole question of inventory management, batch processing and logistics of manufacturing much simpler than with any living cell system, which normally requires processing immediately on harvesting. This is further discussed in the following paragraphs.

The use of seeds for the recovery of a recombinant protein is advantageous in a number of other ways as the scale of production increases. First, the intrinsic bioburden on seeds is much lower than in vegetative structures such as leaves. Although neither seed nor leaf extraction can be performed at a large scale under aseptic conditions, most seeds can be subjected to a surface ‘sterilization’ technique, which reduces bioload to industrially accepted standards for a raw material. This is difficult if not impossible to do with a large mass of leaves without damaging the mesophyll cells, which carry the bulk of the recombinant product. Most seeds also contain less that 10% water, whereas leaves contain >90% water in most cases. This means that the relative quantities of biomass are much larger in leaf-based expression. Finally, the content of protein in most leaves is <5% of wet weight, whereas the percentage protein in seeds is from 10% to 40%. Clearly, from a large-scale processing viewpoint, seeds are a more concentrated starting material, with a much lower titre of proteases, which can also affect recovery if processing and purification does not take place rapidly.

Seed-based expression systems do harbour a number of disadvantages, the most important of which is speed to proof-of-concept, or in the case of recombinant proteins required in relatively smaller volumes, the time-to-product may be preclusive. So, for example, it would be difficult to adapt a seed system to annual production of a new vaccine for a constantly changing influenza virus. Similarly, rapid, leaf-expression systems are being used to produce an scFv derived from the idiotype Ig from the B-cell lymphoma (non-Hodgkin’s lymphoma). These are then reinjected into patients to mount a specific immune response against the tumour (McCormick et al., 2008). This process could not be cycled with the necessary speed using a seed-based system and, in fact, requires viral-based, transient leaf expression to produce the idiotypic scFv in a timely manner for a successful therapy.

Other nonseed systems have also been employed recently very successfully for production of glucocerebrosidase for Gaucher’s disease (Shaaltiel et al., 2007; Aviezer et al., 2009). This enzyme was produced in transgenic carrot cells in culture. In this case, both the scale required (<10 kg/annum) and the current cost of production (>$100 000/gram) suggest that a competing plant cell culture method will be an economical alternative to the current product from CHO cells. These economics would not however apply to a product normally required in thousands of kilograms per annum, such as insulin.

Gene expression in seeds

To optimize the economics of recombinant protein production in plants, it is essential to maximize gene expression. In the narrowest sense, this implies high levels of transcription at the appropriate time and a high level of steady-state mRNA for the desired gene product. In the case of seeds, there is a number of promoters with high specificity and strong transcriptional activity, because seeds generally express a subset of genes specifying storage products at a high level. It is therefore advisable to use a seed-specific promoter for maximizing expression. Constitutive promoters such as the tandem 35S promoter from cauliflower mosaic virus tend to show only moderate expression in dicot seeds (Perrin et al., 2000) and low expression in monocot seeds (Stoger et al., 2002). They also suffer from the disadvantage that the product might also accumulate in many other tissues and organs of the plant. This is neither advisable for extraction and purification purposes as seed processing is very different from the processing of vegetative tissue. More importantly, expression of the bioactive product in vegetative tissues and organs at large scale could be problematic from a regulatory point of view because most field-browsing by domestic or wild animals on cultivated plants takes place on vegetative structures such as leaves.

Seed-specific and seed-restrictive promoters have been variously exploited for their abilities to drive gene expression for recombinant proteins in seeds. Unfortunately, most comparisons are made between different promoters driving different transgenes in different hosts (Stoger et al., 2002, 2005), and therefore they are difficult to interpret quantitatively. Furthermore, many studies are based on relatively few independent transgenic events, and this also confounds much of the work, because with large genomes and single-copy inserts, many insertions show marginal expression even with promoters purported to be highly transcriptionally active.

Among the dicotyledonous promoters, those derived from legumes appear to offer the greatest potential so far. Interestingly, the first-reported seed-specific promoter from the phaseolin gene (Sengupta-Gopalan et al., 1985) is still one of the strongest in transgenic plants. In a side-by-side comparison of seed-specific gene expression in Arabidopsis, we have shown that the phaseolin promoter performs extremely well by comparison with the arcelin and unknown seed protein (USP) promoters when driving the same transgene (maize oleosin). In this case, however, the linin promoter from flax and the Arabidopsis oleosin promoter both supported high levels of expression (Figure 1). Comparison of the phaseolin promoter with the arcelin, USP and cruciferin promoters in Arabidopsis driving expression of a different protein (chymosin) showed a similar trend with phaseolin and cruciferin performing better than arcelin and USP promoters (Figure 2). These comparisons all involved average expression levels using typically 12 independent transgenic events per promoter.

Figure 1.

 Comparative expression of a recombinant maize oleosin gene sequence under the transcriptional control of six different seed-specific promoters in Arabidopsis plants. Expression levels were measured by scanning densitometry of Coomassie blue-stained gels with a minimum of 10 transgenic events per promoter. Error bars are shown as standard deviations.

Figure 2.

 Comparative expression of a recombinant bovine chymosin gene sequence under the transcriptional control of four different seed-specific promoters in Arabidopsis plants. Expression levels were measured by the enzymatic activity of seed-produced chymosin (SPC) in a milk-clotting assay. Error bars are shown as standard deviations.

Among monocotyledonous plants, rice and corn have received the most attention as crops suitable for molecular farming. There is a series of promoters that have been tried with varying success for monocot seed-based expression. The constitutive ubiquitin promoter has been used because it drives expression in both endosperm and the scutellum (Stoger et al., 2002). Unfortunately, the relative level of expression compared to strong, seed-specific promoters is relatively low (Torres et al., 1999; Stoger et al., 2002) despite expression throughout the seed tissues. In contrast, a rice globulin-1 promoter was used successfully for the expression of human lysozyme in rice endosperm (Yang et al., 2003). A globulin-1 promoter from corn with embryo expression preference was also used to express substantial levels of a fungal laccase in corn (Hood et al., 2003).

For albuminous seeds such as corn and rice, it might be preferable to use a construct that contains two separate genes within the expression cassette, one using an endospermic promoter to drive expression of the chosen transgene and a second using an embryo-preferred promoter driving the same coding sequence. Such a double cassette might resolve expression limitations in certain endospermic seeds, where the potential host (for reasons of ease of transformation) is limited to a small number of genotypes and inbreds. Most importantly, Hood et al. (2003) also demonstrated that using an optimized construct could still be further enhanced (as much as 20-fold) by a combination of selection and conventional breeding. We have also found that individual transformation events can be elevated to a commercially useful threshold by more conventional selection techniques, especially where multicopy inserts are involved (Zaplachinski S., personal communication).

Protein accumulation and subcellular targeting

Recombinant protein accumulation in seeds is very sensitive to subcellular compartmentation. This is a subject that has not been studied in systematic way to date; but for certain proteins, it is very clear that subcellular targeting can have a major effect on accumulation levels. In the most general sense, the first question that is normally asked is whether cytoplasmic accumulation or secretory pathway accumulation is better. This varies, of course by protein, but the rules are not intuitively predictable. For example, Parmenter et al. (1995) expressed a hirudin protein from medicinal leeches in seeds. This protein accumulated well in the cytoplasm as an oleosin fusion, but did not accumulate well in the secretory pathway, even using the same promoter. Similarly, growth hormones accumulate well on oilbodies, but accumulate at much lower levels in the secretory pathway (Bosch et al., 1994). This is intriguing, because both of these proteins are normally secreted proteins with obligatory disulphide bridges, and yet they perform much better (including the formation of S–S bonds) in the cytoplasm. Conversely, another secreted mammalian protein, chymosin, accumulates at much higher levels in seeds as a secreted protein than an oilbody-associated protein. Thus, it is hard to make predictions with many proteins as to which subcellular compartments will favour accumulation of a given protein. Antibodies are obvious exceptions in that they require the accumulation of heavy and light chains in the secretory pathway in order for assembly to occur in plant seeds. It also appears that scFv’s, which also require disulphide bridging require expression in the secretory pathway (Conrad and Fiedler, 1998). The question of plant cell glycosylation in the plant secretory pathway will be addressed in another article in this issue, and so it will not be discussed here.

Accumulation of both mAbs and scFv’s was greatly enhanced in tobacco seeds using ER retention signals (KDEL). A critical factor that is not fully understood yet is the actual subcellular localization of some proteins in seeds tagged with a KDEL ER-retention signal. Petrucelli et al. (2006) have shown that a KDEL-tagged antibody is retained in the ER in leaves, but not in seeds where it is partially sorted into protein storage vacuoles. This is not an isolated phenomenon in seeds and has been reported in monocots and dicots (Nicholson et al., 2005; Vitale and Pedrazzini, 2005). This unexpected targeting in seeds is not fully understood, but it is clearly related to the existence of the competing pathway to protein storage vacuoles. A more detailed discussion of the potential mechanisms for this in seed is provided by Robinson et al. (2005).

Unusual forms of subcellular targeting in seeds have been helpful for the production of some recombinant proteins. Thus, Torrent et al. (2009a,b) recently showed several examples of the production of ER-derived protein bodies with recombinant proteins using the proline-rich N-terminal domain derived from the maize storage protein γ zein. This peptide extension causes the deposition of the desired protein into ER-derived protein bodies that can be recovered by centrifugation. Although this approach was reported in this article for leaf cells, the same principle works in seeds where the γ zein normally originates (Torrent et al., 1997). The accumulation of epidermal growth factor and of growth hormone was enhanced by one or two orders of magnitude depending on the protein, using this approach.

The method that has been pioneered by the authors has been to target proteins to the oilbodies of seeds and then recover the proteins of interest using flotation centrifugation. This approach can be used for cytoplasmic proteins through covalent binding to an oleosin protein (Parmenter et al., 1995; Nykiforuk et al., 2006) or noncovalently by attachment of the desired protein to the oilbody via an affinity ligand. (Seon et al., 2002). The latter is particularly valuable for the recovery of antibodies or other proteins accumulating in the secretory pathway. Covalent binding to oilbodies through oleosins frequently increases the recombinant protein titre in the seed. More importantly, it converts a simple seed-extraction process into a powerful purification step, because the centrifugation that would be used in any seed-based process, enables the capture and enrichment of the desired protein in liquid phase. The separation early in the process of the desired protein from host-related contaminants significantly reduces downstream processing costs, where much of the cost-of-goods is normally incurred. This process is being used to purify insulin, apolipoprotein AI Milano, somatotropins and antibodies. By fusing a desired protein to an scFv against oleosin, it is possible to capture many diverse proteins noncovalently onto the oilbody (Seon et al., 2002).

Case studies in seed-based production of recombinant proteins

A wide array of different protein types have been reported to express and accumulate in seeds of different host plants. The class of protein most investigated is that of monoclonal antibodies, particularly IgG’s. These have been reviewed in some depth recently, and this work will not be recapitulated here (Stoger et al., 2005). Instead, we shall focus on some hitherto unpublished or unreviewed examples, which are relatively close to being suitable for clinical trials or even commercialization. The examples we shall discuss here are as follows: human growth hormone, human lysozyme and lactoferrin and human insulin. These cases are chosen because each of them exemplifies some of the challenges that must be overcome in order for the target protein to qualify as a true product candidate rather than a research model. To advance a program to commercial potential, most of the issues raised earlier in this article must be addressed. Expression level is clearly one of the most significant technical and economic parameters. Accumulation level, may or may not, reflect ‘expression level’ depending on the lability and turnover of the target protein in different cellular compartments. The next critical variable that must be evaluated is the correct folding of the desired protein. Strictly, it is often possible to refold a protein in vitro, but this generally adds cost and complexity to downstream processing. In consequence, a construct design that incorporates all of these parameters is a prerequisite to the preparation of a protein with economic potential from plant seeds. The examples in the following paragraphs report work, in some cases involving iterative construct improvements, aimed at meeting the economic and quality criteria required to proceed towards a commercial product.

Expression and characterization of seed-derived growth hormone

Human growth hormone is a $2.3 billion drug and represents currently the sixth largest market size for a protein pharmaceutical. Its major uses are to treat genetic dwarfism in children and cachexia in patients with cancer and patients with AIDS (Hintz, 1996). Human growth hormone has been the subject of significant interest as a follow-on biologic by a number of companies and has been approved as such in both Europe and North America. There is a strong interest in broadening the indications of use and also applying alternative delivery technology to its administration. Consequently, there is substantial interest in alternative manufacturing systems, which might offer economies as the scale of production is increased.

We investigated the expression of human Growth Hormone (hGH) in oilseeds using the oilbody targeting approach. Recovery of the fusion protein on the oilbody resulted in a high degree of enrichment for the hGH. We estimate that this liquid–liquid separation removes more than 90% of the host cell proteins, although at this point in the process, all of the oleosins are still part of the oilbody particle that is recovered (Figure 3). Estimates of expression levels for oleosin fusions using averages of multiple events and construct configurations were in the range of 0.44%–1.58% of total seed protein using just the mole fraction of hGH within the fusion protein, with a composite average of >1% of total seed protein. Interestingly, hGH was a protein that showed significant increases in accumulation when associated with oilbodies rather than exposing it to the secretory pathway. When measured by quantitative immunoblotting using an hGH standard curve, apoplastic accumulation was found to be about 0.28% of total seed protein.

Figure 3.

 Expression in transgenic seed and oilbody partitioning of oleosin-hGH fusion protein from construct 4253. (a) Configuration of fusion protein construct 4253 containing the Arabidopsis 18 kD oleosin fused to the N-terminus of human growth hormone (hGH). Predicted Mr of fusion approx 40 kDa runs slightly faster on SDS PAGE. (b) Coomassie stained SDS gels for total seed protein (left) and oilbody-associated protein (right). NT: nontransformed seed extracts. Arrow indicates the position of the fusion protein. A bracket indicates the position of the native oleosin proteins.

The fusion protein construct was designed with a trypsin-sensitive cleavage site between the oleosin and hGH proteins to enable recovery of the hGH after oilbody separation. Treatment of the purified oilbodies with trypsin followed by oilbody separation resulted in recovery of three products, a primary band with a mobility similar to that of native hGH and two lower molecular weight peptides (Figure 4). Analysis of the primary band by mass spectrometry and N-terminal sequencing confirmed recovery of the expected product, corresponding to a full length hGH with an additional two amino acids on the N-terminus remaining from the cleavage site (Table 1). Analysis of the lower molecular weight products revealed that the intermediate band had the same N-terminal sequence. Data for the lower molecular weight product were inconclusive.

Figure 4.

 Cleavage of olesoin-hGH fusion protein and recovery of hGH from transgenic seed. Coomassie-stained SDS gels showing proteins from purified oilbodies (OB) prior to cleavage, reaction products obtained after digestion with trypsin (CR) and the ‘undernatant’ fraction after separation of oilbodies (UF), compared with an hGH standard (hGH).

Table 1.   Mass spectrometry and N-terminal sequence analysis of cleavage products from an oleosin-hGH fusion protein expressed in seeds of Arabidopsis
Analysis   
Mass spectrometry (Da)SampleExpectedObserved
  1. n.d., not determined.

 hGH22 125n.d.
AthGH (primary band)22 27322 275
N-terminal sequencing
 hGHFPTIP
AthGH (primary band)GSFPT
AthGH (intermediate band)GSFPT
AthGH (lower band)(inconclusive)

Following partial purification by hydrophobic interaction chromatography, the products were further examined by reducing and nonreducing SDS–PAGE. As shown in Figure 5, the primary band retained a similar mobility to native hGH, while the lower molecular weight bands converted to more slower migrating species suggesting that they may be internally cleaved, less compact forms of the protein that remain associated through disulphide bonds.

Figure 5.

 Characterization of Arabidopsis-derived hGH cleavage products under reducing and nonreducing conditions. Coomassie stained SDS gels of purified hGH from transgenic Arabidopsis seed (AthGH) separated with (100 mM DTT) or without reduction of disulphide bonds.

Finally, the functionality of the plant-derived, purified hGH was tested in a standard mouse model. The mice used were Ghrhrlit/Ghrhrlit mice (trivial name = ‘little mice’), which are doubly recessive for a growth hormone releasing hormone receptor gene. The mice undergo classical dwarf growth habits and do not grow much beyond puberty. However, supplementation with a mammalian growth hormone can revive growth to wild-type rates (Bellini and Bartolini, 1993; Bellini et al., 1998). It is thus a very good quantitative, preclinical model for dwarfism. In this study, it was shown that administration of intraperitoneally injected hGH from plants provided at least as strong a response as Eli Lilly’s Humatrope®, which is the ‘standard of care’ for genetic dwarf growth habits caused by growth hormone deficiency in children (Figure 6). Overall, these results demonstrate the utility of the oilbody purification process in enabling recovery of the hGH in highly enriched form and the efficacy of the product in vivo. With additional development of the cleavage strategy to obtain a fully authentic protein, seed-based systems could produce a bioequivalent form of hGH that is competitive with existing commercial products.

Figure 6.

 Bioassay of Arabidopsis-derived hGH. Dwarf mice (Ghrhrlit/Ghrhrlit) (10 animals/group) were injected daily for 12 days with purified hGH (25 μg/animal) from either transgenic Arabidopsis seed (AthGH) or a commercial phamaceutical-grade product, Humatrope® (PhGH). hGH concentration for both forms was determined using a commercial immunofunctional assay. Cumulative weight gain in hGH-treated groups was compared with that of control groups either without injection (unmanipulated) or injected with sterile water. Error bars show the standard error on the mean.

Production of human lysozyme and lactoferrin in rice seeds

Lactoferrin is an iron-binding glycoprotein that belongs to the transferrin family. It is a globular multifunctional protein with a number of possible physiological roles. It is considered an innate defence protein and frequently serves as the first line of defence in protection against pathogens, particularly due to its antimicrobial (bacteriocide and fungicide) and immunomodulatory activities. Lactoferrin is found in high levels in the milk of humans and other mammals and in many mucosal secretions such as tears, saliva, bile, pancreatic juice, genital and nasal secretions and in circulating neutrophils. Lysozyme is also a natural secreted protein found in most mucosal secretions. It is a powerful hydrolytic enzyme capable of catalyzing hydrolysis of 1,4-beta-linkages between N-acetylmuramic acid and N-acetyl-d-glucosamine residues in a peptidoglycan and between N-acetyl-d-glucosamine residues in chitodextrins. It is active against many Gram-negative and Gram-positive bacteria. Children fed with infant formula, without supplementation, lack lysozyme in their diet and have three times the rate of diarrhoeal disease. For both of these proteins, an inexpensive, nonanimal source would enable their exploitation in a number of oral therapies including some of major importance in developing countries where infant mortality caused by enteric bacterial diseases (with subsequent dehydration) is endemic.

Rice offers some interesting advantages for recombinant lysozyme and lactoferrin production. It can yield as much as 8 tonnes of seed per hectare in optimized growing conditions. Rice is essentially self-pollinating, and so its mandatory isolation distances are quite small. As with other seeds discussed earlier, it is possible to store recombinant proteins in rice seeds over years, provided the water content is reduced to below 14%. However, rice is not a protein-rich seed. Typically, a rice seed has an average of 8% protein or even less and so, if a new protein is to accumulate at high levels, it will require strong promoters and stabilization of the protein once it is translated and processed in the cell.

Researchers at Ventria Biosciences have investigated several different promoters to express and accumulate lysozyme and lactoferrin in rice seeds (Huang, 2004). Their studies have shown that rice can be a very appropriate host for the expression of these proteins. As briefly mentioned earlier, the rice glutelin 1 promoter proved to be a very good transcriptional driver of high expression allowing certain lines to accumulate the recombinant protein at up to 40% of total seed protein. Similarly, expression of human lactoferrin, an essential iron-binding protein, was as high as 25% of total seed protein. This would mean that even on an average yield, you might expect 1% of the total seed weight to be the desired protein. Before purification, this translates into a very positive yield of protein per hectare. Cost models built to determine overall cost of goods suggests that lactoferrin could be produced in purified form from seed for as little as $6/gram (Nandi et al., 2005).

It is clear that further work on transcriptional control is also likely to increase expression titres. Thus, Hennegan et al. (2005) discovered that the creation of hybrid promoters combining elements from wheat puroindoline b gene promoters and the archetypal glutelin–1 promoter from rice gave almost a doubling of accumulation levels of the target protein, lysozyme from 5.24 to 9.24 g/kg of rice flour extracted. It is noteworthy that both these products target oral delivery markets, and therefore it is unlikely that plant glycosylation patterns will impede progress of the products to full commercialization.

Expression, purification and characterization of seed-derived insulin

Human insulin is currently the largest volume recombinant biopharmaceutical in use today with annual production in the range of thirteen tons of protein. Because of the exploding incidence of diabetes in both the developed and developing worlds, the demand for insulin is projected to increase sharply, potentially doubling over the next 10 years. This is particularly troubling in the developing world where the cost of supplying insulin is so prohibitive that many diabetics do not have access to this life-saving drug. The advent of new delivery technologies such as inhalable, buccal and oral forms that relieve the need for painful injections and may in some cases offer improved pharmacokinetics, also however, drive up the costs of treatment. This is because these methods generally have lower bioavailability than subcutaneous injection and therefore require higher doses of insulin to achieve the same effect. As a result of these challenges in supply and cost, insulin is a natural target for the advantages of plant-based production.

We have demonstrated the feasibility of insulin production in seeds using our model species Arabidopsis thaliana (Markley et al., 2006; Nykiforuk et al., 2006) and have tested a variety of different construct configurations (Figure 7) for their effects on expression. Our initial expression constructs were designed to produce a ‘mini-insulin’ fusion protein similar to that used in yeast systems (Kjeldsen et al., 2001) in which amino acids 1–29 of the insulin B chain are connected to the A chain via a short three-amino acid peptide. The construct also included an N-terminal fusion partner to provide for the in vivo targeting or postextraction capture of the fusion protein on seed oilbodies thereby simplifying and reducing the cost of downstream purification (Van Rooijen and Moloney, 1995). We have previously found that the location of intracellular targeting for recombinant proteins can significantly influence their levels of accumulation. The preferred location can sometimes be predicted from targeting of the native protein. For example, mammalian proteins targeted through the endomembrane system in vivo often accumulate best when similarly targeted in plant cells. However, this relationship is not strictly observed (as in the case for hGH described earlier), and we have found it useful when developing a new candidate to test multiple targeting options in our model system. For this reason, we examined insulin proteins targeted through N-terminal fusions to either the oilbody or the secretory pathway. The oilbody targeted construct (4405) was fused with an Arabidopsis oleosin such that the insulin moiety accumulated on the cytoplasmic face of the organelle. The construct directed to the secretory pathway (4404) was fused with an anti-oleosin single chain antibody (scFv) containing a plant signal peptide. This fusion also contained a C-terminal KDEL sequence for ER retention to enhance accumulation level (Fiedler et al., 1997; Twyman et al., 2005). As shown in Figure 8, the insulin accumulated to significant levels in both of these configurations with averages of approximately 0.13% and 0.24% of total seed protein (calculated from the mole fraction of insulin in the fusion protein) for oilbody and ER versions, respectively. As expected, both configurations also showed partitioning with the oilbody fraction during centrifugal separation. In addition to providing for oilbody capture, we also found that the scFv fusion partner of the ER-retained version greatly enhanced the level of accumulation over versions that did not contain a fusion partner (not shown) probably through stabilizing the protein against degradation.

Figure 7.

 Configuration of insulin expression constructs tested in Arabidopsis. Construct 4405 was targeted to oilbodies through an N-terminal protein fusion (FP) with the Arabidopsis 18 kD oleosin and contained a truncated insulin B chain lacking the B30T attached to the insulin A chain via a mini-connecting peptide (mC, 3–8 amino acids depending on the construct). Constructs 4404, 4445 and 4501 were targeted for retention in the ER through an N-terminal signal peptide (not shown) fused to an anti-oleosin single chain antibody (scFvD9 or scFv2F5) and a C-terminal ER retention peptide (ERP) on the insulin A chain. These constructs contained either a truncated (4404) or full-length (4445, 4501) insulin B chain attached to the A chain via a mini-connecting peptide or full-length human C-peptide (C) flanked by monobasic amino acid cleavage sites.

Figure 8.

 Expression in transgenic seed and oilbody partitioning of mini-insulin fusion protein. Coomassie-stained SDS gels for total seed protein (left) and oilbody-associated protein (right) from three transgenic lines of (a) oilbody-targeted oleosin fusion construct 4405 (4, 13, 19) and (b) ER-retained scFv fusion construct 4404 (2, 17, 20). See text for construct descriptions. NT: nontransformed seed extracts. An arrow indicates the position of the fusion protein and a bracket indicates the positions of the native oleosin proteins.

In addition to examining the effects of fusion partner and intracellular targeting on expression, variations on connecting peptide sequences and the mini-insulin itself were also tested. These comprised modifications to the cleavable sequences connecting the fusion partner to mini-insulin and the C-terminal peptide containing the ER retention sequence, changing the length and composition of the peptide connecting the B and A chains (including use of the native human C-peptide) and addition of the B30 threonine (B30T) residue found on the B chain of the native protein. Of these changes, the most dramatic effects were obtained for constructs incorporating the B30T (Figure 9). It was observed that plants expressing ER-retained versions of these constructs (e.g. 4445) exhibited substantial increases in insulin accumulation over versions without this modification (e.g. 4404) with levels in many cases exceeding 1% of total seed protein. However, surprisingly it was found that the protein was deposited in the form of an aggregate within the cell. Unlike configurations without the B30T, the fusion protein from these plants did not partition with oilbodies but instead formed a pellet upon centrifugation somewhat similar to the behaviour of inclusion bodies often seen with bacterial systems. When comparing reducing and nonreducing SDS–PAGE, the fusion protein was determined to be highly cross-linked via intermolecular disulphide bonds as seen by the disappearance of the monomeric band on stained gels and appearance of higher molecular weight bands on immunoblots under nonreducing conditions (Figure 9). The effect was even more pronounced in safflower which with the previous configurations had shown much lower relative levels of expression than Arabidopsis (not shown). With introduction of the B30T constructs, the levels of accumulation and partitioning behaviour in both species were quite similar. While these results clearly indicate a change in the folding properties of the protein, the reason that addition of the B30T (present in the native sequence) should have this effect is not obvious and is currently under investigation.

Figure 9.

 Comparison of expression and aggregation of ER-retained scFv-mini-insulin fusion proteins with and without the C-terminal threonine residue on the insulin B chain (B30T). Coomassie-stained SDS gels (left) and anti-insulin immunoblots (right) of total seed protein from several transgenic lines of construct 4445 (with B30T) compared to that of construct 4404 (without B30T) with protein separated under (a) reducing conditions (100 mm DTT in sample) or (b) nonreducing conditions (without DTT). See text for construct descriptions. NT: nontransformed seed extracts. Arrow(s) indicates the position of the fusion protein and apparent aggregates.

In addition to purification from host cell proteins and metabolites, the manufacture of recombinant human insulin requires postextraction enzymatic processing to convert the single chain insulin fusion protein into the mature two-chain product (Figure 10a). Commercial manufacture of recombinant insulin is currently performed in one of two microbial systems, bacterial (Escherichia coli) or yeast (S. cerevisiae). In the bacterial system, the native human sequence, including the C-peptide linking B and A chains, is fused at the N-terminus to a tryptophan synthetase protein (Chance and Frank, 1993). The protein accumulates as an inclusion body that is recovered, folded in vitro and processed to remove the C-peptide. The yeast system follows a different processing scheme, wherein the insulin is produced as a mini-insulin as described earlier, fused at the N-terminus to a secretory leader peptide (Kjeldsen et al., 2001). The insulin fusion folds in vivo and is secreted from the cell. It is then processed in vitro passing through a DesB30 insulin intermediate that is coupled with threonine in a reverse hydrolysis reaction to yield an authentic product. Using similar types of processes, we have demonstrated that either configuration of plant-produced insulin (with and without B30T) can be processed to yield a fully functional product. The processing schemes for the two configurations (Figure 10b) differ in that the B30T version does not initially partition with seed oilbodies and requires in vitro folding. This however must be balanced against the higher expression levels achieved with this configuration. Both versions currently pass through the DesB30 insulin intermediate stage but preliminary experiments have indicated that with some refinement to the cleavage sites around the C-peptide junctions, this is not necessary for the B30T version. This is supported by data from the E. coli process showing that generation of DesB30 insulin can be largely prevented (Frank et al., 1995). Further process development is required to determine which of the two processing options is most economical for plant-produced insulin.

Figure 10.

 Processing and purification of human insulin. (a) Comparison of the processing steps for the maturation of insulin that occur in human beta cells and different recombinant production systems, E. coli, yeast and seeds (OB: oilbody-targeted, ER: ER-retained). The diagram shows the various elements associated with each system: signal peptide (SP), N-terminal fusion partner (FP), insulin B chain (B), insulin C or connecting peptide (C), insulin A chain (A) and ER retention peptide (ERP). Solid lines above the diagram show insulin intra- and interchain disulphide bridges. Arrows indicate enzymatic cleavage sites between the elements and a dotted line indicates the B29 position where cleavage occurs in insulin precursors from yeast and seeds. Below the diagram, the enzymes (or chemical) involved in processing for each system are indicated. In all systems where it is present (beta cells, yeast and seed-ER), the signal peptide is removed intracellularly by signal peptidase (S). For recombinant systems, all subsequent processing is performed postextraction. In beta cells, the proinsulin is processed via the action of prohormone convertase 1 (also known as prohormone convertase 3, PC1/3) and prohormone convertase 2 (PC2), together with carboxypeptidase H (CPH). In E. coli, the fusion partner is removed by chemical cleavage with cyanogen bromide (CNBr) with subsequent processing by trypsin (T) and carboxypeptidase B (CPB). Conversion in yeast can be performed with either trypsin or endoproteinase Lys-C (L) to remove the fusion partner and connecting peptide, with the same enzyme(s) also used to add on the B30T residue. The process in seed is similar to that in yeast with the additional use of endoproteinase Lys-C and carboxypeptidase B to complete removal of the ER-retention peptide. (b) Schematic flow chart depicting the primary steps in the production of human insulin from Arabidopsis for fusion proteins with (left) and without (right) B30T.

Using standard physical-chemical methods, we have characterized the insulin produced from both of the processes outlined in Figure 10b and demonstrated their functionality in biological assays. Mini-insulin versions without B30T fused to either oleosin (4405) or an anti-oleosin scFv (4404) were separated with the oilbody fraction by flotation centrifugation, cleaved in vitro with trypsin and partially purified by high performance liquid chromatography (HPLC). The resulting products were then analysed by mass spectrometry (Table 2). The mass of the product obtained from the oleosin fusion protein corresponded with the expected mass for DesB30 human insulin (later confirmed with an actual DesB30 insulin standard). The product obtained from the scFv fusion corresponded to the predicted mass of a DesB30 insulin with an additional KDEL indicating that processing was not able to remove the ER retention peptide. Through modifying the sequence around the cleavage site, we were subsequently able to achieve removal of this peptide in later construct variants (e.g. 4501, Table 2). Because DesB30 insulin is functional (Moody et al., 1974), it was also possible to test the plant-derived insulin for biological activity. The product from the oleosin fusion construct was tested for activity both in vitro and in vivo. For the in vitro assay, HepG2 (liver) cells were incubated with insulin and activation of the insulin receptor determined through probing immunoblots with an antiphosphotyrosine antibody. Comparable activation was obtained with both plant-derived and standard human insulins (Figure 11a). Pharmacodynamic response was measured using an insulin tolerance test in mice in which the effect on blood glucose level is measured over time. As shown in Figure 11b, similar profiles were obtained when animals were injected with either plant-derived (AthIns), reagent-grade (RhIns) or pharmaceutical-grade (PhIns) insulins.

Table 2.   Mass spectrometry analysis of cleavage products from insulin fusion proteins expressed in seeds of Arabidopsis
 Mass
Construct*ProductExpectedObserved
  1. *See text for details.

-Human insulin5807.65807.8
-Human insulinDesB305706.55705.6
4405Human insulinDesB305706.55706.3
4404Human insulinDesB30-KDEL6192.06191.5
4501Human insulinDesB305706.55706.3
Figure 11.

 Biological characterization of insulin produced from oleosin-mini-insulin fusion protein construct 4405. (a) Insulin receptor activation comparing responses obtained with Arabidopsis-derived DesB30 insulin (4405) and a reagent-grade insulin standard (RhIns). Receptor activation was determined using antiphosphotyrosine antibodies to probe a Western blot of protein extracted from HepG2 cells stimulated with insulin at different concentrations (nM). NT: extract from nontransgenic Arabidopsis seeds used at equivalent volume to the 10 nm extract from transgenic seed. (b) Pharmacodynamic response in mice. Animals (n = 15) were injected intraperitoneally (1 U/kg body weight) in successive tests with each of several different insulins: pharmaceutical-grade insulin, Humulin®R (PhIns), reagent-grade insulin (RhIns) and Arabidopsis-derived DesB30 insulin from 4405 (AthIns) or control samples (NT: nontransgenic seed extract; saline) and change in blood glucose measured over time. Error bars show the standard error on the mean.

Insulin expressed with the B30T and native C-peptide (4501) containing an N-terminal scFv fusion and ER retention peptide were recovered from the pellet fraction following centrifugation of seed extracts, folded in vitro and processed to yield a DesB30 insulin product. As the sequence for this construct included the modifications to the ER retention peptide described earlier, we were able to remove the peptide as part of the processing resulting in a product with a mass identical to that of DesB30 insulin (Table 2). This product was also analyzed using a diagnostic peptide digest assay (Figure 12a). Results from this test confirmed the correct formation of all three disulphide bonds. Biological activity of the product in vitro was verified using the HepG2 receptor activation assay as described earlier (Figure 12b).

Figure 12.

 Chemical and biological characterization of insulin produced from scFv-proinsulin fusion protein construct 4501. (a) V8 peptide digest of folded insulin. Digests of a reagent grade DesB30 human insulin standard (RhIns) and Arabidopsis-derived DesB30 insulin from fusion protein construct 4501 (AthIns) were separated by RP-HPLC. Peak retention times (RT) for fragments I, II and III (FI, FII, FIII) are diagnostic for correct disulphide bond formation. (b) Insulin receptor activation comparing responses obtained with Arabidopsis-derived DesB30 insulin (4501) and a pharmaceutical-grade insulin standard, Humulin®R (PhIns). Receptor activation was determined using antiphosphotryosine antibodies as described earlier.

This work demonstrates that plant-based systems are capable of producing fully functional human insulin. Furthermore, plants appear to provide for flexibility in the way in which insulin accumulates and through this flexibility in the manufacturing process. We have since produced a fully authentic human insulin in our manufacturing host, safflower, and have gone on to demonstrate the bioequivalence of this product to commercial insulin in a human clinical trial (Boothe et al., 2009). We are continuing to develop the manufacturing process for plant-made insulin to determine the most cost-effective strategy. If successful, plants may offer a high capacity, economical system to address the anticipated issues of supply and cost for this product.

Scale-up of seed-based production systems

Overview and regulatory framework

Along with reduced costs, scaleability is among the advantages most often cited for plant-based production systems (Flinn and Zavon, 2004; EMBO, 2005; Boehm, 2007; Liénard et al., 2007). In examining the issues around the scale-up of plant-made pharmaceuticals (PMPs), it is useful to subdivide the process into its primary components namely; field production, upstream extraction and recovery (defined here as steps prior to any chromatography), and downstream purification. In addition to the functional aspects of scaling up individual unit operations, all three of these components also need to be considered in the context of the regulatory requirements for the manufacture of medicinal products. Both US (FDA) and European (EMEA) regulatory authorities have recently published guidance documents on the production of therapeutic proteins from genetically engineered plants providing a framework for manufacturing in these systems (FDA, 2002; EMEA, 2008). The documents cover all aspects of the process from generation of the bioengineered plant through propagation and final purification. As the part of the process that differs most from that of conventional cell-based systems, considerable attention is given to production of the plant material used for manufacturing. Importantly, specifications for final product quality are the same as those for conventional systems and sponsors are directed to a common set of guidelines for further detail (e.g. ICH guidelines) on these requirements. A good review of the US guidance has recently been published in a series of two articles by Berberich and Devine (2005a,b) that include a discussion of the issues together with the roles and responsibilities of the different agencies involved in regulating these activities.

Field production

For large scale production of transgenic seed crops, outdoor growth is realistically the only option. While glasshouse production may be feasible for certain leaf-based systems that employ transient expression and have a rapid cycling time, the generation time and space requirements for these operations would severely diminish any cost advantages enjoyed by seed crops. However, in terms of the types of equipment used and general agronomic practices, there are few if any differences in the requirements for the growth and harvest of transgenic and commodity crops. In this sense, the technology fulfils expectations for virtually unlimited capacity and economical production and clearly provides the lowest cost for a starting material of any recombinant production system. In comparison with more conventional cell-based systems in which economies of scale are limited by the maximum size of a reactor, crop production can be scaled more or less continuously to levels that far exceed the requirements for even the largest volume biopharmaceuticals. The storage stability of recombinant proteins in seeds additionally enables crop production to be decoupled from purification allowing for greater flexibility in manufacturing and inventory management. As discussed earlier, data from transgenic corn have shown that antibodies remain stable for periods of at least 2 years when the seed is stored under cool, dry conditions (Baez et al., 2000). This is consistent with data we have obtained for recombinant hirudin (Boothe et al., 1997) and chymosin (unpublished) expressed in transgenic safflower.

Where PMP production does differ significantly from that of nontransgenic crops and even those carrying transgenes for agronomic or ‘input’ traits is in the level of control that is required over field production and handling of the transgenic seed. The issues here are of both quality and containment. As the raw material for the manufacture of a medicinal product, it is essential that PMP crops deliver seed of consistently high quality in terms of transgene expression and impurity profile. Expression levels and product heterogeneity (e.g. post-translational modification, degradation, etc.) must be maintained within a relatively narrow range across generations, growing locations and environments. In practical terms, the acceptable level of variation will be determined by the robustness of the manufacturing process, but plants must be able to meet standards for product quality comparable to those of established systems if they are to be a viable alternative. This is similarly true for host and process impurities and contaminants. Achieving this requires that the progenitor seed be relatively free of disease and seeds of other species prior to planting and that a rigorous program of crop management is followed thereafter, including control of critical growth parameters, diseases and pests. Master and working seed banks (MSB and WSB, respectively) are developed to ensure consistency of the progenitor seed used for each production lot growout. As with Master Cell Banks in conventional fermentation systems, the MSB is characterized to verify that it meets requirements for genotypic and phenotypic stability over the maximum number of growth cycles (generations) between the MSB and production lot. The presence of foreign seed species is of particular concern in banked material because of its ability to propagate and form a major contaminant in the manufacturing process. Therefore, all seed banks are also required to pass Quality Control standards for purity attributes.

Issues of containment revolve mainly around environmental concerns over spread of the transgene and potential public safety hazards arising from adulteration of the food supply (FDA, 2002; Flinn and Zavon, 2004; Berberich and Devine, 2005b; EMEA, 2008). A variety of mechanisms are available to mitigate the risks of transgene escape including physical containment in glasshouses or caves, segregation of the transgenic crop spatially or temporally and the use of various gene flow restriction technologies (EMBO Reports, 2005; Basaran and Rodríguez-Cerezo, 2008). As discussed earlier, physical containment strategies may be impractical or at least uneconomical for most seed-based systems. The use of genetic/biological restriction strategies such as male sterility or organellar expression could potentially be very effective in species amenable to this approach. Although for many species, segregation may be the only practical method available. In these cases, it is important that growouts are performed at distances sufficient to isolate the plants from any food production areas for that crop and from any close relatives that could participate in interspecific crosses and thereby spread the gene. This can be accomplished through selection of a non-food or low-acreage crop that is not native to the area. The use of a species that is largely or completely self-pollinating can provide an additional safeguard against transgene escape.

The most likely cause of food supply adulteration is through the inadvertent mixing of seed from a PMP crop and food crop of the same species. To avoid this, it is imperative that companies engaged in PMP production maintain a tight chain-of-custody over their transgenic seed. For this reason, it is highly unlikely that PMP crops will ever be approved for unconfined release (Berberich and Devine, 2005a; EMBO, 2005). Control over all aspects of seed handling from planting, through harvest, shipping and storage will be necessary. This, along with other measures to prevent loss or cross contamination of seed such as the use of dedicated equipment, development of validated cleaning procedures and postharvest monitoring of production sites will be required to gain public acceptance of this technology.

Despite the high level of control required for PMP production, the economics appear to remain favourable. Estimates from one company, Meristem Therapeutics, are that PMP production in corn would increase costs by about threefold over equivalent production for food use and contribute <$1/g to the total cost of manufacturing (Mison and Curling, 2000). This threefold increase over commodity production costs is also in line with an independent estimate based on a detailed breakdown of individual costs for implementation of ‘GMP’ field production of transgenic corn (Crosby, 2003). In contrast, replacing traditional fermentation with field production is expected to result in dramatic savings in the costs of building and operating manufacturing facilities. Estimates have placed potential reduction in capital and manufacturing costs in the range of 75%–80% and 50%–60%, respectively (DePalma, 2003).

Upstream extraction and recovery

There are few if any reports describing detailed large scale processes for the manufacture of recombinant proteins from transgenic plants. This is perhaps not surprising given the relatively small number of companies involved, early stage of technological development and proprietary nature of most manufacturing processes. There are however a number of publications where the types of operations are discussed and in some cases, where models of these processes are presented (Evangelista et al., 1998; Baez et al., 2000; Menkhaus et al., 2004). From these, it is apparent that much of the technology for the extraction and recovery of recombinant proteins from seeds either already exists or, with slight modification, can be adapted from methods and equipment currently used in the agricultural food-processing industry. The initial step in the process usually involves milling of the tissue in either a dry state or together with an aqueous extraction buffer. In some cases, this may be preceded by a dehulling step or separation of endosperm and germ (corn) to remove a portion of the biomass that is largely devoid of the recombinant protein. If a dry milling process is employed, it is usually followed by a separate aqueous extraction step. The method of choice may be dependent on seed type, target protein and nature of the process. For an undisclosed protein expressed in corn, it was found that wet milling resulted in product losses and so the two-step dry milling and extraction process was preferable (Menkhaus et al., 2004). In contrast, our safflower oilbody process (Deckers et al., 2000) performs best using a wet milling step, which preserves the oilbody structure. Extraction of the tissue is generally followed by some form of filtration or centrifugation to remove solids that can represent 5%–30% of the total mass of the slurry. At this stage, in most seed-based processes, it is often desirable to perform a concentration step to reduce volumes prior to advancing into chromatography. The safflower oilbody process differs here in that the extract consists of an emulsion in which the recombinant protein is bound to seed oilbodies. The oilbodies and bound protein are separated from the majority of endogenous seed protein through a series of flotation–centrifugation steps. The washed oilbody fraction is then treated to either elute or cleave off the recombinant protein and the oilbodies removed. The recombinant protein is recovered in a highly enriched form reducing the number of downstream chromatography steps required. We have successfully scaled up this process to handle multiple tons of seeds.

With respect to regulatory considerations for this part of the process, the guidelines (FDA, 2002; EMEA 2008) recognize that seeds from plants grown in open fields will necessarily carry a certain level of bioburden and other contaminants. Furthermore, that because of the nature of the upstream extraction and recovery steps, it will not generally be possible to fully enclose this part of the process. They require, however, that all processing occur under controlled conditions designed to minimize the introduction of contaminants and reduce the level of bioburden as the material moves through the process.

The primary factor affecting costs through these initial steps is process volume. This in turn is a function of recombinant protein expression level and overall process yield. As expression levels and yields decrease, process volumes go up resulting in increases in the size of equipment, manufacturing space requirements and raw material usage. The impact that this has in raising manufacturing costs can easily offset savings associated with plant-based production. Achieving sufficiently high expression levels and yields therefore remains a significant challenge.

Downstream purification

As seed-based processes progress downstream beyond extraction and into purification, they begin to resemble more closely those of conventional systems. The major steps employed in this phase are various forms of chromatography and filtration. Each species and process will have its own unique composition with respect to host proteins, other cell metabolites and process-derived impurities that need to be removed. However, this is also true of microbial and mammalian cells, and there is no inherent reason to believe that purification from plant sources should be more difficult than for these systems. What reports there are available to date would seem to support this supposition (Evangelista et al., 1998; Kusnadi et al., 1998; Baez et al., 2000; Mison and Curling, 2000).

From the standpoint of final product quality, the regulatory guidance is clear in stating that products derived from plant sources will be held to the same standards as those from other systems. Specific mention is made of the need to address impurities and contaminants arising from field production along with those that may occur naturally in the host tissue. These include, for example, any pesticides used in production, heavy metals that may accumulate in the tissue and naturally produced or fungal-derived toxicants such as aflatoxin. To establish product purity and quality, it will be necessary to develop the requisite suite of analytical assays to demonstrate removal of these compounds in addition to those for host cell proteins and DNA. It is generally recognized that plant viruses are not pathogenic in mammals, and so these are not a concern, but it may also be necessary to demonstrate the capacity of the process to remove any adventitiously introduced viruses (e.g. from exposure to animal faeces, etc.) as part of final validation. A few PMP products from seed-based sources have now been successfully tested in clinical trials (Baez et al., 2000; Anonymous, 2003; Boothe et al., 2009) establishing the fidelity of biosynthesis in these systems and demonstrating that standards of purification and product quality equivalent to those of conventional systems can be met.

Costs in this stage of the process are largely driven by the ratio of the recombinant protein-to-native protein and the complexity of the mixture. High proportions of nontarget protein compete for binding capacity during chromatography, requiring increased quantities of media and buffers and larger column sizes. Typically, this has a greater impact in the earlier stages of purification because the proportion of the target protein increases with each step. More complex mixtures of proteins may necessitate additional steps to achieve the required level of purity. As no plant-derived products have yet reached the market, the economics of commercial manufacturing are still unproven. However, cost models appear to support the value proposition of the technology offering encouragement for continued development (Mison and Curling, 2000).

Concluding remarks

The proportion of biologics among new drugs in development is steadily increasing. At the same time, the costs of new drug therapies are placing an increasing burden on the health care systems of developed nations. In the developing world, costs and capacity constraints of conventional production systems put these treatments out of reach for much of the population. Plant-based production systems offer an alternative to microbial and mammalian cell hosts to help address these challenges. Because of the scale and economics of agricultural production, even very high volume proteins can be produced at reasonable costs. Although seed-based systems are somewhat slower than transient expression in leaves in providing initial quantities of material for clinical development, it is clear that they are well adapted to large volume applications such as insulin, many therapeutic antibodies and other proteins required in high doses or for chronic use. The case studies discussed in this review demonstrate that seeds are able to produce a variety of therapeutic and related proteins at high levels. Moreover, they show that these proteins are functionally equivalent to their native forms or recombinant versions produced in other systems. With establishment of a regulatory framework, the path for bringing these products forward is becoming clearer. Issues of control and containment of field-grown transgenic crops can be addressed with proper management; and from current assessments, this can be accomplished without sacrificing the economic advantages of conventional cultivation. While the total number is still relatively small, more products from seed-based systems have advanced into clinical trials than from other plant-based platforms. The available results demonstrate that these products can be made to meet existing standards of quality for protein pharmaceuticals and perform equivalently in clinical trials. Completion of clinical development to verify safety and efficacy, and demonstration of the putative cost advantages are among the final steps remaining towards realizing the enormous potential of seed-based production systems. It is likely that these steps will be completed for one or more products within the next few years.

Acknowledgements

The authors acknowledge the expert technical assistance of Jessica Montague and Jennifer Barrow for Arabidopsis transformation experiments.

Ancillary