The PCR scheme to assemble full-length synthetic ORFs
All synthetic gene construction technologies produce final, full-length products by hybridization-mediated assembly of a collection of individual oligonucleotides with overlapping ends (Heyneker et al. 1976; Goeddel et al. 1979; Sekiya et al. 1979). Recently developed techniques couple the assembly step with gene amplification (Dillon and Rosen 1990; Stemmer et al. 1995). The critical factor in selecting a suitable assembly scheme for automation is not maximum length that can be achieved in a one-step reaction nor minimization of iterative rounds of assembly (although this is desirable to avoid introduction of polymerase-mediated replication errors), but its robustness: The reactions need to work always under one set of defined conditions for all assemblies.
We found coupled PCR/ligation-based gene assembly (Strizhov et al. 1996; Seyfang and Jin 2004) to be more costly, less efficient, and much less robust than typical PCR-only-based methods. Classical PCR-based assembly schemes (Dillon and Rosen 1990; Stemmer et al. 1995) are often sensitive to small changes in reaction conditions and can show an unpredictable dependence on sequence (Lin et al. 2002; Gao et al. 2003; Young and Dong 2004). We found that the inside-out nucleation (ION) scheme (Gao et al. 2003) is highly robust. In this scheme DNA fragments are assembled from nested pairs of oligonucleotides, the inner one of which nucleates the reaction (Fig. 2A). In our implementation of the ION-PCR scheme, we use individual oligonucleotides 50–60 bases in length with 25-bp overlaps. The 60-bp limit is imposed by constraints in the current exigencies of commercial manufacturing (cost per base of that length; the avoidance of oligonucleotide purification). We anticipate that these will change over time. In general, it is advantageous to use the longest oligonucleotides available, which still maintain cost effectiveness, because increasing oligonucleotide length diminishes the complexity of the PCR assembly scheme.
Figure Figure 2.. Synthetic ORF assembly scheme. Lengths, locations, and orientations of oligonucleotides (half arrows) and fragments (double arrows) are shown to scale for (A) the ION-PCR assembly and (B) the SOE-PCR to assemble a full-length gene in this three-fragment scheme. Letters “s” and “a” indicate sense or antisense oligonucleotides, respectively; “A,” “B,” and “C” indicate subfragment identity.
Download figure to PowerPoint
Under these conditions ION-PCR robustly assembles fragments up to 375 or 445 bp in length, using five or six nested pairs, respectively. Although longer fragments can be assembled by ION-PCR using more nested pairs, we find that the schemes become sensitive to DNA sequence at ≥6 oligonucleotide pairs (see Electronic Supplemental Material), and require “thermal balancing” (approximate matching of the stabilities of the oligonucleotide overlap regions; Hoover and Lubkowski 2002). Assembly of long fragments that exceed six nested pairs (450-bp limit, using our currently selected oligonucleotide and overlap lengths) decreases process robustness and is therefore not suitable for automation. Based on these observations, we opted to develop a two-step assembly scheme, in which subfragments are generated by ION-PCR in a primary round of reactions, which are then combined in a second round of PCR reactions using splice overlap extension (SOE, Fig. 2B) to assemble full-length synthetic ORFs. Although this approach increases the number of steps, which is not desirable for handcrafted gene construction, it fulfills the robustness criterion essential for automation. Furthermore, we found that it is not necessary to purify the primary reaction products of the ION-PCR: mixing 100-fold dilutions of unpurified primary reaction fragments with flanking primers permits robust assembly of full-length fragments in a secondary SOE-PCR assembly. We have been able to routinely assemble full-length ORFs up to 1.3 kb from four fragments (Fig. 3).
Figure Figure 3.. Full-length assembly for genes of varying lengths. A, ankyrin binding protein (Binz et al. 2004), 519-bp ORF (two primary ION-PCR fragments are combined into the full-length gene by SOE-PCR). G, glucose binding protein from T. maritima, 927-bp ORF (three primary fragments). M, maltose binding protein from E. coli, 1125-bp ORF (four primary fragments).
Download figure to PowerPoint
The ION-PCR and SOE-PCR reaction conditions were optimized to further increase the robustness of the assembly method. Factors that were tested include choice of DNA polymerase and buffer, total and individual oligonucleotide concentrations, oligonucleotide purity, annealing temperature, and number of thermocycling steps (Electronic Supplemental Material). We, and others, have found that the DNA polymerase choice is a critical determinant for robust assembly (Wu et al. 2006), with KOD polymerase derived from Thermococcus kodakaraensis being the most successful (Takagi et al. 1997). Use of a concentration gradient for the ION pair oligonucleotides is also critical. Best results are obtained with a low inner pair concentration, and the others increasing in a geometric progression:
where [Pi] is the oligonucleotide concentration of the combined ith pair of primers (sense and anti-sense), [P]T is the chosen total oligonucleotide concentration, n is the total number of primer pairs, and c is a constant. Robust assemblies are observed for 0.65 ≤ c ≤ 0.75 and [P]T = 600 nM. Under these conditions, the reactions are not very sensitive to the annealing temperature (56°C < Tm < 64°C) or number of thermal cycling steps (N > 12).
Automated setup of the assembly scheme and selection of oligonucleotides
Two programs have been developed for PFA (Fig. 4): GeneFab, to set up ORF topology and convert mutation lists into oligonucleotide sequences, and FabMgr, to maintain the flow of information, physical materials, and operations. GeneFab first generates an oligonucleotide scaffold from user-specified constraints. Given the length of a synthetic ORF (lorf), the maximum length of each oligonucleotide (lo), the maximum number of sense–anti-sense pairs in an ION primary fragment (np), splice-overlap length (of), and ION pair overlap length (oo), a sequence-independent primer topology is designed, specifying length, position, and orientation of all oligonucleotides. If these constraints can be satisfied, the algorithm determines SOE fragment end points and numbers of ION pairs per fragment by simple arithmetic; thus the number of fragments (nf) and the ION primary fragment size (lfrag) are determined by
Figure Figure 4.. Software control for the flow of information, material, and operations. (1) User-specified amino acid sequence, sequence restrictions (e.g., forbidden restriction endonuclease sites, G:C content) and oligonucleotide and PCR fragment length limits are used by GeneFab to generate a gene scaffold assembly topology (positions and orientations of oligonucleotides). (2) User-specified reaction conditions for the ION- and SOE-PCR assemblies. (3) Allele formation is driven by mutation lists provided either by a user or by computational protein design algorithms. These are converted by GeneFab into specific oligonucleotide sequences using the scaffold topology parameters. (4) FabMgr queries the oligonucleotide database to identify duplicate oligonucleotides and generate synthesis orders for new oligonucleotide sequences. (5) New oligonucleotide microplates are added to physical and virtual inventory by FabMgr. (6) The Materials list contains all the information (reagents, oligonucleotides, source and destination locators, plasticware) required to assemble the alleles. (7) Liquid-handling robot requires specification of the robotic deck geometry (position of plates, source of water, enzyme, tips). (8) Human action is required to load the robotic deck with the correct oligonucleotides plates from the collection, plasticware (reaction plates, tips), and reagents (water, enzyme, and oligonucleotides as specified in the Materials list).
Download figure to PowerPoint
Oligonucleotide ION start and end points within fragments are decided in a two-pass scan. In the first pass, a maximum fragment length is calculated by using the user-specified limits for oligonucleotide length and primer pair constraints. The difference between this maximum length and the actual required fragment length (lfrag) is the excess length (Δl); if Δl < 0 in the first pass, the user-specified constraints cannot be satisfied, the process fails, and GeneFab prompts the user for a new set of constraints. In the second pass, Δl is used to calculate the final oligonucleotide lengths (lfinal), which are adjusted to give Δl = 0. Fragments are often not perfectly divisible by the total number of primers, resulting in the assignment of the 3′-most oligonucleotides being assigned one less base in length than the 5′-most oligonucleotides. This maximum value and final length calculation is repeated for each primary fragment in the ORF:
The resulting assembly schemes are subsequently annotated with oligonucleotide concentrations to be used in ION-PCR gene amplification (see above). A starting, wild-type DNA sequence (“gene scaffold”) can either be imported or is generated by reverse translation of an amino acid sequence, using a codon table that is annotated to preferentially use codons associated with high levels of protein expression (Ikemura 1985; Kane 1995; Kurland and Gallant 1996; Baca and Hol 2000). Reverse translation is further constrained by a set of forbidden DNA sequences that prevent introduction of out-of-place sites within the ORF, including restriction sites used for cloning (see below) or regulatory elements such as ribosome-binding sites. GeneFab also adds appropriate flanking sequences for cloning and gene expression (Supplemental material). Once a scaffold topology has been designed, it remains constant for the design of alleles with point mutations.
Alleles are generated either by hand, using a graphical user interface to enter the desired mutations, or in batch mode from mutation lists. The latter mode in particular exploits the power of PFA by enabling facile generation of many variants that may have been specified by computational design (Hellinga and Richards 1991; Pinto et al. 1997; Benson et al. 1998; Looger et al. 2003) or fragment identification algorithms (Brezellec et al. 2006; Lubovac et al. 2006). The resulting list of oligonucleotides and their assembly relationships is passed to the FabMgr program, which manages the flow of material and operations in the PFA pipeline.
For each gene scaffold, FabMgr (Fig. 4) maintains a relational database (the oligonucleotide database) recording the DNA sequences and well locators of all oligonucleotides that have been used to engineer the gene, assembly combinations, and reaction conditions for generating each full-length construct. This database is used in three ways: first, it specifies the oligonucleotides needed for setting up both ION- and SOE-PCR gene assembly reactions (see below); second, it is used to generate orders for the synthesis of new oligonucleotides that are not present in the current collection for the gene scaffold; third, it manages the link between the virtual and physical oligonucleotide collection by maintaining microplate and well-locator information. The elimination of duplicates is essential when designing variants of single sequences, as many alleles will have regions of identical sequence, sharing oligonucleotides (we observe that the need for new syntheses diminishes during the course of a project with the reuse of oligonucleotides).
It is critical to maintain synchrony between the virtual (in the oligonucleotide database) and physical (in freezers) collection of oligonucleotides. Oligonucleotide orders are therefore generated automatically by FabMgr and synthesized in bar-coded microplates. Additionally, contamination of the collection is prevented by replicating stock plates into working plates that are used by the liquid-handling robot (see below).
Automation of the gene assembly reactions
Mixing of the oligonucleotides from the collection, buffers, and enzymes for the PCR-mediated assembly reactions, temperature cycling, dilutions, and materials handling are all carried out by conventional liquid-handling robots using 96-well microplates. To support PFA, liquid-handling robots need to have a multipipetting arm suitable for 96-well microplates, pipetting range of 1 μL to 180 μL per reaction well, aspirate and dispense functions that are individually addressable for each pipette, disposable pipette tips, and script-driven programmability.
Genes are assembled in two steps: generation of the primary fragments by ION-PCR (Fig. 2A), which are then combined appropriately and diluted 100-fold into a second reaction that assembles the full-length product by SOE-PCR (Fig. 2B). For a typical full-length 1-kb assembly, using three fragments of five ION pairs each, about 3300 pipetting steps (oligonucleotide cherry picking, reagents dispensing) are required to set up a 96-well primary reaction plate (32 alleles) followed by ∼230 steps (dilutions and reagent dispensing) to set up the secondary reaction: The need for automation is manifest.
Pipetting algorithms need to satisfy three criteria: avoidance of cross-contamination between reaction wells, minimization of disposable tips use to contain plasticware cost, and optimization of mechanical arm travel time (reagent master mix or oligonucleotide stock to reaction vessel; tip pickup and discard). Two dispense methods are used (Fig. 5): submersion of the pipette tip into the reaction well followed by content expulsion and rinsing with the reactants (submerged mixing) or expulsion of a drop above the surface of the reaction well (aerial expulsion). The former is limited to a single aspiration/dispense cycle bracketed by tip pickup and discard steps; the latter allows a pipette to be used multiple times in successive dispensals for a single aspiration step before the discard, minimizing tip usage and travel time. Figure 5 illustrates how the tip pickup, aspiration, dispense, and discard operations are combined for the dispensing of water (Fig. 5A) or enzyme (Fig. 5B) and cherry picking of oligonucleotides (Fig. 5C).
Figure Figure 5.. Robotic pipetting operations. (A) Dispensing of water does not require prevention of cross-contamination or mixing. A simple multitip, multidispense fills all wells. (B) Submerged mixing for the addition of reagents to wells containing oligonucleotides requires tip discards between mixing tips. (C) Cherry picks of individual oligonucleotides requires aerial expulsion, single-tip addressability, and arbitrarily complex travel paths of the dispensing arm. Some destinations are shown for different sources (oligonucleotide libraries) and destinations (ION-PCR reaction plates). Note the use of single tips for the same oligonucleotide.
Download figure to PowerPoint
Water is dispensed first; the use of a single row of tips (Fig. 5A) minimizes both tip use and travel time. The master mix is dispensed last and requires a mixing step; therefore, only travel time can be minimized, as each well requires a separate tip (Fig. 5B). Above a critical volume (typically 3 μL), oligonucleotides can be dispensed with aerial expulsion. A single tip can therefore serve multiple wells in a multidispense operation, and several tips can be filled simultaneously to minimize travel time (Fig. 5C). To further limit travel time, plates are processed in order, one at a time; occasionally, therefore, not all eight tips are used (as shown in Fig. 5C). For dispensal of very small volumes (<3 μL), submerged mixing has to be used, and neither tip usage nor travel time can be optimized.
Table 1 summarizes the operations for setting up a plate of primary and secondary reactions. The FabMgr program produces a machine-independent (object) code that describes the assembly of the full-length gene products in generic operations (tip pick up, discard, individual tip aspirations, individual tip dispenses, arm moves from reagent or oligonucleotide stock plate positions to reaction wells). The object code is then converted into instrument-specific scripts. In the limit, about 1200 pipette tips would be needed to set up the example given above without optimization. We find that assembly reactions for a typical set of assemblies requiring 10–14 mutations per allele requires about 900 tips and takes 5 h including two thermal cycling steps of 45 min each. Figure 6 shows a set of 32 independent, full-length products generated fully automatically from 95 oligonucleotides.
Table Table 1.. Operations, machine use, and time for a typical PFA run
Figure Figure 6.. Automated full-length gene assembly. Thirty-two alleles of Thermatoga maritima glucose binding protein (three primary ION-PCR fragments per allele) were assembled fully automatically from 96 primary ION-PCR fragments (top two panels) to yield full-length 927-bp ORFs by SOE-PCR (bottom panel).
Download figure to PowerPoint
Selection of full-length, in-frame synthetic ORFs
Oligonucleotides synthesized using phosphoramidite chemistry (Caruthers et al. 1983) are contaminated with single-base deletions that cause frameshift mutations that are the primary source of error in gene assembly reactions (Temsamani et al. 1995; Tian et al. 2004). To suppress such frameshifts, we clone the full-length synthetic ORFs in-frame with a C-terminal selectable marker, chloramphenicol acetyl transferase (Maxwell et al. 1999), in a specially constructed synthetic ORF selection (SOS) vector (Fig. 7A). Chloramphenicol-resistant colonies are then picked and the DNA sequence of the synthetic ORF determined. Similar strategies have been used by others also to select for full-length ORFs by antibiotic resistance (Seehaus et al. 1992; Daugelat and Jacobs 1999; Lutz et al. 2002) or to screen for or improve protein expression levels (Nixon and Benkovic 2000; Mossner et al. 2001; Cabantous et al. 2004). To test the efficiency of the selection step, we constructed a green fluorescent protein (GFPuv) gene (Crameri et al. 1996) by PFA and assessed the fraction of successful assemblies by counting green colonies. Active and inactive colonies were sequenced. Without selection 28% of the clones are functional, using unpurified oligonucleotides; with selection, 82% of the clones are functional. With this success rate, an average of only 1.2 clones needs to be sequenced to identify a correctly assembled synthetic ORF. Sequence analysis demonstrates that the genetic selection system removes all frameshifts present in an assembly reaction when assaying only functional clones (or 50-fold reduction in frameshifts overall; Table 2) while greatly enriching mutation-free clones (Fig. 8). Furthermore, the use of the SOS vectors obviates the need for PAGE-purified oligonucleotides, providing a significant savings in both cost and effort. Other gene assembly methods require PAGE-purified oligonucleotides to facilitate gene synthesis (Stemmer et al. 1995; Strizhov et al. 1996; Hoover and Lubkowski 2002; Gao et al. 2003; Carr et al. 2004; Seyfang and Jin 2004; Tian et al. 2004; Xiong et al. 2004; Young and Dong 2004).
Table Table 2.. Efficacy of full-length ORF selection
Figure Figure 7.. PFA vectors. (A) synthetic ORF selection plasmid (pSOS), used to select correctly assembled full-length ORFs from the SOE-PCR products. (B) synthetic ORF expression plasmid (pSOX), into which ORFs from pSOS are recloned for protein expression. (C) The synthetic ORF selection and expression plasmid (pSOS-X) combines the functionalities of pSOS and pSOX, obviating a recloning step. plac, lactose promoter; ptac, lactose/tryptophan fusion promoter; SD, Shine–Dalgarno sequence; His6, polyhistidine purification tag; UGA, amber stop codon; CS, cloning site; ori, origin of replication.
Download figure to PowerPoint
Figure Figure 8.. Total mutation frequency per clone. Insertions, deletions, point mutations (nonsense, missense, silent) are counted as mutational events. (White bars) no genetic selection; (black bars) SOS genetic selection.
Download figure to PowerPoint