Cloning large natural product gene clusters from the environment: Piecing environmental DNA gene clusters back together with TAR

A single gram of soil can contain thousands of unique bacterial species, of which only a small fraction is regularly cultured in the laboratory. Although the fermentation of cultured microorganisms has provided access to numerous bioactive secondary metabolites, with these same methods it is not possible to characterize the natural products encoded by the uncultured majority. The heterologous expression of biosynthetic gene clusters cloned from DNA extracted directly from environmental samples (eDNA) has the potential to provide access to the chemical diversity encoded in the genomes of uncultured bacteria. One of the challenges facing this approach has been that many natural product biosynthetic gene clusters are too large to be readily captured on a single fragment of cloned eDNA. The reassembly of large eDNA-derived natural product gene clusters from collections of smaller overlapping clones represents one potential solution to this problem. Unfortunately, traditional methods for the assembly of large DNA sequences from multiple overlapping clones can be technically challenging. Here we present a general experimental framework that permits the recovery of large natural product biosynthetic gene clusters on overlapping soil-derived eDNA cosmid clones and the reassembly of these large gene clusters using transformation-associated recombination (TAR) in Saccharomyces cerevisiae. The development of practical methods for the rapid assembly of biosynthetic gene clusters from collections of overlapping eDNA clones is an important step toward being able to functionally study larger natural product gene clusters from uncultured bacteria. © 2010 Wiley Periodicals, Inc. Biopolymers 93: 833–844, 2010.


INTRODUCTION
C ultured soil bacteria have been a productive source of both biologically active and structurally diverse natural products. 1,2 Molecular phylogenetic analysis of soil microbiomes now indicate that a single gram of soil can contain thousands of unique bacterial species, only a small fraction of which is regularly cultured in the laboratory. [3][4][5][6] Uncultured bacteria represent one of the largest pools of genetic diversity that has not been examined for the production of natural products. Culture-independent analysis of microbial communities using DNA extracted directly from environmental samples, which is commonly defined as metagenomics, has the potential to provide access to the biosynthetic capacity of uncultured bacteria. 7 All of the genes required for the biosynthesis of a natural product, including genes that encode biosynthetic, regulatory, and self-immunity enzymes, are typically clustered on bacterial Cloning Large Natural Product Gene Clusters from the Environment: Piecing Environmental DNA Gene Clusters Back Together with TAR Correspondence to: Sean F. Brady; e-mail: sbrady@rockefeller.edu

ABSTRACT:
A single gram of soil can contain thousands of unique bacterial species, of which only a small fraction is regularly cultured in the laboratory. Although the fermentation of cultured microorganisms has provided access to numerous bioactive secondary metabolites, with these same methods it is not possible to characterize the natural products encoded by the uncultured majority.
The heterologous expression of biosynthetic gene clusters cloned from DNA extracted directly from environmental samples (eDNA) has the potential to provide access to the chemical diversity encoded in the genomes of uncultured bacteria. One of the challenges facing this approach has been that many natural product biosynthetic gene clusters are too large to be readily captured on a single fragment of cloned eDNA. The reassembly of large eDNA-derived natural product gene clusters from collections of smaller overlapping clones represents one potential solution to this problem. Unfortunately, traditional methods for the assembly of large DNA sequences from multiple overlapping clones can be technically challenging. Here we present a general experimental framework that chromosomes. Natural product gene clusters can range in size from a few kilobases to over 100 kilobases. The heterologous expression of natural product biosynthetic gene clusters captured on individual eDNA clones has begun to provide access to some of the natural products encoded in the genomes of uncultured bacteria (see Figure 1). However, a major limitation of this strategy has been the inability to routinely construct very large eDNA libraries with inserts big enough to capture large biosynthetic pathways on individual clones. Figure 1 shows a collection of metabolites that have been isolated from the culture broths of soil-derived eDNA clones. In each case, a single cosmid/fosmid eDNA clone confers the production of the metabolites to a heterologous host. Successful functional metagenomic natural product discovery studies carried out on marine samples and other microbiomes have also largely been restricted to single clones. 13,21 While the construction of 30-40 kb insert cosmid libraries from environmental samples is now routine, the construction of larger insert libraries that can be used to capture large natural product gene clusters has been challenging. Bacterial artificial chromosome (BAC)-derived libraries are capable of capturing larger inserts but generally yield metagenomic libraries that are two to three orders of magnitude smaller than those constructed using cosmid-based cloning strategies. 22 Theoretically, all gene clusters that are too large to be captured on a single cosmid-sized clone can be reassembled from collections of overlapping eDNA cosmid clones (Figure 1b). Existing gene cluster assembly strategies depend on either unique restriction sites or k-mediated recombination to reassemble large DNA fragments. Both of these strategies are technically challenging when working with very large DNA fragments or with sequences that span more than two overlapping clones. [11][12][13][14][15][16][17] Transformation-associated recombination (TAR) in Saccharomyces cerevisiae relies on homologous recombination to selectively capture a known sequence from a mixture of genomic DNA. 18,19 In TAR cloning protocols, genomic DNA and a ''capture'' vector with short homology arms corresponding to sequences flanking the region of interest are cotransformed into S. cerevisiae. The capture vector arms and homologous target DNA undergo recombination to yield a stable plasmid containing the targeted genomic region. TAR was originally developed to facilitate cloning large genomic fragments without having to construct and screen genomic DNA libraries. Recent studies extended the scope of this methodology by showing that it could be used to assemble 25 cotransformed overlapping DNA fragments into a complete 592-kb synthetic genome and that multiple PCR products could be assembled into small biochemical pathways. [20][21][22] These studies led us to believe that TAR could also be used to assemble large natural product gene clusters from multiple overlapping eDNA clones.
In this report, we show that TAR in S. cerevisiae can be used to rapidly reassemble large natural product biosynthetic gene clusters from overlapping eDNA cosmid clones. The rich microbial diversity present in soils makes them attractive, but challenging, starting points for the culture-independent discovery of new natural product biosynthetic gene clusters. Much of the difficulty in working with soil-derived eDNA libraries stems from their inherent complexity, which necessitates the construction of very large clone libraries to ensure that large biosynthetic pathways can be recovered in their entirety. Using two of the largest soil eDNA cosmid libraries reported to date as examples, we have also empirically investigated the minimum size eDNA libraries will likely need to be to recover complete large natural product gene clusters on overlapping cosmid clones. Taken together, these studies provide an experimental framework for gaining access to large, intact natural product biosynthetic gene clusters from soil microbiomes.
Cloning Large Natural Product Gene Clusters From the Environment 835 Library Size Analysis DNA from each unique E. coli transduction reaction was used as a template in PCR reactions with degenerate primers designed to amplify b-Ketoacyl synthase gene sequences (dp:KS b , 5 0 -TTC GGSGGNTTCCAGWSNGCSATG-3 0 and dp:ACP, 5 0 -TCSAKSAG SGCSANSGASTCGTANCC-3 0 ). 11

Identification of Gene Clusters of Interest
PCR reactions with degenerate primers designed to amplify b-ketoacyl synthase gene sequences were used to detect Type II polyketide synthase (PKS) sequences. 11,38 Degenerate primers designed to detect flavin-dependent halogenases (TyrohalF3: 5 0 -CGGCTGG TTCTGGTACATCCC-3 0 , TyrohalR2: 5 0 -GAACTCGTAGAASACSCC GTACTC-3 0 ) were used to identify the nonribosomal peptide synthetase (NRPS) gene cluster. The FRI gene cluster was identified using primers that recognize conserved sequences in acyl-CoA ligases found in lipopeptide antibiotic gene clusters (DpFrEFWD1: 5 0 -TSMTSCAGTACACSTCSGG-3 0 and DpFrEREV1: 5 0 -WDGTCGT ASGCGAAGTCSG-3 0 ). Type II PKS sequences were amplified using the same PCR conditions outlined for the library size analysis. Flavin-dependent halogenases were amplified using the following PCR conditions: Each 20-ll reaction contained primer added to a final concentration of 2.5 lM, 0.

General Procedure for Clone Recovery
Individual clones were recovered from a 4000-5000-membered sublibrary by plating a 10 25 or 10 26 dilution of the corresponding glycerol stock into 96-well microtiter plates and screening the diluted cultures by whole-cell PCR with primers designed to recognize amplicons detected in the initial screen. PCR positive wells were then either subjected to a second round of dilution plating or plated directly on LB agar with ampicillin (50 lg ml 21 ) to yield distinct colonies that were screened by whole-cell PCR to identify individual clones of interest. Each recovered cosmid was end-sequenced using vector-specific (pWEB, pWEB-TNC) universal primers [M13(240) and the T7 promoter]. All clones were fully sequenced using 454 GLX FLX pyrosequencing, assembled using Newbler (Roche), and annotated using Genemark and BLASTX. [39][40][41] Gene cluster images were generated using MacVector. The amino acid substrate specificity for each adenylation domain found in the cryptic NRPS gene cluster was predicted using NRPSpredictor. 42 pTARa Vector Construction The yeast ARSH4 (autonomous replicating sequence), CEN6 (plasmid maintenance element), and URA3 markers were obtained from pLLX13 by digestion with EcoRI and HindIII. 23 After gel purification, the fragment was ligated into similarly digested pCC1-BAC (Epicentre Biotechnologies). The resulting vector was digested with HpaI and ligated to a DraI fragment from pOJ436 containing an origin of transfer (OriT), integrase and apramycin resistance gene. 43 Transformation into EPI300 E. coli (Epicentre Biotechnologies) and selection on chloramphenicol (12.5 lg ml 21 ) and apramycin (50 lg ml 21 ) yielded the capture vector pTARa (TAR-ready BAC with the Streptomyces attP integration system, GenBank accession number: GQ452294).

TAR Cloning
TAR cloning was initially developed to selectively isolate regions of genomes without the need to construct and screen a genomic library. 23,24,32,33,44 The procedures outlined below describe our adaptation of these methods for the isolation of sequenced natural product gene clusters and the assembly of large natural product biosynthetic gene clusters captured on multiple overlapping eDNA clones.

Pathway-Specific Capture Vector Construction
The cycloheximide counter selection cassette (CYH2/bla) was PCR amplified using pLLX8 as a template following reported protocols. 23 The cassette was amplified using primers pLLX8/fw/: 5 0 -TTTTCT AGAACGCGTTTAATTAAAATCTAAAGTATATATGAGTAAAC-3 0 and pLLX8/rv/:  23 These homology regions were incorporated to allow pathway-specific capture vector construction using recombination in S. cerevisiae. 23 Upstream homology arm amplification primers contained a sense primer extension: 5 0 -ATATTAC CCTGTTATCCCTAGCGTAACTATCGATCTCGAG-3 0 , and an antisense primer extension: 5 0 -CATATATACTTTAGATTTTAATTAAAC GCGTTCTAGAAAA-3 0 , which add 40 bp of homology to pTARa and the counter selection cassette, respectively. The downstream targeting sequence sense primer extension is: 5 0 -CATTTTCACCGTTTTTTGT TTAAACGTTAACTCTAGAGGG-3 0 , which provides homology to the counter selection cassette and the antisense primer extension is: 5 0 -TA ACAGGGTAATATAGAGATCTGGTACCCTGCAGGAGCTC-3 0 , which provides homology to pTARa. Each primer pair was designed to yield a 600-to 900-bp amplicon that acts as a homology arm in a pathwayspecific capture vector used for a TAR reassembly reaction. 23  For the assembly of a pathway-specific capture vector, 200 ng of pTARa was linearized with NheI and added to 200 lg of heat denatured single stranded carrier DNA (heated to 958C for 10 min then kept on ice), 600 ng of CYH2/bla counter selection cassette amplicon 23 and 200 ng of an upstream and downstream homology arm amplicon pair prepared as described above. All components were added to lithium acetate prepared chemically competent CRY1-2 (uracil deficient, ura 2 ) yeast, plated on synthetic complete (SC) uracil dropout agar (Invitrogen) and incubated at 308C. 45 Colonies typically began to appear within 24-48 h. Assembled capture vectors were isolated in bulk by resuspending yeast colonies from a 100 mm SC dropout agar plate in 5 ml of 1 3 phosphate buffered saline. Plasmid DNA was isolated from 1 ml of resuspended cells (ChargeSwitch TM Yeast Plasmid Isolation Kit, Invitrogen). About 100 ng of the purified DNA was transformed into electrocompetent EPI300 E. coli and plated on LB agar containing ampicillin (100 lg ml 21 ), chloramphenicol (12.5 lg ml 21 ), and apramycin (50 lg ml 21 ) to yield a pathway-specific capture vector containing a counter-selection cassette.

TAR Cloning and Pathway Assembly
Direct TAR cloning of the colibactin gene cluster from genomic DNA was carried out using reported protocols. 24,44 For eDNA pathway assembly, each cosmid to be used in an assembly reaction was initially linearized by digestion with DraI and the capture vector was linearized by digestion with PmeI. About 200 ng of each linearized cosmid and an equimolar amount (100 ng) of a linearized pathway-specific capture vector were added to 200 ll of S. cerevisiae spheroplasts prepared as previously reported. 44 The transformed spheroplasts were added to 7 ml of top agar equilibrated to 508C [1M sorbitol, 1.92 g l 21 SC uracil dropout supplement (Invitrogen), 6.7 g l 21 yeast nitrogen base (Invitrogen), 2% glucose, 2.5% agar]. The top agar containing transformed spheroplasts was overlaid onto SC dropout agar containing 2.5 lg ml 21 cycloheximide. The plates were incubated at 308C and spheroplast growth was typically seen within 72 hours. The resulting recombinants were patched onto SC uracil dropout agar with cycloheximide (2.5 lg ml 21 ) for overnight growth at 308C.
For initial PCR detection of reassembled pathways, a small portion of each yeast patch was resuspended in 10 ll of 20 mM NaOH and heated at 958C for 10 min. 1.5 ll of the cell lysate was then used as a template in a 50-ll multiplex PCR reaction following the manufacturer's directions (Multiplex PCR Kit, Q solution TM , Qiagen). The primer sets used in this analysis were designed to recognize unique regions from each overlapping cosmid clone that was used in an assembly reaction. In the colibactin TAR experiment, PCR primer pairs were designed to detect the previously reported boundaries of the biosynthetic gene cluster. 46

Analysis of TAR Recombined Clones
Yeast recombinants that produced PCR amplicons of correct size for all portions of a pathway were grown overnight (308C, 225 rpm) in 2 ml of SC uracil dropout media (or on SC uracil dropout agar) with 2.5 lg ml 21 cycloheximide and TAR assembled pathways were isolated from these cultures (ChargeSwitch TM , Invitrogen). Five microliters of ChargeSwitch TM prepared DNA (1/10 elution volume) was transformed into electrocompetent EPI300 E. coli which were outgrown at 308C for 2 h (225 rpm) and then plated on LB agar with 12.5 lg ml 21 chloramphenicol. Whole-cell PCR was used to identify E. coli colonies containing correctly reassembled gene clusters. DNA was then isolated from 5 ml cultures of PCR positive E. coli transformants using alkaline lysis and isopropanol precipitation (CopyControl TM pCC1-BAC Induction Protocol, Epicentre Biotechnologies). E. coli transformants containing the colibactin gene cluster were identified using eight sets of previously reported PCR primers designed to detect different ORF's in the pathway (data not shown). 46 Detailed restriction mapping was carried out on each reassembled pathway using an enzyme (PKS, EcoRI; NRPS, EcoRI; FRI, BglII; Colibactin, HindIII) that was predicted to yield restriction fragments that could be easily resolved using agarose gel electrophoresis (1% agarose, 0.53 Tris/Borate/EDTA, 30 V, overnight). The lambda HindIII and 50-bp molecular weight makers were obtained from New England Biolabs. Full pathway sequencing for each gene cluster was deposited with GenBank under the following accession numbers: NRPS: GQ475282, FRI: GQ475284, and PKS: GQ475283.

Conjugation and Preparation for Heterologous Expression
Assembled pathways were transformed into S17-1 E. coli for conjugation into Streptomyces using published protocols. 43 All three reassembled eDNA gene clusters were successfully conjugated and chromosomally integrated into a number of Streptomyces including Streptomyces lividans, Streptomyces albus, and Streptomyces toyocaensis. 47

Library Size Analysis
The genes responsible for the biosynthesis of a natural product are typically clustered on a bacterial chromosome, and therefore theoretically can be cloned on a single continuous fragment of eDNA. While the heterologous expression of biosyn-Cloning Large Natural Product Gene Clusters From the Environment thetic gene clusters captured on eDNA-derived clones has begun to yield novel natural products, many natural product gene clusters are too large to be routinely captured on individual eDNA cosmid clones (see Figure 1). With metagenomic libraries of sufficient size and sequence coverage, large gene clusters that cannot be captured on a single cosmid clone could be accessed by recovering collections of overlapping eDNA clones. Soil microbiomes are among the most genetically diverse environments characterized to date and are therefore attractive starting points for the discovery of natural products using a metagenomic approach. 3 However, because of this complexity, it is difficult to predict the size a soil-based eDNA library must be to permit the recovery of overlapping clones from a diverse collection of large natural product gene clusters. We set out to empirically investigate this problem using eDNA libraries constructed from two different soil samples. For this study, DNA isolated from soil collected in Utah was used to construct a series of independent 750,000-membered eDNA cosmid libraries (10,000,000 clones in total) and DNA isolated from a soil sample collected in California was used to construct a series of independent 320,000-membered eDNA cosmid libraries (15,000,000 clones in total).
The reassembly of large natural product gene clusters from multiple overlapping eDNA fragments begins with the detection of specific sequence(s) of interest located on two or more unique library clones. We therefore wanted to determine the point at which redundant sequences of interest began to regularly appear in unique eDNA library aliquots constructed from the same soil sample. Culture-based studies suggest that Type II (aromatic, iterative) PKS biosynthetic systems are common in bacteria and the PKS genes found in these systems are highly conserved. We therefore chose Type II PKS pathways as a model system for studying large ([30 kb) gene clusters present in soil-derived eDNA libraries. Both the California and Utah libraries were screened for the presence of b-ketoacyl synthase (KS b ) gene sequences using degenerate PCR primers designed to recognize Type II PKS systems. 11,38 In total, 19 distinct KS b gene sequences were amplified from the Utah library and 73 distinct KS b gene sequences were amplified from the California library (see Figure 2). In the Utah library, redundant KS b sequences began to regularly appear once 3 3 10 6 clones had been examined, while in the California library redundant KS b sequences began to regularly appear once 2.25 3 10 6 clones had been examined. Additional screens using primers designed to recognize other conserved natural product biosynthetic gene sequences have shown similar results. In these studies, redundant sequences begin to regularly appear once libraries exceed 1-3 3 10 6 clones in size (data not shown). 49 The libraries used in our efforts to recover natural product gene clusters to be used in TAR assembly experiments were therefore expanded until they contained at least 1-1.5 3 10 7 unique clones, which corresponds to 5-10 times the number of clones needed to identify the first redundant Type II PKS sequences. While even an eDNA library of 1-1.5 3 10 7 clones is unlikely to permit the recovery of rare gene clusters, our analysis suggests that it will likely contain collections of clones encompassing complete PKS gene clusters and, by extension, overlapping clones from many other types of biosynthetic gene clusters found in the genomes of uncultured bacteria.

Natural Product Gene Cluster Identification and Recovery
In excess of 35,000 unique microbial natural products have been characterized using culture-based methods. 50,51 This amazing assortment of natural products is biosynthesized using a much smaller number of conserved enzyme families. The structural diversity seen in natural products appears to arise in large part from the natural combinatorial shuffling of these conserved biosynthetic enzyme families. 52 Degenerate primers designed to recognize conserved natural product biosynthetic gene sequences should therefore be useful for identifying eDNA derived gene clusters that encode the biosynthesis of a diverse collection of small molecules. In this study, three different sets of degenerate primers were used to recover three large natural product biosynthetic gene clusters from the Utah and California soil eDNA libraries. A cryptic Type II PKS gene cluster was identified using the Type II PKS-specific degenerate primers we used in our initial library size analysis. 11,38 A cryptic NRPS gene cluster was identified using degenerate primers designed to amplify flavin-dependent halogenases known to tailor aromatic amino acids found in halogenated nonribosomal peptides. Degenerate primers designed to recognize acyl-CoA ligases found in lipopeptide antibiotic gene clusters were used to identify a gene cluster that is predicted to encode the known metabolite friulimicin. These three eDNA-derived gene clusters are referred to as the PKS, NRPS and FRI gene clusters, respectively. The PKS and NRPS gene clusters were found in an eDNA library derived from topsoil collected in Utah while the FRI gene cluster was found in an eDNA library derived from desert soil collected in California.
Individual cosmid clones containing genes recognized by the degenerate primers used in initial library screens were recovered from the appropriate library and then end sequenced (see Figure 3). PCR primers designed against the end sequences were subsequently used to identify and recover overlapping clones from the same library. The process of clone recovery and end sequencing was iteratively repeated until genes predicted to be involved in primary metabolism were found on the distal ends of a recovered cosmid (see Figure 3). This initial end-sequencing analysis suggested that 838 Kim et al.
the NRPS and FRI gene clusters were recovered on three cosmids each (NRPS: clones ZA41, Q87, J2; FRI: clones 1697, 1451, 201). The PKS gene cluster appeared to be present on two overlapping cosmids (PKS: clones X16, V48). Each clone that was predicted to be part of a gene cluster was fully sequenced and annotated. The eDNA-derived FRI gene cluster and the friulimicin gene cluster from A. friuliensis have the same gene organization and are 89% identical over the 68-kb region that is predicted to comprise the functional biosynthetic pathway. 53 A comparison of these two gene clusters suggests that the entire FRI gene cluster was likely captured on the three overlapping eDNA cosmids that were recovered. While the eDNA-derived PKS and NRPS gene clusters do not closely resemble any known gene clusters, the appearance of primary metabolic enzymes in the sequence surrounding the conserved natural product biosyn-thetic genes found on these clones suggests they were also likely recovered in their entirety. Sequencing of a fourth overlapping clone that extends 20 kb beyond the NRPS gene cluster found no enzymes associated with secondary metabolism. As suggested by our initial eDNA library size analysis, cosmid libraries containing in excess of 10 million clones appear to provide sufficient coverage of soil metagenomes to allow access to a diverse range of complete natural product biosynthetic gene clusters.

TAR Vector Design and Construction
To facilitate TAR reassembly of large natural product gene clusters as well as subsequent heterologous expression studies with reassembled pathways, we created pTARa, a BAC-based S. cerevisiae/E. coli/Streptomyces shuttle capture vector ( Figure   FIGURE 2 Degenerate primers targeting minimal Type II PKS genes were used to identify KS b sequences present in unique eDNA library aliquots constructed from soil samples collected in Utah (A) and California (B). ClustalW 48 derived phylogenetic trees of the KS b sequences identified in these screens are shown. The aliquots from which sequences were amplified and the point at which they began to reappear in the library (red) are shown as a heatmap.
Cloning Large Natural Product Gene Clusters From the Environment 839 4a). This vector contains elements that allow pathways to be assembled in S. cerevisiae, characterized and maintained in E. coli, and conjugatively transferred into a wide range of Streptomycetes for heterologous expression studies. 43 We included these elements to facilitate Streptomyces-based heterologous expression studies, but any number of species-specific genetic elements can be incorporated into pTARa to allow the transfer of pathways into a wide variety of bacterial hosts. 23 As a demonstration of the utility of pTARa as a shuttle vector, we propagated the vector in S. cerevisiae (CRY1-2), transformed and isolated the vector from E. coli and successfully conjugated into a number of different Streptomycetes including S. toyocaensis, S. lividans, and S. albus.  For capture vector construction, pathway-specific upstream (US-blue), and downstream (DS-red) homology arms, as well as a counter selection cassette (cyh/bla) are incorporated into the capture vector. 23,24 During recombination, the counter selection cassette is exchanged for a TAR cloned gene cluster (B) or TAR reassembled eDNA pathway (C).

Capturing Natural Product Gene Clusters from Sequenced Genomes Using pTARa
The cloning of natural product gene clusters from cultured organisms traditionally requires the construction and screening of a genomic DNA library. 54,55 Using TAR cloning, a sequenced biosynthetic gene cluster of any size can be directly cloned without the need to construct or screen a genomic library (Figure 5c). 23 To demonstrate the utility of pTARa for culture-based natural products research, we directly cloned the 56-kb colibactin gene cluster directly from genomic DNA isolated from the cultured bacterium, C. koseri. 24,44 Previous studies determined the functional boundaries of the colibactin gene cluster via transposon mutagenesis. 46 In order to TAR clone this gene cluster, we simply designed a pathway-specific capture vector using this information (Figures 4b and 5), and co-transformed the capture vector and C. koseri genomic DNA into S. cerevisiae spheroplasts. 24,44 We screened yeast spheroplasts using colibactin gene cluster specific PCR primers and were able to quickly identify clones containing intact colibactin gene clusters. Detailed restriction mapping of the TAR cloned pathway confirmed that we had specifically cloned the colibacin gene cluster (pTARa-Colibactin) directly from C. koseri genomic DNA (Figure 5b). 46 As demonstrated by this experiment, TAR cloning should provide a general and rapid means to access intact natural product biosynthetic gene clusters from sequenced microorganisms without the need to construct or screen a genomic library (Figure 5c).

TAR Assembly of Multiclone Gene Clusters
For each reassembly experiment, we constructed a unique pathway-specific capture vector with homology arms corresponding to sequences at the proximal and distal ends of the gene cluster to be reassembled (Figures 4c and 6). Homologous recombination in S. cerevisiae is stimulated by the presence of double stranded breaks adjacent to recombination sites. 32 The individual cosmids to be used in the reassembly of a gene cluster were therefore linearized by restriction digestion with DraI and then cotransformed with a linearized pathway-specific capture vector into competent CRY1-2 S. cerevisiae. DraI, which recognizes the AT rich hexamer, TTTAAA, digests the cosmid backbone, yet rarely cuts in GC rich sequences found in biosynthetic gene clusters thus providing a means to generate linear DNA fragments for TAR reassembly reactions. The concentration of the components used in the cotransformation step was empirically determined and selected to yield, on average, one assembled construct per spheroplast. After 3-5 days of recovery on SC uracil dropout agar, recovered spheroplasts were restruck on new SC uracil dropout agar plates. This step is necessary to reduce the chance of cross contamination caused by DNA from the TAR reaction during the PCR analysis that is used to identify yeast colonies with assembled gene clusters. Yeast colonies were then screened using multiplex PCR with primers specific to each unique cosmid fragment predicted to be present in a reassembled gene cluster construct. Between 30 and 70% of the yeast colonies were found to be PCR positive for all fragments predicted to be present in a pathway. Using this approach we were able to rapidly identify yeast colonies that contained intact biosynthetic gene clusters. Large constructs isolated from PCR positive yeast clones were electroporated into E. coli and analyzed by detailed restriction analysis (see Figure 6). In each case, the large con- struct obtained from a TAR reassembly reaction produced a restriction map that was identical to the map predicted to arise from assembling the individual overlapping clones used in the reaction (see Figure 6). The 39 kb PKS gene cluster was successfully subcloned from the central region of cosmids X16 and V48, two cosmids that contain 2.1 kb of overlap. The entire 89-kb cryptic NRPS gene cluster was successfully reconstructed in a single S. cerevisiae spheroplast transformation reaction from three overlapping eDNA cosmid clones. In a similar fashion, we reassembled the 90-kb eDNA-derived FRI gene cluster using a single S. cerevisiae spheroplast transformation reaction and three overlapping eDNA-derived cosmid clones. The overlapping cosmids (black) comprising a complete biosynthetic pathway is shown above the region targeted for TAR assembly (green line). The individual building blocks that are predicted to be used by the conserved modules (PKS and NRPS) found in these biosynthetic pathways appear below each gene cluster (DHPG 5 dihydroxyphenylglycine). 42 While the PKS and NRPS gene clusters were initially assembled from fully sequenced sets of cosmids, reassembly experiments can also be performed in the absence of comprehensive sequencing. The FRI gene cluster was originally reassembled with only end-sequencing data for each cosmid clone predicted to comprise the complete gene cluster (Figures 4c  and 6). A capture vector based on the end-sequencing data from the distal ends of the two outermost clones, cosmids 1679 and 201, was used to reassemble the gene cluster (see Figure 3). We confirmed the successful reassembly of the fragments using PCR and by comparing restriction maps of the reassembled construct with those produced by the cosmids used in the reassembly experiment (data not shown). Subsequent full sequencing of the clones comprising the FRI gene cluster confirmed the restriction mapping and successful sequencing-independent TAR assembly experiment (see Figure 6).
Traditional gene cluster assembly strategies can become technically impractical when working with large naturally derived DNA sequences. Unique and conveniently located restriction sites needed for traditional ''cut and paste'' strategies are often not available when working with long natural DNA sequences. Recently, k-based recombination has been used to reconstruct functional gene clusters, circumventing many of the problems associated with traditional strategies. 26 Lambda-based recombination becomes difficult, however, for large gene clusters captured on multiple overlapping clones because it requires the step-wise recombination of two clones at a time. This step-wise recombination process requires the introduction of a unique selectable marker into each fragment used in an assembly experiment. As demonstrated here, TARdependent assembly of multiclone natural product gene clusters can be performed in a single reaction without any of these barriers. The maximum number of DNA fragments that can be simultaneously assembled in TAR experiments has yet to be determined, but even the largest gene clusters are unlikely to require more than three or four overlapping cosmids which is well within the established limits of TAR. 34,35,56 CONCLUSIONS Previous studies have demonstrated that metagenomic strategies can be used to uncover metabolites encoded by gene clusters captured on individual soil-derived eDNA clones (see Figure 1). Cloning large natural product gene clusters presents a challenge for both culture dependent and culture independent studies. We have shown that TAR can be used to rapidly reassemble overlapping eDNA-derived clones into a single construct containing large eDNA derived natural product gene clusters. We have also shown that TAR can be used to directly and specifically clone natural product gene clusters from sequenced organisms without constructing and screening a genomic library. TAR-dependent assembly of natural product gene clusters from overlapping clones found in eDNA soillibraries provides an experimental framework for rapidly accessing intact natural product gene clusters that exceed conventional eDNA cloning limits (Figure 1b). In doing so, it eliminates one of the major roadblocks associated with current metagenomic natural product discovery efforts. In this study, this experimental approach provided access to both a new example of what was thought to be a rare gene cluster (FRI) as well as what appear to be new gene clusters (PKS, NPRS). The heterologous expression of large TAR-assembled gene clusters should form a basis for the identification of new natural products from eDNA. The major remaining challenge to the discovery of new natural products from uncultured bacteria, that of heterologous expression, is not unique to culture-independent studies and will likely need to be addressed using many different gene cluster specific strategies.