Development of bacterial artificial chromosome library resources for parasitoid Hymenoptera (Nasonia vitripennis and Nasonia giraulti: Pteromalidae)

Authors


Mónica C. Muñoz-Torres, Department of Biology, Georgetown University, Washington, DC 20057, USA. Tel.: +1 202 687 5855; fax: +1 202 687 5662; e-mail: mcm262@georgetown.edu

Abstract

The species of the genus Nasonia possess qualities that make them excellent candidates for genetic and genomic studies. To increase the wealth of genomic resources for the genus we constructed publicly available bacterial artificial chromosome (BAC) libraries for Nasonia vitripennis and Nasonia giraulti. Libraries have 36 864 clones each, empty-vector contents of ∼2% and average insert sizes of 113.1 and 97.7 Kb, respectively, representing 12 and 11 genome equivalents. The N. vitripennis library was used for genome sequence assembly and in efforts at positional cloning of a developmental gene. The genome assembly of N. vitripennis is currently composed on 6181 un-joined scaffolds. These BAC libraries can be used to identify and close regions between scaffolds of the genome assemblies of both species.

Introduction

The large diversity of extant insect forms and life styles has made them subjects for phylogenetic, evolutionary and ecological studies. Increasing our understanding on the structure and organization of insect genomes and expanding our knowledge on the location and function of genes of interest requires the development of tools that allow us to survey these genomes. Until recently, insect genomics has been dominated by studies performed in dipterans such as drosophilids and mosquitoes.

Genomic resources have been utilized in a wide array of genetic studies such as: (1) positional cloning of Quantitative Trait Loci (QTL) for Plasmodium gallinaceum susceptibility in Aedes aegypti (Jiménez et al., 2004); (2) isolation of the gene underlying the sex determination locus in honey bee (Beye et al., 2003); (3) identification of candidate genes for drought-stress tolerance (Wang et al., 2006) and (4) disease resistance (Ronald, 1997) in rice (Oryza sativa L); as well as (5) genome mapping in cassava (Manihot esculenta Crantz) to identify candidate genes to overcome micronutrient malnutrition challenges worldwide (Fregene et al., 2001). Such diverse research projects have in common the use of a bacterial artificial chromosome (BAC) library to interrogate the genome in search of practical answers to each problem. Since their inception, BAC libraries have been regarded as a means to facilitate the construction of DNA libraries of complex genomes with fuller representation and subsequent rapid analysis of complex genomic structure (Shizuya et al., 1992). These libraries are a common entry point into projects involving functional and structural genomics, gene identification and construction of physical maps, to name a few.

There are over 100 peer-reviewed publications spanning 22 years of research on generation and analysis of insect EST libraries (e.g. Bombyx mori summarized in Futahashi et al., 2008; libraries for seven beetle species (Coleoptera) are reviewed in Theodorides et al., 2002; Ceratitis capitataGomulski et al., 2008; Apis melliferaWhitfield et al., 2002). Genetic linkage maps have been constructed for several insect species such as bumblebee Bombus terrestris (Wilfert et al., 2006), the warningly coloured butterfly Heliconius erato (Tobler et al., 2005) and the red flour beetle Tribolium castaneum (Lorenzen et al., 2005) to name just a few.

BAC libraries have propelled advances in gene discovery and genome-wide analysis for only a reduced number of insect species; other than the well-developed genetic model Drosophila melanogaster, genomic resources for the remainder of the insects have long been surprisingly inadequate. BAC libraries are currently available for the social hymenopterans honey bee (Apis mellifera L.) (Tomkins et al., 2002) and bumblebee (Bombus terrestris L.) (Wilfert et al., 2008), for the mosquitoes Aedes aegypti (Jiménez et al., 2004) and Anopheles gambiae (Hong et al., 2003), red flour beetle Tribolium castaneum (Brown et al., 2002) and silk moth Bombyx mori (Mita et al., 2002). Except for bumblebee, genomes of all the aforementioned species have now been sequenced, and BAC libraries were used to build preliminary physical maps and/or to provide sequence scaffolds to facilitate the sequence assembly process.

Here we describe the construction and characterization of two deep-coverage BAC libraries for species of parasitoid wasps of the genus Nasonia, namely N. vitripennis and N. giraulti (NV and NG, respectively, and hereafter). Both libraries reported here offered a starting point for the design of genomic projects for the genus Nasonia, and constitute the first of their kind for parasitoid wasps. Parasitoid wasps have been subjects of ecological, evolutionary, developmental and genetic research for almost 50 years; they belong to a large and extremely important group of Hymenoptera with more beneficial insects to humans than any other group. Nasonia is an appropriate candidate for studies in genetics and genomics due to ease of rearing, small genome size, haplodiploidy, a wealth of visible and molecular markers available for mapping, and healthy inbred isogenic lines. In 2004 the National Institutes of Health selected NV for six-fold coverage genome sequencing (Werren et al., 2004) and at the time of this publication the Nasonia Genome Sequencing Consortium (NGSC) has published their findings on the first draft version of the sequence assembly, Nvit_1.0 (Werren et al., 2010). A rough draft of the NG genome has also been produced. The resources here described provide a useful complement to the draft of sequence for a variety of genomic and genetic applications. The development of genomic resources for a parasitoid is also likely to yield many benefits for human health and our understanding of important biological processes.

Results

Bacterial artificial chromosome library construction and characterization

Nasonia vitripennis. The NV BAC library contains 36 864 clones. Insert size estimates with pulse field gel electrophoresis (PFGE) of 384 randomly sampled clones yielded an average of 113.1 ± 39 Kb (Fig. 1a) with a range of 8–300 Kb. Less than 2% of the clones contained no inserts. Based on an estimated haploid genome size of 335 Mb (Rasch et al., 1975; Beukeboom and Desplan, 2003), this library represents 12.4 genome equivalents and allows any one particular NV sequence to be recovered with a probability greater than 99% from at least one clone, as supported by the library screening with known genes, described below.

Figure 1.

Histogram of insert size distribution of bacterial artificial chromosome (BAC) clones. (A) Nasonia vitripennis (n= 384) and (B) Nasonia giraulti (n= 192) BAC libraries.

Nasonia giraulti. The NG BAC library has 36 864 clones. One hundred and ninety-two clones were randomly selected and analysed using PFGE to estimate library insert size and other features. Based on an estimated genome size of 330 Mb (J. H. Werren, unpublished data) this library covers the NG genome 10.9 times, its estimated average insert size is 97.7 ± 25.9 Kb (Fig. 1b) ranging from 9 to 135 Kb. This allows any particular species-specific sequence to be recovered with a probability greater than 99% from at least one clone. Less than 3% of the clones in the library contain empty vectors.

Library screening and fingerprinting analysis

To test library coverage and isolate genomic regions putatively associated with genes of interest, we screened the NV BAC library with eight probes derived from candidate genes involved in evolution of wing cell size (ws1), insulin regulation pathway (target of rapamycin (tor) and S6 kinase (S6k)), tumour suppressing (phosphatase and tensin homologue (Pten)) and embryonic development (zernkult (zen), hairy (hry), caudal (cad) and Antennapedia (Antp)). Library coverage and insert quality experiments for the NG BAC library were performed using ws1. Table 1 lists primers used to generate these probes.

Table 1.  Probes designed for hybridization of the Nasonia vitripennis and Nasonia giraulti bacterial artificial chromosome libraries
Gene*Primers (5′-3′)Source organismBAC addressBAC GenBank ID
  1. Positive clones were fingerprinted using HindIII and assembled into contigs. Clones with largest inserts were selected for complete sequencing. §Unpublished data. *Includes abbreviated name, GenBank ID and/or reference when available. See text for complete gene names. †NV_Bb for N. vitripennis bacterial artificial chromosome (BAC) library, NG_Bb for N. giraulti.

AntpDegenerate primersDegenerate primersNV_Bb_49K07GQ358099§
M Munoz-Torres§   
cadcad-F CATGAATTCAARACKCGNACKAARGAYAARTADrosophila melanogasterNV_Bb_66L02AC185337
 cad-R TGAGTCGACRTTYTGRAACCADATYTTNAC   
hryC Desplan§D. melanogasterNV_Bb_67B15AC185339
NM_001014577    
PtenPten-F GATGACCACAATCCACCCCDegenerate primersNV_Bb_85B05AC185134
J. Romero-Severson§Pten-R AANGTRTTYAACCARAARTG   
S6Ksók-F GCATTTCAGACGGAGD. melanogasterNV_Bb_29I19AC185290
J. Romero-Severson§sók-R GAATGTCCTTTCCGG   
tortor-F GGAGCTCGACCAGGTD. melanogasterNV_Bb_44F11AC185143
J. Romero-Severson§tor-R TTGACCATAAGCTGC   
WS1wsl-F GTCGCACCACTTGCTACAGAN. vitripennisNV_Bb_01N08AC185338
J. Werren§wsl-R GAATTGGCCAACCTTCATTG NV_Bb_02K01AC185141
01N08.bb-F GGCGAACTGCATACATGCGC NV_Bb_52F17AC185288
01N08.bb-R TCTGAGCAGCTAATAAAGGCAGAGC NV_Bb_47D01AC185330
02K01.sp-F CTACAGTGGTGAGCAGAGGTC NV_Bb_61B24AC185331
02K01.sp-R TCCTTGCGGTACTCTCTTG NV_Bb_89M15AC185140
19G07.sp-F CAGTGATTTTGCACTCGTTC   
19G07.sp-R TTACCTTTACGGCTTGTTTG   
67D12.gg-F AGCTTGATGCAGCGTGACTTC   
67D12.gg-R AAATGTGCGACGAGCTGAGC   
zenC. Desplan§D. melanogasterNV_Bb_77L07AC185133
NM_057445    

Excluding Pten and Antp, probes consisted of PCR products obtained with primers designed directly from D. melanogaster sequences (see Table 1). The Pten and Antp probes were designed with degenerate primer approaches as described in Baudry et al. 2006. NG males have large wings and fly and NV males have small vestigial wings and are capable of limited flight. A QTL analysis revealed four to five genetic regions responsible for most of these differences and the locus ws1 is believed to be responsible for approximately 50% of the variation (Weston et al., 1999). We obtained a total of 19 positive hits from the NV library with the ws1 probe, and 10 positive hits from NG. Additionally, using NV DNA template we obtained 12 positive hits with the tor probe, 5 with S6K, 27 with Pten, 8 hits with zen, 4 for hry, 12 for cad and 11 positive hits with probe DNA from the Antp gene. Except for the one underlying ws1, all genes are known to be present in single copies in the D. melanogaster genome according to FlyBase (Wilson et al., 2008). These results demonstrate the redundancy of the libraries.

All positive hits obtained with the single-copy probe linked to ws1 were fingerprinted using HindIII; after manual analysis to filter out poorly resolved fingerprints, 17 clones were assembled into one contig using FingerPrinted Contigs (FPC), as described in the Materials and Methods section. BAC-end sequences of external-most clones in the contig were used to obtain PCR products, which were also labelled to screen the BAC filters again. The probes used were labelled as 67D12.gg, 01N08.bb, 02K01.sp. and 19G07.sp. (Table 1, Fig. 2). Thirty clones were brought together into a contig (Contig Nv1, Fig. 2), which were used to design primers to screen recombinants for finer scale mapping of ws1. Results of this cloning effort are reported elsewhere (Werren et al., 2010), but demonstrate the utility of the BAC library for contig construction and chromosomal walking.

Figure 2.

Contig Nv1. Not-to-scale representation of contiged bacterial artificial chromosome (BAC) clones after fingerprinting analyses. BACs in this figure were retrieved with subsequent hybridizations starting with the AFLP-derived marker ‘Af1’, located near ws1 and continuing with end-probes 67D12.gg, 01N08.bb, 02K01.sp. and 19G07.sp. ‘G’ marks the T7 end of the BAC. ‘B’ denotes the SP6r end of the BAC. See text for detailed information.

After corroborating clone identity and insert size, positive hits from the NV library with the largest inserts were selected for full-length sequencing to make these resources available for further gene characterization and to highlight the relevance of the availability of new genomic resources for the Hymenoptera research community. Clone sequences were deposited in GenBank and identification tags are shown in Table 1.

Discussion

We here describe the construction and characterization of BAC libraries for two species of parasitoid wasps, Nasonia vitripennis and Nasonia giraulti. We screened the libraries with genes involved in a diverse range of biological processes and retrieved an average of 12 candidate BACs per screened gene – evidence of the redundancy in coverage of our libraries. A number of candidate BACs were chosen from each species for full-length sequencing to provide resources for further characterization of each gene of interest. Choosing candidate BACs for sequencing provided data for assembly verification performed by the Nasonia Genome Sequencing Consortium (Werren et al., 2010) as mentioned below.

A fingerprinting approach was effectively used to assemble a set of overlapping BAC clones in a region containing the ws1 locus (Weston et al. 1999); this led to the identification of a candidate gene responsible for the large wing size difference between males of the two closely related species of parasitoid wasps (Werren et al., 2010). These results present Nasonia as a model system for genetic analyses of morphological differences between natural populations or closely related species. Analyses like these may be challenging either because of small or subtle differences, or the impossibility to perform crosses, even between closely related species. However, in the case of Nasonia, the four species described in the genus are interfertile when cured of their cytoplasmic bacterial infections (Wolbachia), and they are able to form viable and fertile hybrids (Breeuwer & Werren 1990, 1995).

The natural history and genetic features described above give Nasonia the potential to become a model organism for a number of studies, ranging from fundamental evolutionary biology to exploring pharmaceutical uses. Tractability and all other features described for the species of the genus have made them a primary model for parasitoid genetics (Whiting, 1967). More is known about the biology of Nasonia than any other parasitoid hymenopteran, and an incredibly active research community will ensure that all resources generated for this group of species are exhaustively utilized for advanced research. For instance, identification of candidate genes responsible for variations in a certain trait of interest requires access to the physical genome and not only to recombination rates. New deep-coverage BAC libraries are now available for Nasonia researchers to combine both strategies, allowing the rapid identification of candidate genes as well as conducting functional genome-wide scale experiments in both species.

Application to Nasonia genome sequence assembly

The Nasonia species complex was sequenced using both the Sanger sequence technology on the Applied Biosystems 3730 platform, and 45 bp fragment reads on the Illumina GA2 platform. Over three million successful sequence reads from plasmid, fosmid and BAC subclones were assembled onto 6181 scaffolds for a total of 295 Mb for NV (Werren et al., 2010). Sequences were grouped into ordered, oriented contigs with the help of whole-sequenced BACs using the Atlas Genome Assembly system (Havlak et al., 2004) at the Human Genome Sequencing Center at Baylor College of Medicine (HGSC, BCM, http://www.hgsc.bcm.tmc.edu). Pairing whole-genome shotgun reads with the BAC clones allows for a localized assembly with reduced computational load and facilitates sorting out highly repetitive regions (Havlak et al., 2004). The current genome assembly for NV is still composed of a large number of non-joined segments. The current BAC library for NV will be useful for closing gaps within the genome and joining scaffolds by using BAC-end sequencing, full BAC sequencing and fingerprinting to identify scaffolds within contiguous sets of BAC clones. Similarly, the low 1X Sanger coverage of the NG genome did not yield a well-assembled genome (J. Werren, pers. comm..) and currently the Sanger reads are aligned to the NV assembly, but not assembled. BAC-end sequencing of the NG library can be used to join these scaffolds and examine the respective genomes for sequence rearrangements.

Clones and filters from the NV and the NG BAC libraries are publicly available and may be ordered from the Clemson University Genomics Institute (http://www.genome.clemson.edu/). The use of these BAC libraries should make reference to this resource.

Experimental procedures

Bacterial artificial chromosome library construction

DNA for library construction was extracted from highly inbred lines of NV and NG. For both species, high-molecular-weight DNA from yellow pupae was prepared following a procedure adapted for honey bees (Tomkins et al., 2002). DNA was partially digested with HindIII for 20 min at 37 °C. Size selection was performed on the fragments via two consecutive rounds of PFGE. Fragments were then ligated into the vector pIndigoBAC 536 (Peterson et al., 2000; Luo & Wing, 2003) for 16 h at 16 °C. Vectors were transformed into Escherichia coli ElectroMAX DH10B cells (Invitrogen, Carlsbad, CA, USA) using electroporation, and allowed to replicate at 325 rpm for 1 h at 37 °C. Recombinant colonies were picked using a Genetix Q-bot (Genetics, Boston, MA, USA) and stored individually in 384-well plates at –80 °C.

Bacterial artificial chromosome library characterization

DNA from BAC clones was prepared from 900 ul cultures of Terrific Broth (Gibco)-Chloramphenicol (12.5 ug/ul) in 96-well format, inoculated with 1.5 ul of BAC culture. After 18 h, cultures were treated with a modified alkaline lysis method. To determine insert sizes, their distribution and percent of clones without an insert, approximately 200 ng of BAC DNA from randomly selected clones were digested using 7 units of NotI for 16 h at 37 °C. DNA digestions were analysed by PFGE in 1% agarose gels for 15 h at 14 °C (6 v/cm, switch time of 5–15 s) and stained with Ethidium Bromide for 20 min.

Bacterial artificial chromosome library screening

High-density colony filters for hybridization-based screening of the NV and NG libraries were prepared using a Q-bot (Genetics). Clones were gridded in double spots using a 4 × 4 array with 6 fields on 11.5 × 22.5 cm Hybond N+ filters (GE Healthcare, Piscataway, NJ, USA). This gridding pattern allows 18 432 clones to be represented per filter. Colony filters were grown and processed using standard techniques (Sambrook et al., 1989). Hybridization probes were designed by cutting PCR fragments from ethidium bromide-stained 1% agarose gels and DNA was extracted using a QIAEX II gel extraction kit (QIAGEN, Valencia, CA, USA). The NV library was screened with eight PCR products obtained from genes involved in evolution of wing cell size, insulin regulation, or embryonic development. The NG library was screened with the probe developed for the wing cell size gene and subsequent BAC-end probes designed for this experiment, as described in the results section. Radio labelling of probe DNA and hybridization of colony filters were performed using standard techniques (Sambrook et al., 1989) with the following modifications: preparation of single stranded radio labelled DNA probes from PCR products was done according to manufacturer's instructions using the Random Primed DNA Labeling Kit DECAprime II (Applied Biosystems, Foster City, CA, USA). In each case 30uCi of radioactive phosphorus were used to label approximately 60 ng of probe DNA. Hybridizations were performed for at least 16 h at 65 °C and filters were washed twice at 65 °C (30 min per wash) in a 1X SSC/0.1% SDS solution first, and a 0.5X SSC/0.1% SDS solution the second time. Hybridized BAC filters were imaged in a Storm Scanner (GE Healthcare) and positive hits were scored with HybSweeper (Lazo et al., 2005).

Fingerprinting analysis

BAC-DNA fingerprinting was performed using techniques established by Chen et al. (2002) and Marra et al. (1997). Briefly, BAC DNA was prepared as described above and samples for fingerprinting were digested with the restriction endonuclease HindIII, electrophoresed on 1% agarose gels for 15 h at 60 V, and stained with SybrGold (Invitrogen) for 1 h. Gels were imaged in a Storm Scanner (GE Healthcare) and fingerprinting data were scored using Image3 (v.3.10, http://www.sanger.ac.uk/software/Image/im309). All bands were manually checked; bands below 1 Kb were ignored due to bad image resolution; these accounted for <1% of the total length of BAC inserts. Contig building was done using FingerPrinted Contigs (FPC v.8; Souderland et al., 2000) at a high stringency, with a Tolerance value of 7 and a Minimum Cutoff value of 1e-9. Poorly resolved fingerprints, which are clones with band patterns of 4 or less bands, were automatically excluded by Image3. FPC automatically excluded those clones with band patterns not significantly similar to any other clone in the data set from contig analysis.

Acknowledgements

The construction and characterization of resources presented in this report were partially funded by a grant from the U.S. National Science Foundation (IOS.0208278) to J. Romero-Severson and a National Institutes of Health grant (5R01 GM070026) to J.H. Werren. The authors would like to thank two anonymous reviewers for their detailed suggestions. MM-T, CS, and BB wish to thank Dr Amy Lawton-Rauh for helpful reviews on this manuscript, and Mrs Jeanice Troutman and Mr Michael Atkins for their technical assistance.

Ancillary