Re-use of this article is permitted in accordance with the Creative Commons Deed, Attribution 2·5, which does not permit commercial exploitation.
Sweetness and light: illuminating the honey bee genome
Article first published online: 27 OCT 2006
Insect Molecular Biology
Volume 15, Issue 5, pages 535–539, October 2006
How to Cite
Robinson, G. E., Evans, J. D., Maleszka, R., Robertson, H. M., Weaver, D. B., Worley, K., Gibbs, R. A. and Weinstock, G. M. (2006), Sweetness and light: illuminating the honey bee genome. Insect Molecular Biology, 15: 535–539. doi: 10.1111/j.1365-2583.2006.00698.x
- Issue published online: 27 OCT 2006
- Article first published online: 27 OCT 2006
- Received 25 June 2006; accepted after revision 25 July 2006.
Instead of dirt and poison we have rather chosen to fill our hives with honey and wax; thus furnishing mankind with two of the noblest things, which are sweetness and light. Jonathan Swift
The honey bee Apis mellifera is the first hymenopteran and the fifth insect genome to be sequenced (Honey Bee Genome Sequencing Consortium, 2006) in what promises to be a swarm of insect genome sequences expected to appear over the next few years (Table 1). The Honey Bee Genome Sequencing Project (HBGSP) was conceptualized over a period from 1998 to 2001 by the community at courses, conferences and workshops (Robinson, 1999; Maleszka, 2000; Pennisi, 2001). In addition, initial efforts were directed at physical and genetic maps of the genome (Estoup et al., 1995; Hunt & Page, 1995), collections of expressed sequence tags (Evans & Wheeler, 2000; Whitfield et al., 2002), and studies using microarrays (Kucharski & Maleszka, 2002; Takeuchi et al., 2002; Whitfield et al., 2003).
|Organism||Common name||Order||Size (Mb)||Status||Sequencing centres*||Reference/source|
|Acyrthosiphon pisum||Pea aphid||Hemiptera||525||Ongoing||BCM-HGSC||http://www.hgsc.bcm.tmc.edu|
|Aedes aegyptii||Mosquito||Diptera||1310||Complete||BI, TIGR||http://msc.tigr.org/aedes/aedes.shtml http://www.broad.mit.edu/annotation/disease_vector/aedes_aegypti/|
|Anopheles gambiae||Mosquito||Diptera||264||Complete||Celera Genomics, Genoscope, the University of Notre Dame, EBI/Sanger Institute, EMBL, Institut Pasteur, IMBB and TIGR||Holt et al. (2002); Mongin et al. (2004)|
|Apis mellifera||Honey bee||Hymenoptera||262||Complete||BCM-HGSC||Honey Bee Genome Sequencing Consortium (2006)|
|Bombyx mori||Silkworm||Lepidoptera||530||Complete||International Lepidopteran Genome Project||Mita et al. (2004); Xia et al. (2004)|
|Culex pipiens||Mosquito||Diptera||540||Ongoing||BI, TIGR||http://msc.tigr.org/c_pipiens/index.shtml http://www.broad.mit.edu/seq/msc/|
|Daphnia pulex||Water flea||Siphonaptera||200||Ongoing||JGI||wfleabase.org|
|Drosophila melanogaster||Fruit fly||Diptera||132||Complete||CGI; BDGP; BCM-HGSC||Adams et al. (2000); Celniker et al. (2002)|
|Drosophila pseudoobscura||Fruit fly||Diptera||139||Complete||BCM-HGSC||Richards et al. (2005)|
|Drosophila species†||Fruit fly||Diptera||∼135||Ongoing||Multicentre2||flybase.bio.indiana.edu|
|Glossina morsitans||Tsetse fly||Diptera||590||Ongoing||WTSI||http://www.sanger.ac.uk/Projects/G_morsitans/|
|Ixodes scapularis||Tick||Acarina||2100||Ongoing||BI, TIGR||http://www.entm.purdue.edu/igp/default.html|
|Pediculus humanus||Body louse||Phthiraptera||107||Ongoing||JCVI||http://www.entm.purdue.edu/pittendrigh_lab/default.html|
|Rhodnius prolixus||Chagas’ disease vector||Hemiptera||670||Ongoing||WUGSC||http://www.genome.wustl.edu/genome.cgi?GENOME=Rhodnius%20prolixus|
|Sand flies‡||Sand fly||Diptera||170–300||Ongoing||BCM-HGSC, WUGSC||http://www.genome.wustl.edu http://www.hgsc.bcm.tmc.edu|
|Tribolium castaneum||Red flour beetle||Coleoptera||158||Ongoing||BCM-HGSC||http://www.hgsc.bcm.tmc.edu/projects/tribolium/|
At the end of 2001 members of the honey bee community, led by Gene Robinson and Daniel Weaver, and the United States Department of Agriculture, represented by Kevin Hackett, met at the Baylor College of Medicine Human Genome Sequencing Center (BCM-HGSC) to discuss a full genome sequencing project. (Representatives of the bovine community were also at this meeting to discuss their genome project, a gathering warmly remembered as the milk and honey workshop.) A White Paper to the National Human Genome Research Institute of the NIH ensued (Honey Bee Genome Sequencing Consortium, 2002), which led to the HBGSP receiving a high priority ranking in the comparative genomics program at the NHGRI. With this support from NHGRI, and additional contributions from the USDA resulting from the efforts of Under Secretary Joseph Jen, the project began in December 2002 at BCM-HGSC.
All genome projects have their challenges as each genome and organism has its own idiosyncrasies. The honey bee was no different. A principal complication was under-representation of AT-rich regions of the genome among the small insert shotgun libraries constructed in Escherichia coli for the bulk of the sequencing. Possibly AT-rich DNA was degraded during the preparation of libraries or the clone inserts were not maintained in E. coli. To overcome this, Martin Beye supplied AT-rich DNA isolated from dye-CsCl gradients, and this was used to make more shotgun libraries to build up coverage of the AT-rich regions. It was also found that the genome was not fully represented in the large insert BAC clone library, which again could reflect either loss of some regions during clone preparation or in E. coli. The BAC problem was never solved and so these clones were used sparingly in the project. A potential problem, polymorphism making it difficult to assemble shotgun sequences, was managed using a partially inbred queen from Daniel Weaver. The DNA for sequencing came from a large number of drones. Although polymorphism was not insignificant, several polymorphic alleles per kilobase, this was a boon for identifying SNPs and quite manageable in genome assembly.
The lack of BAC clones meant that the HBGSP became a pure Whole Genome Shotgun project. In all, the project produced over three million DNA sequences for assembly, mainly from small insert clones, but including a few fosmid and BAC clones. The genome assembly used over 80% of these data. The reads were assembled into the genome with the Atlas assembly software, developed at the BCM-HGSC (Havlak et al., 2004). All overlaps between reads were first found by an alignment process and highly repeated sequences were identified because of their large number of overlapping reads. These were set aside, and then a series of steps were performed to create a layout of the reads based on their overlapping sequences. This resulted in clusters of overlapping reads (bins of reads), which end in gaps where the repeated sequences have been removed. Each bin of reads was then assembled into a consensus sequence using Phrap (Ewing & Green, 1998; Ewing et al., 1998), generally producing a single contig (a continuous stretch of sequence). Contigs were linked together into scaffolds using the read pairing information (each clone is sequenced from both ends, producing a pair of reads). The highly repeated sequences were now added back to the assembly, using the read pair information for their placement. The scaffolds were used to build chromosomes, by aligning them to the markers of the linkage map (Solignac et al., 2003, 2004, 2006), called superscaffolding. Manual superscaffolding was also performed by placing reads that were not used by these automated procedures.
The product of these activities was a draft assembly, a consensus sequence good enough to represent nearly all genes at a quality sufficient for use in searches (e.g. with Blast). There are gaps, mainly due to repeats that could not be unambiguously placed, which are of lesser interest than transcribed regions. There are low coverage regions, mainly due to AT cloning bias, but there is enough coverage to find genes in these regions. Some of the assembly was not placed on chromosomes: these tend to be short contigs that fall between markers, especially where markers are far apart. Efforts were made during the project to systematically find markers to fill in these holes so this problem was minimized.
In addition to the draft assembly, a collection of single nucleotide polymorphisms was produced as part of the project. Although the queen used was partially inbred, considerable polymorphism was present among the scores of pooled drones used as DNA sources. Analysis of these sequences at the BCM-HGSC resulted in identification of about 1 million candidate SNPs. Likewise, DNA was prepared and sequenced from Africanized honey bees and these individual sequences were compared with the assembled honey bee sequence to identify more SNPs. Both of these data sets have been submitted to dbSNP. Whitfield et al. (Honey Bee Genome Sequencing Consortium, 2006; Whitfield et al., 2006) performed similar SNP discovery efforts with these Africanized sequences as well as ESTs.
The gene list produced from the honey bee genome sequence was generated via a novel method. Five different gene lists were merged using the GLEAN program (Liu et al., 2006) to produce a consensus set that was superior to any of the individual lists (Elsik et al., 2006). In addition an ab initio list, from Fgenesh (Salamov & Solovyev, 2000), a gene prediction program that overcalls possible genes, was used. The GLEAN and ab initio gene lists were tested against a genome-wide oligonucleotide array (HBGSC, 2006), another first for insect projects. These efforts produced a list of about 10 000 genes, fewer than predicted in other insect projects. The high quality Drosophila melanogaster genome has about 13 000 predicted genes, while higher numbers are predicted for Anopheles and Bombyx. These latter genomes may be overestimates due to redundancy and polymorphism in the assemblies, while the Drosophila number is likely very accurate. Why is the Apis number so low? We believe this is mainly due to lack of EST and cDNA evidence and a conservative gene calling approach. We expect this number to increase in the future.
What are the limitations of this current low number for the honey bee gene list? We expect the deficit to be mainly in unique genes or rapidly evolving genes that are hard to identify by comparison with other genomes. In contrast, we expect gene families, which are primarily the subject of the analyses presented in the papers in this special issue, to be more completely represented. However, this is the nature of a draft genome and it provides defined measures for future upgrading.
Genome analysis was performed with maximum community engagement. The HBGSP united a broad range of scientists, from leaders in human genomics and bioinformatics at BCM-HGSC and elsewhere to members of diverse disciplinary and organism-based communities, including those studying mammals and humans. A total of 112 individuals in 63 institutions around the world signed on to analyse the newly available honey bee genome sequence, generating exciting results in many areas of biology. Themes for analysis were identified by the HBGSP and analysis teams for each of these areas were formed. The analysis themes included Anti-xenobiotic Defence Mechanisms, Bee Disease and Immunity, Brain and Behaviour, Caste Development and Reproduction, Comparative and Evolutionary Analysis, Development and Metabolism, Gene Regulation, Genome Analysis, Physical and Genetic Mapping and Chromosome Structure, Population Genetics, Repeated Sequences and Transposable Elements.
These groups manually analysed over 3000 gene models and identified changes in gene family numbers or in the genetic composition of pathways, by comparison with other insect genomes as well as other genomes, particularly the human genome. In addition there was considerable effort to confirm missing genes: these may be truly absent or they may be present but not recognized if they have a rapidly evolving sequence.
A principal focus was on the honey bee complex social life-style and how it differs from other solitary life-style insects. This large community effort is presented in a special issue of Nature (Honey Bee Genome Sequencing Consortium, 2006) and in more detail in a large number of companion papers forming this issue as well as in other journals. Papers appearing in this volume of Insect Molecular Biology provide new insights into diverse topics in honey bee biology, including neurobiology (Eisenhardt & Leboulle, 2006) and the process of caste determination that results in reproductive queens and largely sterile workers (Cristino et al., 2006; Wheeler et al., 2006). They also address some of the challenges faced by honey bees, including analyses of disease-resistance pathways (Evans et al., 2006; Zou et al., 2006; Claudianos et al., 2006) and metabolic adaptations to an all floral (pollen and nectar) diet (Kunieda et al., 2006). Several papers address ways that honey bee studies can provide insights into human health. These papers cover the genetic bases of honey bee venom allergens (Peiren, 2006), along with mechanistic insights into the remarkable longevity of queen honey bees (Corona & Robinson, 2006) and sperm stored in the spermatheca (Collins et al., 2006). All told over 50 papers will be appearing from this work.
The HBGSP has so far produced a prodigious amount of information, and online resources and database development is proceeding aggressively to manage this (Table 2). BeeBase is a dedicated analysis and display environment for the honey bee genome, headed by Christine Elsik, Texas A&M University, which will be closely tied to the famous FlyBase in collaboration with William Gelbart (Harvard University). Other databases include: NCBI Honey Bee Genomic Resource, EBI-Heidelberg, UC Santa Cruz, US-DOE, and the central site at BCM-HGSC. The BCM-HGSC site also offers the genome sequences for two key honey bee pathogens, Paenibacillus larvae and Ascosphaera apis, projects funded by USDA-ARS (Kate Aronstein and Jay Evans, Principal Investigators) and described in this special issue (Qin et al., 2006). BeeSpace is a project funded by NSF's Frontiers in Biological Research Program, headed by Bruce Schatz (University of Illinois at Urbana-Champaign), for information scientists and biologists to leverage the bee genome to create a new information environment for the study of social behaviour (http://www.beespace.uiuc.edu). New genomic resources are being created in collaboration with industry leaders, government labs, and academia, including whole genome microarrays (Viktor Stolc, NASA-Ames; and Gene Robinson, Jay Evans and Kevin White) and large-scale collections of SNPs for European and Africanized honey bees (above).
|NCBI version 4 assembly||Accession nos CM000054–CM000069, CH876891–CH878241|
|Manual Superscaffolds for chromosomes 13, 14, 15, 16||racerx00.tamu.edu/bee_resources.html|
|SNPs from BCM-HGSC||dbSNP at NCBI (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Snp&cmd=limits)|
|BCM HGSC ftp site (ftp://ftp.hgsc.bcm.tmc.edu/pub/data/Amellifera/snp)|
|SNPs from ESTs||from UIUC (titan.biotec.uiuc.edu/bee/downloads/bee_downloads.html).|
|Tiling Array Data||http://www.systemix.org|
|Gene Predictions||BeeBase racerx00.tamu.edu/downloadFASTA.html|
The HBGSP has produced an excellent draft honey bee genome sequence, enhanced by coordinating the assembly of the genome at BCM-HGSC and the mapping of the genome by Michel Solignac and colleagues at INRA, France (Solignac et al., 2003, 2004, 2006). To further increase the value of the honey bee genome sequence to researchers, a White Paper to obtain additional sequence information was submitted to NHGRI in July 2005 (Honey Bee Genome Sequencing Consortium, 2005). The project was accorded ‘High Priority’ in August 2005, and this work will begin late in 2006. The HBGSP is expected to usher in a bright era of bee research, for the benefit of agriculture, biological research and human health.
We gratefully recognize the financial and administrative support from the NHGRI-NIH throughout this project, as well as additional support from the USDA. Other support for the project has come from the Texas Agricultural Experiment Station, University of Illinois Sociogenomics Initiative, Texas Beekeepers Association, various private donors from the bee industry including Dutch Gold Honey, Golden Heritage Honey, Burleson's Honey, and Bee Weaver Apiaries, Inc. This project has been a highly socially coordinated effort by the Honey Bee Genome Sequencing Consortium.
- 2000) The genome sequence of Drosophila melanogaster. Science 287: 2185–2195. , , , , , , et al. (
- 2002) Finishing a whole-genome shotgun: release 3 of the Drosophila melanogaster euchromatic genome sequence. Genome Biol 3, RESEARCH0079. , , , , , , et al. (
- 2006) A deficit of detoxification enzymes: Pesticide sensitivity and environmental response in the honeybee. Insect Mol Biol 15: 615–636. , , , , , , et al. (
- 2006) Proteomic analyses of male contributions to honey bee sperm storage and mating. Insect Mol Biol 15: 541–549. , , , and (
- 2006) Genes of the antioxidant system of the honey bee. Insect Mol Biol 15: 687–701. and (
- 2006) Caste development and reproduction – a genome-wide analysis of hallmarks of insect eusociality. Insect Mol Biol 15: 703–714. , , , , , , et al. (
- 2006) Genomic and transcriptional analysis of protein heterogeneity of the honeybee venom allergen Api m 6. Insect Mol Biol 15: 577–581. (
- 2006) The PKA-CREB system in the honeybee is encoded by several genes. Insect Mol Biol 15: 551–561. and (
- 2006) Creating a honey bee consensus gene set. Genome Biol (in press). , , , and (
- 1995) Microsatellite variation in honey bee (Apis mellifera L.) populations: hierarchical genetic structure and test of the infinite allele and stepwise mutation models. Genetics 140: 679–695. , , and (
- 2000) Expression profiles during honeybee caste determination. Genome Biol 2, RESEARCH0001. and (
- 2006) Immune-related genes and honey bee disease responses. Insect Mol Biol 15: 645–656. , , , , , , et al. (
- 1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8: 186–194. and (
- 1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8: 175–185. , , and (
- 2004) The Atlas genome assembly system. Genome Res 14: 721–732. , , , , , , et al. (
- 2002) The genome sequence of the malaria mosquito Anopheles gambiae. Science 298: 129–149. , , , , , , et al. (
- Honey Bee Genome Sequencing Consortium (2002) Proposal for the Sequencing of a New Target Genome: White Paper for a Honey Bee Genome Project http://www.genome.gov/Pages/Research/Sequencing/SeqProposals/HoneyBee_Genome.pdf.
- Honey Bee Genome Sequencing Consortium (2005) Upgrading the honey bee genome sequence white paper. http://www.genome.gov/Pages/Research/Sequencing/SeqProposals/HoneyBeeSeq-AddCoverage.pdf.
- Honey Bee Genome Sequencing Consortium (2006) Insights into social insects from the genome of the honeybee Apis mellifera. Nature (in press).
- 1995) Linkage map of the honey bee, Apis mellifera, based on RAPD markers. Genetics 139: 1371–1382. and (
- 2002) Evaluation of differential gene expression during behavioral development in the honeybee using microarrays and northern blots. Genome Biol 3, RESEARCH0007. and (
- 2006) Unique characteristics of the honeybee genes for carbohydrate-metabolizing enzymes as revealed by the genome annotation. Insect Mol Biol 15: 563–576. , , , , , , et al. (
- 2006) Consensus eukaryotic gene prediction by latent variable modeling. Genome Res (in press). , , and (
- 2000) Molecules to behaviour in the honeybee – the emergence of comparative neurogenomics. Trends Neurosci 23: 513–514. (
- 2004) The genome sequence of silkworm, Bombyx mori. DNA Res 11: 27–35. , , , , , , et al. (
- 2004) The Anopheles gambiae genome: an update. Trends Parasitol 20: 49–52. , , , and (
- 2001) Insects rank low among genome priorities. Science 294: 1261–1262. (
- 2006) Genome sequences of the honey bee pathogens Paenibacillus larvae and Ascosphaera apis. Insect Mol Biol 15: 715–718. , , , and (
- 2005) Comparative genome sequencing of Drosophila pseudoobscura: chromosomal, gene, and cis-element evolution. Genome Res 15: 1–18. , , , , , , et al. (
- 1999) Integrative animal behaviour and sociogenomics. Trends Ecol Evol 14: 203–205. (
- 2000) Ab initio gene finding in Drosophila genomic DNA. Genome Res 10: 516–522. and (
- 2003) Five hundred and fifty microsatellite markers for the study of the honeybee (Apis mellifera) Genome Mol Ecol Notes 3: 307–311. , , , , , , et al. (
- 2004) A microsatellite-based linkage map of the honeybee, Apis mellifera L. Genetics 167: 253–262. , , , , and (
- 2006) The genome of Apis mellifera: dialog between mapping and sequencing. Genome Biol (in press). , , , , , , et al. (
- 2002) Identification of genes expressed preferentially in the honeybee mushroom bodies by combination of differential display and cDNA microarray. FEBS Lett 513: 230–234. , , , , , , , and (
- 2006) Expression of insulin pathway genes during the period of caste determination in Apis mellifera. Insect Mol Biol 15: 597–602. , and (
- 2002) Annotated expressed sequence tags and cDNA microarrays for studies of brain and behavior in the honey bee. Genome Res 12: 555–566. , , , , , , et al. (
- 2003) Gene expression profiles in the brain predict behavior in individual honey bees. Science 302: 296–299. , and (
- 2006) Thrice out of Africa: nature and human-facilitated expansions of Apis mellifera. Science (in press). , , , , , , et al. (
- 2004) A draft sequence for the genome of the domesticated silkworm (Bombyx mori). Science 306: 1937–1940. , , , , , , et al. (
- 2006) Comparative analysis of serine proteases-related genes in the honey bee genome: possible involvement in embryonic development and innate immunity. Insect Mol Biol 15: 603–614. , , , and (