SEARCH

SEARCH BY CITATION

Keywords:

  • insect;
  • microarray;
  • molecular marker;
  • SNP;
  • SFP;
  • transposon display

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Conventional marker systems
  5. Major applications of molecular markers in studying insect ecology
  6. Novel marker systems
  7. Some concerns on marker applications in insect research
  8. Large-scale genotyping
  9. Concluding remarks
  10. Acknowledgements
  11. References

Insects comprise the largest species composition in the entire animal kingdom and possess a vast undiscovered genetic diversity and gene pool that can be better explored using molecular marker techniques. Current trends of application of DNA marker techniques in diverse domains of insect ecological studies show that mitochondrial DNA (mtDNA), microsatellites, random amplified polymorphic DNA (RAPD), expressed sequence tags (EST) and amplified fragment length polymorphism (AFLP) markers have contributed significantly for progresses towards understanding genetic basis of insect diversity and for mapping medically and agriculturally important genes and quantitative trait loci in insect pests. Apart from these popular marker systems, other novel approaches including transposon display, sequence-specific amplification polymorphism (S-SAP), repeat-associated polymerase chain reaction (PCR) markers have been identified as alternate marker systems in insect studies. Besides, whole genome microarray and single nucleotide polymorphism (SNP) assays are becoming more popular to screen genome-wide polymorphisms in fast and cost effective manner. However, use of such methodologies has not gained widespread popularity in entomological studies. The current study highlights the recent trends of applications of molecular markers in insect studies and explores the technological advancements in molecular marker tools and modern high throughput genotyping methodologies that may be applied in entomological researches for better understanding of insect ecology at molecular level.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Conventional marker systems
  5. Major applications of molecular markers in studying insect ecology
  6. Novel marker systems
  7. Some concerns on marker applications in insect research
  8. Large-scale genotyping
  9. Concluding remarks
  10. Acknowledgements
  11. References

Insects represent a major life form on earth. Nearly 900 000 insect species have been discovered by now, comprising 75% of all the recorded animal species. Insects can be found in almost all ecosystems; in deserts as well as in the Antarctic. Even the oceans have not been spared. Some insect species (Halobates) are also known to inhabit on ocean surfaces (Ikawa et al. 2002). This truly demonstrates the immense ecological adaptation of insects and their extraordinarily adaptable life forms in diverse environments where other animals can't survive. This biological success is attributed to the enormous diversity of their size and body structure, mating strategies, and the remarkable adaptive abilities in feeding and behaviour. Such diversity makes insects impose a delicate relationship to human life. Insects are beneficial as they pollinate crops, act as natural enemies of damaging pests, and produce useful products such as honey, silk and wax for humans. At the same time, insects are major pests of our food crops, act as vectors for transmitting deadly diseases, cause damage to our urban infrastructure, environment, forest and natural resources, and sometimes interfere in our international trade, commerce and economic affairs. Thus, very often one wonders what makes insects so diverged and how the genes and genetic make-up of insects contribute to their adaptable life forms, and most importantly, how they may affect human life either directly or indirectly. Insect populations even within a species vary in their behaviour and morphology that attributes to their complex interaction with the environment (Dempster & McLean 1999). Thus, the study of insect ecology is important to understand their evolution and diversification, and their influence on the functional and trophic links between different components of associated habitats (Speight et al. 2005).

Significant progress has been made in understanding insect diversity and ecology by using classical genetic principles. Common visible markers including eye colour, body spots or bands and hairs or spines were used as phenotypic markers in studying pattern of dispersal, mating behaviour and inheritance of genetic traits in insects (Bartlett et al. 1968; Fay & Craig 1969; Bond et al. 1970; Bartlett & Butler 1975). Although the phenotypic markers are found at all time of life span of the organism and can be readily used for studies in field conditions, they suffer from many practical limitations. The major drawback is that these visible phenotypes are relatively infrequent and often hard to score. Also, it is difficult and time-consuming to induce genetic mutations in laboratory populations to develop new phenotypic markers, and sometimes they interfere with the overall fitness of the organism. Furthermore, identification of phenotype markers must be accompanied with the information as to how the trait is inherited to the offspring before they are used as faithful markers. Because the phenotype markers are rare, use of these markers in mapping a trait is difficult. For all such difficulties and with the concurrent advancement in biochemical methodologies, protein markers then became more popular. Protein markers made a significant contribution in the early periods when DNA technologies were not so much advanced as it is now (Loxdale & Lushai 1998). The electrophoretic pattern of allozymes as detected in polyacrylamide or starch gels were used to identify different alleles of a given gene. Based on banding patterns, it was possible to determine the extent of heterogeneity of a locus in a population based on the variant allozymes produced by the individuals. Banding pattern of isozymes (products of different loci, but with similar function) could also be used for studying genetic variation within and between populations (Steiner & Joslyn 1979; Bartlett 1981; Loxdale et al. 1985). Protein markers have also been successfully exploited in studying insecticide resistance (Maa & Terriere 1983), pathogen identification (Wilding et al. 1993), chromosome mapping (Loukas et al. 1979) or detection of prey in insect predators (Solomon et al. 1996) to name a few. However, with the development of DNA-based marker systems, it was found that greater level of polymorphism could be obtained by using DNA markers than by using protein markers in many situations (Richardson et al. 1986). This is because mutations in introns or even in the codons of a gene can potentially provide more variation at the DNA level than at the protein level. Moreover, DNA samples are more stable than proteins and are unchanged for detection at all time and tissue of the organism unlike proteins. Thus, DNA markers became the most common yardsticks for measuring genetic differences between individuals or within and between related species or populations. The unprecedented advancements in modern molecular biology, particularly in those of DNA marker technology, have already created a wealth of technical know-how that finds useful applications of these markers especially in molecular ecology research in insects (Hoy 2003).

Over the last past 15 years or so, DNA makers have made a significant contribution to rapid rise of molecular studies of genetic relatedness, phylogeny, population dynamics or gene and genome mapping in insects (Loxdale & Lushai 1998; Avise 2000, 2004; Severson et al. 2001; Heckel 2003). Many improvements have been made to enhance reproducibility, power of resolution (ability to reveal more informative polymorphisms from less number of loci) and more importantly, the cost and time consumption in developing and scoring the marker loci. Since then, application of DNA markers in entomology has gone through and is still undergoing a noticeable change in continuously accommodating new technologies for robust and less expensive genotyping methods. Entomologists are getting more accustomed with the refinement of marker systems and are applying the new techniques to study insect genomes more efficiently. Traditionally, mitochondrial DNA (mtDNA) has been a choice of marker for studying genetic variations in insect species. Mitochondrial gene sequences have been used for phylogenetic and population-genetic studies to construct evolutionary history of related insect species. More precisely, these markers have provided invaluable insights into the history and genetic basis of speciation and phenotypic evolution of recently diverged species. Moreover, mtDNA sequences are often transferred to the nucleus giving rise to the so-called nuclear mtDNA (Numt). Variations in copy number and size of Numts are also used to assess the interspecific diversity of these loci in insects (Richly & Leister 2004). Microsatellites are also used as popular markers in insect studies because of high abundance and highly variable nature of their loci in genome. Despite their rising popularity in insect research, microsatellites have not yet become a major marker system in entomology. Recent studies have shown that microsatellite markers can be faithfully extended beyond population genetics and can be used for studying phylogenetic relationships of closely related species with fewer loci than previously assumed (Schlotterer 2001). However, with the introduction of random amplified polymorphic DNA (RAPD) technique (Williams et al. 1990), use of (polymerase chain reaction) PCR-based fingerprinting assays gained popularity mainly because of the easy-to-perform and easy-to-score procedures for these marker loci. But, because RAPD markers suffer from poor reproducibility, use of these markers in insect ecological studies was limited (Black 1993). Then, the development of amplified fragment length polymorphism (AFLP) (Vos et al. 1995) technology was adopted as a better alternative to generate more numbers of multiloci reproducible markers more reliable than RAPD markers.

Today, molecular marker technology has reached a new height with the power and the precision of modern genomic tools. High throughput genotyping methods are now available that can be used for genome-wide mutation screening in hundreds or even thousands of individuals (as much as 300 000 genotypes) as quickly as in a day! In principle, if the loci sequence is known in one individual, genotyping them in thousands of individuals in that population is not a technical bottleneck any more. Innovative assay methods and implementation of advance instrumentations have been devised to accurately genotype large numbers of markers in parallel (Lyamichev et al. 1999; Kwok 2001; Tsuchihashi & Dracopoli 2002). These large-scale genotyping techniques are reliable enough to discriminate allelic variations even at single nucleotide level. Also, these methods are less time-consuming and are less prone to sample-to-sample variations associated with single tube genotyping methods. Moreover, the assay costs are also gradually decreasing by continuous instrumental innovations and often by commercialization. This review is intended to study the current status of our knowledge on application of molecular markers, and the sciences and techniques of these systems, to study genetic diversity in insects that shapes their evolution and ecology. One of the major objectives of this work is to explore the potential utility of alternate novel marker systems and modern high throughput genotyping technologies applicable in these studies for a better understanding of insect ecology.

Conventional marker systems

  1. Top of page
  2. Abstract
  3. Introduction
  4. Conventional marker systems
  5. Major applications of molecular markers in studying insect ecology
  6. Novel marker systems
  7. Some concerns on marker applications in insect research
  8. Large-scale genotyping
  9. Concluding remarks
  10. Acknowledgements
  11. References

DNA markers such as mtDNA, RAPD, AFLP, microsatellites and ESTs have been used as popular marker systems in insect genetics research. Although there are inherent advantages and disadvantages associated with each marker systems, the choice of applying them depends upon the objectives of a study. Mitochondrial DNA is used for marker analyses largely because of their maternal inheritance, haploid status, and high rate of evolution. In insects, a further advantage of using mitochondrial markers is that many of these loci can be readily amplified by using universal primers designed from highly conserved mitochondrial genes (Roehrdanz 1993; Kambhampati & Smith 1995; Lunt et al. 1996; Zhang & Hewitt 1997; Lanave et al. 2002). Moreover, upon amplification, these loci can be used for genotyping by restriction fragment length polymorphisms (RFLP) by easy and simple restriction digestions and gel electrophoresis (Behura et al. 2001b). Because of higher evolutionary rate, these marker loci are preferred to use in constructing phylogeny and inferring evolutionary history for ecological studies. However, often in cases, particularly in insects, these markers are difficult to use in evolutionary and phylogenetic studies as mtDNA undergoes selective sweep, paternal leakage or even nuclear integration (Zhang & Hewitt 1996; Behura et al. 2001b; Hurst & Jiggins 2005). Another class of markers generated by arbitrarily primed PCR-based DNA fingerprinting methods, such as RAPD, DNA amplification polymorphisms (DAF) and arbitrarily primed-PCR (AP-PCR), are easy to perform and relatively easy to score (Black 1993). However, these markers, because of poor reliability and reproducibility, are not suitable for population studies (Black 1993). For example, in gypsy moth (Lymantria dispar Linnaeus), it was observed that a fragment was present in the RAPD amplification products of F1 progeny although it was not amplifiable from either parent (Reineke et al. 1999). This is particularly a problem when dealing with pooled DNA as template in RAPD-PCR, as in the case of bulk segregant analysis (BSA) or while comparing individual fingerprints of a population where genetic admixture is suspected. However, RAPD loci can be used for diagnostic purposes after converting these loci into reliable STS (sequence tagged site; Hunt & Page 1994; Laayouni et al. 2005), SCAR (sequence characterized amplified region; Behura et al. 1999, 2000) or STAR (sequence-tagged RAPD; Fulton et al. 2001) by sequencing the RAPD fragments and designing sequence specific PCR primers. On the other hand, AFLP technique (Vos et al. 1995) is a better marker system as compared to RAPD and RFLP methods. It combines the easiness of RAPD and the reliability of RFLP. The genomic DNA is first digested with two restriction enzymes, one 4-cutter and another 6-cutter. The digested DNA is then ligated with adapters compatible to the two restriction sites. The ligated material is then used as template for selective amplification with primers extended at the 3′ end with selective nucleotides. The technique is relatively time-consuming and often utilizes radiolabelling of primers. Also, scoring of the loci requires electrophoresis of the amplified products in sequencing gels or in automated sequencing machine. Sometimes, AFLP loci may contain repeats such as microsatellite in their sequences and may pose difficulty in scoring the alleles (Wong et al. 2000). However, the major advantage of AFLP system is that the selective PCR generates more numbers of marker loci, on average 50–100 bands, per primer pairs per sample. Also, AFLP loci are highly reproducible and are generally codominant in nature, although sometimes dominant AFLP loci are also amplified (Wong et al. 2000). AFLP markers are suitable for mapping of genes and quantitative trait loci and to generate linkage maps of genomes (Heckel et al. 1999; Behura et al. 2004; Ruppell et al. 2004). Another feature of the AFLP technique is that it can be applied to cDNA and the results can be used to detect differentially expressed genes in insects (Reineke et al. 2003).

The use of microsatellite markers in ecological studies is preferred as these are highly polymorphic in nature and are also abundant to generate large numbers of markers compared to the AFLP system. Large numbers of microsatellites can be isolated by generating a genomic library of small fragments either amplified by PCR or generated by partial digestion of genomic DNA (Lunt et al. 1999) and enriched with simple sequences repeats (Ostrander et al. 1992; Kandpal et al. 1994). Also, microsatellite markers are amenable for high throughput genotyping by nonradioactive labelling and scoring by automated sequencing machines (Nagaraju et al. 2002). Although the cost involved in the initial setup is higher than that of the AFLP, this is often compromised in population and ecological studies where more number of genotypes are required for meaningful statistical analyses. However, recent findings have shown that in some cases microsatellites are not neutral markers as they are found involved in gene regulation, genetic hitchhiking, and sex-specific differential selection (Schlotterer 2000; Li et al. 2002), and hence, those loci may not be useful for evolutionary and phylogenetic studies. Another class of popular markers, expressed sequence tags (ESTs) has recently been used as major resources for insect genetic studies. Use of commercial kits for preparation of cDNA libraries and automated sequencing methods has made it possible to generate large sets of ESTs in a relatively fast and efficient manner. Although, the major applications of ESTs in insects have been for transcriptome analyses (Whitfield et al. 2002; Pedra et al. 2003; Nakabachi et al. 2005), they have also been used for integrated linkage mapping of insect genomes (Fulton et al. 2001; Severson et al. 2002; Graham et al. 2004). Development of these molecular markers in insects represents a valuable resource in further molecular ecological studies in those species. A list of different molecular markers generated from selected insect species is compiled in Table S1, Supplementary material.

Major applications of molecular markers in studying insect ecology

  1. Top of page
  2. Abstract
  3. Introduction
  4. Conventional marker systems
  5. Major applications of molecular markers in studying insect ecology
  6. Novel marker systems
  7. Some concerns on marker applications in insect research
  8. Large-scale genotyping
  9. Concluding remarks
  10. Acknowledgements
  11. References

Ecological research on insect species provide invaluable information on population structure, speciation, gene flow and genetic diversity, and provide explanation on insect diversity based on their interaction with environmental factors, either biotic (including other biological species) or abiotic. In many a time, molecular marker data help to distinguish between different species, when there is no other comprehensive way available to do so. The traditional approach of discriminating two species could be testing if members of the two populations cannot produce fertile offspring. However, some ecologists reject this notion. There are cases where individuals of one population can produce fertile offspring with a second population, and individuals of the second population can produce fertile offspring with members of a third population, but members of the first and third population cannot produce fertile offspring. Though, the hierarchical process from individuals to species evolution has conceived no less than 20 species concepts (Mayden 1997), only half of those recognize biological processes such as reproduction and competition to contribute to the process of evolution within species (Hey 2001). These concepts are primarily based upon variations in biological, ecological, evolutionary, genetic and phylogenetic observations as opposed to other indirect parameters such as cohesion, recognition, phenetism, polythetism and succession. However, the ‘species problem’ is still an open question as to what should be called a species and what should not. In ‘The Blind Watchmaker’, Dawkins (1986) argues that two organisms are conspecific if and only if they have the same number of chromosomes, and for each chromosome, both organisms have the same number of nucleotides. At this juncture, DNA sequence is considered as basic biomarker for drawing ecological lines between different hierarchies of speciation.

In insects, DNA markers are used to provide raw information based on which an ecologist makes estimates of genetic diversity and gene flow between species, identifies haplotypes and lineages or predicts migration and colonization history (Hale & Singh 1991; Yoon & Aquadro 1994; Behura et al. 2001b; Salvato et al. 2002; Llewellyn et al. 2003; Bosio et al. 2005). Also, molecular data provide the means to differentiate sympatric species from allopotaric species and parapatric species (Ballinger-Crabtree et al. 1992; Wilkerson et al. 1993; Favia et al. 1994; Apostol et al. 1996; Banuls et al. 1999; Ayres et al. 2003; Margonari et al. 2004). Gene flow and genetic variations within and between insect species, measured from marker data, are critical to establish meaningful explanation for population structure and dynamics (Cervera et al. 2000; Wong et al. 2000; Takami et al. 2004; Mendelson & Shaw 2005). Similarly, natural selection is a major factor that is regarded as the driving engine for population diversity. In this regard, molecular markers are used to infer phylogeny and biogeography of insect populations and to understand modes of evolution and evolutionary trajectories (Luque et al. 2002; Chatterjee & Mohandas 2003; Chatterjee & Tanushree 2004; Prasad et al. 2005). Also, diagnostic molecular markers, based on linkage to certain traits or genes, are used for diagnostic purposes of individual insects (Hunt & Page 1994; Behura et al. 1999; Behura et al. 1999, 2000; Kengne et al. 2001; Manguin et al. 2002; Kethidi et al. 2003; Ullmann et al. 2003). Often in cases where two parapatric populations coexist but they do not cross-breed, it is assumed that there is behavioural mechanism that is preventing gene flow between them. Without reproductive isolation, population differences cannot develop, and given reproductive isolation, gene flow between the populations cannot merge the differences. Thus, mating behaviour of insects, their behavioural plasticity and their interaction with hosts and parasites, out of other unknown factors, play important roles in contributing to the ecological diversity of insect species. Applications of molecular markers, with representative examples, in major domains of insect ecological research are discussed below to show the effectiveness of DNA marker systems to unravel the underlying genetic factors that shape ecological diversity in insect species.

Mating, parentage and kinship

DNA markers can unravel information to determine parentage and kinship relations in insects. One of the innovative works done in this regard was to use RAPD markers to determine paternity in two odonate species of Anisopteran dragonflies, Anax parthenope (Julius Brauer) and Orthetrum coerulescens (Keeled Skimmer) (Hadrys et al. 1993). RAPD banding patterns were used to access paternity of ‘synthetic offsprings’ generated by quantitative mixing of genomic DNA from putative parents. This approach has been helpful to establish the paternity of guarding males in species where no information on mating histories of both males and females are known. Using molecular markers, it has been shown how the females in some nonparthenogenetic insect species such as white pine weevil, an important forest pest, carry sperms of more than one male from one season to the next (Lewis et al. 2002). The results of such studies provide explanation on how the offspring are produced in new habitats where no males are available for mating and as the basis of their colonization in new geographical regions. Also, DNA markers are quite helpful to study sperm competition in doubly mated female individuals. Simmons & Achmann (2000) used microsatellite markers in the bush cricket Requena verticalis Walker (Tettigoniidae: Listroscelidinae), and by manipulating the numbers of sperm transferred by the first and by the second male, they were able to find out that the ejaculate quantity of the first male was the determining factor to establish the paternity identity of the offspring. These studies provide information on the role of male competition for selective fitness of offspring. Identification of egg paternity by molecular markers is also used to validate the theoretical models that predict influence of paternal care on reproductive success in some insect species such as golden egg bugs, Phyllomorpha laciniata Villers (Heteroptera, Coreidae) (Garcia-Gonzalez et al. 2003). In these insects, males somehow recognize the eggs that will produce their true genetic offspring and then take care of those eggs. But as more than one male mate a female in the field condition, a male individual can neither discriminate their own eggs, nor can they predict if their own eggs will be fertilized. Marker-assisted selection of paternity, in these situations, has been extremely helpful to determine the mating history of female individuals (Garcia-Gonzalez et al. 2003). The information gained from these studies is crucial to understand remating propensity of females and to estimate the success of sterile insect technique (SIT) often used to control pest populations (Bonizzoni et al. 2002).

Some insects such as cricket, Gryllus bimaculatususe de Geer (Ensifera, Gryllidae), use polyandry as a mating strategy to minimize genetic inbreeding of their brood. In these species, polyandrous females avoid fertilizing eggs with sperm from genetically incompatible males. Using microsatellite markers, this theory was tested by examining history of paternity where a female was mated with related and unrelated males (Bretman et al. 2004). From these molecular data, it was found that unrelated males were more successful in gaining paternity than related males. Similarly, molecular markers are used to understand parthenogenetic mating behaviour in aphid species where cyclical parthenogenesis is a mating strategy by which the insects can populate themselves quickly in a new environment without undergoing the sexual mating (Delmotte et al. 2002). Microsatellite markers were used to measure the genetic diversity between sexual and asexual aphid populations. The data indicated that sexual populations have high allelic polymorphism but with little geographical differences, possibly because of population subdivision, inbreeding, selection and large dispersal ability (Delmotte et al. 2002). In contrast, asexual populations showed less allelic polymorphism but high heterozygosity at most loci. These results implicate that there are major genetic differences between sexual and asexual aphids. To further understand this, sexual and asexual aphid populations coexisting in the same habitat (such as Rhopalosiphum padi L.) were further analysed. RAPD markers associated with life cycle variation and breeding traits in these insects have been converted into codominant sequences-characterized amplified region (SCAR) markers (Simon et al. 1999). Using these SCAR markers in segregating and natural populations of known breeding systems, complete linkage was found in segregating populations. Apparently the association in field populations was on an average of 94%. Such information has potential use for studying the role of genetic mechanism of sexual and asexual mating behaviour in dispersal and colonization of aphid populations in geographical regions.

Insect plant interaction

One of the most appealing applications of molecular markers in insect studies is probably on those for insect–plant interaction. DNA markers provide utility in tagging and mapping genes in important crop plants that provide resistance to damaging insect pests, and are also useful in characterizing avirulence genes in insects interacting with the host plants (Harris et al. 2003). The molecular genetic information generated by marker data is used to characterize phenotypic ability of insect to attack specific plant varieties. Specific examples of marker usage in these aspects can be evident from molecular studies on gall midges (Cecidomyiidae family of Diptera), major insect pests of rice and wheat. Rice gall midge (Orseolia oryzae Wood-Mason) is a major problem in rice cropping in many South and Southeast Asian countries because of emergence of different biotypes in their population that breaks down the resistance of host plants. This causes unexpected damage of rice production worth of nearly half a billion dollars every year. Similarly, a closely related species of gall midge, known as Hessian fly (Mayetiola destructor Say) causes heavy economic in wheat crops. In spite of employing more than 32 resistance genes in wheat fields to protect Hessian fly damages in North America, different virulent strains have evolved in response to the resistance deployment. This has led to cause occasional outbreaks and irregular damages to wheat crops accounting to an average of $100 million each year. In such situations, where no effective chemical or cultural control methods are available, one of the strategies has been to exploit molecular and genetic tools to understand genetic interaction between insect and host plant. This will help identify specific genes involved in the interaction that can further be exploited to develop transgenic-based management strategies. One of the important aspects of these researches has been to characterize the emerging virulent strains of gall midge populations and to understand how they interact with host plants. Using RAPD-PCR with pooled DNA from different strains (or biotypes) of Asian rice gall midge, distinct loci specific to individual strains were identified (Behura et al. 1999). By confirming by Southern blotting and sequencing, codominant SCAR markers were developed. and allele-specific amplification using these diagnostic markers was useful to establish good correlation between genotypes to the observed phenotypes of these biotypes (ability/inability to attack host plants). An AFLP marker was further discovered using bulk segregant methods, and the locus showed linkage with vGm2 avirulence gene that interacted with the corresponding resistance gene (Gm2) in rice (Behura et al. 2000). Similarly, in Hessian fly, RAPD and AFLP markers have been employed in combination with bulk segregant analysis to identify major avirulence genes those condition the resistance mechanism in wheat varieties cultivated in the United States (Stuart et al. 1998; Schulte et al. 1999; Rider et al. 2002; Behura et al. 2004). Although avirulence genes of some plant pathogens have been cloned, no insect avirulence gene has been cloned yet. Thus, these markers have potential to exploit map-based cloning strategy to isolate avirulence genes from these economical important gall midges.

Molecular research on aphids represents another clear example on marker-based study on insect–plant interaction. More than 250 aphid species are pests on horticultural crops throughout the world. Most aphid species are relatively host-plant specific, although they may occasionally feed on alternate hosts. If no resistance is triggered from the host, aphids can stunt plant growth, and cause galls to form on leaves, stems and roots. Some aphid species undergo cyclical parthenogenesis (the alternation of sexual reproduction and a phase of parthenogenetic reproduction). They use one host plant as the ‘primary host’ for sexual reproduction and another as a ‘secondary host’ for parthenogenetic reproduction. In Aphidinae, migrants returning to the primary host are winged males and winged females (parthenogenic). Later on, winged parthenogenic females return to the secondary host (herbaceous plant). In these situations, DNA markers provide useful information to understand the genetic basis of the adult polypheny in such aphid clones. Using RAPD–PCR, it was found that major genetic differences existed between winged phenotypes and the wingless phenotypes of the asexual adult aphids (Lushai et al. 1997). Similarly, in natural populations of grain aphids, Sitobion avenae (Fabricius), feeding on different hosts of grasses and cereals, it was clearly demonstrated that the RAPD banding pattern could be correlated to host adaptation (Lushai et al. 2002). These profiles could identify ‘specialist’ genotypes found on specific grasses from the ‘generalist’ genotypes colonizing on multiple host types including cultivated cereals or native grasses. Genetic basis of the host plant association has also been studied in lettuce root aphid (Pemphigus bursarius Linnaeus) using microsatellite markers (Miller et al. 2005). The degree of virulence (aggressiveness) of individual clones in pea aphids, in response to natural resistance in alfalfa, has been evaluated using RAPD markers (Bournoville et al. 2000). These works clearly demonstrate that application of molecular markers are helpful in better understanding the mechanistic and evolutionary basis for the genetic interaction between insect pests and their host plants.

Insect pathogen interaction

Molecular markers are useful in understanding genetic interaction between disease-causing pathogens and the insect vectors that spread them (Crampton et al. 1997). Triatoma infestans (Klug) (Reduviidae) is a major insect vector of Chagas’ disease in many South American countries. It transmits the Trypanosoma cruzi, the agent of Chagas’ disease. Mixed and pure clones of Triatoma cruzi in the gut of Triatoma infestans have been studied by using RAPD profiles to provide information on the vectoral ability of the insects (Pinto et al. 1998). Similarly, molecular markers were applied to determine the vectorial ability of mosquitoes (Severson et al. 2001) by means of mapping quantitative trait loci (QTL) that determined if a species could transmit the malaria parasite (Severson et al. 1995; Bosio et al. 2000). Expressed sequence tag (EST) markers have also been used to identify the potential target genes that are involved in immunity to specific pathogens causing malaria (Bartholomay et al. 2004). Metarhizium anisopliae is a causative agent of green muscardine disease, and is transmitted by a broad range of insects. To explore the molecular basis of this process, EST markers were used to study the pattern of gene expression in responses to diverse insect cuticles and were found that an array of secreted protein had potential activity in the hosts in these responsive actions (Freimoser et al. 2003).

Apart from medically important insects, DNA markers have also been used in insect-borne viruses that are important for agriculture. These viruses sometimes act as damaging pathogens for agricultural crops and forest resources. The brown citrus aphid, Toxoptera citricida (Kirkaldy), is considered the primary vector of citrus tristeza virus (CTV), a damaging pathogen that causes heavy losses to citrus industries worldwide. The alate form of these host aphids is an important stage of their life cycle where they spread themselves and spread the viruses in citrus-growing area. ESTs as molecular markers have been developed to study the gene expression pattern at this life cycle of the host aphid to better understand the genetics of its vectorial ability to transmit the citrus tristeza virus (Hunter et al. 2003). Similarly, entomopathogenic fungi also play crucial role in the host pathogen interaction in many agriculturally important insects that is analysed by the use of various molecular markers (Fegan et al. 1993; Tigano-Milani et al. 1995). Some entomopathogens are beneficial for humans as they act as natural biocontrol of damaging insect pests. Understanding genetic interactions between insects and these entomopathogens is a major research focus towards developing natural biocontrols for pests. Identification of these pathogens in diverse insect species is an important screening step that often requires application of molecular markers (Hodge et al. 1995; Castrillo et al. 2004). Also, DNA markers are used to characterize new and better cell lines for production of viable pathogens as biopesticides (Sudeep et al. 2005).

Insecticide resistance

Insecticide resistance is another important focus in entomological research and bears medical and agricultural importance. Molecular markers are used for identification and mapping of resistance genes in insects against insecticides. In malaria control programs, difficulties arise because of emerging resistance in the mosquito vectors to DDT. DDT resistance in the major malaria vector in Africa, Anopheles gambiae, is associated with an increased metabolism of the insecticide. Use of microsatellite markers in mapping experiments has identified QTL in Anopheles gambiae that determines the DDT resistance phenotypes (Ranson et al. 2000). Using RFLP markers, it was discovered that DDT resistance in houseflies, Musca domestica was associated with the ‘knockdown’kdr trait (Knipple et al. 1994). ‘Knockdown’ kdr trait is associated with the reduced neuronal sensitivity to pyrethroids insecticides in turn related to a mutation of the sodium channel gating system. Gene-specific PCR markers have also been used to examine expression pattern of cytochrome P450 genes near a DDT resistance gene in Drosophila (Brandt et al. 2002). Similarly, using random amplified DNA markers, genetic loci have been mapped in lesser grain borer, Rhyzopertha dominica (Fabricius) that determines high-level resistance to phosphine (Schlipalius et al. 2002). Use of AFLP markers has facilitated the identification of resistance loci in Colorado potato beetle to pyrethroid (Hawthorne 2001) and in diamondback moth to Bacillus thuringiensis toxins (Heckel et al. 1999). Also, biochemical markers have been successfully used for molecular diagnostic purposes for screening resistance to methyl-parathion in western corn rootworms (Zhou et al. 2002) and neonicotinoid cross-resistance in Aleyrodoidea whiteflies (Bemisia tabaci) (Rauch & Nauen 2003).

Prey, predator and parasites

Molecular markers provide information to understand the prey–predator–parasite trophic interactions in insects. Particularly, they are helpful for identification and quantification of prey in the diets of insect predators. Identification of more than one type of prey in the gut of a single insect is performed by genetic fingerprinting of its gut contents and then by developing diagnostic marker systems. Diagnostic PCR-based markers such as SCARs derived from specific RAPD loci were used to identify Trialeurodes vaporariorum and Helicoverpa armigera prey in the gut of Dicyphus tamaninii (Wagner) (Agusti et al. 1999, 2000). Mitochondrial DNA markers have also been used to identify Collembolan species that comprise a major source of alternative prey to linyphiid spiders (Agusti et al. 2003). Apart from identification, marker data are also used to quantify the parasitism rate in insects. Identification of immature parasitoids of Lysiphlebus testaceipes and green bugs in different species of small grain cereal aphids have been performed by using DNA markers and the information has been used for estimation of parasitism rates in aphids (Walton et al. 1990; Jones et al. 2005). Correlation of parasitoid frequencies with molecular data suggested that application of molecular markers could provide accurate estimates of parasitism rates. A comparison of parasite load based on immunological techniques and that based on PCR showed that the PCR technique was more efficient in such studies (Chen et al. 2000). Multiplex PCRs using more than one diagnostic marker have been used to develop fast and reliable methods to analyse gut contents in predators. Use of fluorescent-labelled multiplex PCR is a novel approach in this regard (Harper et al. 2005). Using this method, more than 10 species have been identified simultaneously. Similarly, microsatellite markers have been used to study the association between host races of Acyrthosiphon pisum and their symbioants (Simon et al. 2003). Similar work has been performed to identify the bacterial endosymbioant Wolbachia in Asian rice gall midge populations (Behura et al. 2001b). It was found that adult midges are infested by Wolbachia in different frequencies in different biotypes. The results obtained from PCR-RFLP data using mtDNA showed that Wolbachia had a profound role in genetic hitchhiking in female midges. These results bear important implications of bacterial parasites causing sex-ratio distortion and population hitchhiking in rice gall midge biotypes.

Behavioural studies

Behavioural plasticity in social insects represents a complex biological phenomenon that is getting attention of molecular biologists. Honeybee (Apis mellifera) is an emerging model organism that is being studied for social behaviour at molecular level (Robinson et al. 2005). ESTs have been used as expression markers in microarray formats to predict the nursing and foraging behaviour in individual bees (Whitfield et al. 2003). In honeybee, these social behaviours are polygenic traits and are influenced by more than one gene referred to as QTL. The two major QTLs that determine the foraging behaviour in honeybee have been identified by employing RAPD markers in backcross populations between bees collecting nectar and those collecting pollen (Hunt & Page 1995). Exploiting similar procedures with molecular markers in honeybee, colony level behaviours such as stinging behaviour, body size, pheromone alarm level, traits for reversal learning and hygienic behaviour have also been dissected at the level of specific genomic regions (Breed et al. 2004). AFLP markers and microsatellites have been used in dissecting the guarding and stinging behaviours in honeybee (Arechavaleta-Velasco et al. 2003). In bumble bees, AFLP markers have been employed to study genetic basis of ecological implications of foraging range and nest density behaviours (Knight et al. 2005). In Australian meat ants, Iridomyrmex purpureus (Smith), the movement of females between colonies has been successfully traced by using mtDNA markers. The data help understand the kin selection as the explanation for multiqueen colonies in this species (Crozier et al. 1997). Also, in sugar ants (Camponotus consobrinus Erichson), microsatellites and mtDNA have been used as markers to study their colony behaviour, and it was found that unexpected genetic diversity and complexity existed in colony structure and behaviour in these ants (Crozier et al. 1997).

Gene, genome and QTL mapping

Genetic linkage mapping, based on recombination frequency, uses molecular markers for tagging and mapping of specific genes and QTL in insects. Construction of genetic map uses segregating populations (generally F2 backcross progenies) to genotype them at hundreds of marker loci. Based on the inheritance pattern of these loci, multipoint linkage analysis is performed to generate different linkage groups. In insects, RAPD and AFLP markers have been extensively used to generate genetic maps. RAPD-based linkage maps have been constructed for genomes of honeybee (Hunt & Page 1995), silkworm (Yasukochi 1998), beetle (Beeman & Brown 1999; Yezerski et al. 2003) and sawfly (Nishimori et al. 2000), whereas AFLP-based linkage maps are available for genomes of silkworm (Tan et al. 2001), Colorado potato beetle (Hawthorne 2001), red flour beetle (Zhong et al. 2004), Hawaiian cricket (Parsons & Shaw 2002), European corn borer (Dopman et al. 2004), Hessian fly (Behura et al. 2004) and butterfly (Wang & Porter 2004; Tobler et al. 2005). Microsatellite-based maps have also been constructed for the genomes of Drosophila mojavensis (Staten et al. 2004) and honeybee (Solignac et al. 2004).

Apart from genome mapping, DNA markers are also used to tag and map specific gene(s) responsible for a phenotypic trait in insects. For example, to tag an avirulence gene in an insect pest that is responsible for triggering resistance mechanism in host plant (a monogenic trait), backcross F2 progenies are generated from a cross between a virulent male and avirulent female. Mating strategy is designed depending on the objective of the experiment and is also based on the sex determination mechanism of the organism. The segregating progenies are then scored for the parental phenotypes, virulent type or avirulent type as in this case, and then are subjected to multiloci DNA fingerprinting to score individual genotypes. These marker data are then used for performing two-point (or three-point) linkage analysis to determine the genetic distance of the marker(s) segregating in association with the phenotype. Using such approaches, AFLP markers have been used in gall midges to identify and map major avirulence genes in different biotypes of Asian rice gall midge (Orseolia oryzae) (Behura et al. 2000), and of Hessian fly (Mayetiola destructor) (Rider et al. 2002; Behura et al. 2004). However, when more than one gene govern the trait (polygenic trait), markers are identified those link to the chromosomal region containing the genes, often known as QTL. QTL mapping is often difficult unlike mapping monogenic traits. Unfortunately, many economical and behavioural traits in insects are polygenic in nature. QTL mapping is primarily based on identifying segregation of a single gene marker and estimating the effect of its linkage to the polygenic trait. Statistical techniques, for example, analysis of variance (anova), are used to check if trait means for one marker genotype at any given locus differ significantly from the trait means for alternative marker genotypes. This will indicate if there is an association between the segregating markers and the character of interest. Generally, three major methods are employed for QTL mapping: single marker analysis, multiple regressions (interval mapping), and marker regression approach. The choice of a marker system for the genotyping in these mapping is a crucial factor as one needs dense markers near the QTL for QTLs are very low in heritability by nature. AFLP markers have been successfully used to map QTL governing ageing in Drosophila melanogaster (Luckinbill & Golenberg 2002), foraging behaviour in honeybee (Ruppell et al. 2004), susceptibility in beetles to tapeworm parasites (Zhong et al. 2005), variation in pheromone composition in Heliothis species (Groot et al. 2004) and Bt (Bacillus thuringiensis) resistance in diamondback moths (Heckel et al. 1999). AFLP markers have also been used to trace the phenotype traits of wing size in pea aphids (Acyrthosiphon pisum) (Braendle et al. 2005), to study genetics of male dispersal in moth species (Thaumetopoea pityocampa and Th. wilkinsoni) (Salvato et al. 2002), and to investigate the genetic basis of haplodiploidy in Venturia canescens wasps (Reineke & Lobmann 2005). Apart from AFLP markers, uses of microsatellite markers have been very successful specially in honeybees, bumblebees and ants to identify genes responsible for diversity in foraging range and mating behaviour (Pearcy et al. 2004; Knight et al. 2005), host parasitization (Solignac et al. 2005), colonization (Jensen et al. 2005), and kinship relation (Trontti et al. 2005).

Comparative genomics and cytogenetics

Molecular markers have important applications in modern cytogenetics and comparative genomic studies of insects. Satellite DNA as marker can provide useful information on gene structure, order and polymorphisms of chromosomal DNA. Fluorescence in-situ hybridization (FISH) on chromosome preparations with (TTAGG)n satellite markers as probes showed that these repeats were extremely conserved in telomeres of chromosomes in many insect species and also in hexapods, crustaceans, myriapods and pycnogonids (Vitkova et al. 2005). Ribosomal DNA as probes is also useful in FISH for comparative cytogenetic studies (Cabrero et al. 2003). Transposable elements, because of their repetitiveness in the chromosome, are suitable markers to physically localize them onto chromosomal DNA and to compare heterochromatin and euchromatin regions of closely related species (Berloco et al. 2005; Torti et al. 2005). FISH localization of BAC clones genetically mapped onto the genome of Hessian fly (Mayetiola destructor), an insect pest of wheat, demonstrated severe crossover suppression in an autosomal segment containing two major avirulence genes required for host resistance (Behura et al. 2004). Also, a ∼1 megabase region of the sex chromosome X2 of Hessian fly has been assembled into a single contig using bacterial artificial chromosomes (BACs) with the help of genetically mapped molecular markers. The BAC clones have been used in fibre-FISH experiments to physically map them onto Hessian fly DNA to confirm their order of location on the chromosomes. These BAC ends have then been used for comparative sequence analyses with the genome sequences of Anopheles gambiae and D. melanogaster to understand the order and synteny of these loci in these three dipteran insects (Lobo et al. 2006). Similar studies have been performed in Anopheles species, using extensive in situ hybridization experiments, to understand inversion and gene order shuffling in Anopheles funestus as compared to A. gambiae, (Sharakhov et al. 2002). Apart from comparative studies, molecular markers are also used to study cytogenetics of unusual sex determination mechanisms in insects, such as unisex progeny production in gall midges, or production of only daughters by the virgin females in the wasp Trichogramma cacoeciae (thelytoky) (Harris et al. 2003; Vavre et al. 2004).

Study of fossils and museum samples

DNA analyses of fossil samples are important tools in studies of archaeology, conservation biology, forensic science and entomology. Information obtained from ancient DNA can contribute to the interpretation of history of extinct populations or species (Brown & Brown 1994). Recovery of good quality DNA from museum samples is the first step towards successful analyses of their sequences. Though in some cases, isolation of quality DNA from amber- and copal-preserved insects is problematic (Austin et al. 1997; Walden & Robertson 1997), proper optimization of the DNA extraction procedure is required for individual cases (Junqueira et al. 2002). Successful recovery of ancient DNA from specimens of Dominican amber fossil termites (DeSalle et al. 1992) and wood gnats (DeSalle 1994) has been reported. DeSalle et al. (1992) were able to amplify gene fragments of mitochondrial 16S ribosomal gene from the ancient DNA samples by PCR, and based on phylogenetic analyses, it was shown that the fossil termites shared several sequence attributes with Mastotermes darwiniensis (Froggatt). Based on sequence polymorphisms of DNA isolated from museum specimens from the late 19th and early 20th centuries, Goldstein & Desalle (2003) were able to identify a relatively recent fixation event of anthropogenic character associated with the extinction and fragmentation of populations in tiger beetle Cicindela d. dorsalis (Coleoptera: Carabidae). Santolamazza et al. (2004) used PCR-RFLP of DNA from museum samples to identify Anopheles arabiensis, A. gambiae and their two molecular forms. Similarly, Cano & Borucki (1995) used 16S ribosomal DNA markers to identify an ancient bacterium closely related to extant Bacillus sphaericus from fossil sample of 25–40 million years old of extinct bees. DNA bar-coding of museum specimens is successfully used in conjunction with morphological study to show that Astraptes fulgerator, a Neotropical skipper butterfly (Lepidoptera: Hesperiidae) is a complex of at least 10 species in northwestern Costa Rica (Hebert et al. 2004). Widmer et al. (2000) collected insects from nectar-rich plants flowering near natural orchid populations and from museum samples to the study of orchid–pollinator relationships. Sequence analysis of the nuclear ribosomal ITS region, in this study, allowed the identification of the orchid species or species-group from which the pollinaria originated. Such DNA method is extremely helpful in understanding orchid–pollinator relationships and especially the degree of pollinator-specificity as direct observations of pollinator visits to orchids are often difficult and time-consuming.

Novel marker systems

  1. Top of page
  2. Abstract
  3. Introduction
  4. Conventional marker systems
  5. Major applications of molecular markers in studying insect ecology
  6. Novel marker systems
  7. Some concerns on marker applications in insect research
  8. Large-scale genotyping
  9. Concluding remarks
  10. Acknowledgements
  11. References

Apart from the above popular markers, novel marker systems are recognized to have potential applications in insect studies. Most of these markers are developed by PCR-based techniques such as ‘transposon display’ (Waugh et al. 1997) or ‘anchored polymerase chain reaction’ (Ayyadevara et al. 2000). Numerous kinds of repeat elements such as DNA transposable elements, retro-elements, satellite repeats are present in insect genomes (Box S1, Supplementary material). Some of them are present in diverse insect species and also show sequence conservation (to some degree) within or between the species. Using PCR, it is possible to anchor to these conserved regions and amplify the flanking polymorphisms to generate hundreds of repeat-associated genetic markers in an inexpensive way. Transposon display (TD) and sequence-specific amplification polymorphism (S-SAP) are two such techniques which have been very well applied in marker development projects in plants (Waugh et al. 1997; Casa et al. 2000; Kentner et al. 2003; Lee et al. 2005) and also have recently been applied in entomology (Behura et al. 2001a; Guimond et al. 2003; Zampicinini et al. 2004).

Transposon display

Transposon display technique is applied to identify polymorphisms associated, generally in the insertion sites of transposons, by using PCR (Waugh et al. 1997). The technique is very similar to the AFLP method (Vos et al. 1995) except that it employs one primer anchoring to a conserved region of the transposon (commonly, the terminal repeats or internal transposase coding sequences) and another primer to the adapter attached to flanking sites generated by restriction digestion (Fig. 1). Then, nested PCRs are performed to selectively amplify the insertion site polymorphisms.

image

Figure 1. (A) Principle of ‘transposon display’ technique. Genomic DNA is digested with a restriction enzyme that has recognition sites in the conserved sites such as inverted terminal region of the transposon. Another enzyme is employed in the second digestion that digests the DNA in the flanking sites of the transposons. Sequences of multiple copies of that element should be checked prior to selection of the second enzyme to make sure that its recognition site is not present in the internal sequences of the transposon. After the digestion, specific adapters compatible to the restriction sites are used to ligate to the digested molecules. The ligated molecules are used as templates to perform two rounds of PCR. In the first PCR, adapter-specific primers are used. The product of this PCR is diluted and used as template for the second PCR where a nested primer (radiolabelled) specific to the transposon is used. The 3′ end of this nested primer is extended arbitrarily up to three nucleotides (if inverted terminal repeat sequences are used to make primers in the first PCR). When internal regions of the element (such as conserved motifs of transposase) is used as anchor primer, the 3′ end extension can be made according to the flanking sequences beyond that motif in the known sequence of the transposable element. The amplified products are then resolved through a sequencing gel and autoradiography is performed to score the banding pattern. (B) An example of banding pattern generated by the above method using genomic DNA of two laboratory strains of Hessian fly, Mayetiola destructor (this was done in the laboratory of Dr Jeff Stuart at Purdue University, USA in the course of standardization of this technique in Hessian fly). The primers employed in this display used the sequences of inverted terminal repeats of mariner-like elements. Arrows show the differences in the banding patterns.

Download figure to PowerPoint

The DNA mediated transposon ‘mariner’ is widespread in insects (Robertson 1993). It is a part of the IS630-Tc1- mariner superfamily where the DD34D domains of the elements are highly conserved. The mariner elements are 1.2–1.4 kb in length and have two ∼12-bp perfect inverted terminal repeats (ITRs) required for transposition. They code for a protein (transposase) that catalyses the excision and insertion events during transposition. Robertson (1993) designed degenerate primer pairs based on the highly conserved amino acid motifs in the mariner transposase coding regions. These primers have been extensively used to identify mariner-like elements in different insect species (Box S1, Supplementary material). Because of widespread presence in insects and presence of conserved sequence motifs, mariner elements are used as anchors in the genome to develop transposon display methods. Mariner-like elements present in the genome of Asian rice gall midge (Orseolia oryzae), an economically important pest of rice, have been exploited using this approach to characterize the insertion site polymorphisms of these elements in different biotype populations in India (Behura et al. 2001a). Mariner-specific primers, along with different combinations of adapter-specific primers, were employed to identify the polymorphisms within and between five biotypes of this pest. Southern hybridization of genomic DNA with a mariner probe suggested that these elements were highly repetitive elements in rice gall midge genome. Moreover, the transposon display results showed that specific insertion site polymorphisms were present in Orseolia oryzae genome in a biotype specific manner. Southern hybridization of the gel blots (of transposon display products) with a mariner probe confirmed these results. Such observation implicated that insertion of mariner elements in rice gall midge was probably associated with some ‘fixed’ mutations in the flanking regions. Transposon display technique has also been used (Behura et al. unpublished data) to identify and map three mariner-flanking polymorphisms as additional markers to the linkage map of Hessian fly (Mayetiola destructor), a major insect pest of wheat (Behura et al. 2004).

Hermes is another DNA-mediated transposon that has potential utility in transposon display in insects. It is a member of the hAT (hobo, Ac, Tam3) family and is present in at least 15 insect species including fruit fly, mosquitoes, beetles and moths. The element is ∼3 kb in length with 17-bp ITRs. Guimond et al. (2003) used hermes element, cloned in a plasmid, and injected to D. melanogaster embryos to study the insertion patterns of these elements in the transgenic flies. Using transposon display method, Guimond et al. (2003) were able to investigate the insertion patterns of hermes elements in the transgenic flies and showed that hermes can be used as a potential vector for insect genetic transformation experiments. Similarly, Wukong is another highly repetitive transposable element (with inverted terminal repeats) discovered in the genome of yellow fever mosquito Aedes aegypti (Tu 1997). Structurally, these are very similar to the miniature inverted-repeat transposable elements (MITEs) found in plants. The sequences of these MITE-like elements show high sequence similarities between copies, which are highly iterative (2100–3000 per haploid genome) (Tu 1997). This shows its potential utility in transposon display of yellow fever mosquitoes to generate large numbers of linked markers. Similarly, RNA- mediated mobile elements can also be used for transposon display methods. Such an approach has recently been incorporated in population studies of the midge Chironomus riparius Meigen (Zampicinini et al. 2004). The retrotransposon, NLRCth1 was used to generate fingerprints of the samples and to assess genetic distances between the midge populations.

Apart from transposable elements, many repetitive satellite elements present in insect genomes can also be used in anchored PCR-based assays (Ayyadevara et al. 2000) to develop genetic markers. For example, Bkm element (Banded krait minor) discovered in D. melanogaster (Singh et al. 1984; Accession no. K01664) is a moderately repetitive satellite sequence in fly genome and has been successfully used in fingerprinting of ecotypes of silkworm (Nagaraju et al. 1995). Similarly, an AluI repeat element has been identified in the genome of honeybee (Apis mellifera) which is estimated to be in 23 000 copies in bee genome (Tares et al. 1993). The element is 176 bp long (Accession no. X57427) and is extremely conserved among the copies. This has been used as probe in RFLP fingerprinting of different geographical populations of honeybee (Tares et al. 1993). There are many other such minisatellite elements (see Box S1, Supplementary material), such as cla element (X78085) in Anisopodidae Chironomus thummi; A4 element (AF465798) in phytophagous grasshopper Dichroplus elongates (Acrididae); pRa-43 (AY141125) in Rhynchosciara Americana (Diptera: Sciaridae); bm1 repeat locus (AB063493) in Bombyx mori or Dromini (M62837) in Drosophila mauritiana; those may also be used to develop anchored PCR markers. Using similar PCR-based approach, STS markers can be developed from direct amplification of BAC ends either by sequencing them or by direct PCR amplification of the ends of large insert clones such as BACs (Lobo et al. 2006), YACs (yeast artificial chromosomes) and even cosmids. A method of direct PCR amplification of BAC ends is provided in Box S2, Supplementary material.

The major advantage of these novel marker systems is that they do not require establishing the sequences of each locus to generate these markers. Because the technique uses just one degenerate primer along with adapter specific primers, the cost of generating and screening novel markers is less than that of the conventional markers. The methods are relatively simple, can be performed by either radiolabelling of anchor primer, or by silver staining of the PCR products. In some species, DNA transposons are present in high abundance, thus making these methods more suitable for construction of linkage maps for their genome. Moreover, many different restriction enzymes and/or linker specific primers can potentially be used to generate the flanking site polymorphisms at both sites of the elements. In situ hybridization with the transposon as the probe can be used to detect the physical locations of these loci in the chromosomes. This can be compared with the genetic map positions of the loci by transposon display to identify any discrepancy between genetic and physical positions of the markers. These approaches may help identify marker recombination or crossover suppression in certain chromosomal regions (Behura et al. 2004; Yoshido et al. 2005). However, in spite of these advantages, the novel markers have not gained widespread popularity. The major drawback in these techniques is that if nested PCR conditions are not properly optimized, nontarget amplifications are often generated. However, Southern hybridization may be employed on the gel blots of the PCR products in order to exclude the nontarget amplifications (Behura et al. 2001a).

Some concerns on marker applications in insect research

  1. Top of page
  2. Abstract
  3. Introduction
  4. Conventional marker systems
  5. Major applications of molecular markers in studying insect ecology
  6. Novel marker systems
  7. Some concerns on marker applications in insect research
  8. Large-scale genotyping
  9. Concluding remarks
  10. Acknowledgements
  11. References

Increasing evidences are now known for indirect selection on mtDNA and microsatellite loci that goes against the so-called ‘neutral’ nature of these popular markers. Thus, their application as molecular markers, particularly in the studies of species evolution and population dynamics, can potentially mislead the interpretation based on the data generated from such marker assays. Thus, these issues should be understood for meaningful interpretation of data derived from mtDNA and microsatellites markers in some situations.

Integration of mitochondrial sequences in nuclear DNA (referred to as Numts) has been discovered in a variety of organisms including insects (Zhang & Hewitt 1996). Thus, PCR amplification using mtDNA marker loci using total genomic DNA can potentially amplify these nuclear copies. These sequences complicate the employment of mtDNA as a molecular marker in ecological studies. In insects, because of the relative small genome size, high copy number of Numt sequences may interfere in effective separation of mtDNA from its nuclear paralogs. Numt sequences do not retain their function after their departure from the mitochondrial genome; hence, they diverge from each other without any selective constraints (Bensasson et al. 2000). This has been evident in brown mountain grasshopper, Podisma pedestris Linneus (Orthoptera: Acrididae) where the Numt sequences were found to have different mitochondrial origin (Bensasson et al. 2000). Thus, population studies using mitochondrial markers derived from these loci can potentially mislead the results.

Another problematic issue of using mtDNA markers has been identified in cases where the host insect harbours maternally inherited microorganisms such as Wolbachia. It is a gram-negative endosymbiotic bacterium that causes many developmental defects such as cytoplasmic incompatibility, feminization and sex ratio distortion (Werren 1997). Wolbachia is widespread in insects and more and more insects are now being tested by molecular methods to have these bacteria (Jeyaprakash & Hoy 2000). As the Wolbachia infection sweeps through an insect species, the frequency of mitochondria from infected individuals also increases in the population due to the similar mode of transmission used by Wolbachia and the mitochondria. As a result, the spread of the mtDNA from infected individuals reaches high prevalence in these populations (Turelli et al. 1992; Rigaud et al. 1999). This phenomenon is commonly referred to as ‘genetic hitchhiking’. Thus, indirect selection of host mtDNA by such microorganisms has been a major concern (Hurst & Jiggins 2005). Apart from selective sweep, presence of more than one strain of Wolbachia, and genetic recombination between the strains have also been recognized as alternative routes for reduction or elevation of mtDNA polymorphism in insect populations (Rousset & Solignac 1995; Hurst et al. 1999; James & Ballard 2000; Jiggins 2003). Coupled blockage of maternal inheritance of host mtDNA and that of Wolbachia is another unusual confounding interference of Wolbachia on host mtDNA observed in Orseolia oryzae (Behura et al. 2001b). A detailed list of similar but commonly observed cases of indirect selection of endosymbioant microorganisms on host mtDNA has been documented in a recent review (Hurst & Jiggins 2005). Thus, inferring evolutionary history of populations solely based on use of mtDNA markers in insect species harbouring such maternally inherited microorganisms may be misleading.

Similarly, natural selection on microsatellite loci has been observed in many cases that raise doubts to use them as faithful genetic markers (Schlotterer 2000). Differences in microsatellite variations in sex chromosomes compared to autosomes represent an example of such differential selection (Bachtrog et al. 1999). The rate of mutation in microsatellite loci is also dependent on repeat units, repeat length, flanking sites and recombination rates of the loci (Schlotterer 2000). Also, genetic hitchhiking (by means of selective sweep) and background selection of microsatellite loci in Drosophila have been demonstrated (Kauer et al. 2003). Analysing microsatellites loci in ancestral (African) and colonizing (non-African) populations of D. melanogaster, it was found that while background selection was the contributing factor for neutral variability in the ancestral populations, variation in colonizing populations was predominantly determined by genetic hitchhiking. Thus, positive selection of microsatellite loci could cause local reduction in genetic variability in such microsatellite loci (Harrift et al. 2003). Significant loss of mutations in X-linked microsatellite loci in non-African populations of Drosophila simulans compared to the African counterparts caused ‘fixation’ tendency of these loci in the colonizing populations (Schofl & Schlotterer 2004). Similar bias in rate of genetic variations in microsatellite loci associated with transposable elements has also been documented in D. melanogaster (Souames et al. 2003). Apart from that, microsatellites have also been suspected to influence gene spacing and secondary folding of DNA in the chromosomes. They are also known to affect chromatin structures in insects (Li et al. 2004). In D. melanogaster, association of AAGAC satellite repeats with the rolled (ri) gene implicates its possible role in rendering the ri gene to undergo extensive polytenization in the salivary gland chromosomes (Berghella & Dimitri 1996). Regulatory roles of simple sequence repeats in gene expression also support the non-neutral nature of those loci (Li et al. 2002). Comparison of transcription factors between Tribolium castaneum and D. melanogaster revealed that many trinucleotides repeat found in Drosophila were almost absent in the corresponding homologous genes in the Tribolium (Schmid & Tautz 1999). These lines of evidences suggest that microsatellites, once considered as junk DNA, may have potential regulatory roles and are potentially non-neutral (see Li et al. 2002 for review).

Null alleles of DNA marker loci also pose difficulties in many genetic analyses. The major problematic issues have been identified in microsatellite applications in studies of parentage, forensic testing or in estimation of allele frequencies (Dakin & Avise 2004). These are alleles of a locus in which the microsatellite flanking primers do not produce any product in PCR. Scoring for these loci should be done with care in order to avoid false inclusion of the presence of a homozygote. This can lead to a false interpretation in the genetic analyses. The conventional way of avoiding this potential difficulty is to treat all homozygotes as heterozygotes with detectable null alleles so that a true heterozygote is not excluded on the basis of homozygosity. Several computer programs, such as probmax, papa, cervus, parente, famoz, newpat, are also available to tackle the difficulties of null alleles (Jones & Ardren 2003).

Large-scale genotyping

  1. Top of page
  2. Abstract
  3. Introduction
  4. Conventional marker systems
  5. Major applications of molecular markers in studying insect ecology
  6. Novel marker systems
  7. Some concerns on marker applications in insect research
  8. Large-scale genotyping
  9. Concluding remarks
  10. Acknowledgements
  11. References

Single nucleotide polymorphisms (SNPs) are highly abundant markers found in the genome of all organisms. These markers have been extensively used in human genome projects (Kwok 2001), and researchers are showing much interest in these markers to explore and exploit in other animals and plants, as well. Some progress have been made in insects too, though mostly limited to Drosophila and mosquitoes. In D. melanogaster, Teeter et al. (2000) used a resequencing approach (of a number of gene fragments and sequence tagged sites) and developed a low resolution SNP map of Drosophila melanogaster for the first time. Subsequently, larger sets of STSs were used for sequencing to develop a large SNP data set for D. melanogaster (Hoskins et al. 2001). Based on this study, at least 52 SNPs in X chromosome, 138 in chromosome 2 and 89 in chromosome 3 have been reported. An SNP map of chromosome 3 of D. melanogaster has also been developed (Martin et al. 2001). A major contribution to Drosophila SNPs was made by Berger et al. (2001) by identifying 7223 SNP markers. SNP mapping in Drosophila nicastrin has been established by use of denaturing high performance liquid chromatography (DHPLC) (Nairz et al. 2002). In Anopheles gambiae, single nucleotide polymorphisms in coding regions of three strains have been identified (Morlais et al. 2004). In Aedes aegypti, single strand conformation polymorphism (SSCP) analyses using cDNA revealed single nucleotide polymorphisms that were used for genome mapping (Fulton et al. 2001; Severson et al. 2002). In Bombyx mori, EST sequences and genomic contig sequences were aligned to identify 101 candidate SNPs in its genome (Cheng et al. 2004). Because SNPs are abundant, they are suitable for large-scale genotyping projects. Some recent advancement in this marker system should be discussed to know how SNPs are identified and how they can be used for genotyping.

In principle, SNPs can be identified by generating nucleotide sequences of a particular DNA region or a particular gene isolated from several individuals in a population. Because the nucleotide sequences from different individuals may vary at many loci in that DNA region, these allelic variations may be manifested as potential SNPs in the multiple alignments of the individual sequences. The approach seems very primitive when dealing with a small piece of DNA fragment or a single gene. But when it comes to SNP-hunting in a chromosome or in the whole genome, the difficulties are manyfold. Multiple alignments become increasingly difficult to find the coordinates in the reference sequence due to false alignments of paralogs, repeats, gene families or gene duplications. In such cases, finding true alignment depends upon the correct anchoring to the actual reference sequences. Three main aspects that need to be taken care of are (i) correct anchoring of the multiple alignments, (ii) filtering out the paralogs, and (iii) discriminating true SNPs from sequencing errors. By using the alignment depth, the base call rate, value for quality base, rigorous statistical methods are available that can be used in estimating the probability that a mutation is a potential SNP and not a sequencing error. Several computer algorithms that can be used to identify potential SNPs in DNA trace sequences are listed in Box 1.

Selection of a suitable method for SNP genotyping is mostly dependent on the objective of a project. Commonly, there have been two broad aspects; one is SNP genotyping in particular gene or set of genes, and the other objective is the genome-wide SNP screening. Simple methods of restriction digestion or SSCP or other low to medium throughput techniques can be used to genotype a small set SNPs in a known gene(s). Many of the reaction formats in these methods are homogenous in nature and can be performed in 96-well plates without sophisticated automation and specialized instrumentation. Entomologists and ecologists who don't have access to high throughput genotyping facilities may rely on these methodologies. Specific examples are described on how old techniques that was once performed in single tubes can be formatted into 96-well plates by suitable scaling up procedures. One example is PCR-RFLP method that can be scaled up to for multiplex genotyping of SNPs. There are simple computer software such as snpkit (Hao et al. 2002), snpicker (Niu & Hu 2004) and snp cutter (Zhang et al. 2005) that can be used to identify restriction enzyme recognition sites overlapping the polymorphic bases (after the candidate SNPs in the sequences are recognized). These then can be used in designing pilot scale restriction reactions in 96-well plates to process large number of SNPs in parallel. McSNP (Akey et al. 2001) is such a novel SNP genotyping method that refers to as melting curve analysis of SNPs (McSNP). McSNP combines restriction enzyme digestion with DNA melting curve analysis. Melting curve analysis is performed by slowly heating DNA fragments in the presence of the dsDNA-specific fluorescent dye, SyberGreen. As the sample is heated, fluorescence rapidly decreases when the melting temperature of a particular fragment is reached. Using this method, it is possible to determine the allele composition of DNA samples. McSNP is well suited for medium-throughput genotyping because 96 samples can be analysed and scored in just 20 min (Akey et al. 2001). Restriction digestion followed by single base extension is also another genotyping approach based on PCR-RFLP (Che & Chen 2004). In this method, a restriction site (FokI) is incorporated in the forward primer so that the PCR product can be digested with that enzyme. Upon digestion, the digested molecules carry 5′-overhang structure that can be used for primer extension with labelled ddNTPs. The labelled products are then separated in an automated sequencer to discriminate the alleles. A similar method, but carried out on solid support, is also used for multiplex genotyping (Shapero et al. 2001). PCR amplifications are carried out on microspheres using primers (one primer contains recognition site for restriction enzyme BbvI) designed for each polymorphic locus. Then BbvI is used to digest the PCR products. The digested DNA is then used for primer extensions with multi colour fluorescent probes in mini-sequencing reaction formats. Apart from the PCR–RFLP methods, other PCR- based techniques such as allele-specific PCR, AFLP, TaqMan PCR can also be used for genotyping SNPs and the principles of some of these techniques are briefly outlined in Box 2.

The genotyping techniques based on DNA confirmation are based on differentiation of the DNA molecules due to conformational changes arising from difference(s) in their nucleotide sequences. When a single-stranded DNA molecule is electrophoresed in a nondenaturing gel medium, intrastrand-base pairing creates specific conformation for each molecule. Based on this principle, SSCP is used for genotyping purposes (Orita et al. 1989). Traditionally, the PCR product is denatured by heating and then snap-cooled on ice and run in a nondenaturing polyacrylamide gel. The bands are then detected either by ethidium bromide staining or by autoradiography. Genotyping of large number of samples by SSCP is preferably performed either by silver staining or by using fluorescently labelled primers (Schmalenberger & Tebbe 2003; Dong & Zhu 2005). High throughput approaches for parallel SCCP scoring can be achieved by using automated capillary sequencers, with optimized electrophoretic conditions (Baba et al. 2003). Cleavage fragment length polymorphism (CFLP) is another conformation-based genotyping method. The method relies on the use of cleavage I enzyme that cleaves the DNA strands at upstream junctions of any duplex structure (Lyamichev et al. 1999). Using this approach, heteroduplexes can be generated by mixing wild type and mutant type DNAs, and then cleavage I is used to digest the duplex DNAs. This can reveal any sequence polymorphisms between the wild type and the mutants. Temperature and assay time are important in the reproducibility of the CFLP pattern, and thus, optimization in the parameters are needed to run all sample in parallel conditions (Oldenburg & Siebert 2000). Nucleotides labelled with different fluorescent dyes may also be used in this method.

Denaturing high-performance liquid chromatography (DHPLC) is also used for SNP genotyping. It uses the fact that under partially denaturing conditions, heteroduplex DNA molecules are denatured faster than the homoduplex molecules. Thus, when DNA samples are subjected in an acetonitrile gradient in a liquid chromatogram, the heteroduplex molecules show more retention time than the homoduplex molecules (Underhill et al. 1997). The advantages of DHPLC over SSCP or CFLP are that no PCR amplification is required prior to genotyping and also that the triethylammonium acetate media reduce the range of melting temperature of the amplicons (Nairz et al. 2002). Denaturing gradient gel electrophoresis (DGGE) (Fischer & Lerman 1983) employs the differential melting behaviour of DNA molecules based on its sequence composition. A single base difference between two DNA molecules can alter the way they may denature. DNA molecule shows differential electrophoretic mobility based on the concentration of the denaturant or also temperature. DGGE methods are specifically suitable when GC rich sequences in the DNA pose difficulty for mutation screening.

Genome-wide SNP mapping is undertaken to construct haplotype maps or linkage blocks, to develop high resolution linkage maps and map unknown genes, or to study population structure and genetic diversity. It involves the screening of many thousands or even millions of SNPs present in the genome in large number (hundreds or even thousands) of individuals. This is where the need of high through put reaction formats and detection systems are necessary to achieve the goal in a cost effective way. There are several high throughput methods now available that can be used for genotyping thousands of loci in relatively short time. Large-scale genotyping is generally carried out using two basic steps: allele discrimination, and allele detection. There are three basic biochemical principles generally used for allele discrimination reactions. These principles are allele specific hybridization, primer extension or ligation (Fig. 2). Allele specific hybridization is a straightforward approach where the probe is hybridized to the DNA if the allele sequence is complementary to that of the probe. There will be no hybridization if the DNA contains mismatching sequence. In primer extension method, an oligonucleotide probe is used that has a 3′ end base complementary to the polymorphic base in the template DNA. When chain elongation takes place, only the template matching to the oligo is allowed to extend. On the other hand, in primer ligation method, two oligos are used, one specific to each allele and another common to both the alleles. The first step of the assay is performed at low annealing temperature to anneal the common oligo to the template DNA. The second step is performed at high temperature to ligate the allele-specific oligos. The ligation will take place only when the allele-specific oligo matches to the SNP site. There will be no ligation product in the case of mismatching oligo. Using these principles, either homogenous or solid phase reaction formats can be adopted for SNP genotyping. In homogenous formats, the whole assay is performed in solution from beginning to the end without any need of separation or purification of reaction products. On the other hand, solid phase reactions are performed on the solid surfaces of glass slides or array beads or silicon chips, and these methods are preferred for large-scale genotyping assays. In these cases, the allele specific oligo probes are covalently attached to these solid surfaces, and sample DNA (to be genotyped) is used as template in the allele discrimination reactions.

image

Figure 2. Principles of different methods used for allele discrimination. I. Allele-specific hybridization: When two alleles having a single nucleotide polymorphism (A to C) hybridized to the oligonucleotide probe with ‘T’ at the position of mutation, only the allele with ‘A’ as the polymorphic base, hybridizes. The other allele with ‘C’ does not hybridize. II. Primer extension method: It uses an oligonucleotide probe with 3′-end base complementary to the polymorphic base in the template DNA. When chain elongation takes place in PCR, only that template matching to the oligo is extended. The primer for the other allele is not extended. III. Allele specific ligation: It uses two oligos. One primer is allele specific and fluorescently labelled. The second primer (unlabelled) is common to both the alleles. The first step of the assay is performed at low annealing temperature to anneal the common oligo to the template DNA. The second step is performed at high temperature to ligate the allele specific oligos. The ligation will take place only when the allele specific oligo matches to the SNP and there will be no ligation product in the case of mismatching oligo. Monitoring the fluorescence of the product allows discrimination of the two alleles.

Download figure to PowerPoint

Common detection techniques in SNP assays are based on fluorescence, luminescence, and fluorescence polarization or fluorescence resonance energy transfer (FRET). Fluorescent labelling of a nucleotide or an oligo probe is the most common approach. Incorporation of the fluorogenic material into the reaction product identifies the allele matching or mismatching to the polymorphic base. Luminescence is mostly used in pyrosequencing methods, and is a direct measure of amount of template DNA incorporated in the ATP-dependent luciferase reactions. Pyrosequencing uses four enzymatic reactions, namely: (i) polymerase reaction for extending DNA strands, (ii) ppi conversion for producing ATP by the catalysis of ATP sulphurylase, (iii) light production by a luciferase-luciferin reaction, and (iv) a dNTP degradation reaction by apyrase. If the added dNTP is complementary to the base at SNP site and is incorporated into the hybridized sequencing primer, a peak is observed in the sequencing spectrum identifying the allele. Similarly, the principle of fluorescence polarization is based on physical properties of plane polarized light. Upon excitement, the emitted light from the fluorescently labelled molecule is polarized to a degree that depends directly on the molecular weight of the DNA. Thus, monitoring the fluorescence polarization of the emitted light, the differences in the molecular weight of two DNA molecules differing in a single nucleotide can be determined. On the other hand, the principle of FRET detection is based on close proximity of the donor and the acceptor dye whose emission spectra overlap with each other. If this proximity is destroyed or created in the allele discrimination reaction, transfer of fluorescence resonance energy takes place. It is a preferred method of detection for high throughput SNP assays using primer extension or ligation (where proximity is created) or 5′ nuclease reactions (where proximity is destroyed). Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI/TOF-MS) is another attractive method for allele detection. It is based on the principle that when a temporally and spatially well-defined group of ions of differing mass/charge ratios are allowed to drift in a region of constant electric field, they will traverse with time that depends upon their mass/charge ratios. The method can discriminate single base differences in nucleotides in two DNA alleles based on their mass/charge ratio (Wada et al. 1999). Based on the above genotyping and detection methods, several high throughput technologies are available for assaying large number of SNPs in parallel. Some of the array-based SNP genotyping methodologies, including Affymetrix Tag Arrays (Fan et al. 2000), Illumina's bead-array system (Gunderson et al. 2005) and arrayed primer extension (APEX) (Kurg et al. 2000), are very promising for exceptionally high throughput and multiplexing assays. Principles of some of these techniques are briefly described in Box 3.

Like SNP, ‘single feature polymorphism’ (SFP) is an emerging marker system to screen genome-wide DNA polymorphisms. Whole genome microarray is used as a genotyping platform for this purpose to identify these polymorphisms. Single features are fabricated by using oligos (typically 25 bp) from the genome sequences. The basic principle of identification of single feature polymorphism using this method is illustrated in Fig. 3. This method has been used successfully to identify genomic regions and QTLs governing various phenotypic traits in plants and also in yeast (Borevitz et al. 2003; Winzeler et al. 2003). The method can be used to detect SPFs in insects to understand genetic diversity in closely related species or between different biotypes or strains. It has been recently used for genomic differentiation of two sympatric forms M and S of African malaria mosquito A. gambiae (Turner et al. 2005). Using Affymetrix genechip containing 142 065 probes (25 bp) covering the entire genome of A. gambiae; Turner et al. (2005) performed microarray hybridization with genomic DNA from the two strains. Using this approach, they identified three specific genomic regions differentiating the M and S strains. Mutations in these three regions (∼2.8Mb that contained 67 genes) on chromosome 2 L, 2R and X are fixed unlike those in the nearby loci. In spite of gene flow between the two strains, these three specific genomic regions remained differentiated between M and S strains suggesting that these are speciation islands containing genes responsible for developing reproductive isolation between them (Turner et al. 2005). Particularly, this method is regarded as a very powerful emerging technology for genomic analysis of populations undergoing recent genetic differentiation (Butlin & Roper 2005).

image

Figure 3. Microarray methods used for detection of single feature polymorphisms (SFPs). Genomic DNA from three strains of a species shows differential intensity in the signal when hybridized to a single feature to a particular gene compared to the reference gene signal. Mutations (either base substitution or deletion) in the corresponding locus in the related strains give rise to differential hybridization strength, when hybridized to a particular ‘feature’. That generates differential signal intensity in the microarray hybridization. A reference feature is used as control to compare the signal intensities in samples. Arbitrary sequences are provided to explain the principle. The common sequence (top) for each strain represents a hypothetical feature. The DNA sequences of the loci corresponding to the feature in the three species are shown below (indicated by arrows). The bases underlined shows sequence differences of the targets (in strain II). The deletion mutations of the target compared to the feature in strain III are shown by ‘asterisks’.

Download figure to PowerPoint

Concluding remarks

  1. Top of page
  2. Abstract
  3. Introduction
  4. Conventional marker systems
  5. Major applications of molecular markers in studying insect ecology
  6. Novel marker systems
  7. Some concerns on marker applications in insect research
  8. Large-scale genotyping
  9. Concluding remarks
  10. Acknowledgements
  11. References

Because of the simple nature of DNA itself, we tend to assume that the technologies to detect DNA polymorphisms are not infinite. If this statement is correct, then we must be getting near the asymptote of the sigmoidal graph of DNA techniques vs. time (let's say since 1953), and that it is unlikely (although not impossible) that many more techniques will be developed in the next 20 years or so. However, at this point, the field of molecular marker technology is fast progressing by adopting new forms and innovative approaches of the existing genetic principles in detecting DNA polymorphism.

One example of this scenario is the use of novel approaches for generating new class of genetic markers. Insect chromosomes are home to numerous DNA and RNA transposons, some of which are highly abundant in the genome. Hence, comparative sequence studies of the repeat elements in diverse insect species can provide useful information on how to make use of them for developing abundant markers that can be used in those species. Another example is the possible application of genomic tools to identify EST markers and other cDNA markers that may be derived from microRNA targets. Global analysis of EST using microarray technology has shown unseen power in revealing expression polymorphism in insects (Whitfield et al. 2002, 2003). The expression and mRNA abundance of many protein-coding genes are regulated by microRNAs. The microRNAs are small (∼22 nt) regulatory RNA molecules that bind to the small stretches in the 3′-untranslated regions (UTR) of target mRNAs and cause post-transcriptional regulation of the target genes. A target mRNA often bears conserved sequences at several locations in the 3-UTR (for example, K box, Brd box, GY box in those of mRNA of Notch genes) where microRNAs bind. Thus, in principle, primers designed based on these conserved regions can be used in PCR amplification of the cDNA fragments corresponding to the microRNA binding regions (Boutla et al. 2003). Also, as the 5′ end of the microRNA (2–8 bases) makes perfect complementary base pairing with the mRNA, it should be possible to use PCR primers, with 3′ end (8 bases) complementary to the 5′ ends of microRNAs, to fingerprint cDNA by cDNA-PCR approaches (Behura et al. unpublished observation). Because, microRNA has hundreds of targets in the genome and often each target has binding regions for multiple microRNAs, these abundant microRNA-binding sites in messenger RNA can be used to generate thousands of cDNA markers.

The most awaiting aspect, at least to me, is the widespread use of high throughput SNP genotyping in insect studies. These markers are highly abundant, and hence, can be the best marker system to perform genetic analyses for ecological studies. Candidate SNPs can be identified by sequencing a set of loci from a small set of individuals, and the sequence information then can be used for genotyping a large number of individuals in that population. Where genomic trace sequences and overlapping EST sequences are available, those can also be exploited computationally to identify potential SNP markers for genotyping individuals. Although SNP mapping have mostly been limited to Drosophila research (Berger et al. 2001; Hoskins et al. 2001; Kaminker et al. 2002) at the moment, these markers have huge potential in other insects to develop SNP genotyping projects, where genome sequences are available. The other system that holds huge promises is the analyses of DNA polymorphisms in microarrays. Also, promising methods are now available to isolate microsatellite sequences (Zane et al. 2002) and to genotype them using fluorescent-ISSR–PCR (Nagaraju et al. 2002). Application of whole genome microarray represents another new approach to perform genomic DNA analyses in closely related insect species (Butlin & Roper 2005; Turner et al. 2005).

After all, DNA sequencing is the ultimate polymorphism detection system and may be that in 20 years’ time, the technology will be so routine that an insect will be placed in a small tube with buffer and homogenized and then some relatively short time later, the entire genome sequence will emerge as data from a sequencing machine, or certainly, large sections of it. I realize that this is at present science fiction, but the fact may not be that far ahead. At the moment, a total of eight insect species are in genome assembly stages and another 35 are in progress for genome sequencing (NCBI genome resources; http://www.ncbi.nlm.nih.gov/genomes). These projects will help us to perform genome-wide mutation analyses and to apply high throughput genotyping technologies in insects by using the forthcoming genome sequences. Comparative genomic approaches can be applied to identify highly conserved gene family, transposons, repeat elements, and noncoding conserved regulatory elements in insects that can be exploited to develop new generation of marker systems. Application of such comparative genomic approaches in insects has just begun (Kaufman et al. 2002) and genome sequences of D. melanogaster and Anopheles gambiae along with the forthcoming genome sequences of other important insect species such as Apis mellifera, Bombyx mori, Aedes aegypti, Drosophila yakuba, Drosophila pseudoobscura will bring into light the many aspects of comparative genomics of these insects very soon. This information can potentially be exploited to develop new generation marker systems based on highly conserved regions in insect genomes and may also provide useful information towards developing generic marker systems such as universal microarrays (Belosludtsev et al. 2004) that can probably be readily applied to many related insect species. These technologies in the field of entomology are expected to provide new direction to study insect genomes in an unprecedented way in the years to come.

Footnotes
  1. Box 1 Computer software to predict single nucleotide polymorphism in DNA sequences

    polybayes (Marth et al. 1999) program is scripted to perform anchored alignment, paralog filtering and SNP detection using Bayesian methods and reports the SNP locations and the associated probabilities in a text formatted output file.

    autosnp program (Barker et al. 2003) uses a gapped FASTA-formatted multiple alignment file in cap3 assemble program to identify the potential SNPs. autosnp has recently been used in a web-based SNPServer for online services of SNP identification (http://hornbill.cspp.latrobe.edu.au/snpdiscovery.html).

    forage program (Unneberg et al. 2005) uses two neural networks and a set of training data set of SNPs to identify a base discrepancy in the multiple alignments. First, the query data set (for example EST sequences) are used to perform the basic filtrations. If a site in the alignment exists in at least two sequences with a discrepancy, it uses it as a vector with component features such as the major allele frequency, smallest error probabilities for each minor allele, and SNP probability. The vector is then submitted to both neural networks and if a unanimous quorum is reached, it is recorded as a SNP candidate.

    insnp program (Manaster et al. 2005) uses first stretch of six consecutive bases where there are no differences between the reference and any one the trace sequences. It uses this as the coordinate to anchor the sequences, and checks for base differences till the half distance of each sequences. Only those base differences are marked as SNPs.

    novosnp software (Weckx et al. 2005) uses a cumulative scoring scheme to score each position in the alignment, which is performed by addition of three subcores (feature score, difference score and peak shift score) determined independently from forward and reverse reads of the aligned sequences with an extra ‘type’ score. The cumulative scoring scheme shows how well both the forward and the reverse reads match at the particular base which forms the basis of detecting a single nucleotide polymorphism.

    snpceqer (Flood et al. 2002) is used in Beckman CEQ2000 (Beckman) genotype system. It reports discrepancies between individual sequences and the consensus sequence and detects SNPs based on individual base call values. It is available free at http://innovation.swmed.edu.

  2. Box 2 Low to medium throughput genotyping approaches

    Allele-specific PCR: Allele-specific PCR-based assay (ASP) (Saiki et al. 1986) uses conventional PCR with two primers designed from SNP-flanking sequences. A common primer matching to both the allele along with two specific primers, with 3′ ends matching to the polymorphic bases in the two alleles, are used so that allele specific amplification occurs. Simple electrophoretic gels can detect the products.

    TaqMan assay: The TaqMan technique (Livak 1999) utilizes the exonuclease activity (5′−3′) of Taq DNA polymerase. The oligonucleotide probe contains a fluorescent reporter dye at the 5′ end and a universal quencher fluorescent dye at the 3′ end. As long as they are close to each other, the fluorescence is quenched. When the probe and the template DNA match, the exonuclease activity of the polymerase cleaves the probe and the fluorophore is released. The emitted fluorescence is then monitored to discriminate the alleles.

    On-off switch assay: Use of DNA polymerase with proof reading activity in combination of allele-specific modified primers (generally phosporothiorate, PTO or locked nucleic acid, LNA to render resistance to the exo-activity of the polymerase) has also been successfully exploited for SNP genotyping (Lin-Ling et al. 2005; Zhang et al. 2005). Termed as ‘on/off switch assay’ it works on the simple principle that if the modified primer matches to the template SNP, the proof reading activity does not come into action, and hence, the primer extension take place. But if the modified primer is mismatched, the DNA polymerase then attempts to employ its exo activity, but fails (because of primer modification). Thus, no primer extension takes place.

    RCA technology: SNP specific ligation using padlock oligos followed by rolling cycle amplification (RCA) of the alleles is also an attractive method for SNP genotyping (Qi et al. 2001). The method uses cyber-gold staining of the reaction product to identify the polymorphic DNA molecules. Only ∼50 ng starting DNA is required per assay, and hybridization-based bi-allelic assay of the rolling cycle PCR products with fluorescence dye-labelled probes makes it ideal for cost effective genotyping method. Commercial format of RCA format is available from Amersham Pharmacia Biotech Inc.

    RNaseCut method: Similarly, RNaseCut is an innovative method that is based on digestion of RNA transcripts from PCR fragments (Krebs et al. 2003). These are then digested with guanosine-specific ribonuclease T1 to generate sequence specific fingerprints that are then analysed via MALDI-spectrometry.

    Phosphodiesterase: Phosphodiesterase enzyme is also used for SNP discrimination. Allele specific PCR product is generated by terminally modified (methylphosphonate) oligo in presence of ddNTPs (Sauer et al. 2000). When the product is digested by phosphodiesterase, it digests the amplicon leaving back the modified backbone of the primer portion that is linked to the negatively charged phosphate group of dideoxynucleotide matching to the SNP. This negatively charged tag could then be monitored in MALDI to assign its genotype.

    Pyrosequencing: In a pyrosequencing reaction, the DNA polymerase incorporates a nucleotide to the template DNA and it releases pyrophosphate (PPi) that is converted to ATP (by ATP sulphurylase). This releases energy that is converted to light (through oxidation of luciferin), which is detected by a photon detector in order to identify the nucleotide. Template for pyro-sequencing is generated by using two or more primers (a common labelled primer at one end and allele specific primers on the other end of the SNPs) in the same reaction to first amplify the DNA containing multiple SNPs. After cleaning the products, they are then passed through a column of sepharose beads (coated with streptavidin) so that after washing single stranded DNAs can be isolated. This is then used to carry out pyrosequencing reactions using the allele specific primers to determine the genotypes in the individual samples. Recently, a multiplexed format of SNP assay based on pyrosequencing method has been developed for high throughput genotyping (Zhou et al. 2005).

    SNPwave: This is an AFLP-based method that utilizes padlock probes followed by selective amplification (van Eijk et al. 2004). This method utilizes the ligation-mediated circularization of the padlock probes. Upon hybridization to the denatured genomic DNA, the 3′-OH end of the probe makes a phosphodiester bond with the adjacent 5′-PO4 end of a common sequence in the probe resulting in a closed circularization of the molecule. This happen only if the 3′ end of the probe matches to the polymorphic base, but when there is mismatch, no close circularization takes place. In the second step, these closed circular molecules are amplified selectively with AFLP primers with two selective nucleotides at their 3′ end. This way only those closed ligation probes are amplified which contain perfectly base-paired nucleotides adjacent to common primer sequences. The bands are separated in a capillary (MegaBASE using 1 × LPA matrix) sequencing machine based on their size differences induced by stuffer nucleotides.

    SNapIT Technology: SNaPIT technology (Curran et al. 2002) uses the high specificity of Uracil-DNA Glycosylase (UDG), a DNA repair enzyme to digest the template DNA in allele specific manner. At the first step, a PCR is performed to replace the thymine residues with uracils in the template DNA. Then UDG is used to excise the uracil bases and generates apyrimidinic sites, which are cleaved under alkaline conditions. This generates fragments of different lengths based on the alleles, which can be readily detected in denaturing conditions by PAGE or using an ABI-PRISM Genetic Analyser.

    ARMS assay: The amplification refractory mutation system (ARMS)-PCR (Newton et al. 1989) is similar to allele-specific PCR assay. The PCR assay uses oligonucleotides with 3′ ends matching to the two alleles. Two SNP specific primers designed from SNP-flanking sequences at one end and another common primer matching to both the allele on the other side are used to discriminate the two alleles, and simple electrophoretic gels can detect the products. Alternatively, SyberGreen can be used as dsDNA binding dye to efficiently detect large-scal.

  3. Box 3 High throughput genotyping technologies

    Invader assay: It uses flap endonucleases that specifically recognize the overlapping nucleotides in a DNA structure formed by hybridizing two overlapping oligos to a target template molecule (Lyamichev et al. 1999). It then cleaves the downstream oligonucleotide probe to remove the 5′ end flap nucleotides of the perfect overlapping structure. Two signal probes (one for each allele) are designed so that they overlap the polymorphic base in the template DNA. When an upstream oligo probe (invader) is used for hybridization, the downstream oligo is cleaved if only the signal probe has a match to the polymorphic base. No cleavage takes place if the oligo is not specific to the allele. Then the cleaved allele-specific probe is used in a (fluorescence resonance energy transfer) FRET-based reaction to amplify the signal so that the oligo cleavage can be easily monitored (Ledford et al. 2000).

    APEX: This is basically a primer extension-based allele discrimination technique coupled with microarray-based detection. The oligonucleotide probes are immobilized on a glass surface. The target genomic DNA is digested with specific restriction enzymes and is hybridized to the immobilized primers. Then, labelled ddNTPs are used for primer extension that can detect a match or mismatch in the template DNA based on monitoring the labelled ddNTP incorporation in the reaction products (Kurg et al. 2000).

    Bead array technology: The technology (Gunderson et al. 2005) uses beads with two 75-mer oligos as capture probes. The capture sequences in both oligos are the same; only the 3′ end base differs from each other. Pre-amplified genomic DNA is used for hybridization onto the bead array. Then, an array-based primer extension reaction is performed in presence of biotin-labelled dNTPs. If the captured genomic DNA molecule had a match to the 3′ end of the bead probe, biotin label gets incorporated, and if that is a mismatch, no label is incorporated. The incorporated biotin signal is then amplified using a sandwich protocol and then after washing the array, the signal is recorded by a suitable imaging system to score the SNPs.

    Tag Array: The technology (Fan et al. 2000) uses generic oligonucleotide arrays containing thousands of preselected 20-mer oligonucleotide tags. The genomic DNA is amplified by SNP specific primers. The PCR product is then used for single base extension (SBE) reactions using chimeric primers with 3′ end complementary to the specific SNP sites and 5′ end complementary to specific probe tags on the array. Multi-coloured dyes are used to label ddNTPs that is used in the SBE reaction. The extension primers, terminating one base before the SNP sites are thus extended with different label for the two SNP alleles. This is then hybridized to the tag array. The genotypes are deduced from the fluorescence intensity ratio of the colours.

    MassCode: The technology (Kokoris et al. 2000) uses small tags covalently attached to the primers used for allele-specific amplifications. The tag is attached to the primer via a photo-cleavable linker molecule. After the amplifications steps, the tags are cleaved from the primers by photolysis. SNP alleles in the product are then determined by the relative ratio of the paired tags (coming from matched and mismatched alleles). Multiplexed formats using 30 different tags can be done using this approach with a potential to genotype 40 000 SNPs in a day.

Acknowledgements

  1. Top of page
  2. Abstract
  3. Introduction
  4. Conventional marker systems
  5. Major applications of molecular markers in studying insect ecology
  6. Novel marker systems
  7. Some concerns on marker applications in insect research
  8. Large-scale genotyping
  9. Concluding remarks
  10. Acknowledgements
  11. References

I am grateful to Dr Charles Whitfield of University of Illinois at Urbana-Champaign for his encouragement and support. I am also thankful to Dr Jeffery J. Stuart of Purdue University, West Lafayette, Indiana, for his encouragement and to Drs H. D. Loxdale and Louis Bernatchez for their helpful suggestion in improving the manuscript. I apologize for not being able to include other works on insect molecular markers due to space limitation.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Conventional marker systems
  5. Major applications of molecular markers in studying insect ecology
  6. Novel marker systems
  7. Some concerns on marker applications in insect research
  8. Large-scale genotyping
  9. Concluding remarks
  10. Acknowledgements
  11. References

Susanta K. Behura works on molecular tagging and mapping of avirulence genes in agricultural insect pests such as rice gall midge and Hessian fly. Currently, he is involved in SNP mapping projects of honeybee genome towards understanding population structure and the spread of killer bees in the United States. He is employing large scale SNP genotyping methods along with micoarray expressions of brain genes for eQTL mapping of behavioral phenotypes. He is also generating expression profiles of microRNAs in developing bee brain and to study their possible role in age dependent behavioral plasticity (interested in foraging behavior).