Functional genomics of life history variation in a butterfly metapopulation


Christopher W. Wheat, Fax: +358 9 191 57694;E-mail:


In fragmented landscapes, small populations frequently go extinct and new ones are established with poorly understood consequences for genetic diversity and evolution of life history traits. Here, we apply functional genomic tools to an ecological model system, the well-studied metapopulation of the Glanville fritillary butterfly. We investigate how dispersal and colonization select upon existing genetic variation affecting life history traits by comparing common-garden reared 2-day adult females from new populations with those from established older populations. New-population females had higher expression of abdomen genes involved in egg provisioning and thorax genes involved in the maintenance of flight muscle proteins. Physiological studies confirmed that new-population butterflies have accelerated egg maturation, apparently regulated by higher juvenile hormone titer and angiotensin converting enzyme mRNA, as well as enhanced flight metabolism. Gene expression varied between allelic forms of two metabolic genes (Pgi and Sdhd), which themselves were associated with differences in flight metabolic rate, population age and population growth rate. These results identify likely molecular mechanisms underpinning life history variation that is maintained by extinction–colonization dynamics in metapopulations.


Genomic studies of wild populations have the potential to reveal the genetic basis of traits affecting fitness and may ultimately lead to a synthesis of population biology and genomics (Ellegren & Sheldon 2008). Historically, genetic variation affecting fitness in populations was expected to be uncommon due to fixation by selection (Fisher 1958), though opposing views were also expressed (Dobzhansky 1955; Lewontin 1974). Over time, research focus has shifted to understanding the dynamics that can maintain such variation (Wade & Goodnight 1998). Theoretical (Frank & Slatkin 2007) and empirical studies (Cain et al. 1990; Gibbs & Grant 2002) have shown that temporally varying selection due to changing environmental conditions may maintain genetic variation with large fitness effects (Bell 2010). Similarly, it is well known that selection may vary from one habitat type to another in a heterogeneous environment, thereby maintaining genetic variation at the landscape level (Levene 1953; Schaeffer 2008). Less well understood is what happens in fragmented landscapes in the absence of habitat differences but with a high rate of population turnover. Can extinction–colonization dynamics lead to diversifying selection and maintain variation with large fitness effects?

An increasing number of studies have successfully related phenotypes with known fitness effects in natural populations to specific genetic variants. Technological advances further increase the pace at which such genotype-to-fitness connections can be established across a range of species. However, the majority of such studies have focused upon conspicuous morphological phenotypes [e.g. coat colour in desert mice (Nachman et al. 2003) and beak size in Darwin’s finches (Abzhanov et al. 2006)]. Few studies have used a functional genomic approach to investigate genetic variation affecting morphologically cryptic phenotypes that interact with ecological dynamics. To study such phenotypes requires ecological knowledge to identify the phenotypes of interest and to sample them in an appropriate manner.

Here we examine the consequences of repeated local extinctions and re-establishment of new populations for the pattern of genetic variation with large fitness effects across a heterogeneous landscape. Specifically, we test the hypothesis that gene expression phenotypes and alleles associated with large phenotypic effects on fecundity and dispersal become assorted according to population age by the metapopulation dynamics (Hanski et al. 2004). We use data and material from the long-term ecological study of the Glanville fritillary butterfly (Melitaea cinxia) in the Åland Islands in Finland (Hanski 1999). This large metapopulation persists in a balance between stochastic local extinctions and establishment of new populations in a network of several thousand small meadows (Hanski 1999). The extinction and colonization rates and thereby the viability of this (Hanski & Ovaskainen 2000) and other metapopulations (Ronce & Olivieri 2004) are influenced by many life history traits. Recently, a non-synonymous SNP in the glycolytic enzyme gene phosphoglucose isomerase (Pgi) (Orsini et al. 2009) was found to be strongly associated with a range of life history traits (Niitepõld et al. 2009; Saastamoinen et al. 2009) and even population growth rate in the Glanville fritillary (Hanski & Saccheri 2006), but beyond Pgi there is no knowledge of genetic mechanisms affecting life history variation in this species.

The strong Pgi effect on individual performance and fitness components in the Glanville fritillary, which is perhaps the best documented genetic polymorphism affecting coupled ecological and evolutionary dynamics (Hanski & Saccheri 2006; Zheng et al. 2009), has stimulated us to apply functional genomic tools to further investigate how genetic variation interacts with metapopulation processes. Here, we compare gene expression in butterflies from new populations (habitat patches recently colonized by females that dispersed from existing populations) with butterflies from old populations. With these data, we also systematically scan for additional polymorphic loci associated with population history. We validate findings of differentially expressed genes with physiological studies of the relevant traits using independent samples, and we examine how gene expression and flight metabolic phenotypes vary between allelic forms of Pgi and another polymorphic metabolic gene revealed by our transcriptome scan. Finally, we determine how allele frequencies at these two loci are related to metapopulation dynamics in a large independent sample.

Materials and methods


The butterflies used in the experiments were common-garden reared offspring (below) of butterflies originally sampled from the Glanville fritillary metapopulation in the Åland Islands in Finland (Hanski 1999). Sample sizes and a summary of material used in the experiments are shown in Table S1 (Supporting information). Throughout the text we refer to new and old populations. The age of local populations is known based on long-term annual surveys since 1993 (Hanski 1999; Nieminen et al. 2004). Here we compare butterflies originating from new local populations, established by dispersing butterflies, with butterflies originating from old local populations, which had persisted for at least 5 years. Populations of both age categories were scattered across the Åland Islands (50 km × 70 km).

Sample for gene expression and metabolic rate measurements.  To minimize any maternal effects, we used second-generation butterflies that had been reared under common garden conditions on the natural host plant Plantago lanceolata grown in a greenhouse. The field-collected material consisted of ∼400 final instar larvae sampled from 60 different local populations in 2005. These larvae were reared to adults under constant environmental conditions (12:12 L/D, 25 and 20 °C), and released into a large (30 m × 26 m) outdoor population cage with natural conditions (Saastamoinen 2007b). Mating and egg laying were observed and recorded (Saastamoinen 2007b). During spring 2006, ∼200 of the offspring were reared to adults under the above-described conditions. Of them, 65 unmated females were fed ad libitum honey water on day 1 post-eclosion. On day 2, the butterflies were flown in a respirometer for 10 min (see below), and then quickly (<2 min) dissected to isolate the head, thorax, and abdomen, which were immediately flash frozen in liquid nitrogen. From this set, we randomly selected an equal number of new-population and old-population butterflies to use in the gene expression study, the population origin being determined by the known matriline. Material used in the gene expression study comprised 34 thoraces (21 matrilines from 9 new and 12 old populations) and, from a subset of the same individuals, 18 abdomens (12 matrilines from 6 new and 6 old populations). We conducted multilocus single nucleotide polymorphism (SNP) genotyping of butterflies used in the microarray experiment, which showed that new and old populations do not comprise genetically distinct subgroups (P = 0.30; Fig. S1, see Supporting information for methods). This was expected based on the extensively documented biology of the metapopulation, wherein butterflies have limited dispersal distances (up to 2–3 km) and new and old populations occur throughout the large habitat network [50 km × 70 km (Hanski 1999)]. These and an additional 29 females from the same spring 2006 material described above were genotyped for a polymorphism, succinate dehydrogenase d (Sdhd), that we discovered in the gene expression study (total N = 94 butterflies from 33 matrilines).

Sample for oogenesis physiology.  To verify the results of gene expression in the abdomen, we examined reproductive physiology of females reared from larvae collected in April 2005 from 11 different local populations (6 new, 5 old). Larvae completed development on natural host plants (P. lanceolata) under common-garden conditions. Starting on the day of adult eclosion (day 0) and up to day 2, virgin females were sampled between 2 and 8 h after lights on, with no sampling time bias for the two population ages, in order to minimize potential effects of circadian rhythmicity in juvenile hormone (JH) titre that occurs in at least one insect (Zhao & Zera 2004).

Sample for additional measurements of metabolic rate.  To examine the association between metabolic rate and genotypes of interest in an independent sample, we genotyped butterflies from a previous study that examined metabolism phenotypes (Niitepõld 2010). In this case, 71 virgin females from 20 matrilines were measured for flight metabolic rate at adult day 2. The butterflies had been reared under the above-described laboratory conditions from larvae collected in the field in 2004.

Methods and analyses

In every analysis presented in this paper, family (matriline) was included as a random factor to account for relatedness.

Metabolic rate measurements.  Peak metabolic rate (PMR) and total CO2 emitted during 10 min of flight were measured using methods previously described (Haag et al. 2005). Briefly, individual butterflies were placed in a plastic jar modified for flow-through respirometry and the production of CO2 was recorded during gentle shaking of the jar as needed to stimulate continuous flight. A z-transformation (Bartholomew et al. 1981) was used to remove autocorrelation caused by diffusion in the gas stream and obtain instantaneous metabolic rates.

Microarray design and annotation.  Details of the microarray design have been reported (Vera et al. 2008). Briefly, 454 GS20 sequencing was used with mRNA from a diverse tissue collection. Assembled contigs [Assembly v1.0 (Vera et al. 2008)] and singletons were annotated using Blast homology searches against predicted gene sets of silkmoth (Bombyx mori, Lepidoptera: Bombycidae) and fruit fly (Drosophila melanogaster, Diptera: Drosophilidae) genomes, and the UniProt database (Vera et al. 2008). Contigs and singletons that had blast hits (bitscore >45) were used for designing a tiled series of hybridization probes (60 mers) using the publicly available Agilent eArray tool. The best-performing probe (highest hybridization intensity) per sequence was identified from a preliminary array hybridization. 45K-feature Agilent microarrays were printed with 14 251 probes, each at least in triplicate, which corresponded to approximately 9000 unigenes. We subsequently obtained an additional ∼600 000 EST reads using 454 FLX sequencing that we assembled and annotated together with the original sequences, which formed assembly v2.0. We used the latter assembly to better annotate the microarray probes and inform our analyses. These assemblies, annotations (Melitaea cinxia transcriptome assemblies v1.0 and v2.0), and raw sequences are available at (for Assembly v1.0 data see also the NCBI Sequence Read Archive,

RNA labeling and hybridization to microarrays.  Total RNA was extracted using Trizol reagent (Life Technologies) from individual flash frozen thoraces and abdomens followed by RNA column purification (RNAeasy, Qiagen) and cDNA synthesis, with T7-(dT)24 primers. Using this template, amino-allyl-UTP was incorporated during in vitro aRNA amplification (Ampliscribe T7 kit; Epicenter Biotechnologies). Subsequently, Alexa Fluor 555 or 647 dyes were incorporated using succinimidyl ester chemistry, unincorporated dye was removed by gel filtration, and the resulting labelled RNAs were quantified before hybridization to Agilent arrays following manufacturer’s protocol. Quantification and quality control assays at various steps used both Nanodrop and Bioanalyzer to assess aRNA quality and dye incorporation. Using slides containing four arrays, we hybridized labelled RNA from one old- and one new-population butterfly onto each array. Dye was randomized and balanced across population type for a balanced incomplete block design (Kerr & Churchill 2001) that maximized the number of biological replicates with no pooling, thereby maximizing estimates of both error and population variance (Kerr 2003). Slides were scanned at 5 μm resolution, averaged over two passes, at 100% laser power, with PMT set to optimize dye channel balance. Scanned images were checked in each dye for the presence of regional effects or artefacts, with rejected arrays being repeated.

Filtering and trimming of hybridization intensities.  Log2 transformed hybridization intensities of spot median pixel intensity were filtered to flag probes having mean intensities within two standard deviations of either the mean negative or positive controls (i.e. background or saturated signal). Probes having >50% of individuals flagged within a population age, dye, and tissue category were excluded from the analysis. Coefficients of variation (CV) among replicated probes within arrays that passed quality filtering were low (95% of probes had CV <5% of their mean), indicating excellent technical performance (Table S2, Supporting information).

Expression data analysis.  To identify expression differences in individual genes, we used the normalized expression data in linear mixed model analyses (Wolfinger et al. 2001). Log2 transformed fluorescence intensities were quantile normalized by tissue and used in mixed model analyses as implemented in JMPGenomics 3.2 (SAS Inc.), which uses the SAS PROC MIXED procedure with an adjusted (i.e. Type III) sum of squares that allow for both fixed and random factors, as well as gene-specific variance components (Wolfinger et al. 2001). Random factors account for the hierarchical structure of the experimental design (Spot + Slide +Array + Butterfly + Family + Array × Spot), including the spatial (Spot), grouped (Slide), and paired (Array + Array × Spot) effects (Gibson & Wolfinger 2004). Inclusion of butterfly in the model accounts for the technical effects captured in a dye × array interaction as well as the correlation among replicate spots within a probe (Rosa et al. 2005). Inclusion of family as a random factor accounted for correlated patterns of gene expression among sibs.

We used a two-step approach to the mixed model analysis of gene expression in each body section. Our first objective was to determine expression differences associated with population age, and so the first analysis, referred to hereafter as MM-1 (mixed model 1), included only population age (Popage) and a technical variable (Dye) as fixed effects. Our second objective was to examine the effects of factors independently from their association with population age. Hence, the second model for abdomen samples, referred to hereafter as MM-2, contained the following fixed effects: Popage, Dye, Pgi genotype (Pgi_111_AC vs. Pgi_111_AA) and Sdhd genotype (presence vs. absence of the Sdhd D allele). MM-2 analyses of thorax data included those factors along with peak metabolic rate and total CO2 production. Microarray data is available at NCBI's Gene Expression Omnibus (GEOArchive).

Enrichment analysis to detect co-varying expression of functionally related groups of genes.  Mean expression differences between population ages and other comparisons were generally small (<2-fold change), as expected for a study of individuals from a single population and an experiment that did not involve any environmental or physiological manipulation (Oleksiak et al. 2002; Crawford & Oleksiak 2007). These small differences posed a challenge for inferring significant differences while controlling type 1 error across thousands of tests, an issue that is common in studies of standing variation in gene expression (Mootha et al. 2003). Therefore, in addition to a mixed model analysis of the normalized hybridization intensities of individual genes, we used gene set enrichment tests to identify co-varying sets of functionally related genes. These tests used gene ontology (GO) annotations to detect over-representation of GO categories in the ranked list of mean expression differences (Mootha et al. 2003; Al-Shahrour et al. 2007). Gene set enrichment analysis is also an excellent tool for identifying trans-regulated genes, as enrichment analyses detect sets of co-regulated genes sharing a common biological function that are commonly trans-regulated.

Prior to performing enrichment analyses, we minimized redundancy of the microarray probes (i.e. technical replication at the level of unique mRNA transcripts) and used homology searches to assign Flybase gene IDs to probes (see Supporting information for detailed methods and Fig. S5 for a flow diagram). Flybase IDs and gene expression data were used as inputs in an enrichment analysis using Fatiscan ( We used the D. melanogaster reference species option, which uses KEGG to assign Flybase IDs to gene ontology (GO) terms. The results show, at different levels of the gene ontology hierarchy, groups of functionally related genes that have a systematic bias for higher expression in either of the two population ages. Reported P values take into account the multiple testing inherent in the enrichment analysis; they are adjusted P values based on False Discovery Rate (FDR) method (Benjamini & Hochberg 1995; Benjamini & Yekutieli 2001).

Female reproductive physiology.  Sampling involved collection of hemolymph for biochemical analyses of protein and JH followed by counting chorionated eggs in dissected ovarian follicles as described in Webb et al. (1999). Total soluble protein in hemolymph was first quantified and then separated using SDS-PAGE. Levels of both vitellogenins and a ∼78 kDa serum protein were estimated using peak intensity of their bands in digital gel images. Lipid hormones in hemolymph extracted with 80% acetonitrile were analysed as in Westerlund & Hoffmann (2004) with LC–MS/MS (Waters Micromass Quattro micro™ triple quadrupole mass spectrometer operated in positive mode). Hormone detection and identification was carried out by matching of retention times with those of authentic standards and monitoring the following diagnostic transitions: m/z 267 > 235 (JH III), 281 > 249 (JH II), 295 > 263 (JH I), 447 > 303 (ecdysone), and 463 > 301 (20-hydroxyecdysone) with a dwell time of 100 ms per channel. Only JH III was detected. Its level was quantified using the external standard method with signal-to-noise ratios of 3 and 10 to define limits of detection and quantification, respectively. Detection limit was 0.05 pmol.

Quantitative PCR.  The same target cDNA we hybridized to microarrays was used as template in qPCR assays measuring the level of mRNA for angiotensin-converting enzyme, vitellogenin, and actin (cytoplasmic), with the latter serving as the endogenous control. The Glanville fritillary transcriptome assembly v2.0 was used to design primers with which we ran sample, standard, and NTC reactions in triplicate on an ABI 7500 Fast Real-Time PCR System (Applied Biosystems). Levels of mRNA were calculated using the standard curve method with curves for each plate and amplicon deriving from Ct values for qPCRs containing template from six different amounts of a cDNA standard. See Supporting information for methods details.

Systematic scan for population-age differences in nucleotide variation.  Each of the 14 251 unique probes on the microarray is potentially sensitive to genetic variation in our sample, and our data for SNP probes (methods described in Supporting information) showed that transcripts having a single nucleotide variant from the microarray probe could cause a 2-fold or greater difference in hybridization efficiency, with potentially greater changes resulting from multi-nucleotide or indel polymorphism (Hughes et al. 2001). In contrast, transcript abundance variation is likely to be small (i.e. <2 folds) among healthy, common-garden reared individuals from the same population. To search for ecologically important genetic polymorphism that could produce high hybridization variance (e.g. single or multiple DNA differences, insertions, deletions, splice variants), we took the following approach. First, we ranked the probes based upon their absolute expression difference between population ages (new vs. old) from the mixed model analysis. Second, we ranked the probes based on their among-butterfly variance in hybridization intensity. Third, we took the sum of these two ranks, and ranked that value. This approach was performed on both the abdomen and thorax expression data.

We then took the sum of these tissue ranks, and ranked this sum. Finally, we examined the top 5 ranked probes, i.e. those with the most extreme differences between population types and variance among individuals across two tissue types. Each of these five probes was examined for potential divergent allelic variation in our EST assembly by aligning the microarray probe with the respective contig.

Genotyping Pgi.  A portion of the dscDNA generated prior to dye incorporation was used to PCR amplify the first 1209 bp of the Pgi gene from all individuals used on the microarray [forward primer = first 21 bp of Pgi gene, reverse primer was the degenerate primer previously reported (Orsini et al. 2009)], which was subsequently sequenced using internal primers [mPGI-598R and mPGI-1192R (Orsini et al. 2009)]. All primers were located in regions free of SNP variation, or in sole case of known low-frequency SNP variation, the primer was degenerate in this location.

Genotyping Sdhd.  The indel polymorphism in Sdhd was discovered initially by blasting the Sdhd microarray probe against our transcriptome assembly and discovering that there was an additional contig containing ESTs of the deletion allele that aligned partially to the probe sequence. To properly characterize this polymorphism, a portion of the 3′-UTR containing the indel was PCR amplified from genomic DNA or cDNA using fluorescently labelled forward (5′-/56-FAM/--ACTTAATGAAAAGYGTGATTG-3′) and pig-tailed reverse (5′-GTTTCTTTGTTAAAAGGTCTTGAGTTCG-3′) primers. Fragment analysis with capillary electrophoresis and GeneMapper® (Applied Biosystems Inc., Foster City, CA, USA) software was used to associate size of labelled amplicons with the indel allele. This analysis revealed three alleles: deletion (D), mini deletion (M) and insertion (I). Alleles were confirmed by cloning, sequencing and alignment (MegAlign, DNASTAR Inc.) of amplicons from both cDNA and genomic DNA.


Abdomen gene expression and fecundity phenotypes

Our primary aim was to detect possible systematic differences in the expression of functionally related groups of genes between new and old local populations. In abdomens, we found higher expression of larval serum protein genes in new-population butterflies (LSPs; GO:0005616; P = 4.5 × 10−9; Fig. 1a; Tables S3 and S4, Supporting information). LSPs are amino acid transport and storage molecules (Burmester 2001) regulated at the transcription level by JH (Gkouvitsas & Kourti 2009) and serve as protein sources during the provisioning of developing eggs [vitellogenesis (Pan & Telfer 2001)]. Also more highly expressed in new-population butterflies were lipid transporters (lipophorins and perilipin; GO:0005319; P = 8.6 × 10−4, Table S3, Supporting information) that are similarly involved in the mobilization and transfer of nutrients to eggs (Teixeira et al. 2003).

Figure 1.

 Differential gene expression in the abdomens of butterflies representing the new vs. old population matrilines and related reproductive physiology. (a) Volcano plot of microarray data highlighting probes for egg-provisioning genes (larval serum proteins) and angiotensin converting enzyme. (b) Transcript abundance of vitellogenin (Vg; black bars) and angiotensin converting enzyme (Ace; open bars) mRNA in the abdomen measured by quantitative PCR (N = 14, a balanced subsample of the microarray butterflies). Vg expression was ∼2-fold higher in new-population butterflies (P = 0.02). Variation in Vg expression was mirrored by differences in Ace (P = 0.001). See Table S5 in Supporting information for statistical analysis and Methods for qPCR details. Note that expression differences in Vg are confirmed at the protein level in an independent sample shown in Fig. 2.

Higher expression of genes putatively involved in the mobilization of reserves for egg development should be accompanied by higher expression of vitellogenin (Vg), the major insect egg provisioning protein. We could not assess this using the microarray because Vg transcripts were too abundant (i.e. saturated probe signals for nearly all individuals). Using quantitative PCR (qPCR) on a subset of the same material, we found that Vg transcription was ∼2-fold higher in new-population butterflies (P = 0.02; Fig. 1b; Table S5, Supporting information).

Among the individual genes that showed a significant population age difference in the abdomen microarray experiment (Table S6, Supporting information) was angiotensin converting enzyme (Ace; Fig. 1a; P = 0.0000005). ACE regulates oviposition in Lepidoptera, apparently through effects on hormone synthesis and trypsin enzymes that release proteins from the fat body (Vercruysse et al. 2004). We confirmed the microarray result for Ace expression using qPCR and found that it was higher in new populations (P = 0.001) and precisely mirrored the differences in Vg expression (Fig. 1b; Table S5, Supporting information).

These results for gene expression in the abdomen suggested that new-population females mobilize protein from fat body reserves to provision developing eggs more rapidly than old-population females. To test this hypothesis, we examined a number of oogenesis physiology phenotypes in an independent sample of 0- to 2-day-old virgin females (statistical analyses in Table S7, Supporting information). New-population females had a higher hemolymph concentration of JH III (5.3 pmol/uL vs. 3.7 pmol/uL, respectively, at the mean age; P = 0.01; Fig. 2c), which in nymphalid butterflies stimulates oogenesis (Ramaswamy et al. 1997), along with higher levels of total hemolymph protein (P < 0.0001; Fig. 2d), including LSP and two vitellogenin proteins (Apo-Vg1, Apo-Vg2; P < 0.003; Fig. 2b). New-population females also had more mature (chorionated) eggs (P = 0.02; Fig. 2a, e), in a manner positively related to total hemolymph protein (P = 0.0002; Fig. 2f).

Figure 2.

 Comparison of female butterflies representing the new vs. old population matrilines in reproductive physiology. (a) Examples of ovarioles at 0 and 2 days of age. (b) Coomassie blue-stained SDS-PAGE of proteins in hemolymph (0.1 μL volumes) and partially purified egg vitellin (Vt) from individual butterflies, and for comparison from the moth Helicoverpa zea, whose vitellogenins have been previously identified (Satyanarayana et al. 1992). From left to right, lanes 1–3 contain hemolymph collected from day 0 (D0) through 2-day-old female butterflies from a new population, lanes 4–6 contain hemolymph collected from 0 to 2 day old female butterflies from an old population, lane 7 contains Vt from butterfly eggs, lanes 8–9 contain hemolymph collected from day 1 through day 2 old male butterflies, lane 10 contains Vt from H. zea eggs, lane 11 contains hemolymph from day 2 old female H. zea, and lane 12 contains protein size standards (mk). Hemolymph from female butterflies and the moth contain two apoprotein bands identified as vitellogenins (Vg) present in egg but absent in male hemolymph. The apo-Vg-1 and apo-Vg-2 bands are about 160 and 45 kDa, respectively. Also shown is a band at about 75 kDa that is probably larval serum proteins based on size and absence from egg Vt. (c–f) Physiological measurements (mean, SE) from an independent sample of 0–2 day old virgin females from new (open circles) and old (closed circles) population matrilines: (c) juvenile hormone III titer; (d) hemolymph protein concentration; (e) chorionated eggs; (f) relationship between total hemolymph protein and number of chorionated eggs. See Table S7 in Supporting information.

Thorax gene expression and flight metabolic phenotypes

New-population females had higher peak metabolic rate during flight than old-population females (12% mean difference; P = 0.008, Fig. 3a, Table S8, Supporting information; N = 65, the sample from which butterflies were drawn randomly for the microarray experiment). This population age difference in mean PMR confirms a previous result on the Glanville fritillary (Haag et al. 2005). Having these data allowed us to directly examine how gene expression varied with peak metabolic performance in addition to our primary focus on population age.

Figure 3.

 Peak metabolic rate during flight (PMR) and thoracic gene expression for virgin female butterflies. (a) PMR (residual adjusted for thorax mass) in different families classified according to population age, with population means indicated by dashed lines. (b–c) Volcano plots highlighting expression differences for unfolded protein response and proteasome genes in relation to population age and PMR. Population age was the only fixed effect in the mixed model in (b). Population age, PMR, total CO2 emitted during 10 min flight, and Pgi and Sdhd genotypes were included as fixed effects in (c).

In our analysis of groups of functionally related genes in the thorax, we found that new-population butterflies had a significant tendency for higher expression of proteasome (core and regulatory particle) and unfolded protein response genes (chaperones; Fig. 3b; Tables S9 and S10, Supporting information). Among the individual genes most strongly associated with population age in the thorax (Table S6, Supporting information) were two protease inhibitors (serpins; Fig 3b); these had reduced expression in new-population butterflies, consistent with elevated proteasome expression. The strongest association (P = 0.00001) between population age and expression of an individual gene in the thorax was NADH-ubiquinone reductase (syn. NADH dehydrogenase), a nuclear gene for the 24 kDa subunit of mitochondrial complex I, more highly expressed in old populations. NADH dehydrogenase affects superoxide radical formation in the presence of elevated levels of succinate (Muller et al. 2008), which may be meaningful given the population age and metabolic rate association (see below) we found for alleles of a succinate dehydrogenase gene.

Variation in gene expression in relation to flight metabolic rate and Pgi polymorphism

Here we switch to using our second mixed model approach (MM-2, see Methods) in order to examine associations between gene expression and physiological variables, independently of population age differences. In the thorax, higher expression of proteasome (core and regulatory particle) and unfolded protein response genes was positively associated with mass-adjusted peak metabolic rate during flight (PMR; Fig. 3c; Table S9, Supporting information). These same genes were associated with population age (MM-1 analyses above). This suggests that the genetic variation affecting physiological measures of flight performance is the same variation being sorted by the metapopulation dynamics. Some of this physiological variation may arise from polymorphism at Pgi, since both the abdomen and thorax showed gene expression phenotypes that varied with Pgi genotype. The Pgi_111_AC genotype associated in Glanville fritillaries with higher fecundity (Saastamoinen 2007a) and PMR (Niitepõld 2010) had higher expression of genes involved in the final step of oogenesis (chorion protein genes in the abdomen, P = 1.9 × 10−22; Fig. S2, Supporting information; Table S3, Supporting information) and of oxidoreductase and ribosomal complex genes (in the thorax; Table S9, Supporting information).

A transcriptome scan for additional population-age associated allelic variation

In our transcriptome scan for ecologically important alleles or splicing variation, Succinate dehydrogenase d (Sdhd) showed the greatest population age-associated variability in probe intensity in both the abdomen and thorax (Fig. 4a). Hybridization intensity for Sdhd ranged among individuals in a discontinuous fashion from near saturation to indistinguishable from background (Fig. 4b). This extreme range seemed highly unlikely to reflect transcript abundance given that Sdhd encodes an essential enzyme. Hence, we examined our 454 transcriptome assembly for evidence of polymorphism and found multiple contigs (gene assemblies) containing different sequences suggestive of an indel in the gene region corresponding to the microarray probe. None of the other top five ranked genes in this analysis had as broad a range of hybridization intensities or evidence in the transcriptome assembly of polymorphism or splice variation in the probe regions. To verify the polymorphism in Sdhd, we cloned and sequenced this gene from both cDNA and genomic DNA and confirmed that the microarray probe overlapped with an insertion/deletion polymorphism in the 3′ UTR, with three distinct alleles (I, M and D; Fig. 4c). Within the indel is a predicted target site for a micro-RNA (miR-71; Fig. 4c). Individuals homozygous for the Sdhd M allele had high microarray probe intensity signals, befitting the perfect match between their genotype and the probe, whereas individuals lacking the M allele had greatly reduced probe intensity. Using qPCR probes targeted to invariant regions of the gene, we found that Sdhd expression level in the abdomen varied significantly with the indel genotype (Fig. S3, Supporting information).

Figure 4.

 Systematic scan for population-age differences in alleles affecting variation in microarray probe hybridization. (a) Probes were ranked for absolute value of population-age difference in each body region and these ranks were summed with ranks for among-butterfly variance in hybridization intensity. The plot shows that Sdhd (arrow at upper right) was the top-ranked probe in both tissues. (b) Log2 hybridization intensity for each individual in the thorax microarray (N = 34) for Sdhd and Actin. (c) Portion of the 3′ UTR in Sdhd transcripts showing the microarray probe alignment over Sdhd allelic variants (M, D, and I). The insertion (I) contains a site with a perfect match to the seed site (5′–3′ positions 2–8) and the next six bases of miR-71, a micro-RNA that is highly conserved in invertebrates, including the moth Bombyx mori (bmo-miR-71).

Butterflies carrying the Sdhd D (deletion) allele had higher expression of chorion genes in the abdomen (P = 1.2 × 10−11; Fig. S2, Supporting information; Table S3, Supporting information) and carbohydrate metabolism genes in the thorax (glycolysis, pyruvate metabolism, and TCA cycle; P = 2.3 × 10−4; Fig. 5b; Table S9, Supporting information). Individuals carrying the Sdhd D allele were better able to maintain a high flight metabolic rate over 10 min of flying, hereafter referred to as endurance (P = 0.01; Fig. 5a; N = 65; Table S11A, Supporting information).

Figure 5.

 Flight performance of virgin female Glanville fritillary butterflies. (a) Flight metabolic rate in two siblings, exemplifying variation in endurance despite similar peak performance. (b) Thoracic gene expression differences among butterflies with and without the Sdhd D allele, highlighting the significant enrichment of higher expression of central metabolism genes in flight muscles of Sdhd D butterflies. (c) Flight endurance [i.e. area under the curves in panel (a)] according to Pgi_111 (AA or AC) and Sdhd (D vs. no D) genotypes, adjusted for family, mass and ambient temperature. Letters above bars indicate significant differences in a posteriori comparisons.

To further test the association between both Sdhd, Pgi and flight phenotypes, we genotyped both loci in an independent sample of 2-day old virgin females that had been previously measured for metabolic rate (Table S1, Supporting information; Niitepõld 2010). Again we found superior endurance associated with the Sdhd D allele (P = 0.003; mean difference = 32%; N = 69; Table S11B). Peak metabolic rate was not related to Sdhd genotype but was significantly related to Pgi genotype (P = 0.01; mean difference = 23%, with Pgi_111_AC heterozygotes the highest; Table S11B). Alleles at these two loci may interact epistatically, as butterflies possessing both Pgi_111_AC and Sdhd D had the greatest flight endurance (P < 0.05 for an interaction effect; Fig. 5c; Table S11B). Analysis of Pgi and Sdhd genotype associations in our samples indicates that they do not exhibit linkage disequilibrium (P = 0.3 for association; R2 = 0.008). Additionally, there is high synteny among Lepidopteran chromosomes (d’Alencon et al. 2010), and in the model Lepidoptera, Bombyx mori, the genes Pgi and Sdhd are located on separate chromosomes [chromosomes 20 (nscaf 2780) and 26 (nscaf 1071); based on blast against the BGI assembly]. Hence, these two loci appear to be independently associated with different aspects of flight metabolism (peak vs. endurance; Fig. 5a).

Sdhd alleles and metapopulation dynamics

Previous research on the Glanville fritillary metapopulation has shown that allelic variation at Pgi is associated with large ecological effects (Hanski & Saccheri 2006). To test the hypothesis that the newly discovered Sdhd polymorphism has similar relationships with ecological dynamics, we examined associations between Sdhd indel allele frequencies and metapopulation processes. The Sdhd D allele was more frequent in new populations (P = 0.04; N = 94 butterflies, 33 populations; Table S12, Supporting information), in agreement with the expectation that butterflies with greater flight endurance are more likely to disperse (Niitepõld et al. 2009) and colonize new habitat patches. In a separate sample used previously to demonstrate an association between Pgi alleles and the growth rate of isolated local populations (Hanski & Saccheri 2006), both Pgi and Sdhd allele frequencies had highly significant effects on growth rates of local populations (R2 = 0.64; P < 0.0001; Table 1).

Table 1.   Genotype frequencies and population growth
Sourced.f.F ratioP
  1. Population growth is defined as the regionally adjusted year-to-year change in the number of larval groups in 43 isolated populations in the Åland Island metapopulation of the Glanville fritillary (see Table 1 in Hanski & Saccheri 2006). Here we examine the effects of Pgi allozyme genotype (which corresponds closely to the Pgi_111_AC SNP genotype; Orsini et al. 2009) and Sdhd M allele frequencies on population growth. Log transformed area of the habitat patch is included as an additional environmental variable (see Hanski & Saccheri 2006). Frequencies were arcsine transformed to achieve normality. The regression is weighted with the number of alleles sampled per population, and the unweighted model yielded a nearly identical result. The R2 value for the full model is 0.64.

Patch area10.00010.99
Frequency Pgi F110.90.002
Frequency Pgi F × Patch area122.5<0.0001
Frequency Sdhd M allele119.10.0001
Frequency Sdhd M allele × Patch area121.1<0.0001


Previous studies have shown that female Glanville fritillary butterflies inhabiting newly-established vs. old populations are a non-random draw of the genotypes and phenotypes present in the metapopulation (Hanski et al. 2004). Female offspring of the founders of new populations exhibit differences in life-history traits related to dispersal (Haag et al. 2005; Hanski et al. 2006) and reproduction (Saastamoinen 2007b), including higher flight metabolic rate, earlier mating, and more frequent egg laying during the first days of adult life. Here we have identified a number of differentially expressed genes that are likely to cause these life history differences. Two-day old virgin females from new-population matrilines have higher transcription of angiotensin-converting enzyme (Ace) and a higher JH titer (Figs 1 and 2). These two factors are likely to upregulate (Ramaswamy et al. 1997; Vercruysse et al. 2004) expression of LSP and Vg genes involved in the release of stored proteins for reproduction (Figs 1 and 2). These gene expression differences were consistent with physiological phenotypes, as the concentration of protein in the hemolymph (blood) and the number of mature eggs in new-population butterflies were approximately one day ahead of old-population butterflies (Fig. 2c–e). These differences are likely to cause the earlier mating (Hanski et al. 2006) and 1-day earlier reproductive maturity (Saastamoinen 2007a) of new-population females, a substantial difference considering that daily mortality among the adult butterflies is approximately 10% (Ovaskainen et al. 2008a).

Our analysis of gene expression in the thorax, which contains the flight musculature primarily responsible for flight metabolic rate, showed that both new-population butterflies and butterflies with higher peak metabolic rate (independently of population age) had higher expression of proteasome and unfolded protein response genes. Proteasome function affects the rate of protein turnover in muscle, which has positive effects on locomotion in insects (Haas et al. 2007), along with pleiotropic, trans-acting effects on the regulation of gene expression in general (Collins & Tansey 2006). Protein turnover within muscle cells increases in response to higher levels of circulating amino acids (Franch 2009), and hence the higher thorax proteasome gene expression and superior flight performance of new-population butterflies (Fig. 3; Haag et al. 2005; Niitepõld et al. 2009) may be causally related to their higher LSP and Vg gene expression and higher overall protein content in the hemolymph (Figs 1a and 2d). Previous research in both flies (Levenbook & Baur 1984) and moths (Huebers et al. 1988; Miller & Silhacek 1992; Wu & Tischler 1995) has shown that amino acids, iron, and riboflavin from LSPs are used to synthesize adult tissues, with nearly 50% of radiolabelled LSP amino acids incorporated in flight muscle proteins (Levenbook & Baur 1984). Positive effects of protein mobilization on both oogenesis and flight performance provides a mechanistic hypothesis for the absence of a trade-off between dispersal and fecundity in this species (Hanski et al. 2006) and other lepidopterans (Zhao et al. 2009; Jiang et al. 2010), contrary to other species in which ovaries and flight muscles compete for protein (Roff 1977).

In addition to characterizing gene expression variation that underlies ecologically important life history traits, we have identified two polymorphic loci with apparent large effects on life history phenotypes (Pgi and Sdhd). Previous studies in the Glanville fritillary, stimulated by the strong fitness effects of Pgi polymorphism in other distantly related butterflies (Watt 2003) and other types of insects (Wheat 2009), found highly significant life history associations with Pgi alleles (Haag et al. 2005; Hanski & Saccheri 2006; Niitepõld et al. 2009; Saastamoinen et al. 2009), and evidence for long-term balancing selection acting on the coding variation at Pgi (Wheat et al. 2010). The present study extends that work by examining gene expression phenotypes related to Pgi alleles, along with the interaction between Pgi genotype and a newly discovered polymorphism in Sdhd. We found that butterflies with the Pgi_111_AC genotype, associated with earlier female fecundity and higher flight metabolic rate (Saastamoinen 2007a; Niitepõld 2010), had higher expression of chorion genes in the abdomen (Fig. S2, Supporting information), suggesting more rapid progression to the final stage of egg maturation (deposition of an egg shell rich in chorion protein).

Our scan of the microarray for additional population age-associated allelic variation revealed an indel polymorphism in the 3′ UTR of Sdhd. The deletion allele (Sdhd D) was associated with higher expression of energy metabolism genes in the thorax, chorion genes in the abdomen, and higher flight endurance. This scan was unbiased, yet identified another metabolic gene. Glanville fritillary butterflies fuel flight exclusively by consuming carbohydrates (respiratory exchange ratio = 1.0; Fig. S4, Supporting information), and thus Pgi and Sdhd are in the same aerobic pathway functioning to support active flight muscles. Sdhd alleles assort independently of Pgi alleles but these two loci appear to interact epistatically, as the highest flight performance occurred in butterflies possessing the Pgi and Sdhd alleles that are disproportionately abundant in new populations (Fig. 5c).

The indel polymorphism in Sdhd is located in the 3′ UTR, that is, outside of the amino acid coding region beyond the stop codon. This polymorphism may be in linkage with other variation within the coding region, or other parts of the gene, or potentially even the flanking chromosomal region. However, there is some evidence pointing to the 3′ UTR polymorphism itself being a target of selection, as 3′ UTRs commonly affect transcript processing and/or stability. The polymorphism in Sdhd is a potential candidate for differential effects on translational control because it contains a putative target site for a microRNA, miR-71 (Fig. 4c). Polymorphism for the miR-71 target is particularly interesting because this microRNA has 100% sequence conservation across invertebrates and is known to affect life history. In the C. elegans nematode, miR-71 was the top gene revealed by a scan for microRNAs affecting life history, with target sites in the 3′ UTR of a number of genes that affect insulin signalling and life span. Manipulation of miR-71 expression in that study caused large effects on lifespan and responses to thermal and oxidative stress (de Lencastre et al. 2010).

Functional genomics studies in butterflies are beginning to identify alleles of large effect with the ultimate goal of integrating that knowledge with evolutionary ecology. In Heliconius butterflies, wing coloration involved in geographically variable and evolutionarily dynamic mimicry complexes have been mapped to specific genomic regions and are now intensively studied (e.g. Baxter et al. 2010). In the Glanville fritillary, individual-based modelling strongly suggests that the ecological metapopulation dynamics and the dynamics of Pgi allele frequency are coupled (Zheng et al. 2009). It now appears that allelic variation in both Pgi and Sdhd are associated with variation in life history and metabolism phenotypes, vary in frequency with population age, and may even affect changes in population size. In addition, there is likely to be allelic variation in other genes having large trans-acting effects on expression variation, of which the Ace gene is one good candidate.

How is genetic variation with fitness consequences maintained in the metapopulation? Spatially and/or temporally varying selection (Levene 1953; Gillespie 1991) as well as heterozygote advantage may all play a role, perhaps in concert with pleiotropic, epistatic and sex-dependent effects. While theoretical studies have examined these effects, along with G × E interactions in the maintenance of genetic variation (e.g. Barton & Whitlock 1997; Turelli & Barton 2004), we know little of their combined effects in the metapopulation context (Whitlock 2004). Empirical data on Pgi in the Glanville fritillary points to strong heterozygote advantage (Haag et al. 2005; Orsini et al. 2009), which appears to interact with temperature and habitat type (Hanski & Saccheri 2006; Ovaskainen et al. 2008b; Saastamoinen & Hanski 2008; Niitepõld 2010) and have sex specific effects on dispersal (Hanski et al. 2004; Niitepõld et al. in press). The Sdhd polymorphism reported here may also involve heterozygote advantage with environmental (Table 1) and genomic (Fig. 5) interactions. In addition to these different forms of balancing selection, genetic variation may be maintained by the coupling between organismal-level demographic and microevolutionary dynamics, of which Pgi in the Glanville fritillary provides a prime example (Zheng et al. 2009). Thus, our findings suggest that integrating these various mechanisms into a unified analysis of the maintenance of variation is warranted.

To summarize, we have combined functional genomics with a long-term ecological study to gain a more mechanistic understanding of life history variation affecting ecological and evolutionary dynamics. First, the long-term ecological study has allowed the identification of groups of populations that differ in their demographic history. Second, population age, which does not correlate with any morphological traits of individuals, was used as a ‘treatment’ in the functional genomics experiment. Third, gene expression differences and allelic polymorphisms were associated, across independent samples, with life history traits and population dynamics. These results demonstrate that integrating functional genomics with population ecology is a powerful way to obtain mechanistic insights into life history ecology and evolution (Ronce & Olivieri 2004) and to identify new candidate genes affecting eco-evolutionary dynamics (Saccheri & Hanski 2006). Our findings have significance for conservation biology, because the life history traits we have studied affect metapopulation persistence in fragmented landscapes (Hanski & Ovaskainen 2000).


We thank P. Auvinen at the DNA sequencing and Genomics laboratory, Institute of Biotechnology, University of Helsinki for his advice and help with array scanning, D. Crawford for his advice on microarray experimental design, M. Saastamoinen for her help with the butterfly rearing, numerous people for comments on earlier versions of this manuscript (A. Meyer, K. Elmer, K. Bargum, R. Schilder, A. Read, C. Grozinger, S. Schaeffer, A. Zera), and C. Brenner, D. Matasic, and S. Wherry for their assistance with DNA isolation and cloning, with special appreciation to A.D. Jones for analytical measurement of lipid hormones. Funding for this work was provided by the US National Science Foundation (grants EF-0412651 and IOS-0950416), the Academy of Finland [grant numbers 131155, 38604 and 44887 (Finnish Centre of Excellence Programmes 2000–2005, 2006–2011)], and AdG number 232826 from the European Research Council.

Conflicts of interest

The authors have no conflict of interest to declare and note that the funders of this research had no role in the study design, data collection and analysis, decision to publish or preparation of the manuscript.