Combining molecular evolution and environmental genomics to unravel adaptive processes of MHC class IIB diversity in European minnows (Phoxinus phoxinus)


  • Helene Collin,

    Corresponding author
    1. Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, U.K
    • Department of Ecology and Evolution, Laboratory for Conservation Biology, University of Lausanne, Lausanne, Switzerland
    Search for more papers by this author
  • Reto Burri,

    1. Department of Ecology and Evolution, Laboratory for Conservation Biology, University of Lausanne, Lausanne, Switzerland
    2. Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, 75236, Uppsala, Sweden
    Search for more papers by this author
  • Fabien Comtesse,

    1. Department of Ecology and Evolution, Laboratory for Conservation Biology, University of Lausanne, Lausanne, Switzerland
    Search for more papers by this author
  • Luca Fumagalli

    1. Department of Ecology and Evolution, Laboratory for Conservation Biology, University of Lausanne, Lausanne, Switzerland
    Search for more papers by this author


Héléne Collin, Institute of Integrative Biology, Department of Ecology, Evolution and Behaviour, Biosciences Building, University of Liverpool, Crown St., Liverpool L69 7ZB, U.K.

Tel: 44 (0) 151 695 45 28; Fax: 44 (0) 151 795 44 10;



Host–pathogen interactions are a major evolutionary force promoting local adaptation. Genes of the major histocompatibility complex (MHC) represent unique candidates to investigate evolutionary processes driving local adaptation to parasite communities. The present study aimed at identifying the relative roles of neutral and adaptive processes driving the evolution of MHC class IIB (MHCIIB) genes in natural populations of European minnows (Phoxinus phoxinus). To this end, we isolated and genotyped exon 2 of two MHCIIB gene duplicates (DAB1 and DAB3) and 1′665 amplified fragment length polymorphism (AFLP) markers in nine populations, and characterized local bacterial communities by 16S rDNA barcoding using 454 amplicon sequencing. Both MHCIIB loci exhibited signs of historical balancing selection. Whereas genetic differentiation exceeded that of neutral markers at both loci, the populations' genetic diversities were positively correlated with local pathogen diversities only at DAB3. Overall, our results suggest pathogen-mediated local adaptation in European minnows at both MHCIIB loci. While at DAB1 selection appears to favor different alleles among populations, this is only partially the case in DAB3, which appears to be locally adapted to pathogen communities in terms of genetic diversity. These results provide new insights into the importance of host–pathogen interactions in driving local adaptation in the European minnow, and highlight that the importance of adaptive processes driving MHCIIB gene evolution may differ among duplicates within species, presumably as a consequence of alternative selective regimes or different genomic context.


In natural environments, heterogeneity in selection due to habitat fragmentation, climate fluctuations, and pathogen regime variation play a major role in shaping populations' genetic diversity, and eventually can lead to local adaptation (Kawecki and Ebert 2004; Sommer 2005; Barrett et al. 2008; Gandon and Nuismer 2009; Spurgin and Richardson 2010). Pathogen communities can affect various fitness traits linked to immune resistance in hosts (Summers et al. 2003; Simkova et al. 2006; Evans and Neff 2009). In return, hosts develop a wide range of immune responses to overcome pathogen infections (Acevedo-Whitehouse and Cunningham 2006). Host–pathogen interactions are therefore ideal to study local adaptation, as both host and pathogen are subjected to varying selection in time and space, modifying their reciprocal fitness.

Genes of the major histocompatibility complex (MHC) are among the best candidates to study the genetics of the adaptive immune response (e.g., Klein 1987; Kalz and Shykoff 1998; Piertney and Oliver 2006). MHC genes are the most polymorphic genes in vertebrates, and participate in the activation of the adaptive immune response (Klein 1986). While MHC class I is involved in immune response against intracellular pathogens, class II triggers the immune response against extracellular pathogens. MHC polymorphism is believed to be mainly maintained by different mutually nonexclusive forms of pathogen-mediated balancing selection. First, heterozygote advantage confers a greater resistance to MHC-heterozygote individuals, as they are able to recognize a greater array of pathogen-derived antigens than homozygotes (Doherty and Zinkernagel 1975; Hughes and Nei 1988; Bernatchez and Landry 2003; Garrigan and Hedrick 2003). Second, under rare-allele advantage, rare alleles are selected for as they provide better resistance than common genotypes to which pathogens are adapted (Takahata and Nei 1990; Bernatchez and Landry 2003; Garrigan and Hedrick 2003). Finally, pathogen communities may fluctuate over time and space, leading to temporally and spatially variable selection (Gandon and Nuismer 2009; Spurgin and Richardson 2010; Eizaguirre et al. 2012). As a consequence, various subsets of alleles are selected across pathogen fluctuation cycles and thereby high MHC polymorphism is promoted. Also, phases of positive selection may alternate with phases where neutral processes are more important in shaping MHC polymorphism (Klein 1987; Landry and Bernatchez 2001). The null hypothesis that neutral processes (mutation, genetic drift, gene flow) shape MHC diversity can be tested by contrasting patterns of genetic differentiation between MHC genes and neutral markers. Under local adaptation to spatially heterogeneous pathogen pressures, populations are expected to be more genetically differentiated at MHC genes than at neutrally evolving genetic markers. Even though pathogen-mediated selection seem to be a major force driving genetic diversity at MHC genes, neutral processes can decrease genetic differentiation at putatively selected loci (Bernatchez and Landry 2003; Van Oosterhout 2009; Miller et al. 2010). For instance, numerous cases reported bottleneck effects on MHC genes due to genetic drift in small and isolated populations (Miller et al. 2010; Bollmer et al. 2011; Spurgin et al. 2011; Strand et al. 2012). In these cases, genetic differentiation at MHC can be larger than at neutral loci as neutral genetic structure can be overestimated when population size is very low (Hedrick 1999; Bernatchez and Landry 2003; Van Oosterhout 2009). Also, differentiation at MHC genes can be lower compared to neutral loci in the face of balancing selection (including cases of overdominance and heterozygote advantage) (Schierup et al. 2000; Muirhead 2001; Bernatchez and Landry 2003; Blais et al. 2007; André et al. 2011; Bollmer et al. 2011).

MHC class I and IIB genes have been well described in fish, especially in commercially important Salmonids like the Atlantic salmon (Salmo salar). This species frequently suffers from infectious diseases, such as the salmon anemia virus, or furunculosis caused by the bacteria Aeromonas salmonicida. The polymorphism observed at MHC class I and IIB is maintained through pathogen-mediated selection (Dionne et al. 2007, 2009) with particular alleles conferring resistance to furunculosis (Langefors et al. 2001; Grimholt et al. 2003). Other studies also reported a joint role of selection and neutral processes on the evolution of local adaptation at immune relevant genes in European trout (Salmo trutta) or Atlantic salmon (S. salar) (Landry and Bernatchez 2001; Dionne et al. 2007; Keller et al. 2011). In comparison to S. salar and other Salmonids, cyprinid fish present a complex MHC architecture, with the presence of multiple duplicated MHC genes (Stet et al. 1998; Shum et al. 2001; Consuegra et al. 2005; Seifertova and Simkova 2011). More particularly, Simkova et al. (2006) found a positive relationship between pathogen species richness and MHCIIB diversity in various other European cyprinids.

The European minnow (Phoxinus phoxinus) is a cyprinid occurring in most European freshwater systems. Local adaptation in growth and survival of P. phoxinus to a specific pathogen, the trematode Diplostomum phoxini, has been highlighted in the early 1990′s (Ballabeni and Ward 1993; Ballabeni 1994). Little is known about the influence of bacterial pathogens on this species' immunogenetics, especially on patterns of genetic structure and polymorphism at MHC loci. Recently, it has been shown that P. phoxinus displayed different body shape and adaptive genetic divergence to various abiotic factors such as lake-stream habitat, landscape topography, trophic dynamics, and geography, highlighting the capacity of this species to locally adapt to its environment (Collin and Fumagalli 2011). The present study aims at identifying whether genes of the immune system (MHCIIB genes) contribute to local adaptation to pathogens in natural populations of European minnows. More specifically, we intended to identify local adaptation by (1) comparing patterns of genetic differentiation at MHCIIB exon 2 and neutral markers, and (2) identifying pathogen-mediated selection by testing the correlation of diversity of potentially pathogenic bacteria with MHCIIB polymorphism. To this end we characterized diversity of MHCIIB exon 2 in European minnows and bacterial 16S rDNA in water samples using 454 amplicon sequencing. We identified two MHCIIB gene duplicates (DAB1 and DAB3) both presenting signs of historical balancing selection, but displaying contrasted patterns of adaptive evolution. Indeed, DAB3 duplicate exhibited evidence for pathogen-mediated selection contrarily to DAB1 duplicate, demonstrating that neutral and adaptive processes' roles differ among MHCIIB duplicates in European minnows.

Material and Methods

Sampling and DNA extraction

One hundred and seventy-six European minnows were collected in nine populations from the Western Swiss Alps (Fig. 1) during summer 2008 and 2009, when bacteria communities are the most diverse. Adult fish were killed using AQUI-S® (AQUI-S New Zealand Ltd, Lower Hutt, NZ) and immediately stored in absolute ethanol for further genetic analyses. To assess bacterial diversity for each population, water samples were collected in 1 L sterile bottles and filtered using autoclaved membrane filters (0.2 μm). Filters were stored in 3 mL of lysis buffer (0.75 mol/L Sucrose, 400 mmol/L NaCl, 20 mmol/L EDTA (pH 8.0), 50 mmol/L Tris-HCl (pH 9.0) at −80°C. Filters in lysis buffer containing bacterial DNA were digested overnight at 37°C with sodium dodecyl sulfate (1%), proteinase K (60 μg mL), and CaCl2 (90 mmol/L). Bacterial DNA was extracted from 500 μL of digested lysate using phenol chloroform (pH 7.5), sodium acetate (15 μmol/L), isopropanol, and ethanol precipitation procedure. DNA was resuspended in sterile water and concentration was adjusted to 10 ng μL−1. DNA from fish was extracted using the Gentra Puregene Qiagen kit according to the manufacturer instructions. DNA concentration was adjusted to 50 ng μL−1.

Figure 1.

Map showing the locations of the nine populations sampled in the South Western Swiss Alps. Population abbreviations are indicated in brackets.

MHCIIB isolation and characterization

In order to isolate and characterize MHCIIB exon 2 of P. phoxinus, a subset of four individuals from six populations (Ta, Ic, Li, Il, Va, Lu) were used. We first amplified the 3′-end of exon 2, intron 2 and the 5′-end of exon 3 using the primers FishC12S and Fish12R from Ottova et al. (2005) (for primer description and sequences, see Fig. S1 and Table S1). Additionally, intron 1 and the 5′-end of exon 2 were amplified using the reverse complement of FishC12S (FishC12S-R), and a newly designed primer in exon 1 (FishEx12F-bis). The latter primer was designed in conserved regions of exon 1 from Cyprinus carpio, Barbus intermedius, and Danio rerio sequences (Van Erp et al. 1996; Kuroda et al. 2002; Kruiswijk et al. 2004). We then designed species-specific primers amplifying the entire exon 2, which was included in P. phoxinus the two duplicates DAB1 and DAB3 (inferred from phylogenetic analyses with other cyprinid MHCIIB exon 2 sequences, see below). PCR for the two duplicates of MHCIIB exon 2 characterization were performed in a 25 μL reaction volume containing 1.5 mmol/L MgCl2, 1× PCR buffer (Qiagen), 0.4 μ mol/L of each primer, 100 μ mol/L of each dNTP, 1 unit of Taq Polymerase (Qiagen), and 50 ng of genomic DNA. Thermal cycling was carried out on a Biometra T Professional Standard Thermocycler (Biometra, Goettingen, Germany), with an initial denaturating step of 95°C for 3 min, followed by 39 cycles of 30 sec of denaturing at 95°C, 1 min annealing at 52°C and 1 min 30 sec of extension at 72°C, and a final extension of 5 min at 72°C. PCR products were cloned into the pGEM T-easy vector (Promega, Madison, WI). Between four to eight positive clones for each duplicate of MHCIIB exon 2 were purified with the Wizard SV PCR clean-up system (Promega) and sequenced using the Big Dye 3.1 Terminator cycle sequencing kit (Applied Biosystems, Carlsbad, CA) on a ABI 3100 PRISM genetic analyzer (Applied Biosystems) using SP6 and T7 standard primers and universal procedures. Sequences from clones were read using BioEdit 7.09 (Hall 1999). In order to ensure that the two duplicates corresponded to functional genes and not pseudogenes, we also amplified P. phoxinus cDNA following the protocol mentioned above (Text S1). Start and end of each exon 2 duplicates were deduced by comparison with known cDNA and gDNA sequences of D. rerio (Kuroda et al. 2002).

454 sequencing of MHCIIB exon 2 and 16S bacterial rDNA

In order to amplify the two duplicates of MHCIIB exon 2 and bacterial 16S rDNA in a partial 454 GS FLX Titanium run (Roche, Branford, CT), we pooled 18–20 individuals per population and randomly assigned each population five base pair tagged PCR primers (Table S1, Fig. S1). These primers were then used in locus-specific PCRs to amplify DAB1 and DAB3. The same tagging strategy was used to amplify bacterial 16S rDNA from the nine populations. PCRs for both MHCIIB exon 2 and 16S rDNA were conducted on Biometra T Professional Thermocyclers (Biometra) in a final volume of 25 μL, using 1 × Q-solution (Qiagen), 1× buffer (Qiagen), 1.5 mnol/L MgCl2 (Qiagen), 0.2 mmol/L dNTPs (Peqlab, Erlanger, Germany), 0.2 μmol/L of each primer, 2 units of Taq (Qiagen), and 20 ng of genomic DNA. Cycling conditions included an initial denaturation step at 95°C for 3 min followed by 35 cycles at 95°C for 30 sec, annealing at 52°C (DAB1 and DAB3 loci) or 58°C (bacterial 16S rDNA) for 1 min, 72°C for 1.5 min, and a final extension step at 72°C for 5 min. PCR product concentrations were estimated simultaneously on agarose gel and with a Nanodrop spectrometer (Nanodrop Technologies, Wilmington, DE). PCR products with similar concentrations were pooled and purified using the MinElute PCR Purification kit (Qiagen). Purified PCR products' concentrations were estimated a second time in order to produce a final pool with equimolar quantities of PCR products (Babik et al. 2009; Kloch et al. 2010). Final pool concentration was adjusted to 20 ng μL−1 in a final volume of 30 μL and sequenced on a partial 454 GS FLX Titanium run (Microsynth AG, Switzerland).

Sequence retrieval and filtering

Bacterial 16S rDNA data

The 16S rDNA multifasta file resulting from the 454 run was sorted according to tag and primer sequences. In order to attribute each bacterial 16S rDNA sequence to a taxonomic unit (genus or species in our case), 16S rDNA sequences were submitted to the Bayesian Ribosomal Database Project (RDP) Classifier (; Wang et al. 2007; Cole et al. 2009). Sequences shorter than the expected amplicon size of 250 bp and with a bootstrap support inferior to 50% for taxon assignment were discarded to guarantee sufficient sequence length to reach a high taxonomic resolution and accuracy, respectively. A community data matrix containing the presence–absence of each species per population was built. We identified bacteria potentially pathogenic in fish using the classification established by Austin and Austin (1999).


The primer used to sequence DAB1 and DAB3 using 454 technology amplify 276 bp approximately (Fig. S1). Altogether with primers and tag, read length exceeded the length that 454 run cold sequences, resulting in reads whose length was shorter than expected. All sequences were thus truncated to 220 bp in order to maximize the number of sequences for further analyses (Spurgin et al. 2011). Sequences from the multifasta DAB1 and DAB3 files containing a complete tag and a match to the forward or reverse primer were retained for further analysis. From these files we identified authentic MHCIIB variants (true variants or TV) according to several criteria, partially following Galan et al. (2010). First, we counted for each population the number of unique variants (all sequences differing in at least one base pair or one indel). We then removed variants appearing only once, or in less than two populations in order to eliminate artifactual variants (AV) resulting from polymerase errors and sequencing errors. This step might remove some rare alleles, but avoids including AV in further analyses. As in the four to eight cloned sequences from four test populations we never observed frameshift mutations and because here we are exclusively interested in functional MHCIIB variation, the remaining sequences were aligned using ClustalW (Thompson et al. 1994) in Mega 5.0 (Tamura et al. 2011) and sequences with indels whose lengths differ from a multiple of three were excluded. Even though PCR products were included in equimolar quantities for sequencing, the number of variants represented only semiquantitative data, and the number of reads per TV might not be representative of the actual TV number in the data set. For further analyses, we thus decided to use a presence–absence matrix of TV instead of allele frequencies.

AFLP data

In order to compare patterns of genetic differentiation at neutral markers with those observed at MHCIIB, we generated amplified fragment length polymorphism (AFLP) markers using the same protocol as described in Collin and Fumagalli (2011). AFLP were automatically genotyped using Genemapper version 4.0 software (Applied Biosystems) and genotypes were manually cross-validated twice. To avoid genotyping errors and a confounding factor from genotyping plates, 38 individuals from different populations were replicated independently thrice, and all individuals within and between populations were randomized across plates from the extraction step on. All loci that were not 100% reproducible across the 38 individuals were removed from the analysis. For data analysis, populations were assumed to be under Hardy–Weinberg equilibrium and nonpolymorphic (noninformative) loci were removed, resulting in 1′930 polymorphic loci. As we were interested in comparing neutral genetic divergence with MHCIIB divergence between populations, we excluded loci potentially under selection. The latter were inferred using BAYESCAN 1.0 (Foll and Gagiotti 2008) following a method previously developed specifically for European minnows (Collin and Fumagalli 2011). This software implements two models (including and not including selection, respectively) using reversible jump Markov Chain Monte Carlo sampling. It then estimates the posterior probability for a locus to be under selection, depending on a detection level that we set to “substantial” corresponding to a posterior probability superior to 0.76. The choice of this specific detection level is arbitrary, but has been previously demonstrated to not affect conclusions (Collin and Fumagalli 2011). Also, choosing a “substantial” detection level allows the detection of loci under very weak selection to generate a fully “neutral” data set with presence–absence of 1′665 neutral AFLP loci per population.

Statistical analyses

Bacteria diversity

In order to explore nonpathogenic and pathogenic bacterial composition between sampling locations, we conducted a permutational multivariate analysis of variance based on dissimilarities, that is, beta nonpathogenic and pathogenic bacteria diversity (ADONIS), using populations as a factor. Betadiversity was calculated using the nonpathogenic bacteria and pathogens' abundance matrices (excluding two populations, Br and Ta, as no pathogenic bacteria were recorded in these two locations). We finally conducted a canonical correspondence analysis (CCA) in order to analyze the correlation between pathogen composition and various environmental factors characterizing each sampling location and which have been shown to influence bacteria distribution (Yannarell et al. 2003; Lindström et al. 2005; Yannarell and Triplett 2005; Newton et al. 2007; Shade et al. 2008; Lindström and Langenheder 2012) (dissolved oxygen, altitude, landscape slope, soil pH, mean number of annual frost days, primary productivity; see Collin and Fumagalli 2011 for a full description of environmental variables). All analyses were conducted in R using the “vegan” package (Oksanen et al. 2008).

MHCIIB diversity

We were able to amplify 55 codons (165 bp) for DAB1 and 58 codons (174 bp) for DAB3, in which peptide-binding region (PBR) represented the most polymorphic codons as they bind pathogenic antigens (Hughes and Yeager 1998). We inferred the locations of PBR and non-PBR regions from human MHCIIB molecular structure (Brown et al. 1993). Two types of indices were estimated for MHCIIB loci. First, amino acid diversity at every codon position was estimated separately for DAB1 and DAB3 using DIVAA (Rodi et al. 2004). Then DAB1 and DAB3 amino acid polymorphism was measured as the percentage of polymorphic codons for PBR, non-PBR and all codons, respectively. We tested for differences between DAB1 and DAB3 amino acid polymorphism at PBR, non-PBR, and all codons using a t-test. The relative rate of synonymous (dS) and nonsynonymous (dN) substitutions was calculated according to Nei and Gojobori (1986) using MEGA 5.0 (Tamura et al. 2011). MEGA 5.0 was also used to perform a Z-test of selection at all amino acid positions, PBR and non-PBR sites, under the null hypothesis that dN = dS. Additionally, we tested for positive selection using a maximum likelihood method implemented in CODEML (PAML v4.3 package; Yang 2007). Phylogenetic tree topologies were reconstructed using MrBayes 3.1 (Ronquist and Huelsenbeck 2003) for DAB1 and DAB3 duplicates separately using Squalius cephalus as outgroup (Genbank accession number HQ595117 for DAB1 and HQ595146 for DAB3). We then used CODEML to implement four site models: two neutral models (M1a and M7) and two models allowing for positive selection (M2a and M8) (Yang et al. 2000). The models were compared using likelihood ratio tests statistics (LRT statistics) calculated as two times the difference between the Log-likelihoods of the models allowing for selection and the neutral models. LRT statistics were compared to a chi-square distribution with 2 degrees of freedom. Positively selected codons were identified using the Bayes empirical Bayes (BEB) approach described in Yang et al. (2005).

MHCIIB and bacteria diversity

In order to provide evidence for pathogen-mediated selection acting on particular MHC sites, we tested the null hypothesis of no difference in the correlation between MHCIIB polymorphism and bacterial pathogen diversity (Dionne et al. 2007). We calculated species richnesses of pathogenic bacteria, nonpathogenic bacteria, and overall bacterial species richness per population using the R library “vegan” (Oksanen et al. 2008). We then analyzed the relationship between the percentage of amino acid polymorphism at PBR, non-PBR codons, and bacterial species richness (pathogenic, nonpathogenic, overall) using linear models. Significance was tested using analysis of variance (ANOVA).

Genetic differentiation at MHCIIB and neutral AFLPs

As our study includes population-based and not individual-based data, we used dissimilarity indices, that is, Euclidean distances between pairs of populations as an estimate for population differentiation. First, Euclidean distance matrices were estimated based on (1) TV presence–absence in each population for DAB1 and DAB3 and (2) AFLP band presence–absence in each population. DAB1, DAB3, and AFLP distance matrices were then standardized to a 0/1 scale using the following formula: (dxdmin)/(dmaxdmin), where dx represents the pairwise Euclidean distance between two populations, dmax the maximum Euclidean distance between two populations, and dmin the minimum Euclidean distance between two populations identified in the matrices. We first tested for isolation by geographical distance (IBD) with Mantel tests both for neutral AFLP and MHCIIB (separately for DAB1 and DAB3). In order to estimate if MHCIIB differentiation exceeds neutral differentiation along geographic clines, we tested whether isolation by distance (IBD) patterns in MHCIIB persisted when accounting for neutral population differentiation using a partial Mantel test. We tested for correlation between differentiation at AFLP and at MHCIIB loci (DAB1 and DAB3 separately) using a Mantel test in order to highlight a potential effect of divergent selection which would increase levels of genetic differentiation at immune loci compared to neutral markers. We also aimed at obtaining the full neutral distribution of IBD slopes, in order to directly compare this distribution to IBD slope values observed at the MHC, hence providing a cross-test of IBD not only using neutral population-based data but individual neutral data. We estimated individual-based AFLP Fst matrices between pairs of populations using a subset of randomly chosen AFLP markers among the 1665 available. The regression slopes between Fst matrices and geographical distance matrices were obtained from Mantel tests. We then compared the distribution of neutral regression slopes (minimum and maximum values of R2 obtained with AFLP markers) with values obtained from MHC loci. All Mantel and partial Mantel tests were implemented with 1000 permutations in each case using the “vegan” library in R (Oksanen et al. 2008).


MHCIIB characterization and diversity

All MHCIIB sequences obtained by cloning four individuals of six P. phoxinus populations blasted against previously published MHCIIB sequences of other cyprinids (including Abramis brama, Barbus bocagei, B. intermedius, Brachydanio rerio, C. carpio, Ladigesocypris ghigii, Rhodeus ocellatus, and S. cephalus) and salmonids (S. salar), with sequence similarities ranging from 79% to 92%. As reported also from other species (Ottova et al. 2005; Seifertova and Simkova 2011), we identified two functional duplicates, corresponding to DAB1 and DAB3. Functionality of the genes was supported by successful cDNA amplification (Text S1), and orthology with DAB1 and DAB3 of other cyprinids was confirmed by two well-separated paralog sequence clusters in a phylogenetic analysis (Appendix 1), with alleles from different species found in each duplicate cluster. We thus further studied two MHCIIB paralogs, with large differences in intron 1 and 2 lengths between the two groups of sequences strengthening the evidence for at least two gene copies (data not shown). A number of clues such as (1) successful amplification of cDNA (2) the absence of stop codon or frameshift mutations (3) the presence of many features characteristic of functional MHCIIB genes (N-linked glycosylation sites and the two conserved cysteine residues) are consistent with a functional antigen binding protein.

454 sequencing produced 7′591 DAB1 and 7′877 DAB3 reads (Text S2 and S3). Average, minimum and maximum read lengths and quality scores frequencies are reported in Appendix 2 and Fig. S2. The first step consisting in the identification of unique TV and AV resulted in 3′456 retained sequences for DAB1 and 3′811 for DAB3. After removing sequences appearing only once or less in the two populations, 162 variants for DAB1 and 177 for DAB3 were left. Sequences were aligned and partial intronic sequences were removed. After alignment and removal of sequences with frameshift indels, 19 TV were found for DAB1 (956 sequences in total, 13% of the initial reads number) and 30 TV for DAB3 (1′046 sequences in total, 13% of the initial reads number).

For both DAB1 and DAB3, amino acid diversity was highest at the 10 (DAB1) and 11 (DAB3) codons matching human PBR codons (Appendix 3). Mean DAB1 and DAB3 amino acid diversity at PBR codons were 0.117 and 0.139, respectively, and were significantly lower at non-PBR sites (0.009 and 0.059 for DAB1 and DAB3, respective t-tests: df = 8.832, t = −3.685, P = 0.005; df = 10.438, t = −4.219, P = 0.002). In concordance with these results, the Z-test of selection showed evidence for diversifying selection on PBR codons as the rate of nonsynonymous substitution (dS) over synonymous substitution (dN) was significantly higher than expected under neutral expectations (Table 1). We tested for differences in amino acid polymorphism between DAB1 and DAB3 duplicates. The t-tests revealed that DAB3 presented greater amino acid polymorphism only at PBR codons (t = 2.402, df = 15.887, P = 0.014). The maximum likelihood method implemented in CODEML revealed that models allowing for positive selection (M2a and M8) better explained DAB1 and DAB3 sequences evolution than neutral models (M1a and M8) (Table 2). Models M2a and M8 identified 13 and 14 codons, respectively, as having evolved under positive selection in DAB1, with, respectively, seven and eight of these corresponding to human PBR sites. For DAB3, five codons were inferred having evolved under positive selection, all of them corresponding to PBR sites.

Table 1. Ratio of synonymous (dS) to nonsynonymous (dN) substitutions for the two duplicates of MHC class IIB (MHCIIB) exon 2 (DAB1 and DAB3)
RegionCodondN/dS P Codon d N /d S P
  1. Bold font indicates significant departure from neutrality test at P < 0.05.

PBR 10 4.320 <0.001 11 7.038 <0.001
Total 55 2.131 0.035 581.2400.217
Table 2. Evidence of positive selection on MHC class IIB (MHCIIB) exon 2 duplicates
DuplicateModelLn LParameter estimatesPositively selected sitesLRT statistic
  1. Sites inferred as being under positive selection with a posterior probability >0.99 are also reported as numbers corresponding to their position in Fig. S4. LRT statistics were used to compare (1) M1a and M2a models and (2) M7 and M8 models to a chi-square distribution (df = 2). Log-likelihood and parameter estimates calculated by CODEML are presented.

DAB1M1a−807.21p = 0.548 (p1 = 0.452), ω = 0, ω1 = 1Not allowed 
M2a−775.92p0 = 0.501, p1 = 0.144 (p2 = 0.355), ω = 0, ω1 = 1, ω2 = 9.1216, 8, 10, 14, 22, 24, 25, 35, 36, 44, 45, 51, 5462.584 (< 0.001)
M7−807.43p = 0.005, q = 0.005Not allowed 
M8−775.93p0 = 0.643 (p1 =0.357) p = 0.005, q = 0.020, ω = 8.9656, 8, 10, 14, 22, 24, 25, 28, 35, 36, 44, 45, 50, 51, 5462.991 (< 0.001)
DAB3M1a−1015.73p = 0.663 (p1 = 0.337), ω = 0.081, ω1 = 1Not allowed 
M2a−971.28p0 = 0.454, p1 = 0.457 (p2 = 0.089), ω = 0.065, ω1 = 1, ω2 = 9.4028, 10, 24, 26, 2888.906 (< 0.001)
M7−1019.09p = 0.159, q = 0.221Not allowed 
M8−971.60p0 = 0.911 (p1 = 0.089) p = 0.023, q = 0.020, ω = 9.5358, 10, 24, 26, 2894.963 (< 0.001)

Genetic differentiation and diversity at MHCIIB and neutral AFLPs

Isolation by distance

Isolation by distance was observed at neutral loci and DAB1 (Fig. 2; Mantel test respectively, R = 0.330, P = 0.037 and R = 0.403, P = 0.025). DAB3 showed a tendency of IBD (Mantel test, R = 0.242, P = 0.086). The nonsignificance of this correlation was driven by three outlier population pairs (Ic-Or, Il-Or, Lu-Or) for which low levels of genetic differentiation was not explained by the geographic distance separating them. The three outlier population pairs are highlighted by black circles in Fig. 2C. Removing them resulted in significant IBD (Mantel test, R = 0.515, P = 0.004). The pattern of IBD at DAB1 persisted when controlling for neutral genetic differentiation (R = 0.297, P = 0.042), and stayed nonsignificant for DAB3 (R = 0.188, P = 0.140). Again, when removing the three outlier pairs of populations and controlling for neutral differentiation, significant IBD was found also at DAB3 (R = 0.459, P = 0.014).

Figure 2.

Isolation by distance patterns for genetic differentiation between populations at neutral AFLP (A), DAB1 (B), DAB3 (C). Lines represent the least square regression between genetic distance and geographical distance. R statistics from Mantel tests and their significance (* significant at 95% level, NS non-significant at the 95% level) are indicated. The three outlier pairs of populations breaking down the isolation by distance (IBD) pattern at MHC class IIB (MHCIIB) DAB3 are circled in black (C).

Patterns of genetic differentiation at MHCIIB and neutral AFLP

Patterns of differentiation at neutral and MHCIIB duplicates were significantly correlated for DAB1, but not for DAB3 (Fig. 3; Mantel test; DAB1, R = 0.469, P = 0.006 and DAB3, R = 0.207, P = 0.137). Again, removing the three outlier pairs of populations resulted in significantly correlated genetic differentiation at AFLP markers and DAB3 (Mantel test; R = 0.313, P = 0.062). The neutral distribution of regression slopes ranged from −0.263 to 0.839, which includes values observed at MHC loci (0.445 and 0.242 for DAB1 and DAB3, respectively).

Figure 3.

Correlation between the extent of genetic differentiation at neutral AFLP loci and MHC class IIB (MHCIIB) loci (DAB1: gray closed dots, Mantel test: R = 0.469, P = 0.006; DAB3: black closed dots Mantel test: R = 0.207, P = 0.137).

Bacterial and MHCIIB diversity

To investigate how pathogenic bacteria affect P. phoxinus populations' MHCIIB diversity at a local scale during summer, we first characterized pathogenic and nonpathogenic bacterial communities for each population using 454 amplicon sequencing of 16S rDNA. Classification in the Ribosomal Database Project retained sequences with a length >200 bp resulting in a total of 374 sequences of 16S bacterial rDNA out of 548 reads (see Appendix 2 for read details). Based on Bergey's nomenclature used by the classifier taxonomic assignment tool implemented in RDP, 41 different species were identified in the nine sampling sites analyzed, and 40 other species were not determined and thus treated as being different species in the data set. Seven potentially pathogenic species were identified following Austin and Austin (1999): Lactobacillus sp. (Carnobacterium), Staphylococcus sp., Flavobacterium sp., unclassified Clostridiales, unclassified Enterobacteriaceae, unclassified Pseudomonadaceae, and unclassified Oxalobacteraceae. We therefore partitioned the bacteria into three different data sets: “pathogenic bacteria,” “nonpathogenic bacteria”, and “all bacteria.” The species richness index calculated for these three data sets ranged from zero to three for pathogens, from two to 36 for nonpathogens, and from three to 39 for all bacteria. We examined the overall bacteria phylum composition for each sampling location (Fig. 4A). We found four main phyla across the nine sampling locations screened: Actinobacteria, Bacteroidetes, Firmicutes, and Proteobacteria. A small proportion of unknown bacteria were recorded. Il and Va locations were characterized by the absence of the Bacteroidetes phylum and Ta by the absence of Proteobacteria. Firmicutes was the less abundant phylum in all sampling locations. No pathogenic bacteria were recorded at Br and Ta locations (Fig. 4B). Il location was characterized by Carnobacterium (Firmicute), while in other locations Flavobacterium pathogens (Bacteroidetes) predominated (Fig. 4B). ADONIS analysis revealed no significant difference in pathogenic and nonpathogenic bacteria composition between locations. CCA highlighted that 62% of the variance observed in the multivariate data set (pathogen diversity and environmental variables for each location) were explained by CCA1 and CCA2 axes (Fig. 5). As mentioned previously, Il and Lo locations were differentiated from the others as their pathogen distribution (1) was rather different, dominated by Carnobacterium and Staphylococcus, respectively, and (2) was positively associated to higher altitudinal sites and topography. Pathogen compositions at other sampling locations were characteristic of highly productive (eutrophic) sites at lower elevation.

Figure 4.

Barplot of 16S rDNA sequences relative abundance of phyla (A) and pathogen species (B) for sampling location (full population names and abbreviation are detailed in Figure 1).

Figure 5.

Canonical correspondence analysis correlation biplot showing the relationship between pathogen abundance and composition per sampling location (for full names, see Fig. 1) and environmental variables (DO, dissolved oxygen; alt, altitude; Slope, landscape slope; pH, soil pH; frost, mean number of annual days of frost; PP, primary productivity). Explanatory environmental variables are represented by gray arrows whose length indicated the strength of correlation between variables and pathogen's ordination score. Angles between variables reflect their degree of correlation. Gray numbers represent each pathogen species (1, Carnobacterium; 2, Staphylococcus; 3, unclassified Clostridiales; 4, unclassified Enterobacteriaceae; 5, unclassified Pseudomonaceae; 6, unclassified Oxalabacteraceae; 7, Flavobacterium).

We used linear models and ANOVAs to explore the relationship between amino acid polymorphism at MHCIIB and species richness of pathogenic bacteria, nonpathogenic bacteria, and all bacteria. The percentage of polymorphic amino acid sites for the two MHCIIB duplicates ranged from 80% to 100% for DAB1, 82% to 100% for DAB3 at PBR codons, and at non-PBR codons from 20% to 27% for DAB1 and 19% to 36% for DAB3. Only the percentage of polymorphic amino acid sites within PBR codons at DAB3 showed significant positive association with pathogen species richness (ANOVA: df = 7, F = 12.015, P = 0.010). ANOVA results of linear regression models are summarized in Table 3.

Table 3. General linear model results regarding the effects of bacterial richness (nonpathogen and pathogen) on the percentage of polymorphic amino acid sites for the two MHCIIB duplicates (DAB1 and DAB3)
  1. Bold font indicates a significant effect at P < 0.05.

All bacteria

(df = 7)

F = 0.536

P = 0.488

F = 0.355

P = 0.570

F = 0.440

P = 0.528

F = 3.488

P = 0.104


(df = 7)

F = 0.740

P = 0.418

F = 0.460

P = 0.521

F = 0.244

P = 0.637

F = 4.142

P = 0.081


(df = 7)

F = 0.532

P = 0.489

F = 0.148

= 0.712

F  = 12.015

P  = 0.010

F = 0.015

P = 0.905


The present study aimed at examining whether the cyprinid fish P. phoxinus exhibited MHCIIB-related local adaptation to potentially pathogenic bacterial communities to which fish populations are exposed to. Our results suggest pathogen-mediated local adaptation in European minnows at two MHCIIB loci (DAB1 and DAB3), though they appeared to be driven by different aspects of pathogen-mediated selection. Patterns of genetic differentiation at DAB1 suggest that selection favored different alleles in each population, resulting presumably in better immune defense against local pathogens. This pattern was also found for DAB3, but only partially when removing the three pairwise comparisons with largest geographic distance. Contrary to its paralog, genetic diversity at DAB3 was correlated with local pathogen diversities. This suggests that although locally adapted alleles were favored by selection also at DAB3, selection for locally adapted genetic diversity overpowered the selection for locally adapted alleles. The interplay between these different selection processes may explain the breakdown of the overall pattern of IBD at DAB3.

Bacterial community composition

In the present study, pyrosequencing provided a fast way to characterize bacterial communities with a greater sequencing depth than any other method (Sogin et al. 2006). We found common phyla of freshwater lakes, such as Bacteroidetes, Actinobacteria, and Proteobacteria (Newton et al. 2011). We found seven potentially pathogenic bacteria species that were inferred from multiple previous studies as driving factors of fish diseases (Austin and Austin 1999; Isik et al. 1999; Clements et al. 2007, 2009; Moyer and Hunnicutt 2007; Embar-Gopinath et al. 2008; Loch et al. 2008; Sekar et al. 2008; Von Siebenthal et al. 2008; Prasad and Kumar 2010; Newton et al. 2011; Mouchet et al. 2012). Their composition didn't significantly differ between locations, but was influenced by various environmental factors such as altitude, landscape topography, and productivity. These factors have been previously reported to play a role in bacteria species distribution and abundance at local and regional scales (Yannarell et al. 2003; Lindström et al. 2005; Yannarell and Triplett 2005; Newton et al., 2007; Shade et al. 2008; Lindström and Langenheder 2012). No pathogenic bacteria were recorded at two locations (Br and Ta), suggesting that other types of pathogens may be present and play a role in shaping MHC genetic diversity in these populations. Even though ADONIS analysis showed little differences in pathogen composition, the CCA analysis revealed a high heterogeneity in pathogen distribution between sites, especially for Il and Lo. Il was dominated by Carnobacterium, which was reported as a major fish pathogen associated with warm water streptococcosis (Michel et al. 2003; Mata et al. 2004; Leisner et al. 2007). Flavobacterium, Pseudomonas, Enterobacteriacae, or Clostridium are known to be pathogenic bacteria commonly colonizing digestive tract of fish (Sugita et al. 1985; Nayak 2010). Enterobacteriacae recorded in one population, Le, can be a major disease agent in fish and is probably of anthropogenic origin (Newton et al. 2011) and Le is a human managed freshwater location. ADONIS and CCA presented contrasted results suggesting that pathogen richness and distribution differ but not their composition. Also, the environmental factors identified by the CCA to have potential influence on pathogen distribution were also found to play a role in morphological adaptation to various habitats in the European minnow (Collin and Fumagalli 2011). These concordant lines of evidence suggest that European minnows are locally adapted to both abiotic and biotic factors in their respective environment.

Pathogen-mediated selection on MHCIIB loci

In the present study, we found that DAB1 amino acid polymorphism at PBR sites was not associated to pathogen richness, but patterns of differentiation at neutral loci and DAB1 were significantly correlated. These results could suggest an effect of migration and drift on DAB1 diversity. However, contrary to expectations where neutral processes alone were explaining genetic structure at DAB1, the pattern of isolation by distance at DAB1 remained significant even when controlling for neutral genetic structure. These results suggest (1) a strong spatial structure between populations and (2) a role for diversifying selection, with selection on DAB1 favoring different, locally adapted alleles in every population. DAB1 genetic diversity seems thus to be the result of diversifying selection promoting different locally adapted alleles frequencies in different populations. Similar results were observed on MHCIIB in recent studies on great snipes (Gallinago media; Ekblom et al. 2007) and MHC class I in house sparrow (Passer domesticus; Loiseau et al. 2009).

Contrarily to DAB1, a significant positive relationship between diversity of potentially pathogenic bacteria and DAB3 amino acid polymorphism was found. Also, patterns of genetic differentiation at DAB3 and neutral markers were not correlated, implying that selection shaped DAB3 diversity. The latter results suggest a role for contemporary pathogen-mediated selection in promoting local adaptation in terms of genetic diversity of P. phoxinus at DAB3. Pathogen-mediated selection linked to habitat heterogeneity has been proposed as a major mechanism shaping MHC variation (Blais et al. 2007; Ekblom et al. 2007; Alcaide et al. 2010). In fish, there are numerous empirical examples suggesting that pathogen-mediated selection is acting to promote MHC diversity through three main mechanisms (reviewed in Wegner 2008 and Eizaguirre and Lenz 2010): (1) heterozygote advantage (Rakus et al. 2008; Evans and Neff 2009; Kekäläinen et al. 2009), (2) frequency-dependent selection (Langefors et al. 2001; Croisetière et al. 2008; Dionne et al. 2009; Eizaguirre et al. 2009) and/or (3) habitat heterogeneity (Dionne et al. 2007, 2009; Eizaguirre et al. 2011, 2012). MHC differentiation observed at DAB3 is likely to be a result of habitat heterogeneity and local adaption to the local pathogen diversities. Several lines of evidence suggest that pathogen-mediated selection promotes and maintains genetic diversity at DAB3. First, IBD became only apparent for DAB3 when excluding three outlier pairs of populations and was absent otherwise. This highlights a limited role for distance-dependent gene flow in shaping the distribution of DAB3 alleles. Significant IBD after the exclusion of the three outlier pairs of populations suggests that selection (local adaptation) at DAB3 has driven allele frequency evolution in a number of populations, as for DAB1. As the outliers show low genetic differentiation at large distances, selection appears to have favored similar alleles in isolated populations. We found a significant relationship between pathogen richness and DAB3 amino acid polymorphism at PBR sites. Overall, these results suggest that the genetic diversity observed at DAB3 may be adaptive at a local scale, and that bacterial pathogen diversity represents a main selective agent. Though, we cannot exclude that only balancing selection may act on the three outlier pairs of populations at DAB3 loci. Indeed, as mentioned earlier, balancing selection usually produces pattern low of genetic differentiation at MHC loci compared to neutral markers (Hedrick et al. 2001; Bernatchez and Landry 2003; Seddon and Ellegren 2004; Van Oosterhout 2009; Evans et al. 2010; Miller et al. 2010). For example, Van Oosterhout et al. (2006) found lower levels of genetic differentiation at MHC loci than at neutral markers in the Poecilid fish Poecilia reticulata. They explain such a pattern by MHC loci being under balancing selection, increasing the effective migration rate for these genes compared to what is expected at neutral markers. Finally, the pattern of genetic differentiation observed between populations of European minnows and the presence of outlier pairs of populations could suggest an effect of fish stocking. We can rule out this hypothesis with high confidence, as the levels of genetic differentiation for outlier pairs of population are higher at AFLP loci and DAB1 than they are at DAB3 (gene flow from stocking would lead to genome-wide low differentiation).

The slight contrasting pattern of differentiation between DAB1 and DAB3 suggests that the evolution of these two duplicates is driven by different selective agents (Landry and Bernatchez 2001) and/or different pathogen-mediated selection intensity (Edwards and Hedrick 1998; Martinsohn et al. 1999; Penn and Potts 1999; Van Oosterhout 2009). Contrasting patterns of evolution of the MHCIIB duplicates similar to the ones identified here in P. phoxinus have already been demonstrated in other cyprinid species (Seifertova and Simkova 2011). For example, chubs (Squalius cephalius) display a similar MHCIIB architecture with each two of the duplicates present in European minnows highlighted in the present study. In S. cephalus DAB3 also presents a greater diversity than DAB1. Although these data support the results found in the present work, several studies on other cyprinid fish reported the inverse pattern (Ottova et al. 2005, 2007).

Combining molecular evolution and environmental genomics

Our study used a combination of molecular evolution and environmental genomics to understand how pathogenic bacteria communities and their environmental characteristics influence MHCIIB genetic diversity in nine populations of European minnow. We also used genome-wide neutral AFLP markers to compare levels of genetic differentiation at both neutral and selected markers. This study attempted to (1) characterize MHCIIB in P. phoxinus and (2) understand how pathogen-mediated selection shapes diversity at two MHCIIB copies. Despite providing important insights into these questions, our study also shows the technical limitations of this approach.

First, the bacteria diversity survey was based on water samples collected from a single water sampling event. Even though this may represent some limitation, this approach gave some important information about pathogen distribution and abundance that can be a starting point for further studies on European minnows or other species. Indeed, most studies aiming at detecting pathogen-mediated selection, often do not report diversity, spatial distribution, or abundance of pathogens (Wegner 2008). Even though in the present study potentially pathogenic bacteria were recovered from water samples and not from fish, numerous studies on bacteria associated with fish demonstrated that gut and gill microflora (including potential pathogens) is colonized by free living aquatic bacteria and/or ones associated to food (Nayak 2010; Mouchet et al. 2012). Further work on the topic should also consider temporal distribution of pathogens, as time scale can be of particular importance in shaping MHC diversity.

We demonstrated that both copies presented historical evidence for positive selection, but only one copy (DAB3) presented evidence for contemporary pathogen-mediated selection. DAB1 may also be subjected to this type of selection, but as fish and water samples have been sampled during the summer, we can hypothesize that this copy may be involved in the recognition of pathogens that were not sampled at that time of the year, or other pathogen taxa.

Our next-generation sequencing of MHCIIB copies presents several limits as compared to more classical approaches. The major limitation of our approach was the population-level tagging rather than individual-based sequencing. Even though it provided a rapid and cheap way to sequence a large number of individuals, the frequency of the identified variants could not be reliably inferred. Several hypotheses related to pathogen-mediated selection could not be tested such as a potential heterozygote advantage or frequency dependent selection. Nevertheless, several recent publications successfully used a population-based 454 sequencing approach similar to the one presented in our study to assess population-level MHC variation (Spurgin et al. 2011; Radwan et al. 2012; Sepil et al. 2012). In addition, we think that the technological progress in tagging possibilities and the longer read length now available with next-generation sequencing will soon unravel the limitations mentioned above.


Duplication events and the fate of duplicates in the face of both neutral processes and selection are still sparsely understood. In this context, the present study demonstrates how host–pathogen interactions can be the driver of genetic diversity in immune-related genes, and how alternative selective regimes may influence the processes driving the evolution of gene duplicates within the same species. Further studies with higher technical resolution examining the duplication history of MHC genes in cyprinid fish in Europe and the origin of duplicates' diversity should provide new insights into the relative role of historical and contemporary selection, and neutral processes on host–pathogen related local adaptation.


We thank Lee Marowski for providing DIVAA, T. Parchman and P. Taberlet for help with analyses, S. Antoniazza and S. A. Pavey for comments on the manuscript, P. Nosil for discussion. We also thank the “Services de la Faune” from the cantons of Valais and Vaud, L. Anex, Y. Crettenand, and J. M. Trolliet for help with sampling. This work was conducted in compliance with the Swiss laws on animal experimentation under authorization number 2049 delivered by the Swiss Veterinary Services. The study was supported by a Swiss National Science Foundation grant (3100A0-109852) to L. F. and a “Société Academique Vaudoise” grant to H. C.

Conflict of Interest

None declared.

Appendix 1

Description of the 454 sequencing run details, before read sorting and classification of the two MHC classIIB exon 2 and 16S rDNA sequences

AmpliconNumber of readsMean read lengthMinimum read lengthMaximum read lengthMean quality scoreMinimum quality scoreMaximum quality score
16S rDNA54824527655271640

Appendix 2.


Unrooted neighbor-joining tree of Phoxinus phoxinus DAB exon 2 alleles (Jukes Cantor nucleotides distances) based on a 135 bp and 174 bp for DAB1 and DAB3, respectively. DAB alleles from additional cyprinids (Abramis brama, Squalius cephalus, Cyprinus carpio, Barbus bocagei, Brachydanio rerio, Barbus intermedius, Rhodeus ocellatus, and Ladigesocypris ghigii; Genbank accession number beside allele names) and one salmonid (Salmo salar) are included. “HAP” refers to true variants of MHCIIB for DAB1 (pink branches) and DAB3 (green branches) duplicates obtained for Phoxinus phoxinus in the present study. Bootstrap values >50% (1000 replicates) are shown.

Appendix 3


Sequence logo showing amino acid diversity for each amino acid position of the two MHC class IIB duplicates examined: DAB1 (A) and DAB3 (B). Human peptide-binding region (PBR) sites (Brown et al. 1993) are indicated with an asterisk. The amino acid positions refer to the beginning of MHC class IIB (MHCIIB) exon 2.