Theory predicts that positive heterozygosity-fitness correlations (HFCs) arise as a consequence of inbreeding, which is often assumed to have a strong impact in small, fragmented populations. Yet according to empirical data, HFC in such populations seem highly variable and unpredictable. We here discuss two overlooked phenomena that may contribute to this variation. First, in a small population, each generation may consist of a few families. This generates random correlations between particular alleles and fitness (AFCs, allele-fitness correlations) and results in too liberal tests for HFC. Second, in some contexts, small populations receiving immigrants may be more impacted by outbreeding depression than by inbreeding depression, resulting in negative rather than positive HFC. We investigated these processes through a case study in tadpole cohorts of Pelodytes punctatus living in small ponds. We provide evidence for a strong family structure and significant AFC in this system, as well as an example of negative HFC. By simulations, we show that this negative HFC cannot be a spurious effect of family structure, and therefore reflects outbreeding depression in the studied population. Our example suggests that a detailed examination of AFC and HFC patterns can provide valuable insights into the internal genetic structure and sources of fitness variation in small populations.

Correlations between heterozygosity at neutral markers and fitness (heterozygosity-fitness correlations [HFC]) are widespread in wild populations (see Britten 1996; Chapman et al. 2009 for a review) and central to our understanding of subjects as varied as sexual selection (e.g., Freidenburg and Skelly 2004; Charpentier et al. 2008; Oigarden et al. 2010 for recent ideas) or the conservation of small populations (e.g., Jaquiery et al. 2009; Blomqvist et al. 2010). HFC is generally positive (more heterozygous individuals have higher fitness) and is usually considered as reflecting the deleterious effects of inbreeding on fitness traits under circumstances that create a variance in inbreeding among individuals: small populations, recent bottlenecks, nonrandom mating, or population admixture (Szulkin et al. 2010). However, empirical results show that even in relatively small populations, HFC is a rather small and inconsistent signal (Chapman et al. 2009). Part of this variability may come from sampling error, which is predicted to be high under realistic levels of inbreeding (David 1998; Szulkin and David 2011). However, we believe that other processes can generate additional variation in HFC, not accounted for in the classical HFC paradigm. In this article, our aim is to illustrate two of these processes and show how their influence can be assessed using an example.

The first process is family structure. In small or fragmented populations, family structure can be important when a few pairs of breeders produce a large number of offspring, or when a few males fertilize many females. In such cases, that is, when the ratio of population effective size to sample size is not large, individuals are sampled within a finite set of sibships (families), which may differ from each other by their allele frequencies, their heterozygosity, and their fitness traits, if the latter are heritable. Some alleles may happen to be found in relatively fit families, which generate a statistical correlation between allele occurrence and fitness (allele-fitness correlations, hereafter AFCs). Similarly, random, positive, or negative correlations between heterozygosity and fitness may emerge if the fitter families happen to be either more heterozygous or less heterozygous than the average (classical HFC theory). Then, tests for HFC can be spuriously significant because individuals are related, not independent. To our knowledge, this effect of family structure has never been considered in the HFC literature.

Second, the relationship between inbreeding and fitness in fragmented populations is often not monotonic; mating between distant parents sometimes produces relatively unfit offspring resulting in outbreeding depression (Willi and Van Buskirk 2005; Dolgin et al. 2007; Escobar et al. 2008). This may occur when one of the ancestors is an immigrant carrying alleles not adapted to the local environment (e.g., Edmands 1999); or partially incompatible with the local alleles (Escobar et al. 2008). Outbreeding depression can result in negative HFC, reversing the classical pattern. Although outbreeding depression has been known for a long time, its impact on HFC only starts to be considered (Olano-Marin et al. 2011a; Szulkin and David 2011).

Amphibian species are good models to explore these questions. They live in fragmented habitats and form small populations where inbreeding can occur (e.g., Lesbarreres et al. 2003; Andersen et al. 2004; Dixo et al. 2009). HFC have been documented in several species (Halverson et al. 2006; Lesbarreres et al. 2007; Schmeller et al. 2007). Populations may also be impacted by migration among populations adapted to different local conditions (Richter-Boix et al. 2010), a condition that increases the probability to observe outbreeding depression. Moreover, a few breeding pairs often produce a high number of larvae (Waldman and McKinnon 1993), increasing the potential for family structure.

Here we propose a way to assess the impact of family structure and outbreeding depression on HFC in small, fragmented populations, prone to inbreeding and local adaptation. We illustrate our approach by studying relationships between microsatellite genotypes and two traits strongly linked to fitness (larval survival and larval growth, see Smith 1987; Altwegg and Reyer 2003) in natural populations of the parsley frog (Pelodytes punctatus, six populations, 31–90 individuals sampled in each). We first search for indication of inbreeding and family structure based on genotypic structure within samples. We then test the prediction that family structure should result in detectable AFC in the studied populations. We finally investigate the existence of both positive and negative HFC reflecting the relative impacts of inbreeding and outbreeding depression, respectively. Using simulations we evaluate the impact of family structure on the statistical tests of HFC (and conclude that in our case the results are robust).

Materials and Methods

Tadpoles were sampled in six ponds: Ruisseau, Fesq, GMP, PMP, Combe l’Escure, and Bergerie (see Jourdan-Pineau et al. 2012 for description of the study area). In GMP, PMP, Combe l’Escure, and Bergerie, the larval cohorts were sampled about 1 month after spawning. In Ruisseau, the sampled cohort was laid in spring whereas in Fesq it was laid in autumn. In these two sites, the same larval cohort was sampled twice, at the beginning (about 15 days after spawning) and at the end of the larval period before the appearance of any metamorph (respectively, 3.5 and 6 months after spawning, corresponding to similar developmental stages, as the autumn cohort in Fesq overwintered in the pond, and therefore had a much slower development than the spring cohort in Ruisseau). These ponds were regularly inspected (every 10 days), and clutches and tadpole development were systematically estimated to make sure that all tadpoles originated from the same spawning event (i.e., within a single 10-day interval between two visits) in each pond and had not yet started to metamorphose. To limit the impact of sampling on larval cohorts, the sample size per site had to be limited (Table 1). We roughly estimated the number of tadpoles at each visit as the product of pond area by density, the latter estimated using a standardized dip-netting method (Richter-Boix et al. 2007). In Ruisseau and Fesq, the first sample represents less than 10% of the larval cohort. Individuals were genotyped at six microsatellite markers: PPU2, PU5, PPU10, PPU11, PPU15, PPU16 (Jourdan-Pineau et al. 2009).

Table 1.  Expected and observed heterozygosity (He and Hobs, respectively) and heterozygote deficiency (FIS) for all samples, at all genotyped loci. Significant values (P < 0.05) of FIS are bold numbers. “Start” and “end” refer to the two temporal samples in the Ruisseau and Fesq ponds.
SiteRuisseau (nstart= 88, nend= 90)Fesq (nstart= 42, nend= 34)
g 2 start = 0.008 ± 0.013 /g2 end =−0.005 ± 0.012 g 2 start = 0.017 ± 0.033 /g2 end = 0.036 ± 0.039
H e H obs F IS H e H obs F IS
PPU20.7740.7860.7960.767−0.023 0.030.6310.6380.7380.618−0.159 0.047
PPU5 0.666 0.65 0.693 0.711 −0.035 −0.089 0.631 0.702 0.571 0.794  0.107 −0.117
PPU100.6850.7340.7610.811−0.107−0.10.670.660.6430.559 0.053 0.167
PPU11 0.392 0.333 0.352 0.311  0.106  0.072 0.486 0.5 0.357 0.294  0.276  0.424
PPU150.5660.4760.6250.444−0.098 0.0720.4110.3190.2860.206 0.315 0.367
PPU16 0.868 0.838 0.875 0.867 −0.002 −0.029 0.774 0.764 0.714 0.853  0.09 −0.101
All0.6580.6360.6840.652−0.033−0.0190.6010.5970.5520.554  0.0935  0.087
SitePMP (n=31)GMP (n=31)Combe l'Escure (n= 31)Bergerie (n= 31)
g 2=−0.058 ± 0.036 g 2= 0.01 ± 0.034 g 2= 0.012 ± 0.037 g 2= 0.031 ± 0.046
Loci H e H obs F IS H e H obs F IS H e H obs F IS H e H obs F IS
PPU20.7520.548  0.286 0.6430.677−0.0370.6640.677−0.0030.6610.688−0.025
PPU5 0.609 0.742 −0.203 0.643 0.613  0.063 0.512 0.452  0.134 0.577 0.656 −0.122
PPU100.6120.613 0.0160.630.677−0.0590.6240.742−0.1720.5170.375  0.289
PPU11 0.433 0.3  0.322 0.498 0.4  0.213 0.448 0.29  0.366 0.359 0.219  0.404
PPU150.20.226−0.1110.3070.29 0.0690.2250.194 0.1550.3980.5 −0.24
PPU16 0.877 0.807  0.096 0.858 0.871  0.001 0.856 0.839  0.037 0.807 0.688  0.163
All0.5810.539  0.088 0.5960.588 0.030.5550.532 0.0570.5530.521  0.074

For each sample, expected heterozygosity (He), observed heterozygosity (Ho), FIS, and FST were computed with Genetix (Belkhir et al. 1996).FIS and FST were tested with 1000 permutations. Tests of linkage disequilibrium were run with Genepop 4.0 (Raymond and Rousset 1995; Rousset 2008). We estimated inbreeding variance among individuals within samples using the parameter g2, which quantifies the covariances in homozygosity in pairs of loci that arise from inbreeding. Being based on covariances among loci, g2 is insensitive to sources of error that affect different loci independently, such as null alleles or amplification failure (David et al. 2007). As a measure of inbreeding it is therefore superior to FIS, which is very sensitive to such artifacts. Finally, we estimated the number of families in Ruisseau and Fesq by two methods: direct visual count of egg masses in the field, and indirect count based on microsatellites, using the algorithm “Modified Simpson” in KINGROUP version 2, which partitions each sample into a set of potential sibships (Queller and Goodnight 1989; Goodnight and Queller 1999; Konovalov et al. 2005).

We considered larval growth and larval survival as fitness traits. We know from field surveys that all tadpoles in a sample originated from the same spawning event (that may last a few days) and therefore had roughly the same age when measured. Size at sampling (snout-vent length) was therefore used to estimate growth. For Ruisseau and Fesq, we used only the size of the second temporal sample because the first sample showed very little size variance.

To investigate the occurrence of AFC on growth, we partitioned the second samples of Ruisseau and Fesq into two halves: large and small individuals. We then calculated the FST between the two groups within each site. We also tested whether mean size differed among the putative families reconstructed by KINGROUP (one-way ANOVA). We investigated AFC on survival by looking at changes in allelic frequencies between the beginning and the end of the larval development, using the FST between the first and the second samples in Ruisseau and Fesq.

To assess HFC on growth, we regressed size on multilocus heterozygosity (MLH) as well as on observed heterozygosity at each locus (multivariate model). When significant, the two models (MLH and multivariate) were compared using a F-test following David (1997) and Szulkin et al. (2010) to test whether particular loci had significantly different effects on growth (test for “local effects”). Correlations between heterozygosity and survival were tested by comparing the mean heterozygosity between the first and the second samples in the Ruisseau and Fesq ponds. Because larvae are sampled before metamorphosis (i.e., before any juvenile escapes the ponds), any change in heterozygosity must reflect differential survival. To this end, we analyzed the contingency table (sampling date by MLH class) using a Poisson log-linear model and tested the interaction between sampling date and MLH (Manly 1985; David and Jarne 1997; Fahrmeir and Tutz 2001).

We evaluated the potential of family structure to generate spurious HFC in our data by simulating datasets in which the sample was drawn from a given number of full-sibling families. Parental genotypes were simulated by randomly drawing alleles at the observed allelic frequencies, and offspring genotypes were drawn assuming random Mendelian segregation. Phenotypes were simulated as 1/2 h2x+ (1 − 1/2 h2) y, where h2 is the heritability, x is a standard normal deviate common to all members of a family, and y is a standard normal deviate specific to each individual. This allowed us to generate 1000 simulated datasets for each parameter combination (numbers of families, sample sizes, h2). Standard regressions of phenotype on heterozygosity were performed on each, and we recorded how often significant tests (P < 0.05, 0.01, or 0.001) were observed, and compared to the nominal risk (0.05, 0.01, 0.001), that is, the proportion of significant tests expected by chance if all datapoints had been independent.


Significant heterozygote deficiency occurred in several populations but at apparently random loci. Significant linkage disequilibrium was found at a large proportion of locus pairs in all sites except GMP (see Table S1). No sample showed significant evidence of inbreeding based on the g2 parameter although six estimates out of nine were positive; the second sample of Fesq exhibits the highest g2 value (g2= 0.036 ± 0.039, P= 0.098, Table 1). On the other hand, both visual egg counts and estimates based on genetic data indicated that our samples were composed of a limited number of families, especially in Fesq (visual count: 9 and 43; modified Simpson method: 12 and 34 for Fesq and Ruisseau, respectively).

FST between temporal samples was significant in Ruisseau (FST= 0.005, P= 0.026) but not in Fesq (FST= 0.000, P= 0.5). Reconstructed families were not equally represented between the two samples in Fesq (χ211= 21.1895, P= 0.03146) although this test was marginally nonsignificant in Ruisseau (χ233= 44.7449, P= 0.08336).

We found a significant genetic differentiation between the two size classes in both sites (in Ruisseau, FST= 0.012 P= 0.011 and in Fesq, FST= 0.058 P= 0.007) but the effect of family on size (using reconstructed families) was significant only in Fesq (F9,21= 3.83, P= 0.005; Ruisseau: F30,59= 1.12, P= 0.35).

Mean heterozygosity did not change between the first and second sample in both Ruisseau and Fesq populations (Table 1, F1,4= 0, P= 1 for Ruisseau and F1,4= 0.84, P= 0.4 for Fesq), showing no correlation between survival and MLH. We found no significant correlation between size and MLH except in one population: Fesq (Table S2). The correlation in Fesq was strongly negative and remained highly significant even after correction for multiple testing (R2= 0.54, P < 0.001, Fig. 1 and Table S2). In this case, both the univariate (MLH) and multivariate (locus-by-locus) models were significant; however, the latter did not explain significantly more variance than the former (F= 0.861, P= 0.521), indicating no significant differences in heterozygosity effects among loci (see Table S3 for locus details).

Figure 1.

Size (snout-vent length) measured on individuals taken in the second sample in Ruisseau (A) and Fesq (B) ponds in relation to multilocus heterozygosity.

Our simulations showed that the existence of family structure indeed resulted in too liberal tests for HFC: the proportion of false positives (type I error) was higher than nominal P-values (Table 2). Logically, type I error increased as h2 became higher and/or as the number of families became lower (keeping total sample size constant). However, even using relatively pessimistic values (h2= 0.5 or 1; 5 or 10 families in the sample) the ratio between true and nominal P-values remains modest. In particular, P-values of less than 0.001 (such as that observed in the Fesq population) are very unlikely to emerge by chance under realistic levels of family structure.

Table 2.  Effect of family structure on regression tests. We simulated heterozygosity-phenotype regressions with specified family structures in the sample (50 individuals divided in five or 10 equally represented families) under the null hypothesis, and computed the proportion of significant tests (false positives) for various thresholds (P= 0.05–0.001). Results are given for two values of trait heritability. The simulations used observed allele frequencies from the Fesq population, and assume random mating with monogamy.
h 2 N families N indiv/familyProportion of false positives
P= 0.05 P= 0.01 P= 0.005 P= 0.001
0.510 50.0650.01580.00780.0016
   5 10 0.0744 0.0199 0.0107 0.0019
110 50.11830.03990.02540.0076
   5 10 0.1773 0.0744 0.0516 0.0221



Here we ask whether the genotypic structure of the population bears signatures of inbreeding (the classical source of HFC), family structure (a potential source of AFC and spurious variation in HFC) or both. Under inbreeding one expects heterozygote deficiencies (FIS) occurring consistently at all loci and correlations in heterozygosity among loci (g2). On the other hand, if family structure is present, one expects variable, randomly distributed heterozygote deficiencies, and linkage disequilibria (because of the Wahlund effect due to mixing groups with different allelic frequencies). Based on these predictions, parsley frog populations fulfill the expectations of family structure and show little traces of inbreeding. Indeed, FIS were sometimes significant, but differed among loci and populations, and linkage disequilibria were consistently significant. The g2 estimates were nonsignificant although mostly positive. Given our low sample sizes, we cannot exclude that some of the populations had a low proportion of inbred individuals but the dominant genetic signature is certainly that of family structure. This is consistent with our field data indicating a limited number of breeding pairs in each site. Reasonably similar numbers of families were inferred from direct counts and genetic analysis (KINGROUP). It appears that in small fragmented populations of parsley frog, multilocus genetic structure is more affected by the limited number of pairs involved in an episode of reproduction than by consanguineous mating.


In both Ruisseau and Fesq, we observed significant AFC for growth, as attested by significantly different allele frequencies (FST) between the large and small individuals; we also observed significant differences in size among genetically reconstructed sibships in one of the two populations (Fesq); the nonsignificant effect in Ruisseau may be due to a lower statistical power (smaller families). AFCs are also observed for survival, as attested by significant change in allele frequencies between the two temporal samples, but the effect is weaker than for size, and only significant in the Ruisseau population. Strangely, differential survival of reconstructed sibships is significant in the Fesq population, and not in Ruisseau, but again, this probably reflects statistical power.

Although associations between molecular genotypes and fitness often focus on heterozygosity only, differential survival of individuals with different electrophoretic genotypes have already been observed. For example, in Bufo boreas, Samollow (1980) observed differential mortality of tadpoles associated with electrophoretic variation (other than heterozygosity). Given the similarity in reproductive biology between these two amphibian species, it is possible that such correlations relied on differential survival of a limited number of families as in the parsley frog.


Spurious correlations (positive or negative) emerge when three conditions are met: (1) there are few genotyped loci generating large contrasts in MLH among families by chance, even in the absence of inbreeding, (2) the sample is highly structured with very few families compared to the sample size, (3) heritability is high, resulting in highly differentiated phenotypic values among families. This possibility has never been considered before although we show that, in extreme cases, the number of false positives may be three times as high as that expected under normal conditions (independent individuals, see Table 2). We therefore suggest that future studies should check for family structure in their samples and, if needed, test their influence by simulations.


We did not expect strong HFC because of the apparent lack of inbreeding in our samples. Accordingly, we found no correlation between heterozygosity and survival (note, however, that survival here does not include early inbreeding effect during embryonic development). In addition, despite the larger dataset, no HFC on growth was found in five of the six ponds. Surprisingly, a significant negative HFC on growth was found in Fesq. There is no evidence that this effect is due to one locus in particular (local effects). First, could this negative correlation be a spurious signal due to family structure as shown previously? Following the results of our simulations, even using the most pessimistic assumptions on growth heritability (h2= 0.5 or 1 whereas estimates of broad-sense heritability obtained in the laboratory are H2= 0.41 [Jourdan-Pineau, unpubl. data]) and number of families (five or 10), we found that the observed P-value (2.5 × 10−6) can very improbably be obtained by chance. This is also true after allowing for an additional Bonferroni correction over the six populations studied. Therefore, the observed correlation has to be interpreted as truly significant, that is, more heterozygous individuals or families grew less in the Fesq tadpole cohort. This reverse HFC must be due to outbreeding depression (Lynch and Walsh 1998).

Two mechanisms can contribute to outbreeding depression. The first is intrinsic incompatibilities: offspring of genetically distant mates may suffer from the breakup of the coadapted genome of the parents. The second is local adaptation: offspring of matings between immigrant and resident individuals may be less adapted to their environments than either parent is to its own environment (Waldman and McKinnon 1993). Both explanations could apply to our case. Previous analyses of the population structure of P. punctatus have shown local differentiation (FST= 0.1 on average, Jourdan-Pineau et al. 2012) among tadpole populations, indicating that all sites are relatively isolated from one another. In such contexts, outbreeding depression can occur even between geographically close populations (Edmands 1999; Dudash and Fenster 2000; Fenster and Galloway 2000). For example, in the same system of ponds as studied here, outbreeding depression has been observed in a snail species, Physa acuta (Escobar et al. 2008). Its occurrence is also plausible in amphibians. Unfortunately, although local adaptation (which can be one source of outbreeding depression) has been well described in amphibians (e.g., Berven 1982; Laugen et al. 2003; Rasanen et al. 2003), outbreeding depression itself has rarely been assessed except between two populations of common frog, Rana temporaria (Sagvik et al. 2005). Apart from amphibians, it is worth mentioning that negative HFC have been recently observed in several species (blue tits, Olano-Marin et al. 2011a; Borrell et al. 2004), and outbreeding depression has been mentioned as the most likely potential explanation (Szulkin and David 2011). Interestingly, outbreeding depression can be detected for some traits (negative HFC), even when other traits are affected by inbreeding depression (positive HFC) (Olano-Marin et al. 2011a,b; Szulkin and David 2011).


Inbreeding depression is often identified as the main genetic concern for small fragmented populations (e.g., Frankham 1998). Although populations of parsley frog live in small, often isolated, breeding ponds, inbreeding does not appear to be strong, but larval cohorts are founded by a few breeding pairs only, resulting in genetic heterogeneity within demes due to family structure. This creates associations between alleles and fitness (AFC), independent of heterozygosity per se. We also observed negative HFC, revealing a stronger impact of outbreeding depression than of inbreeding depression in these fragmented populations. AFC and negative HFC are probably more frequent than they are usually thought to be, although they are often neglected (or explained using ad hoc locus-specific effects: Hansson et al. 2004; Lieutenant-Gosselin and Bernatchez 2006; Charpentier et al. 2008). Furthermore, we demonstrated that family structure (probably a common feature in small populations) can bias HFC tests. We believe that family structure and outbreeding depression may be responsible for at least a fraction of the large variation and unpredictability of HFC typically observed in small or fragmented populations of many species (Chapman et al. 2009). We suggest that future HFC studies should routinely control for family structure in their samples and consider alternative types of genotype–phenotype associations (AFC and negative HFC).

Associate Editor: J. Kelly


This research was supported by a grant from Agence Nationale de la Recherche (SCOBIM JCJC0002), and a “Chercheurs d’Avenir” grant to PD from the Région Languedoc-Roussillon. Molecular data used in this work were (partly) produced through molecular genetic analysis technical facilities of the IFR119 « Montpellier Environnement Biodiversité». A. Nicot and V. Dupuy helped us in molecular work.