Using parentage analysis to examine gene flow and spatial genetic structure


Nolan C. Kane, Fax: 604-822-6089; E-mail:


Numerous approaches have been developed to examine recent and historical gene flow between populations, but few studies have used empirical data sets to compare different approaches. Some methods are expected to perform better under particular scenarios, such as high or low gene flow, but this, too, has rarely been tested. In this issue of Molecular Ecology, Saenz-Agudelo et al. (2009) apply assignment tests and parentage analysis to microsatellite data from five geographically proximal (2–6 km) and one much more distant (1500 km) panda clownfish populations, showing that parentage analysis performed better in situations of high gene flow, while their assignment tests did better with low gene flow. This unusually complete data set is comprised of multiple exhaustively sampled populations, including nearly all adults and large numbers of juveniles, enabling the authors to ask questions that in many systems would be impossible to answer. Their results emphasize the importance of selecting the right analysis to use, based on the underlying model and how well its assumptions are met by the populations to be analysed.

Understanding migration and gene flow between populations is one of the most important aspects of population genetics, and is vital when investigating conservation genetics, hybridization, and disease genetics. Estimates of divergence between populations (e.g. using FST or similar statistics, or coalescent methods) are useful for understanding historical gene flow and relationships between populations, but newer methods attempt to quantify the level of recent migration (Wilson & Rannala 2003) or identify the parents of particular individuals (Jones & Ardren 2003) using more sophisticated algorithms. One way to determine recent migration and gene flow history is to perform assignment tests.

There are several types of assignment methods available to researchers. Assignment tests based on likelihood algorithms are the most popular as they can estimate population structure without a priori knowledge of population limits (Pritchard et al. 2000; Corander et al. 2003; Falush et al. 2003; Manel et al. 2005). Bayesian assignments tests have received the greatest attention over the past few years, with clustering-based methods by far the most widely used. Programs like structure (Pritchard et al. 2000), baps (Corander et al. 2003), and structurama (Huelsenbeck & Andolfatto 2007) can all assign individuals to a cluster based on a probabilistic model, each with slightly different underlying assumptions and different methods of searching parameter space. The assignment method developed by Rannala & Mountain (1997), implemented in either GeneClass2 (Piry et al. 2004) or BayesAss+ (Wilson & Rannala 2003), can be used to estimate the frequency of migrants within known populations. A major advantage of this method over other clustering assignment tests is that they provide a posterior probability of each individual's migration ancestry. These various methods can complement each other, with clustering assignment tests used to determine the appropriate population substructuring followed by analyses using assignment methods to estimate recent migration between those demes.

Parentage analyses can provide a finer scale assignment test than those that assign individuals to clusters based on genotypes. More information is needed however, and this can vary depending on the level of sampling possible and/or the a priori knowledge of relationships (Jones & Ardren 2003). Exclusion-based parentage analyses are the simplest form of parentage assignment, but require the most a priori information (Jones & Ardren 2003). Categorical allocation of parentage can be used when exclusion-based analyses fail to exclude all but one potential parent (Meagher & Thompson 1986). Fractional allocation is similar to categorical allocation, but it is allowed to assign only a fraction of the offspring to a potential parent (Devlin et al. 1988). Both the categorical and fractional methods use a likelihood-based approach where a log of the odds (LOD) score is used to statistically assign parentage. Both methods perform best with exhaustive or near-exhaustive sampling of all individuals. However, not only is exhaustive sampling difficult in highly mobile species, but completing this for multiple populations is prohibitively expensive and time-consuming in most systems. Even in botanical studies where the immobility of individuals makes complete sampling of reproducing individuals easier, there is often the issue of dormant individuals within the seed bank.

In this issue of Molecular Ecology, Saenz-Agudelo et al. (2009) present their work on six populations of the panda clownfish Amphiprion polymnus, a charismatic fish living in association with sea anemones in and around coral reefs in Southeast Asia (Fig. 1). They genotyped 281 adult and subadult clownfish and 171 juveniles comprising the vast majority (85–95%) of individuals found on an exhaustive search of suitable habitat in Bootless Bay, Papua New Guinea. Due to the patchiness of this coral reef environment, these individuals were divided into five subpopulations separated by 2–6 km of inhospitable territory. However, analysis of 11 microsatellite loci revealed little divergence between subpopulations, with FST between subpopulations ranging from 0 to 0.026, and only one of the five populations showing significant divergence from the others.

Figure 1.

Panda clownfish (Amphiprion polymnus) in their natural habitat in Papua New Guinea. Photo credit: Serge Planes.

A more distant comparison was made possible by the addition of an earlier data set of 85 adults and 73 juveniles collected in Schumann Island, 1500 km away (Jones et al. 2005). This population was substantially more diverged from the Bootless Bay subpopulations, with FST ranging from 0.092 to 0.111, depending on the subpopulation used. Using the genotypic data, the authors applied two additional methods to look at connectivity between populations.

Interestingly, the authors found results that implied very different levels of accuracy for the two methods they used to examine gene flow and migration. Their ‘parentage analysis’, using the software famoz (Gerber et al. 2003) was a maximum-likelihood approach that assigns parents to each juvenile if the LOD score for the hypothesis of that parent–juvenile pairing is highest among all possible pairings and also passes a ‘threshold decision value’ of significance. This approach appeared to perform well at high levels of gene flow, assigning individuals to the same parents in most cases regardless of whether the more diverged subpopulation within Bootless Bay was included in the analysis or not. However, when the Schumann Island populations were included the performance was apparently much weaker, as some individuals from Bootless Bay were assigned to parents from Schumann Island, and vice versa. While some of these individuals are probably long-distance migrants from populations outside of Bootless Bay, the specific assignments are almost certainly incorrect and are likely due to the violation of the model's assumption of panmixia.

Their other approach used the software GeneClass2 (Piry et al. 2004) to perform the Bayesian assignment method of Rannala & Mountain (1997), assigning juveniles to each population or subpopulation using the adults as a reference. This analysis correctly showed that no juveniles from Bootless Bay were immigrants from Schumann Island, and only one Schumann Island juvenile was weakly assigned to Bootless Bay. However, the more fine-scale assignments within Bootless Bay were not possible with this approach, as 82% of Bootless Bay individuals were assigned to more than one subpopulation.

Saenz-Agudelo et al. (2009) used two highly complementary analytical methods to test for spatial structuring in the clown fish populations. The parentage analysis was appropriate for the fine-scale high gene flow analysis, and the assignment test performed well in the broad-scale low gene flow analyses. The authors point out that it is difficult to know a priori which methodology will be the most appropriate. Given the rigorous levels of sampling needed for parentage analysis, we can understand why many researchers are hesitant to use parentage as a method of determining gene flow, but Saenz-Agudelo et al. (2009) demonstrate the value of this type of data set. With so many different new methods available to estimate gene flow, more of these empirical comparisons are necessary to evaluate the accuracy and appropriateness of the methods for particular situations.

Nolan Kane uses population genetic, comparative genomic and bioinformatic approaches to study the genetic basis of adaptation and speciation. Matthew King uses population genetics and bioinformatics to understand molecular evolution and diversification of sunflowers and sedges.

doi: 10.1111/j.1365-294X.2009.04110.x