Genetic data in population viability analysis: case studies with ambystomatid salamanders


Katherine R. Greenwald, Department of Evolution, Ecology, and Organismal Biology, Ohio State University, 300 Aronoff Laboratory, 318 W 12th Ave., Columbus, OH 43210, USA. Tel: +1 614 292 2891; Fax: +1 614 292 2030


Parameterization of population viability models is a complicated task for most types of animals, as knowledge of population demography, abundance and connectivity can be incomplete or unattainable. Here I illustrate several ways in which genetic data can be used to inform population viability analysis, via the parameterization of both initial abundance and dispersal matrices. As case studies, I use three ambysomatid salamander datasets to address the following question: how do population viability predictions change when dispersal estimates are based on genetic assignment test data versus a general dispersal–distance function? Model results showed that no local population was large enough to ensure long-term persistence in the absence of immigration, suggesting a metapopulation structure. Models parameterized with a dispersal–distance function resulted in much more optimistic predictions than those incorporating genetic data in the dispersal estimates. Under the dispersal–distance function scenario all local populations persisted; however, using genetic assignments to infer dispersal revealed local populations at risk of extinction. Viability estimates based on dispersal–distance functions should be interpreted with caution, especially in heterogeneous landscapes. In these situations I promote the idea of model parameterization using genetic assignment tests for a more accurate portrayal of real-world dispersal patterns.


Whether isolated populations of organisms are able to persist in fragmented habitats is a question of central importance in conservation biology. Site occupancy and population persistence are affected by both demography (e.g. reproductive rate, immigration/emigration rate), and landscape (e.g. habitat patch size and isolation, and the quality of non-habitat between patches; Fahrig & Merriam, 1994; Fahrig, 2001; Ricketts, 2001; Prugh et al., 2008). In many cases, local persistence requires regional connectivity, and single fragments (patches) may not be sufficient to ensure long-term viability (Marsh, 2008). An improved understanding of the taxon-specific nature of these relationships is critical for the conservation of target organisms. Population viability analysis is an important predictive tool for assessing persistence probability and informing conservation decisions. Perhaps the most common approach to examining population connectivity and viability on a landscape scale is metapopulation modeling (Hanski & Gilpin, 1997; Hanski, 1999), although various other methods have been developed (e.g. Ray, Lehmann & Joly, 2002; Rustigian, Santelmann & Schumaker, 2003; Compton et al., 2007). Metapopulation models use demographic and dispersal parameters to predict metapopulation persistence and size, average local population size and latency to recolonization following local extirpation.

For population viability predictions to be meaningful, they must be based on accurately parameterized models. In many cases, models require input parameters such as initial abundance, dispersal rates and demographic measures (e.g. survival and fecundity), values which may be very difficult to obtain (Halley et al., 1996). Applicability of the metapopulation framework is particularly dependent on the hypothesis of limited dispersal among local populations (Smith & Green, 2005); unfortunately, dispersal is often an especially difficult parameter to estimate (Marsh, 2008). Previous work has taken numerous approaches to this problem, including basing dispersal estimates on mark–recapture data (Schtickzelle & Baguette, 2004) or expert opinion (Gilioli et al., 2008) or modeling dispersal as a constant proportion of individuals per time step (Hels & Nachman, 2002) or a distance function (e.g. Akçakaya & Atwood, 1997; Akçakaya et al., 2004). Use of a dispersal–distance function assumes that animals are able to disperse equally in any direction; however, real-world landscapes comprise a mosaic of land-cover types that may vary in permeability. Incorporating known movement patterns may lead to vastly altered predictions, a possibility that I explore here.

Genetic assignments of individuals to local populations of origin provide an ideal method for parameterizing metapopulation models, as these data can reveal current dispersal rates when immigrants or their offspring are sampled (Berry, Tocher & Sarre, 2004; Paetkau et al., 2004). This method assigns each individual to the most probable population of origin based on background allele frequencies at all sampled sites. Individuals genetically assigned to a site other than their capture location (i.e. ‘misassignments’) may be immigrants. Despite being widely applied in conservation research, I am not aware of previous studies using assignment tests to parameterize metapopulation models. Here I use three ambystomatid salamander datasets as case studies to address the question: How do viability predictions change when dispersal is parameterized using empirical genetic estimates as opposed to a general dispersal–distance function?

The systems included here are ideal case studies for two important reasons. First, research on population viability is especially important for amphibians, which have undergone global population declines due in large part to habitat loss and degradation (Wake, 1991; Alford & Richards, 1999; Houlahan et al., 2000; Stuart et al., 2004). Metapopulation models have been used frequently in amphibian studies (e.g. Gill, 1978; Sjögren-Gulve, 1994; Hecnar & M'Closkey, 1996; Driscoll, 1997; Hels & Nachman, 2002) and have been recommended as an important management tool (Semlitsch, 2000; Marsh, 2008). Management actions for amphibians (e.g. the creation of new breeding ponds) are often based on the assumption of a metapopulation dynamic (Marsh, 2008). The popularity of this framework in amphibian research is due to the ongoing creation of discrete patches by habitat fragmentation, as well as the methodological convenience of identifying breeding ponds as habitat patches (Marsh & Trenham, 2001).

Second, several of the conditions for metapopulation structure are already known to be fulfiled in these study systems. These conditions are: (1) habitat patches support local breeding populations; (2) no single population can ensure long-term survival; (3) patches are not too isolated to prevent recolonization; (4) synchronous extinction of all sites is unlikely (Hanski, 1999; Hanski & Gaggiotti, 2004; Smith & Green, 2005). Samples included here were collected from breeding individuals and/or offspring, so is clear that habitat patches support local breeding populations [condition (1)]. Previous research has demonstrated gene flow among patches for all three of the systems, thus patches are sufficiently connected to allow recolonization following local extinction [condition (3); Zamudio & Wieczorek, 2007; Greenwald, Gibbs & Waite, 2009a,; Greenwald, Purrenhage & Savage, 2009b; Purrenhage, Niewiarowski & Moore, 2009]. Synchronous extinction of all local populations should be unlikely, as the spatial scale of the sampling area is larger than the scale on which we would expect autocorrelation of stochastic extinction events [condition (4)]. Genetic spatial autocorrelation occurs at scales <4.8 km in one of these systems (Zamudio & Wieczorek, 2007), a scale much smaller than the sampling extent (Fig. 1). In this study, I address whether any individual population can ensure long-term persistence [condition (2)]. I also focus on the condition of limited dispersal, and ask how viability predictions change when models are parameterized with distance functions versus genetic assignment test data. Persistence estimates based on empirical dispersal data will allow for more informed management based on an accurate representation of movement in fragmented habitats.

Figure 1.

 Map showing locations for Ambystomatid salamander datasets Aop-OH [Ambystoma opacum, south-east Ohio; inset (a)], Amac-OH [Ambystoma maculatum, north-east Ohio; inset (b)] and Amac-NY [A. maculatum, upstate New York; inset (c)]. Insets show sampling locations (circles) along with forest (light gray) and non-forest (dark gray) land cover. Inset scale bars each represent 5 km.

Materials and methods

I analyzed landscapes and microsatellite genetic data for two ambystomatid species in three regions (Table 1, Fig. 1; names reflect species and locations). Datasets consisted of 367–665 individuals from 11 to 29 sites, averaging 24 individuals site−1 (Table 1). Dataset Aop-OH was composed of marbled salamander Ambystoma opacum samples collected from a cluster of 21 ponds in south-eastern Ohio in spring 2005 (Greenwald et al., 2009a). The other two datasets consisted of spotted salamander Ambystoma maculatum samples from north-east Ohio (Amac-OH; Purrenhage et al., 2009) and upstate New York (Amac-NY; Zamudio & Wieczorek, 2007). For dataset Amac-OH, I used only a central subset of 11 sites sampled in 2003, as other sites were well beyond the dispersal capabilities observed for these and more vagile ambystomatid salamanders (Trenham et al., 2000; Semlitsch & Bodie, 2003; excluded sites were 15–50 km from the central cluster). Details on sampling methodology and genetic analysis can be found in the associated publications (Table 1; Greenwald et al., 2009b).

Table 1.   Three datasets included in this study of the effect of dispersal on population persistence in Ambystoma salamanders
  1. The number of individuals sampled is shown for both larvae (L) and adults (A). Spatial extent indicates the minimum and maximum pairwise distance between sampled sites for each dataset. Population size indicates the means and ranges (in parentheses) estimated by msvar. These values are likely overestimated due to gene flow among sites.

SpeciesAmbystoma opacumAmbystoma maculatumA. maculatum
LocationOhioNew YorkOhio
Samples478 L404 L/261 A367 A
Spatial extent (km)1.0–68.21.6–46.90.1–14.1
Population size4251 (246–18 493)704 (150–1355)20 165 (1698–61 094)
ReferencesGreenwald et al. (2009a)Zamudio & Wieczorek (2007)Purrenhage et al. (2009)

Genetic analysis

Previous publications report the basic population genetic structure for these datasets (Zamudio & Wieczorek, 2007; Greenwald et al., 2009a,b; Purrenhage et al., 2009). I conducted genetic assignment tests with GeneClass2 (Piry et al., 2004), using standard assignment of individuals for datasets consisting of adults (Amac-OH; 12 sites from Amac-NY) and detection of first generation migrants for datasets composed of larval samples (Aop-OH; 19 sites from Amac-NY; Greenwald et al., 2009a,b). I used a Bayesian computation method (Rannala & Mountain, 1997) and Monte-Carlo resampling (Paetkau et al., 2004) with 1000 replicates and an α value of 0.05. Use of the resampling procedure helps to avoid inflated Type I error (falsely identifying residents as immigrants; Paetkau et al., 2004), a serious problem when further population viability analysis relies on the identification of immigrants. I generated a pairwise matrix of all individuals assigned as migrants for each dataset to parameterize dispersal matrices in metapopulation modeling. I used two approaches for migrant assignment, one stringent (migrants were only counted if the probability of assignment to a source population was ≥80% and P<0.05 from the simulation) and one lenient (migrants were assigned to the source population with maximum probability of assignment regardless of P-value).

I estimated current effective population sizes using msvar (Storz & Beaumont, 2002), a program that uses Markov Chain Monte Carlo simulation of coalescence to estimate demographic parameters (e.g. Aspi et al., 2006; Goossens et al., 2006). I assumed a generation time of 3 years which is typical for these species (Scott, 1994; Pechmann, 1995; Petranka, 1998). I assumed an initial mutation rate for all loci of 1 × 10−4 mutations generation−1 and an exponential model of population decline as it is more accurate for recent, sharp declines, which might be expected in populations recently affected by anthropogenic habitat change (Beaumont, 1999). When more than one run was necessary to achieve convergence, each subsequent run was reseeded with a large (four-digit) random number to assure independence between runs. I used a thinning interval of 10 000 steps, resulting in a total of at least 100 000 states (sampled steps in the run) for each population. I used tracer (Rambaut & Drummond, 2007) to examine output trace files for convergence, and to calculate the effective sample size (ESS) for each variable. I ran the program until every estimated parameter converged (ESS>200). For dataset Amac-NY I used a random subset of the data (eight loci), as parameter estimates failed to converge with all 11 loci.

Metapopulation parameterization and analysis

I used both geographic configuration and population dynamics in ramas/metapop to address whether the groups of local populations constitute a metapopulation (Akçakaya, 1998; Akçakaya et al., 2004). Specifically, I used the program to address Hanski's second condition, that no single population is large enough to ensure long-term survival. I report the metapopulation occupancy (number of local populations occupied over time) as well as the terminal extinction risk (or quasi-extinction probability), the probability that the metapopulation will fall below a given size by the end of the simulation (Stevens & Baguette, 2008). For each dataset I ran 10 000 replicate simulations over a 100-year period.

I used a discrete, age-structured model (Leslie matrix) for demographic parameters. The four ages were larvae (year 0), 1-year-old juveniles, 2-year-old juveniles and a composite age class for adults (years 3+). The same input matrix was used for both marbled and spotted salamanders, as demographic parameters from the literature were very similar. Adults were the only reproductive stage, with an average fecundity of 4.0 hatched eggs/female. This number was calculated to incorporate both average clutch size (∼100; Mohr, 1930; Noble & Brady, 1933; King, 1935; Savage & Zamudio, 2005) and egg survival (∼0.04; Petranka, 1998; Gibbs & Shriver, 2005). I used a wide standard deviation (±3) to account for potentially high variability in reproductive success. Larval survival was density dependent, and set to 0.14±0.1 (Taylor, Scott & Gibbons, 2006). Survival for both juvenile stages was set to 0.6±0.2, while adult survival was 0.8±0.2 (Taylor & Scott, 1997; Petranka, 1998; Gibbs & Shriver, 2005; Taylor et al., 2006). Both juveniles and adults were allowed to disperse (Savage & Zamudio, 2005; Gamble, McGarigal & Compton, 2007). Stochasticity was incorporated into the dispersal estimate by allowing it to vary by a common coefficient of variation (CV=0.001). These conditions produced a relatively stable finite rate of increase (λ=1.001), allowing me to evaluate the outcome of the various dispersal scenarios described below.

Initial abundances and dispersal matrices were parameterized using genetic data (Fig. 2). For the former, I used effective population sizes as estimated by msvar as the initial abundance of adults for each local population; initial abundances for the other stages were estimated by ramas/metapop from the stable age distribution matrix. A genetic estimate of population size was used due to a lack of ecological data for these populations; additionally, initial population size has been shown to have no effect on the trajectory and quasi-extinction probabilities of simulated populations (Schtickzelle & Baguette, 2004). To determine the sensitivity of the model to variation in dispersal, I compared results parameterizing the model four different ways: no dispersal, a dispersal–distance function (‘distance function’ model) and with genetic assignment tests of both low stringency resulting in many migrants (‘high dispersal’ model) and high stringency resulting in few migrants (‘low dispersal’ model). Under the ‘no dispersal’ model, all pairwise dispersal rates and stochasticity were set to zero. For the distance function model, the dispersal matrix was calculated from a given dispersal-by-distance relationship using pairwise distances between sites. The function was a normal curve with mean 0 and standard deviation 440.1 m as described for dispersing juvenile A. opacum by Gamble et al. (2007).

Figure 2.

 Schematic representation of the methodology used in this study. White arrows represent indirect estimation of population parameters; gray and black arrows represent metapopulation model input and output, respectively. Empirical data (top) can be used either directly (demographic data) or indirectly (molecular data) to parameterize metapopulation models. See the ‘Materials and methods’ for a detailed explanation of how empirical data were converted to model parameters.

The ‘high dispersal’ model assumed symmetrical dispersal, and used low-stringency genetic assignment test results, that is immigrants were assigned to the source population of maximum likelihood regardless of probability level. The ‘low dispersal’ model assumed asymmetrical dispersal and used high-stringency assignment test results; individuals were only included if they were assigned as immigrants with P<0.05 in the geneclass simulation and with ≥80% probability of assignment to a given source population. To illustrate the difference between symmetrical and asymmetrical dispersal values, suppose sites A and B each had 25 sampled individuals. Two individuals were assigned as moving from A to B, while one individual moved from B to A. Under symmetrical dispersal (for the ‘high dispersal’ model), I summed all migrants (2+1=3) and divided by the total number of individuals sampled at both sites (25+25=50) for a dispersal rate of 0.06. This value was then used as the dispersal rate in both directions (A to B and B to A). However, under asymmetrical dispersal (for the ‘low dispersal’ model), the rate from A to B would be 2/25=0.04, while dispersal from B to A would be 1/25=0.02.


Dispersal matrices using low-stringency versus high-stringency assignment test results varied considerably within dataset. With low-stringency assignment tests, the number of individuals assigned as immigrants (and included in dispersal matrices) was 198, 117 and 377 for datasets Aop-OH, Amac-OH and Amac-NY, respectively. However, with high-stringency assignment tests these numbers were vastly reduced, to 32, 9 and 58, respectively. This variation considerably affected predictions for population persistence and terminal extinction risk. Current population size as estimated by msvar were also variable, ranging from 246 to 18 493 (mean=4251) for dataset Aop-OH, 1698 to 61 094 (mean=20 165) for dataset Amac-OH and 150 to 1355 (mean=704) for dataset Amac-NY (Table 1); however, this affected predictions much less than variation in dispersal regimes.

The three datasets had similar population trajectories under the extreme dispersal scenarios (no dispersal and the dispersal–distance function based on purely demographic inferences of dispersal; Fig. 3). Under no dispersal, all local populations of all three datasets eventually went extinct. I conducted longer a posteriori simulations (300 years) to estimate time to metapopulation extinction for each dataset under this scenario. With no dispersal, Aop-OH fell below one remaining population after 234 years, Amac-OH after 218 years and Amac-NY after 226 years. Within the 100-year simulation, Aop-OH and Amac-OH had over 90% probability of decline, while Amac-NY had a 78% probability of decline (Table 2). On the other hand, the dispersal–distance function scenario resulted in persistence of all local populations and low probabilities of decline (Fig. 3; Table 2). The local populations do not constitute a metapopulation under these scenarios, as the probability of recolonization of local populations is either 0% (no dispersal) or 100% (dispersal–distance function).

Figure 3.

 Local population occupancy (left) and terminal extinction risk (right) for datasets Aop-OH (top), Amac-OH (middle) and Amac-NY (bottom). Local population occupancy shows the number of local populations occupied across time over the 100-year simulation. Terminal extinction risk shows the probability that the metapopulation will be below a given population size at the end of the simulation. The vertical dashed lines indicate the approximate starting population sizes for each metapopulation.

Table 2.   Probability of decline (population size at end of simulation < population size at beginning of simulation) for each of three datasets under four dispersal scenarios
  1. Under ‘no dispersal,’ all pairwise dispersal values were set to 0. Low and high dispersal values were based on genetic assignment tests of different stringencies (see ‘Materials and methods’). Under the distance function, dispersal was a function of distance among sites based on empirical data (Gamble et al., 2007).

No dispersal0.90 ± 0.010.91 ± 0.010.78 ± 0.01
Low dispersal0.83 ± 0.010.90 ± 0.010.60 ± 0.01
High dispersal0.47 ± 0.010.57 ± 0.010.13 ± 0.01
Distance function0.43 ± 0.010.55 ± 0.010.11 ± 0.01

Patterns of population persistence were more variable when genetic data were used to parameterize dispersal matrices (Fig. 3; Table 2). Under high dispersal (using assignments with no probability cutoff), Amac-OH and Amac-NY results were identical to the dispersal–distance function. That is, no local populations went extinct, and the probability of decline was low. However, Aop-OH had one local population that received no immigrants even under the high dispersal scenario. This population was lost and accounts for the difference between the distance function and high dispersal scenarios. The probability of decline under high dispersal ranged from 13 (Amac-NY) to 57% (Amac-OH; Table 2). Under low dispersal (using assignments with P<0.05 and 80% assignment likelihood), Amac-OH followed a nearly identical extinction trajectory as under no dispersal, while Aop-OH and Amac-NY lost some local populations (4 and 6, respectively) but then stabilized (Fig. 3). The probability of decline under low dispersal ranged from 60 (Amac-NY) to 90% (Amac-OH; Table 2). Longer simulation under low dispersal showed that Amac-OH fell below one remaining population after 224 years, while Aop-OH and Amac-NY never fell below one local population even in 300-year simulations.


This study demonstrates the utility of genetic data in population viability analysis, a promising method that has been suggested but not extensively used (Storfer et al., 2007). I show that incorporating empirical dispersal data from genetic assignment tests can greatly alter population viability predictions from metapopulation models. The sensitivity of the persistence estimates to the connectivity matrix illustrates the point that PVA results may be misleading if dispersal is not parameterized accurately. I suggest that this point might be especially relevant for studies conducted in fragmented or otherwise heterogeneous habitats, as the assumption that an organism can move with equal probability in any direction (i.e. parameterization with a dispersal–distance function) is clearly violated in those conditions.

The case studies included here illustrate that population viability predictions change dramatically when dispersal estimates are based on genetic data as opposed to a general dispersal–distance function. Models using distance functions predicted persistence in all cases, while those with genetic data were less optimistic. For one dataset (Amac-OH) so few individuals were assigned as immigrants with high likelihood that the low dispersal model was functionally equivalent to the no dispersal model (although this may be misleading). The remaining two datasets lost four and six local populations under the low dispersal model. No individuals were assigned as immigrants at these sites, and thus recolonization was impossible following stochastic local extinction. In the absence of immigration, all local populations went extinct even under relatively optimistic population size and vital rate estimates, suggesting that these local populations likely do function as a metapopulation. Altogether, these results suggest that conservation of single local populations may be insufficient to protect ambystomatid salamanders and that population connectivity is also critical for persistence, a result complementing previous research in amphibian conservation (Hels & Nachman, 2002; Cushman, 2006; Marsh, 2008).

Connectivity matrices parameterized with genetic data could be subject to two sources of error. First, the reliance on assignment tests to estimate dispersal presents the possible issue of ‘false positives,’ that is individuals that are assigned as immigrants but are in fact residents. Use of the stringent (P=0.05, likelihood >80%) cutoff for assignment was intended to avoid this issue; however, inclusion of such individuals may still occur and may cause dispersal values to be overly optimistic. Second, dispersal could in fact be underestimated if the sampled region contained high numbers of unsampled (ghost) populations. This would result in immigrants from unsampled source populations never being assigned with high likelihood; thus it would appear that there were few immigrants when in fact there could be many. This seems likely to be the case for dataset Amac-OH, as a previous analysis showed relatively high levels of gene flow in this system (Purrenhage et al., 2009). For this dataset and other similar cases, use of the ‘high dispersal’ parameterization scenario is likely the best approach, as metapopulation model results may be overly pessimistic using more stringent assignment tests. Additionally, it should be noted that the high and low dispersal scenarios presented here are in fact points on a continuum. While useful for illustrating the methodology, alternative assignment methods or likelihood cutoffs might be more appropriate for other systems.

The use of genetic data to estimate population size also has potential to introduce error into persistence estimates, although I suggest that this is a less important issue. Sensitivity analyses have shown that initial population size has very little effect on persistence estimates (Schtickzelle & Baguette, 2004), so any error introduced with these estimates should not greatly skew PVA results. Indeed, a posteriori analysis of the datasets included here showed that initial population size had no qualitative effect on the simulation trajectories. Additionally, any bias introduced this way would be positive, that is populations would be represented as larger than they actually are, and persistence would then be more likely. This is because effective population size estimates are affected by migration, reflecting an estimate of the ‘genetic neighborhood’ size rather than a single sampled site. However, local extinctions still occurred in simulations despite this potentially optimistic bias.

In conclusion, these results suggest that genetic data can be an extremely valuable tool for population viability analysis. Genetic estimates of dispersal may be especially useful in heterogeneous landscapes, as dispersal–distance functions do not account for variation in landscape permeability and may therefore overestimate population persistence and viability. Genetic data can be used to identify source populations as well as those at risk of local extinction, which may aid in allocating limited conservation resources. For the datasets examined here, management on a local scale may temporarily protect functional subpopulations of ambystomatid salamanders; however, this approach does not appear sufficient to ensure long-term persistence. Steps must be taken to maintain or enhance regional connectivity, as no local population was large enough to persist in isolation. Improvement of connectivity in these and similar landscapes could help shift metapopulation dynamics further toward the ‘high dispersal’ end of the continuum, thus improving persistence and viability estimates.


I greatly thank J.L. Purrenhage and K.R. Zamudio for the use of their data and H.L. Gibbs, the Gibbs Lab group, and two anonymous reviewers for improving earlier versions of the paper. Thanks to L.S. Kubatko and M.A. Beaumont for assistance with analyses and interpretation of results. Part of this work (msvar analyses) was carried out by using the resources of the Computational Biology Service Unit from Cornell University which is partially funded by Microsoft Corporation.