Marked point pattern analysis on genetic paternity data for uncertainty assessment of pollen dispersal kernels


Correspondence author. Biodiversität und Klima Forschungszentrum, D-60325 Frankfurt am Main, Germany. E-mail:


1.  Obtaining accurate estimates of the pollen dispersal kernel is central to a wide range of ecological studies. Assessing their statistical uncertainty is as important, but rarely considered.

2.  We developed a new method of marked point processes for nonparametric estimation of dispersal kernels based on data of genetic paternity analysis that does not require assumptions about the shape of the dispersal distribution. This allows for construction of Monte Carlo simulation envelopes of a given null model, such as random mating, and for uncertainty assessment of the observed dispersal kernel.

3.  We applied our method to characterize spatial patterns of pollen flow in an isolated population of Populus nigra in Central Germany and to assess the associated statistical uncertainty in estimates of within-population dispersal kernels. We compared our nonparametric within-population kernel estimate with that of established methods of parametric kernel fitting including: (i) a general mating model, (ii) a simplified mating model using categorical paternity data and (iii) least-squares regression of the nonparametric kernel estimate.

4.  Our analysis showed a significant departure from the random mating null model. We found a highly significant excess of mating events at short distances (<400 m) and a weakly significant shortage of mating events at larger distances (1500–2000 m). Simulation envelopes of the null model were very wide at larger distances (>2000 m), indicating large uncertainty on the detailed shape of the kernel’s tail.

5.  Results of the point pattern analysis were consistent with kernel fits using published maximum-likelihood mating models. Model selection revealed that two-component pollen dispersal kernels were the most parsimonious functions.

6. Synthesis. Our approach of nonparametric kernel estimation could be widely applied for dispersal data from genetic paternity analysis and complements traditional kernel estimation by providing a nonparametric kernel estimate and effective methods for an uncertainty assessment in kernel estimation. Our results indicate that statistical model fitting may substantially underestimate the uncertainty in kernel estimation, especially at larger distances.


Gene flow among and within plant populations concerns evolutionary ecologists, conservationists and ecosystem managers. Spatial patterns of pollen and seed dispersal are important components of gene flow in plants which facilitate connectivity between individuals and populations and create a template on which post-dispersal processes such as selection, competition, predation and exogenous disturbances operate (Linhart & Grant 1996; Nathan & Muller-Landau 2000; Kalisz et al. 2001; Vekemans & Hardy 2004). Under current scenarios of rapid human-mediated landscape change, there is increasing interest to better understand and quantify the effects of restricted pollen dispersal and spatial genetic structuring (e.g. Koenig & Ashley 2003). This requires a precise characterization of the seed or pollen shadow, i.e. the distribution of seeds or pollen with distance from the source plant, and its uncertainty.

One important theoretical tool for characterizing seed or pollen shadows is the kernel function, defined as the probability density function of dispersal distances from individual plants (Levin & Kerster 1974). The dispersal kernel has important analytical and modelling applications, such as comparing dispersal and reproductive success parameters across populations with contrasting individual distributions (Burczyk, Adams & Shimizu 1996; Oddou-Muratorio, Klein & Austerlitz 2005; Slavov et al. 2009), quantifying metapopulation connectivity (Klein, Lavigne & Gouyon 2006) and forecasting introgression risk from crops or exotic plantations into natural ecosystems in different demographic settings (e.g. Kuparinen & Schurr 2007).

Obtaining accurate estimates of dispersal kernels remains a challenging problem (Jordano 2007; Robledo-Arnuncio & García 2007; Jones & Muller-Landau 2008). Different approaches have been used to address this problem. First, forward predictions which directly model the behaviour of the dispersal agent (e.g. wind, birds, mammals; Schurr et al. 2005; Russo, Portnoy & Augspurger 2006; Will & Tackenberg 2008) contribute to a functional understanding of dispersal and allow construction of dispersal kernels. Second, classical non-genetic estimates of seed dispersal kernels rely on inverse modelling (Ribbens, Silander & Pacala 1994; Clark et al. 1999; Nathan & Muller-Landau 2000) and reconstruct dispersal kernels and fecundity from the spatial patterns of mother trees, size of the mother trees and the spatial distribution of seeds. This approach does not require identification of the mother plant of each seed and assumes that all potential sources contribute to dispersal into a given point, proportionally to their distance and fecundity. Recent methods allow accounting for the effects of explicit spatial structures of the landscape on the movement of seeds (e.g. Schurr, Steinitz & Nathan 2008). Third, genetic paternity analysis allows for identification of individual mothers (and/or fathers) and improves the basis for estimating dispersal kernels (Oddou-Muratorio, Klein & Austerlitz 2005; Burczyk et al. 2006; Goto et al. 2006; Robledo-Arnuncio & García 2007; Jones & Muller-Landau 2008). In the case of pollen dispersal, fitting dispersal kernels requires individual genotypes from a sample of seeds of known maternal origin and from all potential pollen donors within the study area.

The simplest approach based on paternity analysis is to use the observed mating distance distribution, obtained from categorical paternity assignment, as estimate for the dispersal kernel. However, this approach (Shimatani et al. 2007; Fortuna et al. 2008) is not a good estimate of the true dispersal kernel in general (Robledo-Arnuncio & García 2007; Jones & Muller-Landau 2008) because the observed distribution of dispersal distances is an outcome of both the normally unobservable kernel function and the relative spatial distribution of pollen sources (fathers) and recipients (mothers) (Robledo-Arnuncio & Austerlitz 2006). Other potential problems of pollen kernel fitting include the cases where the genotype of a pollen gamete is incompatible with that of all pollen donors within the study area (i.e. potential immigrants, genotyping errors or mutation), ambiguous paternity assignments for some seeds, and, once genotyping errors have been accounted for, how to deal with immigrants from outside the study area (Jones & Muller-Landau 2008). In most cases, the spatial origin of immigrants is unknown, and thus fitted functions are within-population kernels in the sense that they do not incorporate immigrants.

The more general methods based on paternity analysis yield kernel parameter estimates that maximize the likelihood of the observed sample of seed paternal haplotypes, given the Mendelian transition probabilities and given the spatial distribution of pollen donors relative to maternal plants (Adams, Griffin & Moran 1992; Burczyk, Adams & Shimizu 1996; Oddou-Muratorio, Klein & Austerlitz 2005). These methods, usually referred to as neighbourhood models, mating models or full-probability models, do not require a categorical assignment of paternity, and are thus especially useful for low-resolution genetic assays. During the last years, the development of highly polymorphic markers and automated fragment length detection methods substantially improved paternity assignments. Using several independent loci with large number of alleles increases the amount of seed for which either a single pollen donor can be assigned or all potential donors within the study area can be excluded as fathers. Such data sets, thus, comprise the locations of all potential pollen donors within a study area, the locations of selected mother plants and the paternal origin of seed samples harvested from these mothers. However, when there is a large number of candidate fathers, it is still not possible to assign one or exclude all fathers for a given seed, so that the paternity of an important proportion of the sample remains ambiguous (Ashley 2010; Christie 2010).

Besides providing an individually explicit characterization of mating patterns, categorical paternity assignments may also offer potentially useful insights into pollen dispersal kernel estimation. In particular, we propose here the estimation of nonparametric within-population dispersal kernels (which we will denote ‘empirical’ kernels) within the framework of point pattern analysis (Stoyan & Wagner 2001; Illian et al. 2008) to complement traditional kernel estimation. An advantage of this approach over curve fitting is that it does not require assumptions about the shape of the dispersal kernel and therefore does not rely on proper model selection. It is well known that kernel functions with very different means and tails may fit, about equally well, the data over the spatial scale of analysis (Smouse, Robledo-Arnuncio & González-Martínez 2007). Moreover, it allows confronting the empirical estimate of the dispersal kernel to (empirical) kernels expected by null models such as the random mating model. Comparison of the empirical kernel estimate to Monte Carlo simulation envelopes of the null model (a standard approach in point pattern analysis) allows for uncertainty assessment of the derived kernel. The width of the simulation envelopes provides information about the shapes of kernels that are, for example, compatible with the random mating null model. In addition, a formal goodness-of-fit (GoF) test can be used to explore if the empirical pollen dispersal kernel differs at selected distance intervals significantly from that expected under the null model.

Thus, point pattern analysis can contribute to dispersal ecology in providing (i) techniques to derive nonparametric kernel estimates free of assumptions about the shape of the dispersal distribution and (ii) assessment of uncertainty in kernel estimates, which has received surprisingly little attention in the literature. The main goal of this article is to provide methods for these purposes. Our objectives were to characterize spatial patterns of pollen flow in a population of Populus nigra in Germany, where the species is of conservation concern, and to present new nonparametric point pattern methods to estimate pollen dispersal kernels from categorical paternity assignment and to evaluate the associated uncertainty. We then compared the results of point pattern analysis with those of established methods of parametric kernel fitting. More specifically, we fitted within-population dispersal kernels based on: (i) a general mating (neighbourhood) model, which considers spatial effects and uses all seed data including ambiguous cases, (ii) a simplified mating model which uses only data in which the offspring can be categorically assigned to the parent (Hardy et al. 2003; Robledo-Arnuncio & Gil 2005) and (iii) techniques of point pattern analysis, assessing the uncertainty in the kernel estimates using the data set used in (ii).

Materials and methods

Study species and study site

Our study species is the Eurasian black poplar (Populus nigra L.), a dioecious and wind-pollinated tree with high potential for gene flow over long distances. Black poplar faces two major threats caused by human influence. First, its habitat, the floodplain of rivers, has been reduced by river regulation and intensive utilization of the landscape next to the riverbank for agricultural purposes. This has led to increasing isolation of the fragmented P. nigra populations. A second threat derives from its hybrid form P. x canadensis (a cross between European black poplar P. nigra and American black poplar P. deltoides). This tree is planted mostly along roads and ponds for ornamental purposes and is also used as a fast-growing tree in plantations. The hybrid pollen and ovules are fertile and back crossings with their parents are possible (Bradshaw et al. 2000). In the long run, this unintentional genetic admixture may lead to a hybrid swarm, which does threaten the species genetic identity of P. nigra (Csencsics et al. 2009).

Our study area comprises the floodplain forest of the Eder River next to the city of Fritzlar in Central Germany and adjacent agricultural areas (51°07′17″ N, 9°18′45″ E, Fig. 1a). Almost all of our studied trees are located in a stretch of about 3 km along the Eder River (Fig. 1b). We also sampled additional trees that are located along two smaller rivers parallel to the Eder River in the north and south and in the surrounding area that consists mainly of agricultural fields and rural villages. However, we did not sample a group of about 20 male trees located at the western edge of the studied population because they were in an inaccessible area (Fig. 1b). According to a large survey all over Germany (Kaetzel, Kramer & Tröber 2007), our study population is isolated and there are no other P. nigra populations known within 50 km of the sampled area.

Figure 1.

 Spatial distribution of Populus nigra male trees (grey dots) and sampled mother trees (crossed discs). The light grey areas are rivers or ponds, the dark grey areas are forests of other species. (a) On the left the whole study area of about 300 km2, (b) on the right the core population along the Eder River and an inaccessible area indicated by the striped rectangle.

Data sampling and genetic paternity analysis

Data collection

In total, we collected leaf material of 331 black poplar trees and measured their geographic position with a differential Leica GS50 positioning system (Leica Geosystems, Heerbrugg, Switzerland). During the year 2007, we studied the flowering of sampled trees for male/female classification. We counted a total of 197 potential pollen donors, including trees that did not flower during the observation period and that were also defined as potential males. Seeds were harvested directly from the branches of six mother trees during 2006 and 2007. Mother trees were chosen for a representative coverage of the study area and of different local neighbourhoods like dense and open stands (Fig. 1b). We collected seeds from different branches of each mother tree to ensure random sampling. To get enough leaf material for genetic fingerprinting, we sowed 200 seeds per mother on Petri dishes and harvested the germinated seedlings after 4 days.

Microsatellite analysis

All potential pollen donors, mother trees and every individual seedling were genotyped at seven nSSR loci: WPMS05 and WPMS09 (Van der Schoot et al. 2000), WPMS14, WPMS18 and WPMS20 (Smulders et al. 2001) and PMGC14 and PMGC2163 (IPGC, The seven markers are located on different chromosomes (Cervera et al. 2001; Gaudet et al. 2008) and are therefore unlinked. Fragment amplification and electrophoresis protocols are described elsewhere (Rathmacher et al. 2009a). The variability of the used marker in our population was high (see Table S1 in Supporting Information; see also Rathmacher et al. 2009b). The combination of the seven loci yielded a very high probability of exclusion for the second parent PE2 = 0.9966 within the P. nigra stand, as estimated with CERVUS 3.0 software (Marshall et al. 1998; Kalinowski, Taper & Marshall 2007).

Paternity assignment

We conducted paternity assignment using all seven analysed nSSR loci through standard maximum-likelihood methods implemented in CERVUS 3.0. Critical likelihood values (LOD-scores) yielding 95% and 80% confidence in assignments were obtained using simulations. We simulated 10 000 offspring assuming an average mistyping error of 0.046 per locus, equal to the observed mother–offspring mismatch rate and very similar to the estimate from a mating model (0.044; see below). We estimated the male population size using the PATRI software (Nielsen et al. 2001) and used this estimate as the number of candidate fathers required for the CERVUS simulations (Oddou-Muratorio et al. 2003). The assumed proportion of candidate fathers sampled was then computed as the difference between the PATRI estimate and the actual number of sampled fathers. Allele frequencies used in the simulations were calculated from the whole population. To test the sensitivity of the point pattern analysis and kernel estimates to CERVUS paternity assignment assumptions, we conducted additional analysis based on different CERVUS runs assuming different error rates, numbers of candidate fathers, proportion of sampled males and confidence levels (see Results).

Point pattern analysis

Inverse approaches estimate dispersal kernels by curve fitting. In contrast, methods of point pattern analysis (Wiegand & Moloney 2004; Perry, Miller & Enright 2006; Law et al. 2009) can be used to derive nonparametric estimates of dispersal kernels. In the following, we develop new methods within the framework of marked point processes and mark correlation functions (Illian et al. 2008) that allow nonparametric estimation of dispersal kernels based on paternity analysis together with an uncertainty assessment.

Mark correlation functions

Our data set comprises (i) the spatial coordinates of sampled mother trees and of all potential pollen donors in the population and (ii) the number of seeds of each mother tree assigned to each pollen donor. Such data sets can be interpreted as complex marked point patterns which can be analysed using the framework of spatial point pattern analysis (Stoyan & Stoyan 1994; Illian et al. 2008). The locations of the potential pollen donors (indexed by f) and of the mother trees (indexed by m) represent spatial point patterns, and the marks mmf are the number of seeds of a given mother m fathered by pollen donor f.

In standard situations, mark correlation functions allow testing if the marks (e.g. size) of a point pattern (e.g. trees) are spatially correlated, conditionally on the spatial locations of points (Stoyan & Penttinen 2000; Illian et al. 2008; Law et al. 2009). A non-normalized mark correlation function ct(r) gives for two arbitrary points p and q of the pattern that are distance r apart the expectation of an appropriate test function t(mp, mq) involving the marks mp and mq of the two points. This test function may be, for example, the mark of one of the points [i.e. t(mp, mq) = mp] or the product of the marks of the two points [i.e. t(mp, mq) = mp × mq] (Illian et al. 2008). In practice, all pairs of points are visited and the average of the test function t is calculated for a given distance r. For example, if small trees are clustered, the mark product of nearby trees will be smaller than the non-spatial average of the mark product and we have a spatial dependency in the values of the marks.

However, our data structure is more complex. We therefore developed a new approach to bivariate mark correlation functions (Raventós et al. 2010; Getzin et al. 2011) that uses the results of paternity analysis to estimate empirical dispersal kernels. We have a focal pattern (i.e. the mother trees m which are analogous to ‘seed traps’ in studies of seed dispersal kernels) from which to measure the distance r to the donors, and a marked pattern (i.e. the potential donor trees) where the mark mmf is the number of seeds the donor f fathered at a given mother m. To determine how the values of the marks mmf depend on the distance r to the mother m, we developed the following bivariate mark correlation function adapted to our data structure (Fig. 2):

image(eqn 1)

where nm and nf are the total number of mother trees and potential pollen donors respectively, t(mmf) = mmf is the test function with the normalization constant cf (the mean of the mark mmf taken over all f and m), xf and xm are the location of pollen donor f and mother tree m, respectively, the kernel function inline image yields 1/h if the point pair xf and xm is separated by a distance within the interval (– h/2, h/2) and 0 otherwise, and h is called bandwidth of the kernel function. Note that the kernel function defines basically a circle with radius r and width h centred in the mth mother tree and that the sum inline image counts the number of potential pollen donors f within such a circle.

Figure 2.

 Illustration of the estimator of the mark correlation function kmf(r). At distance (r − h/2, r + h/2) from the mother tree there are six potential father trees which contributed 2, 4, 1, 5, 6 and 0 pollen to the mother tree. Thus, inline image and inline image. If mother m would be the representative mother tree, the expected number of seeds fathered by a representative male located distance r away from mother m would yield 18/6 = 3. In practice, the average overall females is taken as estimate of the representative female.

The interpretation of the estimator given in equation 1 is straight forward. Basically, we visited all mother trees m in sequence and counted the marks mmf of all potential donors f located at distance r from mother m [i.e. the inner sum inline image] and the number of potential donors f at distance r from the mother tree m [i.e. inline image]. Thus, the quantity kmf(r) can be interpreted as the average number of seeds sired by male trees located distance r away from a representative mother tree, divided by its non-spatial expectation cf. This function is a bivariate r-mark correlation function (Illian et al. 2008).

The mark correlation function kmf(r) is proportional to the pollen probability density f(r) at distance r from the location of the dispersing father tree which is often called dispersal kernel (e.g. Clark et al. 1999). Following Stoyan & Wagner (2001), the location of a single pollen grain follows in the isotropic case the probability density function f(r): the probability that the pollen grain is in an infinitesimal disc of area dxdy centred at the point (x,y) is f(r)dxdy, where r is the distance of (x,y) from the location of the pollen dispersing male. The normalization of pollen probability density f(r) is given by

image(eqn 2)

For example, when fitting the empirical mark correlation function kmf(r) with an exponential function f(r) = a exp(−r/α) the normalizing constant c yields ct/(2πaα2). In this way, it is also possible to obtain a parametric estimate of the within-population dispersal kernel f(r) by least-square regression of kmf(r) on distance, which accounts for the observed spatial arrangement of both father and mother trees.

Estimators of mark correlation functions

For estimation of the mark correlation function, we used equation 1, which does not apply edge correction as recommended by Illian et al. (2008). Estimation of the mark correlation function involves a decision on the band width h, which basically defined what is regarded as ‘distance r’. Too small bandwidths h will produce jagged plots of kmf(r) because not enough points fall within the different rings. We used in all analyses a ring width of 100 m. Note that values of kmf(r) at distance r are somewhat biased for distances h/2 (Illian et al. 2008) because here the distance r is smaller than the bandwidth h. We therefore used a correction. We calculated the real area of each ring and determined the real mean radius rcor corresponding to the scale r (see Fig. S1). In all graphs we used rcor.

Test of significance

The empirical mark correlation function was contrasted to that of the null model of random mating. In this null model, each potential pollen donor f has the same chance to father seeds of mother m, independent of their distance. Thus, the null model assumes that there is no spatial structure in the marks. Consequently, kmf(r) = 1 under random mating. In practice we implemented this null model by randomly shuffling the marks mmf corresponding to a given mother tree m among all potential pollen donors. We used a Monte Carlo approach for construction of simulation envelopes of the null model. Each of the 199 simulations of the point process underlying the null model generates a kmf(r) function and simulation envelopes with an approximate α = 0.05 were calculated for the test statistic using its fifth highest and fifth lowest values.

Note that we cannot interpret the simulation envelopes as confidence intervals because we tested the null hypothesis at many scales r simultaneously. This may cause Type I error (Stoyan & Stoyan 1994; Diggle 2003; Loosmore & Ford 2006). To test overall departure of the data from the null model without Type I error inflation, we used a GoF test that collapses the scale-dependent information contained in the test statistics into a single test statistic ui, which represents the total squared deviation between the observed pattern and the theoretical result across the scales of interest. The ui were calculated for the observed data (i = 0) and for the data created by the i = 1,…, 199 simulations of the null model and the rank of u0 among all ui is determined. If the rank of u0 is larger than 190, there is a significant departure from the null model with α = 0.05 over a scales of interest. Details can be found in Diggle 2003, Loosmore & Ford 2006 and Illian et al. 2008.

Kernel fits

To contrast the point pattern analysis with established methods of parametric kernel fitting, we estimated within-population (up to 8000 m) pollen dispersal kernels in three ways: (i) using a general mating model (MM; Burczyk et al. 2006), (ii) using a simplified mating model based on categorical paternity assignments only (CM; Hardy et al. 2003; Robledo-Arnuncio & Gil 2005) and (iii) by weighted least-squares regression of the empirical estimate of the dispersal kernel obtained from the marked point pattern analyses (PP), with the number of father–mother pairs within each distance class as weights. The CM and PP approaches were based on the subset of the total seed sample in which paternity could be unambiguously determined a priori (using CERVUS), while MM was based on the full seed sample (see below). This allowed us to assess whether such subsampling biased our kernel estimates.

The MM of Burczyk et al. (2006) assumes that the paternity of the offspring of a mother tree comes either from migrant pollen from outside the sampling area (probability m) or from local males (probability 1 – m), with relative mating success of each local male modelled as a function of the assumed dispersal kernel and the relative spatial position of all potential males relative to the focal female. The within-population kernel is estimated jointly with m following a maximum-likelihood scheme based on Mendelian transition probabilities, which does not discard ambiguous-paternity cases (see Burczyk et al. 2006 for details). We used the NM+ software (Chybicki & Burczyk 2010a) to conduct the mating model analysis, including a function that jointly estimates per-locus mistyping rates to avoid m-overestimates caused by inflated Type II errors (I. Chybicki & J. Burczyk, unpublished data). Background pollen allele frequencies were estimated from offspring pollen gamete haplotypes with no compatible father within the population, allowing for up to two father–offspring mismatches (Slavov et al. 2009).

The CM model can be seen as a simplified version of classical mating models in which only seeds of unambiguously-known paternal origin (as determined in our case by the CERVUS analysis) are considered for kernel fitting, while still accounting for the spatial geometry of the population (for details, see Hardy et al. 2004; Robledo-Arnuncio & Gil 2005; Robledo-Arnuncio & García 2007; Slavov et al. 2009). We used this simplified mating model to compare with the point pattern analysis, since they are based on the same categorical paternity data.

We fit several two-dimensional within-population dispersal kernels f(r; θ), with parameter set θ, yielding the probability of pollen transport per unit area at distance r from the source. First, we considered the exponential-power family:

image(eqn 3)

Besides considering the full exponential-power model, in which both the scale (a) and shape (b) parameters were estimated, we also considered an exponential model by fixing = 1 and estimating a only, for comparison with abundant previous studies using this distribution.

Second, we considered two-component dispersal kernels (except for the general mating model, since the NM+ software does not yet implement them). This kind of kernel has been used for attempting to describe better the short- and long-distance components of dispersal, under the assumption that they follow different patterns that are not well fitted by simpler probability laws (Goto et al. 2006; Geng et al. 2008; Slavov et al. 2009; Chybicki & Burczyk 2010b). We used a two-component model comprising two exponential-power functions:

image(eqn 4)

For each of the three estimation methods separately, we used Akaike’s Information Criterion (AIC) to select the most parsimonious pollen dispersal model among those tested. For least-squares regression (point pattern analysis), assuming normally distributed errors, we used AIC = nlog(RSS/n) + 2K, where RSS is the residual sum of squares, n is the number of observations and K is the number of estimated parameters (including the intercept and RSS/n). For maximum-likelihood fits, we used AIC = −2log(L(inline image)) + 2K, with L(inline image) being the value of the likelihood function at its estimated maximum.


Genetic paternity analysis

The male population size was estimated by PATRI at 532 individuals (95% CI: 502–564). Although this might be an overestimate, since PATRI does not correct for typing errors, we used this conservative value as the number of candidate fathers (i.e. 35% of candidate father sampled) for CERVUS simulations to minimize Type I errors in paternity assignments. Of a total sample of 2565 seedlings, we could assign 632 to their fathers at the 80% significance level, based on LOD scores, while 1849 seedlings were left unassigned and 84 seedlings were excluded from the analysis because of genotyping failure. Among the 1849 unassigned seedlings, 1754 had a most likely candidate male with LOD score lower than the 80% threshold, while 23 (or 72) seedlings had two or more candidate males with identical LOD score larger (or smaller) than the 80% threshold. On the other hand, 233 of the 2481 successfully genotyped seedlings were unassigned and had no compatible father within the stand, yielding an apparent pollen immigration rate of 9.4%. This could be the result of either long-distance pollen immigration or local pollen dispersal from the group of unsampled males located within the stand. Some of the assigned seeds and unassigned seeds with one or more compatible candidate fathers within the stand might be the result of cryptic pollen immigration, and thus the above apparent pollen immigration rate should be regarded as a minimum estimate, but we do not expect its bias to be large. This is because CERVUS is designed to minimize Type I errors (wrong assignment) and is known to suffer substantial Type II error rate (wrong unassignments), as large as 30–50% in the presence of scoring errors of 1–3% per locus (Oddou-Muratorio et al. 2003). In addition, our population is strongly geographically isolated and we used a conservative value for the number of candidate fathers. This expectation was supported by the mating-model estimate of total pollen immigration, i.e. accounting for cryptic pollen flow, which was only slightly larger than the apparent pollen immigration obtained from CERVUS (see below).

The genetic paternity analysis revealed rare long-distance dispersal events from fathers up to 7500 m away from the mother tree in 2007. However, for most mother trees, the most distant father is about 2000 m away. Overall, 70% of effective pollen has its origin in a distance of less than 500 m, 83% less than 1000 m and 96% less than 2000 m (Fig. 3), or 63%, 75% and 87% respectively, when accounting for the apparent 9.4% long-distance pollen immigration rate from outside the stand. The relative spatial position of individual trees in the population determined a non-uniform distribution of potential pollen dispersal distances, visually similar to the distribution of realized mating distances at intermediate and long distances (Fig. 3). At short distances, however, there was an apparent excess of mating events, relative to the potential distribution.

Figure 3.

 Distribution of potential pollen dispersal distances and observed mating distances for Populus nigra along the Eder River in Central Germany.

For assessing the within-population pollen dispersal kernel in subsequent analyses, we used all 2481 successfully genotyped seedlings in the case of the general mating model, and the 632 seeds with categorical paternity assigned at the 80% level in the case of the point pattern and simplified mating-model approaches. Different assumptions on the number of candidate males (from 200 to 532), mistyping error rate (from 0 to 0.08) and confidence threshold (80% or 95%) in CERVUS analysis changed the number seeds left unassigned, but had little influence on subsequent pollen dispersal kernel estimates or apparent pollen immigration rates (see Table S2).

Point pattern analysis

Figure 4a shows the mark correlation function kmf(r) which is proportional to the empirical kernel functions f(r). The theoretical expectation for the null model of random mating yields cf,m(r) = 1. Our data were not compatible with the random mating null model. Overall, the empirical pollen dispersal kernels declined with increasing distance r between father and mother trees; there was an excess of nearby mating (up to 400 m), but fewer than expected mating events occurred at larger distances (Fig. 4a). The mark correlation function was at short distances (<340 m) above the simulation envelopes and at larger distances (1500–2000 m) close to or below the simulation envelopes (Fig. 4a).

Figure 4.

 Results of the mark correlation analyses of the joined 2006 and 2007 Populus nigra paternity data. (a) Normalized mark correlation function kf,m(r), which is proportional to the empirical kernel function f(r). The mark mmf attached to potential donor f was the number of seeds of mother m fathered by donor f. In this case, the mark correlation function kmf(r) gives the expected number of seeds a potential donor would father at a representative mother tree, given it is located at distance r from the mother, divided by the non-spatial expectation. Bold black line: mark correlation function kmf(r); horizontal line: expectation under the null model where the marks were randomly shuffled over all potential fathers (i.e. random mating); grey lines: simulation envelopes being the fifth lowest and highest values taken form 199 simulations of the null model; dashed line: distance at which very few male–female pairs were observed (<35). (b) Same as (a) but showing the full range of distances at a logarithmic scale. (c) Number of mother–potential-donor pairs per distance class. (d–f) Examples of dispersal kernels generated by Monte Carlo simulations of the random mating null model.

To formalize these observations, we applied the GoF test. The GoF test for distances 0–500, 0–1000 and 0–2000 m yielded ranks of 200 (Table 1). This indicates a significant departure from the null model of random mating. We also found a (weakly) significant difference in the kernel function for distances between 1500 and 2000 m from the null model (Table 1). Thus, this part of the tail of the distribution can be distinguished from a null model without spatial dependence, but because stochastic effects are strong (as manifested by the relatively wide simulation envelopes) the uncertainty on the detailed shape of the decline is large (Fig. 4b). The reason for this is that the number of potential donors decreased at our study design with distance from the mother (Fig. 4c) and therefore we observe that stochastic effects are strong at larger distances r.

Table .1.   Rank of the goodness-of-fit test for different distance intervals. The estimate of the kernel function is at a 0.05 error level significantly different from the null model of random shuffling of the father trees if rank > 190 and at the 0.01 error level if rank > 198
Data set0–500 m0–1000 m0–2000 m0–2500 m1000–2000 m1000–2500 m1500–2000 m1500–2500 m
  1. *P ≤ 0.05; **P ≤ 0.01; ***P ≤ 0.001

2006 + 2007200**200**200**169173108194*118

Figure 4b shows the mark correlation functions for the entire range up to 8000 m and outlines the high variability in the values of the null model at distances larger than 2100 m (dashed vertical line). This high uncertainty is caused by the low number of male–mother pairs within the different distance bands for distances larger than 2100 m (Fig. 4c). Plotting the width of the simulation envelopes over the number of male–mother pairs shows that the simulation envelopes strongly increase (above a width of 2) if we have roughly less than 35 male–mother pairs (Fig. S2). To further illustrate this high uncertainty at larger distance, Figures 4d–f show three examples of mark correlation functions that arise from the random mating null model.

Kernel fits

Parameter estimates of the within-plot pollen dispersal kernel obtained with the PP were consistent with results from the MM and from the CM (Table 2). Assuming an exponential-power function, the estimated mean dispersal distance was 7652 m for PP, 9460 m for CM and 8497 m for MM, with similarly low (< 1) estimated shape parameters in all cases, indicating leptokurtic dispersal. The MM yielded a pollen immigration estimate from outside the population of  12% and an average mistyping error rate of 0.044 (Table 2). In all cases, the assumed dispersal kernel function had a strong impact on the estimated mean dispersal distance, with the best fits obtained with the two-component exponential-power, one-component exponential-power and one-component exponential function, in this order (Table 2). In some cases, the kernel’s mean was similar or larger than the maximum distance between sampled trees. This is a logical result of extrapolating the fit beyond the spatial scale of analysis, which could reflect either biological reality or inadequate model selection.

Table .2.   Estimates of within-population pollen dispersal kernels obtained from a simplified mating model (CM), the point pattern analysis (PP) and the general mating model (MM). The best model for each data set is shown in bold, as indicated by Akaike’s Information Criterion (AIC). MM jointly estimates pollen immigration from outside the plot (inline image) and the typing error rate (inline image). Mean indicates the mean pollen dispersal distance in metres. See text for details
MethodModelParameter estimatesPmean (m)Log-likelihoodAIC
Component 1Component 2
CMExponential= 856.8567
= 1 (fixed)
Exp-pow= 3.249
= 0.2708
2C Exppowa= 407.6768
= 1.1124
a= 7779.3261
= 3175.7177
PPExponential= 591.1680
= 1 (fixed)
Exp-pow= 318.2904
= 0.4831
2C Exppowa= 387.2324
= 0.9376
a= 7224.7012
= 3600.0700
MMExponential (inline image = 0.118, inline image = 0.044)= 601.9251
= 1 (fixed)
Exp-pow (inline image = 0.117,inline image = 0.044)= 15.2846
= 0.3159

To test the impact of different assumptions in CERVUS analysis, we repeated our analyses for 18 different scenarios, each yielding a different number of assigned seeds. Table S2 shows, however, that the number of assigned seeds had little influence on subsequent pollen dispersal kernel estimates or apparent pollen immigration rates. In addition, we applied an independent approach to estimate pollen immigration from outside the population, using the PFL software (Slavov et al. 2005) and allowing for up to three mismatches (Slavov et al. 2009). The resulting estimate ( 14%) was within estimation error of the one obtained with the MM model (result not shown), suggesting that the latter is not biased by mistyping.


In this paper, we used the framework of marked point processes to develop a new method that allows for nonparametric estimation of local dispersal kernels based on genetic paternity data. Our approach makes two major contributions to the field of dispersal ecology: (i) it provides kernel estimates free of assumptions about the shape of the dispersal distribution and (ii) it allows construction of Monte Carlo simulation envelopes of a given null model (such as random mating) that facilitates assessment of uncertainty in kernel estimates. Analysis of within-population pollen dispersal of an isolated population of Populus nigra L. located in Central Germany rejects the random mating null model and indicates a decrease of the probability of effective pollen dispersal with distance. We could confirm a highly significant excess of mating events at short distances (say <340 m) and a somewhat weaker, but significant shortage of mating events at larger distances (1500–2000 m). However, the simulation envelopes of the random mating null model were very wide pointing to higher uncertainty in kernel estimations than previously thought. Especially the few father–mother pairs at larger distances (2100–8000 m) produce extremely wide simulation envelopes, raising serious doubts about the confidence in kernel fits at larger distances even for a data set as ours that comprises a relatively large number of seeds.

Contribution of empirical kernel estimation to dispersal ecology

One of the biggest challenges in kernel fitting is the problem of model selection, i.e. one needs to assume certain families of kernel functions and determine the parameters (or functions) that provide the most parsimonious fit to the data. However, it is well known that kernel functions with very different means and tails may fit the data over the limited spatial scale of analysis equally well (Smouse, Robledo-Arnuncio & González-Martínez 2007). We therefore argue that it is important to complement the analysis with a nonparametrically estimated kernel function which does not rely on assumptions about the shape of the dispersal distribution. In many cases, the kernel estimate will not be a smooth function but rather show some erratic stochastic fluctuations (see e.g. Fig. 4). Comparison of the empirical estimate with the fitted curve provides a simple visual method to assess GoF.

The second contribution of our method to dispersal ecology is that it allows confronting the empirical kernel estimate to that of Monte Carlo simulations of a given null model. Two types of null models can be used. The null model may represent a randomization of the data, which, however, maintains the data structure. This allows for a comparison of the empirical kernel to kernel functions that may emerge only by stochastic effect without spatial dependence. In the context of our study, the natural null model of this type is that of random mating, which randomizes for a given mother tree the individual identities (seeds) over all potential pollen donors (fathers), while keeping the spatial attributes of the remaining paternity data fixed. The simulation envelopes encircle the range of potential kernel functions compatible with random mating and show the degree to which the empirical kernel is incompatible (outside the envelopes) or compatible (inside the envelopes) with random mating. Our results showed that the simulation envelopes were very wide, indicating high uncertainty only due to stochastic effects, although the empirical kernel indicates significant distance dependence (Fig. 4b).

Null models may also represent a kernel function fitted with traditional methods. We did not present this analysis here, but similar methods are frequently used in point pattern analysis (Illian et al. 2008; Wiegand, Martinez & Huth 2009). In this case, the null model would randomize the individual identities (seeds) over all potential pollen donors (fathers), but conditionally on the distance dependence described by the kernel function. This null model allows for an assessment of the magnitude of stochastic effects that arise from the given data structure (in our case the geometry of fathers and mothers) under the fitted kernel. The width of simulation envelopes indicates the uncertainty in the fit and a formal GoF test allows assessing the GoF of the assumed kernel function given the data. Such an assessment is especially important in cases where the empirical kernel function is not a very smooth function and not very well represented by the assumed family of kernel functions. However, note that the width of the simulation envelopes is directly dependent on the number of data points at a given distance interval (i.e. observed mother–father pairs; Fig. S2).

Showing that an observed dispersal kernel agrees with a given null model beyond a certain distance could also be done with existing analytical approaches. However, the simulation envelopes provide additional information. For example, a case where an observed dispersal kernel is for larger distances consistent with the random mating null model (but with narrow simulation envelopes) has a completely different interpretation from that of a case where the simulation envelopes are very wide. In the first case, there would be strong evidence that the data agree with the null model (i.e. narrow envelopes), whereas in the second case there is a high risk to accept the null model even if it is not true because of weak data and associated wide simulation envelopes. A GoF test should therefore only be conducted over a distance interval where the envelopes do not explode due to low number of data points as in our case for distances larger than 2000 m (Fig. 4b). Plotting the width of the resulting simulation envelopes over the number of data points (as done in Fig. S2) provides a rough estimate on the number of data points required for shrinking the simulation envelopes sufficiently.

Confrontation of the empirical dispersal kernel to those derived from Monte Carlo simulations of the null model showed that statistical fitting may substantially underestimate the uncertainty in the estimates of dispersal kernels at larger scales. This is a general dilemma in dispersal kernel studies (see Kot, Lewis & vandenDriessche 1996), which deserves further theoretical consideration. One approach would be to use our method and generate artificial data to test which sampling schemes and amount of data would provide sufficient data over the distance interval of interest. However, the critical question is if it would be feasible to collect the amount of new data required. There is certainly no simple solution since no modelling approach so far can substitute for the lack of data.

The issue of uncertainty assessment of kernel estimates has received surprisingly little attention, except deriving confidence intervals for fitted parameters and except some studies that have used nonparametric tests (e.g. Mann–Whitney–Wilcoxon) to check whether the observed distribution of mating distances significantly departs from the distribution of potential dispersal distances, i.e. whether mating occurs at random or not (see Streiff et al. 1999). The reason for this may be that kernel fitting does not provide a simple and natural way of construction simulation envelopes as point pattern analysis. Kernel fitting approaches could in theory derive simulation envelopes of randomized data sets. However, the problem is the intermediate step of kernel fitting in which significant information may be lost because the assumed family of kernels to be adjusted may fit the randomized data poorly or convergence problems may appear. This problem is circumvented in point pattern analysis because kernels are estimated empirically and not via curve fitting.

We conducted a fit of the empirical kernel to check the consistency of the point pattern analysis against established approaches of kernel fitting (Fig. 5). Of course, fitting the empirical kernel will not produce better fits than maximum-likelihood mating models, but this was not the purpose of this exercise. We rather need to confirm that the empirical kernel estimate and kernel fitting both produce the same results. Our analyses show that this was the case (Table 2).

Figure 5.

 Estimated pollen dispersal kernels for Populus nigra. The left panel shows the estimate for the point pattern model within the mark correlation function. The right panel shows the best-fit models obtained with the three methods used: PP, CM and MM. See text for details about the assumed dispersal models.

Note that the point pattern method is, in principle, able to incorporate long-distance dispersal similarly to other kernel estimation methods. However, the limiting factor is data availability. To produce reliable results, more than 35 male–female pairs are required in each distance band (Fig S2). One possibility to increase the number of male–female pairs is to select a larger bin and bandwidth and to study the tail of the empirical dispersal kernel with low resolution (Fig. S3). This approach provides at least a rough estimate of the tail of the kernel. Although the number of male–female pairs was still too low in our case, Fig. S3 provides some idea on the shape of the tail.

Issues of kernel fitting

A certain disadvantage of our approach is that it needs categorical data. We therefore used a simplified mating model to derive kernel fits to be compared with the fits of the empirical kernel. However, to compile a data set comprising categorical data we had to exclude a notable proportion of our data. The question arises if this had an impact on the kernel estimate. This would be the case if the seeds excluded from the analysis would have a spatial structure that differs from that of the accepted seeds. For example, for highly clustered potential fathers we may expect that nearby potential fathers are closely related and genetically similar. This may produce a clustering in ambiguous cases. However, we found that only 1.2% of the seeds with unassigned paternity had two or more potential fathers exceeding the critical LOD score at the 80% confidence level, while in most cases seeds were left unassigned because the most likely male had an LOD score smaller than the critical value. It is somewhat unlikely that these cases may show a marked spatial structure; however, to exclude this possibility we also conducted kernel fits based on a general mating model that uses all seed data including ambiguous cases. We found no indications for a bias.

We did not attempt to incorporate the pollen immigration estimate from outside the study site (12%) into the kernel estimate. Instead, we decided to report the immigration estimate and fit only within-population kernels, including long-distance dispersal events of up to 8000 m. There are well-established procedures to include immigration rates in dispersal kernel estimates based on genetic paternity analysis, through modelling the probability of a propagule coming from outside the plot as a function of the kernel (Jones et al. 2005; Goto et al. 2006; Robledo-Arnuncio & García 2007; Slavov et al. 2009). However, these approaches require the strong assumptions of an infinite continuous population, and that immigration events respond to the same (explicitly assumed) function that governs near dispersal. These assumptions are adapted to situations where a small study plot is embedded within a larger continuous population (the demographic scenario that motivated the above-mentioned developments). In that case, a large proportion of immigrants arrive from individuals not so far away within the same population, driven by similar air flows and turbulences (or animal-mediated processes) as within-plot pollen dispersal.

However, in strongly isolated populations like the one studied here the population-continuity assumption is violated and immigrants have probably originated from unknown trees located many kilometres away. Pollen immigration into isolated populations results from long-distance dispersal events governed by highly stochastic processes poorly described by the simple phenomenological functions used to fit local mating patterns (Nathan 2006). We believe that in this case it is more appropriate and less misleading to report separately immigration rates on the one hand and fit within-population dispersal kernels on the other hand, unless there is reliable information about the geographical origin of long-distance immigrants, which is very rarely the case.

Two additional considerations of practical interest arise from our analysis. One, already pointed out in the context of seed dispersal (Robledo-Arnuncio & García 2007; Jones et al. 2008), is that fitting the observed mating distance distribution directly will yield biased estimates of the pollen dispersal kernel. We found strong underestimates of the mean dispersal distance using this naive direct approach (results not shown). To avoid confusion and potentially misleading comparative inferences, researchers should bear in mind that the observed mating distance distribution is an outcome of both the kernel function and the spatial geometry of individuals within the study plot. It thus contains both demographic and dispersal information, and should be plainly reported as the mating distance distribution, never as a kernel function (as still found in recent literature), which by definition describes dispersal probabilities from a single individual. The second consideration is the great sensitivity of dispersal parameter estimates, such as the mean dispersal distance, to model selection. This is mostly due to the way different models extrapolate beyond the spatial scale of analysis. Obviously, choosing the best-fit model does not alleviate the problem unless it provides an accurate enough description of the underlying mechanistic process over the relevant spatial scales (see Kuparinen et al. 2007), which is especially hard when dealing with long-distance dispersal. Even within the spatial scale of analysis and using categorical paternity information, we have shown that the uncertainty in dispersal data becomes hopelessly large beyond a few hundreds of metres. This should be bore in mind before interpreting and comparing long-distance dispersal estimates in different studies.

Contribution to conservation ecology

Conservation activities for the protection of endangered species mainly focus on direct and obvious threats. In our study species P. nigra, these threats are mainly reductions in suitable habitat and the resulting fragmentation of populations. However, there is also a more subtle threat, the loss of the species genetic identity, by crossing with its hybrid form P. x canadensis and the eventual establishment of a hybrid swarm. Given that P. x canadensis is a man-made construct, this introgression would alter natural evolutionary processes of P. nigra. In addition, introgressed individuals may contain limited genetic variation, since in Germany almost all (in our study area 84%) of the hybrid males are a single clone (P. x canadensis Robusta). As a consequence, the adaptive response of one of the key species of European floodplain forests to climate or anthropogenic changes in the river system would be diminished, with unpredictable consequences for the whole ecosystem.

To assess the introgression risk for P. nigra populations, an estimation of the amount of pollen exchange between natural and hybrid stands is needed. Furthermore, it is crucial to set standards in agriculture and forestry for the necessary isolation distance between plantations and natural stands (see for genetic modified maize Sanvido et al. 2008) to prevent crossings between the two species. Especially in the context of possible introduction of genetically modified P. x canadensis individuals into the landscape, these values are of great importance.

We showed that modelling pollen dispersal using genetic paternity data is subject to high uncertainty. Risk assessment for species with high seed- or pollen-dispersal potential should carefully account for this uncertainty.


We are grateful to I. Chybicki and J. Burczyk for providing an unreleased version of NM+ software that jointly estimates genotyping errors. We thank I. Chybicki, F.A. Jones and H. Muller-Landau for helpful discussion on long-distance dispersal estimation, and the editors and two anonymous referees for their constructive comments. M.N. was funded by the Federal Ministry of Education and Research, Germany (Grant ID 0313285J). J.J.R.A. was funded by a ‘Ramón y Cajal’ research fellowship and by CGL2009-09428 project from the Spanish Ministry of Science and Innovation.