1. Seed dispersal is a key biological process that remains poorly documented because dispersing seeds are notoriously hard to track. While long-distance dispersal is thought to be particularly important, seed-tracking studies typically yield incomplete data sets that are biased against long-distance movements.
2. We evaluate an analytical procedure developed by Jansen, Bongers & Hemerik (2004) to infer the tail of a seed dispersal kernel from incomplete frequency distributions of dispersal distances obtained by tracking seeds. This ‘censored tail reconstruction’ (CTR) method treats dispersal distances as waiting times in a survival analysis and censors nonretrieved seeds according to how far they can reliably be tracked. We tested whether CTR can provide unbiased estimates of long-distance movements which typically cannot be tracked with traditional field methods.
3. We used a complete frequency distribution of primary seed dispersal distances of the palm Astrocaryum standleyanum, obtained with telemetric thread tags that allow tracking seeds regardless of the distance moved. We truncated and resampled the data set at various distances, fitted kernel functions on CTR estimates of dispersal distance and determined how well this function approximated the true dispersal kernel.
4. Censored tail reconstruction with truncated data approximated the true dispersal kernel remarkably well but only when the best-fitting function (lognormal) was used. We were able to select the correct function and derive an accurate estimate of the seed dispersal kernel even after censoring 50–60% of the dispersal events. However, CTR results were substantially biased if 5% or more of seeds within the search radius were overlooked by field observers and erroneously censored. Similar results were obtained using additional simulated dispersal kernels.
5. Our study suggests that the CTR method can accurately estimate the dispersal kernel from truncated seed-tracking data if the kernel is a simple decay function. This method will improve our understanding of the spatial patterns of seed movement and should replace the usual practice of omitting nonretrieved seeds from analyses in seed-tracking studies.
A principal reason why seed dispersal remains relatively poorly understood is that dispersing seeds are notoriously hard to follow (Wang & Smith 2002). It is often difficult or impossible to individually tag and follow seeds because they are so small. Researchers typically use seed marking methods such as thread tags or radioisotopes and then attempt to relocate tagged seeds after dispersal (reviewed in: Forget & Wenny 2005). Although this has resulted in a much better understanding of when and how far seeds disperse, a certain proportion of the dispersed seeds in these studies are never recovered, thus their dispersal distances remain unrecorded (Wang & Smith 2002). Because seeds that disperse relatively far are the most difficult to find, LDD events are the least likely to be observed. The resulting bias against far-dispersed seeds is problematic because it misses the tail of the seed dispersal kernel, which is the most important portion of the distribution (Bullock & Clarke 2000). Seeds that disperse far are important for a host of ecological and evolutionary processes such as the spread of invasive species, metapopulation dynamics and maintenance of diversity (Portnoy & Willson 1993, Cain, Milligan & Strand 2000; Nathan et al. 2003; Soons & Bullock 2008).
In most seed-tracking studies, researchers search for dispersed tagged seeds within a predefined radius of the point of release (Fleming & Heithaus 1981; Howe 1990; De Steven 1994; Fragoso 1997; Jansen, Bongers & Hemerik 2004; Jansen, Bongers & van der Meer 2008). Seeds that disperse further than this search radius are automatically lost. Typically, these seeds are classified as ‘missing’ and omitted from the data set, thus dispersal distances greater than the search radius are not represented in the data set. Jansen, Bongers & Hemerik (2004) developed an approach for overcoming this problem, but this method has never been validated. This method, henceforth called ‘censored tail reconstruction’ (CTR) uses survival analysis to estimate the entire dispersal kernel based on the pattern observed at the beginning of the distribution. Instead of omitting missing seeds, this analysis assumes that all missing seeds dispersed beyond the search radius. Dispersal distances are treated as waiting times, while missing seeds are treated as observations censored at the search radius. The full dispersal kernel is then estimated by fitting a cumulative function to Kaplan–Meier probability estimates of dispersal distance.
Jansen, Bongers & Hemerik (2004) used the CTR method to estimate the dispersal kernel of Carapa procera seeds dispersed by scatter-hoarding rodents in French Guiana from data on thread-tracked seeds with incomplete recovery. To date, these two studies remain the only published applications of this method. Although the CTR method is arguably superior to the common practice of simply omitting missing seeds from a data set, the accuracy of this method in providing credible estimates of dispersal distances has not been tested. Jansen, Bongers & Hemerik (2004) fitted a Weibull function to the survivorship curves, but it is unknown whether this is the best-fitting function in general. It is also unknown how sensitive the CTR method is to falsely censored seeds, for example seeds overlooked by field researchers within the search radius.
To test whether the CTR method produces accurate estimates of seed dispersal distributions, we used an existing, unpublished data set: the complete frequency distribution of seed dispersal distances for the rodent-dispersed palm Astrocaryum standleyanum (hereafter: Astrocaryum), which was obtained through seed tracking with telemetric thread tags (B.T. Hirsch, P.A. Jansen & R. Kays submitted). We truncated the data set at various distances to mimic search distances used in the field, fitted different dispersal kernel functions through CTR, used ΔAIC values to select the best-fitting function and determined how well this function approximated the observed full dispersal kernel. Additionally, we quantified the effect of function selection and falsely censored seeds on the CTR results. These tests allowed us to evaluate the overall robustness of the CTR method and make recommendations about study design. Finally, we used the CTR method in conjunction with simulated dispersal kernels to test whether it can be used in studies of plant species with different shaped dispersal kernels.
Seed Dispersal Data
The data set with seed dispersal distances that we used for our test was collected on Barro Colorado Island (BCI), Panama, a 1560-ha island protected and administered by the Smithsonian Tropical Research Institute (9°10′N, 79°51′W). BCI is covered with primary and secondary semi-deciduous moist tropical forest. Annual rainfall averages 2600 mm with an average temperature of 27 °C. The dry season generally lasts from mid-December to May (Terrestrial-Environmental Sciences Program of the Smithsonian Tropical Research Institute).
The study species, Astrocaryum standleyanum, is a Neotropical arborescent palm occurring from Costa Rica to Ecuador. Trees annually produce 3–6 pendulous infructescences with up to 1500 ovoid fruits in total. The local fruiting period is from March to the beginning of July (De Steven et al. 1987). The fresh weight of the 2–3 cm seeds averages 9·6 g (Wright et al. 2010). Astrocaryum depends on scatter-hoarding by rodents for seed dispersal, in particular on agoutis Dasyprocta punctata (Smythe 1989; Galvez et al. 2009), 2–4 kg caviomorph rodents that bury the seeds in the soil as food reserves for periods of food scarcity (Smythe 1978, 1989).
A complete frequency distribution of seed dispersal distances of Astrocaryum was obtained by placing 589 tagged seeds at 52 seed stations across a ∼25-ha area in the centre of BCI. Each seed had a telemetric thread tag and a black nylon-coated stainless steel leader wire tied to a 3·8-g radiotransmitter with 20-cm wire antenna (Advanced Telemetry Systems, Isanti, MN, USA; Hirsch et al. submitted). Dispersal distance was measured of 423 seeds removed from seed stations by animals, and no differences in removal rate or dispersal distance were found between seeds with and without transmitters (Hirsch et al. submitted). Removed seeds were located by sight or with a hand-held telemetry receiver (Yaesu VR-500) and three-element Yagi antenna. The transmitter was occasionally bitten off of the seed, but 97% of seed tags were recovered intact (Hirsch et al. submitted). Dispersal distance and direction were measured with measuring tape and a compass. If a seed dispersed more than 20 m, a hand-held GPS receiver (Garmin 60CSx GPS) was used to measure the dispersal distance. We used the primary dispersal distances obtained from the above study to formulate our empirical dispersal kernel (e.g. no secondary dispersal events were included).
Dispersal Kernel Fitting
We fitted four commonly used dispersal kernels in their one-dimensional form (i.e. probability density functions) directly to the distribution of dispersal distances from the 417 radiotracked seeds: (i) lognormal, (ii) Weibull, (iii) exponential and (iv) 1DT (Table 1). All are simple decay functions in which larger dispersal distances are less frequent than any shorter dispersal distance, as is commonly assumed in seed dispersal studies. We used the function optim in R 2.10 (R Development Core Team 2010) to search for the parameter values in each of the four probability density functions that maximized the likelihood L of the observed distances (d);
where d is a vector of n observed dispersal distances, p a set of parameters corresponding to one of the probability density function f. We used Akaike information criterion (AIC; Akaike 1974) to determine which function fitted the observed data best.
Table 1. Dispersal kernel functions fit to empirical seed dispersal distances in the palm Astrocaryum standleyanum, ranked by fit. ΔAIC values denote the difference in AIC scores between the current model and the best-fitting model
The CTR method (cf. Jansen, Bongers & Hemerik 2004; Jansen, Bongers & van der Meer 2008) uses survival analysis to estimate the dispersal kernel, assuming that missing seeds have travelled beyond the search radius. CTR treats the retrieval of a dispersed seed as an event, observed dispersal distances as failure times, and missing seeds as events censored at a given dispersal distance, that is, the radius of the area in which dispersed seeds were searched. Kaplan–Meier survival analysis is used to calculate the survivorship function to which a standard dispersal kernel can then be fitted and used to predict the tail of the distribution.
The steps used in CTR are:
1 Collect data on seed dispersal distance using thread tags or similar methods as appropriate for the study system. The search radius and the number of seeds lost (i.e. moving further than this distance) should be recorded.
2 Estimate the Kaplan–Meier survivorship curve, treating dispersal distance as time, and including all missing seeds as observations censored at the search radius.
3 Fit probability density functions to the K–M survivorship curve. We test four functions used in previous studies here, but any other appropriate decay function can be used.
4 Use the AIC selection procedure to determine which probability function best fits the data.
We provide an example R-code which can be used as a guide to conduct CTR analyses in Appendix S1. The R-code uses the packages ‘survival’ (Therneau & Lumley 2009) and ‘fdrtool’ (Strimmer 2011).
To estimate the accuracy of CTR, we estimated the difference between CTR-derived dispersal kernels fitted on truncated data and the ‘true’ empirical dispersal kernel for Astrocaryum. Truncated data were obtained by assuming that all Astrocaryum seeds dispersed beyond a given search radius were missing. We then compared the average distance of the 95% percentile of the dispersal kernel between the CTR and empirical results. We used the 95th percentile criteria as a measure for LDD because of its use in previous studies (e.g. Nathan et al. 2003).
We evaluated the sensitivity of the CTR method to three potential sources of bias: (i) the probability density function used, (ii) the size of the search radius (or proportion of seeds which fall within the search radius) and (iii) the proportion of seeds overlooked by observers within the search radius (falsely censored seeds). We estimated confidence intervals (α = 0·05) for each measure of bias (detailed below) using a nonparametric bootstrap (Efron & Tibshirani1993). Confidence intervals were calculated as the 2·5 and 97·5 percentiles of the bootstrapped estimates.
We tested the sensitivity of CTR for function selection by comparing dispersal distance estimates derived with the CTR method for each of four previously defined probability density functions. These four functions were chosen because they have commonly been used in prior studies of seed dispersal (Table 1). We compared the CTR-derived distance of the 95% percentile of the dispersal kernel using four mathematical functions vs. the empirical results. Through bootstrapping we also tested how often AIC yielded the true (or nontruncated) dispersal model among the four candidate models fit to truncated data.
A typical search radius used in previous studies is 20 m (Howe 1990; De Steven 1994); however, it is unknown whether using such a radius with the CTR method can yield accurate results. Here we created multiple truncated data sets based on the empirical Astrocaryum dispersal kernel with search radii ranging between 1·6 and 134·5 m which corresponded to 0–90% of seeds falling outside the search radius. We then tested the effectiveness of the CTR method using these various search radii (19 different radii were evaluated in total, each increasing radius corresponded to a 5% increase in the proportion of seeds recovered). We used the difference between the observed ‘true’ 95th percentile of dispersal distance and the CTR-derived results as an estimate of bias. Bias (ε) was calculated as ε = μctr − μ, where μ is the ‘true’ observed LDD distance and μctr is the CTR-derived LDD measure. Here we report the absolute proportional bias |ε|/μ.
False-censoring or overlooking seeds
The CTR method is based on the assumption that all seeds not recovered within the search radius were dispersed beyond this radius. To determine how robust the CTR method is to violations of this assumption, we evaluated bias when 0–50% of the seeds were overlooked (in 11 equally spaced steps, each step corresponding to a 5% increase in the proportion of overlooked seeds). This was done by randomly removing a given percentage of the seeds from a truncated data set (truncated at 20 m) and treating them as censored. We used the same measure of bias (|ε|/μ) for this analysis.
To evaluate how sensitive CTR is to the specific shape of the distribution, we ran the above analyses for a variety of simulated seed distributions that had the same sample size and scale as the empirical distribution (Appendix S2). We used a Monte Carlo type simulation to generate dispersal distributions through random number generation from each of the four tested probability density functions.
Primary dispersal distances of the 417 Astrocaryum seeds ranged between 0·15 and 132·5 m (mean 14·7 m, median 7·5 m). Of the four dispersal kernel functions, the lognormal fitted the observed dispersal data best (Table 1, Fig. 1). Compared with the estimated 95th quantile distance calculated from the empirical data set (58·6 m), the lognormal distribution was more accurate (estimate = 58·4 m, 95% CI = 44·4–74·8 m) than the Weibull (34·7 m, 34·7–37·5 m), exponential (34·9 m, 32·0–37·6 m) and 1DT distributions (97·6 m, 73·9–139·8 m) (Fig. 2). These analyses demonstrate that the CTR method is highly sensitive to the mathematical function that is fitted on the Kaplan–Meier survivorship estimates. When the best-fitting model (lognormal) was used with the CTR method, the derived results were very similar to the empirical results (Fig. 2).
Given the importance of model choice, the AIC approach was an important step in selecting the best model for our CTR-derived data set. The lognormal function, which gave the best overall fit to the full observed data set and provided the least bias, was selected 97% of times based on its ΔAIC score (after 1000 resampled data sets). Similar results were obtained with simulated data (Table S1). In the simulation, a noncritical issue arose when the curve could be approximated equally well by two different models (for example, the Weibull with a shape parameter of 1 is the equivalent of the exponential), but model predictions were essentially identical in such cases. These results indicate that the AIC model selection can effectively select the CTR-derived function that corresponds with the true dispersal kernel in the evaluated cases.
The Effect of Search Radius
Bias rapidly decreased as a higher proportion of seeds are recovered (i.e. search radius increased) up to a point, and then levelled off (Fig. 3). In fact, there is little improvement in the estimate when more than 50% of seeds are recovered, which would have been accomplished with a 7·5-m search radius for Astrocaryum seeds. The simulated results also show acceptable bias when a large proportion of the seeds are recovered (≥50%), which suggests that this result is robust (Fig. S1).
The Effect of Overlooking Seeds (False-Censoring)
The CTR method was highly sensitive to false-censoring of seeds within the search radius (Fig. 4). The CTR method worked well when <5% of the seeds that dispersed within the search radius were overlooked, but bias was substantial when larger proportions were overlooked. For example, starting from the 5% threshold of falsely censored seeds, the proportional bias increases exponentially from 29·4% (Fig. 4.) Simulations showed that the strength of the bias due to falsely censored seeds depended on the ‘fatness’ of the tail of the distribution (the proportion of seeds dispersing long distances); ‘fat-tailed’ distributions, such as the lognormal, were relatively sensitive to overlooked seeds (Fig. S2). This shows that the assumption of the CTR method that all nonrecovered seeds dispersed beyond the search radius is critical and that the effect of violating this assumption is much greater than the effect of using a relatively small search radius.
Seed-tracking studies typically classify nonrecovered seeds as missing observations, which produces an inherent bias against longer distance seed movement. Here we used a full-dispersal data set, obtained with telemetric seed tags, to evaluate an alternative method for handling these nonrecovered seeds: CTR (Jansen, Bongers & Hemerik 2004; Jansen, Bongers & van der Meer 2008). We found that the CTR method can produce excellent approximations of the true dispersal kernel as long as 50% or more of the seeds dispersed are recovered and <5% of the seeds dispersed within the search radius are overlooked (Fig. 3). In all cases evaluated, the CTR method approximated the true dispersal kernel better than the standard practice of omitting nonretrieved seeds from the data set.
The ability to accurately predict dispersal distance at a given percentile using the CTR method was greatly affected by the choice of function that was fitted to the survival estimates. However, even when using truncated data, it was generally possible to choose the ‘correct’ function with the use of the AIC selection method. This appears to be independent of the shape of the kernel, as demonstrated in our simulation results (Table S1). We advise researchers to take care in selecting a set of dispersal models from which to conduct model selection as appropriate functions will vary across dispersal systems. Also note that in some systems, complex multimodal kernels exist (e.g. Russo, Portnoy & Augspurger 2006). These cannot be described with commonly used simple (decay with distance) seed dispersal functions (Cousens, Dytham & Law 2008). Researchers should also be aware that problems can arise when estimating 2D seed density using the best-fitting kernel obtained from 1D data, as not all 1D models are equally appropriate for translation to two dimensions. Depending on the precise mathematical formulation, some 1D kernels (e.g. the exponential) that allow for nonzero predictions at the origin will result in infinite densities at the point zero (a/2πr = ∞ when r = 0) when translated to 2D. A list of suitable two-dimensional density kernels which can be freely translated from 1 to 2 dimensions are listed in the study by Cousens, Dytham & Law (2008); table 5.2).
The CTR method worked surprisingly well even when sampling a small part of the full dispersal kernel, producing accurate results even when 50% of seeds fell outside the search radius. For the Astrocaryum in our study, this could have been met with a 7·5-m search radius (assuming no seeds are overlooked). Given that the shape of seed dispersal kernels can vary between years and between species (Greene et al. 2004), we recommend researchers choose a search radius that includes at least 50% of their tagged seeds. We also encourage further tests of the CTR method on different plant species and in systems where dispersal occurs at different spatial scales. Even if the 50% cut-off cannot be applied to any and all study systems, our results provide guidelines for the experimental design of future seed-tracking studies. Our results suggest that the traditional search radius of 20–30 m is sufficient for use with the CRT method if seed dispersal is on a similar scale as Astrocaryum.
We found that error resulting from falsely censored seeds within the search area is a much larger concern than the proportion of seeds censored. Overlooked seeds can greatly distort the results, and the accuracy of the CTR method is extremely sensitive to these observer errors. If seeds in a given study system are easily overlooked, or if the search radius is too large to efficiently find >95% of seeds that fall within the area, the CTR method could lead to large overestimations of long distance dispersal. In addition, if the seeds in a given study system are completely destroyed when eaten, this may have a similar effect as overlooked seeds. The CTR method can only be used in systems where eaten seeds can be retrieved or where seeds are never immediately consumed. Depending on how easy it is to overlook seeds with a particular tracking method, a trade-off could exist between the size of the search area and the amount of overlooked seeds. We suggest that researchers choose a search radius and tracking method that yields low rates of overlooked seeds. We also feel that it would be useful for researchers to empirically test the efficiency of their field crew in detecting seeds to ensure that they are within the range recommended by our sensitivity analysis.
Comparing CTR-derived estimates of seed dispersal kernels vs. the true kernel showed that the CTR method can accurately estimate the dispersal kernel using truncated seed-tracking data. It should also be possible to reanalyse data from previously published studies to extract complete dispersal kernels, provided that the search radius is reported and that the search was full and reliable. Our results show that the CTR method can be used in conjunction with standard tagging methods to adequately approximate complete seed dispersal kernels by collecting enough data over a smaller area to characterize the scale and shape of the relationship. For example, CTR would be ideal in conjunction with radioisotope labelling because Geiger counters allow retrieval of a very high proportion of cached seeds in a given area (Vander Wall 1997). These radioisotope labels can also be used to recover the seed coat of eaten seeds. Low-tech seed-tagging methods such as thread tags and fluorescent marking are typically much more economically feasible than methods that allow the measurement of complete seed dispersal kernels, such as genetics or radiotelemetry. Using these methods along with the CTR would allow researchers around the globe to obtain credible dispersal kernels from more plant species, thus extending our understanding of seed dispersal, and plant ecology in general. CTR should entirely replace the traditional practice of simply ignoring missing seeds.
We thank Eelke Jongejans and two anonymous reviewers for valuable comments to an earlier version of the manuscript. This study was supported by funding from the National Science Foundation (NSF-DEB 0717071 to RWK) and the Netherlands Organization for Scientific Research (grants W85-239 and 863-07-008 to P.A.J.). M.D.V. acknowledges funding from the Smithsonian Tropical Research Institute fellowship programme.