Data on the distribution of C. ohridella were compiled from questionnaires collected in 1997 and 1998 and from visual surveys that we carried out between 1998 and 2000 in Germany. In 1997, data were compiled from questionnaires and concern Bavaria and a few isolated spots further north (n = 203). In 1998, data were compiled from surveys carried out in Bavaria and other parts of Germany (n = 1480), questionnaires (n = 224) and additional visual observations (n = 31). In 1999, surveys were extended to Nordhrein-Westfalen (n = 1308) in addition to Bavaria (n = 955). In 2000, only the northern part of Germany was surveyed (n = 676). Leaf damage scores were used in surveys and questionnaires (0%; 0–10%; 10–25%; 25–50%; 50–75%; > 75%). These scores were estimated in the field by recording the maximum score observed at each sampling location. These data were grouped by city code by using the highest score. The numbers of surveyed city codes were thus 146, 855, 933 and 331 in 1997, 1998, 1999 and 2000, respectively (Fig. 1). Preliminary data treatment consisted of establishing annual presence/absence distribution through the 1996–99 periods. Only 276 of 1962 sampled city codes were surveyed more than once, and it was necessary to estimate infestation status (presence/absence) in years when no observations had been made. Locations surveyed more than once were thus used as a training set to derive prediction rules on past or future infestation status. Extinctions from one year to the next were very rare (1·16%), and the first rule assumed therefore that year n was infested if any previous year was infested, and conversely, that year n was uninfested if any following year was uninfested (Table 1; rows 1 and 2). Moreover, there was a clear relationship between infestation status in year n and infestation level in year n+1 and n+2, respectively, because very recently infested sites generally have lower levels of infestation than sites infested from a longer period. The second and third rules were thus based upon logistic regression models built to predict the infestation probability in year n as a function of infestation score in year n+1 and n+2, respectively (Table 1; rows 3 and 4). The cut-off value of 0·5 was used to predict infestation status (0/1) as a function of the logistic models estimated probability. Finally, predictions were not made when a site was observed to be uninfested in years n−1,2,3 or infested only in year n+3 (Table 1; rows 5 and 6). At the end of this process, the number of points with observed or predicted infestation status was 1233, 1877, 1928 and 1632 in 1996, 1997, 1998 and 1999, respectively, of which 4736 (71%) were derived from the rules summarized in Table 1. Areas with low spatial and temporal sampling intensity were excluded from further analyses.
Figure 1. Distribution of C. ohridella damage observations in Germany summarized per city code in 1997 (n = 146), 1998 (n = 855), 1999 (n = 933) and 2000 (n = 331) throughout Germany. The greyed shading corresponds to areas with low spatial or temporal sampling density excluded from the analysis.
Download figure to PowerPoint
Table 1. Rules used to predict past/future infestation status
|Observed past/future status||Predicted status in year n||Infestation probability||Case no.|
|Year n−1,2,3 infested||Infested||P = 0·984|| 457|
|Year n+1,2,3 uninfested||Uninfested||P = 0·016||1932|
|Year n+1 infested||Logistic model (% correct = 90·9; P < 0·001)||Logit (P) 1·388 Inf*−0·541||1246|
|Year n+2 infested||Logistic model (% correct = 83·1; P < 0·001)||Logit (P) 1·743 Inf**−3·742||1101|
|Year n+3 infested||Unknown||–||–|
|Year n−1,2,3 uninfested||Unknown||–||–|
These enhanced observations for infestation status were then interpolated using ordinary kriging (Isaaks & Srivastava 1989) to estimate infestation probabilities in unsampled locations. The global trend was modelled as a linear function of the spatial coordinates, and the residuals of the linear models were used to create a model of semivariance used for ordinary kriging. The results of the ordinary kriging were then added to the trend model (see Table 2 for trend models and ordinary kriging parameters). Global trend functions, standardized semivariograms and ordinary kriging estimates were calculated using the software Surfer 8 (Golden Software Inc., Golden, USA). The interpolated distribution of infestation probabilities was used for mapping, and to delineate the most probable area of initial infestations, i.e. area where the 1996 interpolated infestation probability was higher than 0·9.
Table 2. Global trend model and kriging model parameters used to interpolate infestation status. R2 quantify the relationship between semivariance spherical models used for ordinary kriging and experimental standardized semivariograms
|Year||Global trend model||Nugget||Sill||Range (km)||r2|
|1996||Z = −1·27 10−18X − 2·21 10−18Y + 1·41 10−11||0·54||0·49||104||0·969|
|1997||Z = −1·27 10−18X − 2·96 10−18Y + 1·88 10−11||0·56||0·46|| 68||0·963|
|1998||Z = −2·24 10−18X − 5·22 10−18Y + 3·32 10−11||0·29||0·75|| 75||0·946|
|1999||Z = −1·97 10−18X − 2·90 10−18Y + 1·83 10−11||0·34||0·70|| 69||0·977|
Data on human population densities were obtained from the Gridded Population of the World (GPW), Version 2 (CIESIN 2000), resampled at the 2·5 km resolution.
A spatially explicit model was build to explore simulated spread according to different dispersal assumptions. This model was carried out using data resampled at a 2·5-km resolution using the maximum summarizing function if more than one observation was found in a 2·5-km cell. The model was developed within the Arcview GIS 3·2 platform (ESRI, Redlands, CA, USA), using the Spatial Analyst and Avenue programming language. We assumed three generations per year over the 4 years (as generally observed in Germany; Freise & Heitland 2001), and the algorithm we used to simulate the spread over one generation involved four steps. First, the distance of each cell to the nearest occupied cell in the previous time step (cells with value equal to 1) was calculated. Secondly, the infestation probability of each cell was estimated as a function of that distance, and as a function of the local human population density in the last model. Thirdly, a layer of random numbers was generated and cells with a random number lower than their infestation probability were set as occupied. Fourthly, each cell's infestation status was updated and the algorithm re-iterated. The algorithm started with the initial distribution (set as the area where interpolated infestation probability in 1996 is > 0·9) and iterated until the 12th generation (i.e. for three years).
The functions used to estimate infestation probability varied according to the model used (Fig. 2). In the first model, termed ‘diffusion’, infestation probability decreased as a function of distance following a normal curve (Fig. 2a). This type of model, used for half a century to model biological invasions, is expected to produce a travelling wave of constant velocity (Shigesada & Kawasaki 1997). Dispersal distance distributions are frequently leptokurtic, i.e. with more propagules near the centre and in the tail than in the normal distribution and this can have a substantial impact on spread predictions (Jeltsch et al. 1997). The second model (Fig. 2b), termed ‘leptokurtic dispersal’, used a leptokurtic function that was shown to generate accelerating travelling waves (Kot et al. 1996) observed in some invasions. The third model, termed ‘stratified dispersal’, assumed that propagules disperse by two independent dispersal processes occurring at different spatial scales, each one having dispersal distance distributed normally (Fig. 2c). Infestation probability was estimated by the probabilities of short-distance dispersal (P1) and long-distance dispersal (P2) combined with an ‘OR’ statement. In the fourth model, it was assumed that long-distance dispersal probability was a function of the distance to the nearest infested cell (P2) combined with a function of human population density (P3) by an ‘AND’ statement (Fig. 2d). P3 was estimated using a logistic function such as to have a maximum probability when human population was maximum, a minimum probability when human population was minimum and a probability equal to 0·5 when ln(human population density) was equal to the observed median. P3 was estimated as
Figure 2. Function used to predict infestation probability as a function of distance to the shortest infested cell in the four models: (a) diffusion model; (b) leptokurtic dispersal model; (c) stratified dispersal model (in bold) combining a short-scale and large-scale diffusion model; and (d) stratified dispersal model combined with the effect of human population density. P (bold lines), P1, P2 and P3 (thin lines) are probabilities, x is the distance to the nearest infested cell, and α, β and γ are model parameters. The figure illustrates the shape of the different functions and the way they are combined and does not correspond to actual distance or human population density units.
Download figure to PowerPoint
- (eqn 1)
where h is ln(human population density), hm is the observed median ln(human population density) and φ is a scaling factor. Each model was fitted using a stochastic optimization procedure. We set Ii,t the observed infestation status at location I in year t. Each simulation generated 12 layers of occupied (1) and unoccupied (0) cells (4 × 3 generations). The 3rd, 6th, 9th and 12th iteration layers were used as predictions for the years 1996, 1997, 1998 and 1999, respectively. For each parameter set, 500 runs were performed and the average infestation status of the 3rd, 6th, 9th and 12th iteration layers calculated. This average layer constituted the set of Si,t simulated values. The set of observations Ii,t (0 or 1) comprised the raw data against which the set of simulations Si,t (decimal values ranging from 0 to 1) could be evaluated by estimating the sum of the squared difference between observed and simulated values [SSE =∑i,t(Ii,t − Si,t)2]. The r2 models were estimated as a complementary measure of model fitness as 1 – SSE/SST with SST =∑i,t(Ii,t − ?t)2 where ?t is the average of Ii,t. For each model, we identified the parameter values that minimized the SSE. In the fourth model, the short-scale diffusion terms α from the best-fitted stratified dispersal model was used because we assumed that human population density should not affect short-term dispersal.
In addition to the least-squares approach used to compare the predictive power of the spatial models, we wanted to compare the spatial geometry of their predicted distribution. The fractal dimension has been shown to be a useful way to characterize the geometry of a number of patchy spatial patterns (Li 2000 and references therein), especially when patterns exhibit self-similarity at multiple scales. The fractal dimension D of the distribution of C. ohridella over 1996–99 in Germany was estimated and compared to the fractal dimension of each best-fitting spatial model. D was determined using the semivariogram method (Burrough 1983). The semivariogram is defined as:
- (eqn 2)
where N(h) is the number of pairs of data points separated by distance h, zi and zi+h are the infestation status at location i and i+ h, respectively (Rossi et al. 1992). D can be calculated from the slope m of the double logarithmic plot of γ(h) vs. h by D= (4 – m)/2 (Burrough 1983). The fractal dimension standard error was calculated as the standard error of the regression slope divided by 2. A fractal dimension of 1 implies strict spatial dependence, i.e. homogeneity, and a dimension of 2 implies complete spatial randomness. We wanted a single measure of fractal dimension that would characterize the distribution over the 1996–99 periods. The slope m was thus estimated on a log-log plot of γ(h) vs. h by pooling the semivariogram values from the 4 years in the observed distribution, and from 4 years × 100 runs in the simulated distributions. All semivariograms were estimated with the software Surfer 8 (Golden Software Inc, Golden USA) up to a distance of 200 km with a distance interval of 5 km.
Finally, spatial models with their best-fitting parameters were run at the European scale at the same resolution (2·5 km) by forcing the initial foci to the first observation location in lake Ohrid in 1985, such as to compare our models to the European spread reviewed in Sefrova & Lastuvka (2001). The model was run by assuming three generations per year over 1985–2002 through the following countries: Albania, Austria, Belgium, Bosnia Herzegovina, Bulgaria, Croatia, Czech Republic, Denmark, France, Germany, Greece, Hungary, Italy, Luxembourg, Montenegro, Netherlands, Poland, Romania, Serbia, Slovakia, Slovenia and Switzerland. Each model was run 500 times and the date of first occurrence in each country over all simulations constituted our set of simulated observations.