Environmental impact prediction using neural network modelling. An example in wildlife damage

Authors


Dr François Spitz, Institut de Recherche sur les Grands Mammifères, INRA, B.P. 27, 31326 Castanet-Tolosan cedex, France. E-mail: spitz@teleirgm.toulouse.inra.fr

Abstract

1. Decision making in management of environmental impact confronts the problem of analysing relationships in highly complex ecological systems. These relationships are generally non-linear, and conventional techniques do not apply satisfactorily. The problem is equivalent to that of predicting the output of a black box. Artificial neural networks (ANN) have shown tremendous promise in modelling such situations.

2. The present work describes the development and validation of an ANN in modelling wildlife damage to farmland, a particular instance of ecological impact. An ANN approach was developed and tested using data from 200 damaged plots, and a control sample of 20 undamaged plots, described by 17 environmental characters. The dependent variable was the financial cost of impact per plot.

3. The predictive quality of the ANN models was evaluated through the ‘leave-one-out’ procedure. For 82% of predicted values, deviation from observed values is lower than 1780, that is 10% of the range of the observed values. The frequency of bad predictions depends on the minimum level of impact (critical effect size) considered. In France, compensation is given starting from a minimum level of 200 FF. At this level, the frequency of occurrence of Type I errors (predicted impact does not occur) is 7·11%. Different strategies of prevention of impact, using different threshold values for prevention, are analysed. Frequency of Type I errors increases, and that of Type II errors (occurrence of unpredicted impact) decreases as the threshold for prevention increases.

4. Sensitivity analysis allows the determination of the effect of seven quantitative variables on the cost of damage compensation. Proximity of a paved road, proximity and number of houses, and number of other buildings contribute negatively, and three other variables (proportion of the perimeter of the plot occupied by woody vegetation, density of the vegetation, and density of wild boar in the surrounding area) contribute positively to the predicted value.

5. Results show that the utility of impact prediction for prevention depends on the cost of errors and the absolute cost of prevention.

6. Finally, ANNs proved able to learn complex relationships between environmental variables and impact assessment, and to produce operationally relevant predictions. Good predictions can help managers to distribute efficiently their actions between prevention, protection and compensation.

Introduction

Decision makers in charge of the management of biological resources or ecosystems constantly confront the problem of predicting the short-term or long-term effects of their decisions. The ecological impact of economic or technical measures as well as the economic impact of environmental measures have to be assessed. Impact assessment, in most instances, does not concern the global dynamics of an ecosystem, but only some particularly significant issues, e.g. change in biodiversity in a particular area, change in the distribution of parasites of cultivated plants, or change in the yield of a particular crop. Such predictions are quite difficult to make because relationships between variables in environmental sciences are often non-linear, while usual methods are based on linear principles. Most of the statistical methods reviewed by James & McCulloch (1990) assume that relationships are smooth, continuous, and either linear or simply polynomial. Conventional techniques, preferentially based on multiple regression, are capable of solving many problems, but sometimes show serious shortcomings. Non-linear transformations of variables (logarithmic, power or exponential functions) appreciably improve the results, but are often still unsatisfactory. Techniques based on correlation coefficients are often inappropriate (ter Braak 1986). In other words, management of biological resources is similar to managing a black box where the regulatory decisions are in input and impact of decisions in output. Owing to the technical reasons mentioned above, the prediction of global or local relationships in very complex systems can benefit from the use of artificial neural networks (ANN) (Rumelhart, Hinton & Williams 1986; Stern 1996), especially in ecology. For instance, Colasanti (1991) found similarities between ANN and ecosystems and recommended the use of ANN in ecological modelling. In a review of computer-aided research in biodiversity, Edwards & Morse (1995) underlined that ANNs have an important potential. Relevant examples are found in very different fields in applied ecology, such as modelling the greenhouse effect (Seginer, Boulard & Bailey 1994), predicting several parameters in brown trout management (Baran et al. 1996; Lek et al. 1996a,b), predicting phytoplankton production (Scardi 1996), predicting fish diversity (Guégan, Lek & Oberdorff 1997), predicting the production/biomass (P/B) ratio of animal populations (Brey, Jarre-Teichmann & Borlich 1996), and predicting farmer risk preferences (Kastens & Featherstone 1996). Most of these works show a better performance of ANN compared to more classical modelling methods.

We have selected an example of risk assessment in wildlife damage management (Slate et al. 1992) because, in this example, regulatory measures (hunting regulations and damage regulations) result in an economic cost through an ecological black box (the network of ecological, physiological and behavioural relationships between the agro-ecosystem and the wildlife populations). In France, hunting regulations consist of limitations in the number of hunting days per week, and bag limitations for some larger game species. Damage regulations consist of a compensation system where hunters contribute to paying for agricultural damage (there is no compensation system for damage to forest). Application of these regulations, since the seventies, has coincided with the persistent expansion of the geographical range of deer Cervus elaphus L. and Capreolus capreolus L. and wild boar Sus scrofa L., and a continuous increase in the density of the populations of these species, together with an increase of their impact on natural or cultivated vegetation. Deer and wild boar cause the majority of agricultural damage (wild boar account for ≈ 90%). Besides the direct effect of national legislation, it is undeniable that expansion of large mammal populations is a continental phenomenon (Mitchell-Jones et al. 1999) that is linked with environmental causes, such as changes in land use and landscape, and rural depopulation. Hunting organizations in France are in charge of administrating the compensation. They can also pay for crop protection against wildlife, rather than paying compensation for the damage. In the latter case, hunting organizations need to know which cultivated plots deserve protection, and which do not. To answer this question, we investigated the feasibility of quantifying the damage expected in a plot using an ANN with a limited number of environmental descriptors in input. Quantifying damage is just a particular instance of environmental impact assessment, and our objective was also to demonstrate that ANN can help solve impact problems in complex ecological systems.

Methods

Study site and data collection

Data were collected in Aude (southern France), an administrative district c. 5000 km2 in area. Throughout the district, observations were made by wildlife officers, in spring and autumn 1995, on 200 plots where wild boar damage had been recorded. A control sample of 20 plots without recorded damage, located in the vicinity of the damaged plots, were described using the same procedure. For each damaged plot, the wildlife officer characterized the damage by estimating the percentage of destroyed area, and converting it into an amount of production lost, or its cash equivalent. None of these conventional estimates of the impact is really satisfactory from an ecological viewpoint. The ‘percentage of area destroyed’ depends upon the surface area of the plot. The absolute area destroyed can be derived from it but is only meaningful as an estimate of the local activity of the damaging animals. When converted into the amount of production lost, this estimate has more ecological meaning, but wildlife managers are more interested in the financial cost (depending on the market value of the crop) balanced against the cost of protecting the plot. For these reasons, the expected financial cost of the damage to a plot was the predicted variable in our model.

The input descriptors of our ANN were those used in a previous paper, with completely different data and objectives (Spitz, Lek & Dimopoulos 1996), following a field screening study of the relationships between the probability that a cultivated field would be visited by large game and numerous descriptors of the plot and its environment (Laporte 1991). Our list of descriptors is as follows:

• Elevation: mean elevation of the plot in metres a.s.l. (taken from IGN© maps).

• Topography: the observer classified the plot (or the area where it is located) such as plateau, bottomland, slope, summit, valley or canyon.

• Orientation: the observer classified the plot roughly as north-facing, south-facing, east-facing, west-facing, or without any definite orientation.

• Water: the observer looked for any running or stagnant water in the vicinity of the plot (the response is presence or absence).

• Fences: the observer indicated the presence of a fence all around the plot, or on part of the perimeter, and whether electrified or not.

• Road: distance to the nearest paved road (taken from IGN® maps).

• Tracks: the observer indicated the presence of dirt tracks (used for farm work) on one or several sides of the plot.

• Houses: distance of the nearest inhabited house (taken from IGN© maps).

• Distances d from the nearest road or house were transformed into an index of proximity, using:

ip = Max (d) - d(eqn 1)

where Max(d) is the greatest distance observed.

• Inhabitants: number of inhabited houses within 500 m of the plot.

• Buildings: number of other buildings within 500 m of the plot.

• Enclavement: the observer (in the field and using maps) indicated how many sides of the plot (supposedly rectangular) consisted of ‘natural’ vegetation, which type of vegetation it was (deciduous forest or coppice, coniferous young or old plantation, woodland, moorland, heathland, grassland), and the density of plant cover.

• Cultivated plant: plants cultivated in the studied plot were classified into six categories (cereals, maize, grass, vine, oil-seed plants, protein-seed plants).

• Height: the observer indicated the height of the cultivation at the occurrence of damage (less than 50 cm, 50 cm to 100 cm, more than 100 cm).

• Boars: average number of wild boars killed per km2 in the administrative unit where the plot is located. This number was used after logarithmic transformation.

Data processing

The relationships between environmental characteristics and wildlife damage were studied with a modelling method based on one of the principles of neural networks, the algorithm of backpropagation of errors (Rumelhart, Hinton & Williams 1986). It is based on a mathematical representation of neurone function, by transforming neurone activation into a non-linear type response. A network with backpropagation of errors typically comprises three or more layers: an input layer, one or several hidden layers and an output layer, each consisting of one or several neurones. All the neurones of a given layer, except those of the last layer, emit an axon to each neurone of the layer downstream. The network is said to be entirely connected. In most cases, to limit calculation time and especially when the results are satisfactory, a single hidden layer is used (Fig. 1). The input layer contains n neurones coding the n elements of information (X1 . . . Xn) at the input of the network. The number of neurones in the hidden layer is chosen by the user according to the reliability required for the results (Smith 1994; Lek, Dimopoulos & Fabre 1996; Stern 1996). Finally, the output layer comprises the neurones responsible for producing the results. Each connection is characterized by a modifiable weight. During the training phase, the network is designed to compare expected and calculated values, and to modify connection weights in order to make these values equal. Neurones of the hidden and output layers evaluate the intensity of the stimulation from the neurones of the previous layer by the following relationship:

Figure 1.

Typical three-layered feed-forward artificial neural network: input nodes (I) corresponding to 17 independent environmental variables, five hidden layer nodes (H) and one output node (O) corresponding to the estimate of damage compensation. Connections between nodes are shown by solid lines: they are associated with synaptic weights that are adjusted during the training procedure. The bias nodes are also shown, with 1 as their output value. The sigmoid activation functions are plotted within the node.

image(eqn 2)

where aj is activation of the jth neurone of the current layer, Xi is output of the ith neurone of the previous layer, Wji is the synaptic weight of the connection between the ith neurone of the previous layer and the jth neurone of the current layer.

After computing the weighted sum (eqn 2), the way that the response of hidden and output neurones is calculated from their net input depends on the type of activation function used in the ANN. Most backpropagation ANNs use the additive sigmoid type function, because of its non-linearity:

image(eqn 3)

The technique of backpropagation is related to supervised learning (to learn, the network has to know the reply that it should have given). With this technique, the intensity of the connection is modified to minimize the error of the response. The estimation of the error signal differs according to the layers considered. Many articles, notably by Rumelhart, Hinton & Williams (1986) and Weigend, Huberman & Rumelhart (1992) detail the algorithms of backpropagation of errors. One can use parameters such as η (learning coefficient) and α (momentum), which serve to accelerate learning while preventing the network from falling into local minima. Network learning continues until minimization of the sum of the squares of the errors (SSE) given by the relationship:

image(eqn 4)

where Yj is the expected value at the output of the network (‘Theoretical value’); Ŷj is the value calculated by the network (neurone of the output layer); and j = 1..N: number of observations. Computing was performed in a Matlab® environment for Windows®.

The range of input variables differed by an order of magnitude. To standardize the measurement scales, inputs were converted into reduced centred variables. The qualitative variables were recoded into completely disjunctive variables. The dependent variable (damage cost) was also converted in the range [0..1] to adapt it to the demands of the transfer function used (sigmoid function).

In the present study, our aim was to determine the predictive and explanatory performance of artificial neural networks (ANN). Modelling was carried out in two steps. In a first step (model fitting), we fitted the model with the whole sample, a matrix of 220 observations described by 17 variables. This allowed us to estimate the sensitivity of the variables and the overall performance of the model. In a second step (model testing), we performed the normal procedure for ANN use, i.e. training the model with part of the observations, then testing it with the remaining part. Two types of testing have been described (Efron 1983; Jain, Dibe & Chen 1987; Kohavi 1995): ‘holdout’ and ‘leave-one-out’. The holdout procedure requires large databases, possibly with replicate observations, one or several sets of replications being used as a training sample, the other sets of replications constituting the test sample (Lek, Dimopoulos & Fabre 1996). Our study falls into another category, where every new observation is unique, and can be added to the training sample according to a posteriori validation (amount of damage actually observed). This situation is more relevant to the leave-one-out procedure, where each observation is tested using a model trained by all the other observations. In fact we performed 220 training phases with 219 observations followed by 220 testing phases with one observation.

To evaluate the quality of the models, we could have used the determination coefficient (r2) or correlation coefficient (r), but because of the scarcity of high values of damage cost, we preferred to use a performance index based on the number of values correctly returned by the model. We considered predicted values deviating from observed by plus or minus 5% or 10% of the range of the predicted variable (i.e. 890 FF and 1780 FF, respectively).

The results of the leave-one-out procedure were evaluated following Mapstone (1995), who recommends giving primacy to the stipulation of a critical effect size (ES), in this instance the level of impact that brings about a decision. Stipulating ES should not depend on an arbitrary choice, such as the critical 0·05 ‘significance level’ currently used in ecology, but it should depend on biological, ecological or socio-economic considerations. In the case of wildlife damage, there are two critical ESs because managers must consider (i) the minimum level of expected impact (or cost) for which prevention methods should be applied (critical ES I); and (ii) the minimum level of observed impact (or its cost in local currency) for which compensation should be paid (critical ES II). Critical ES I has a corresponding Type I error, when an impact was predicted (i.e. impact cost was predicted to exceed critical ES I, thereby leading to a prevention decision), and does not actually occur. Critical ES II has a corresponding Type II error, when an impact was not predicted (i.e. impact cost was predicted to be lower than critical ES II), and actually occurs, thereby resulting in compensation. We used different ES levels for analysis of different prediction-prevention strategies.

The cost of compensation, in the absence of prevention measures, is given by:

A = ∑Di=1 ci(eqn 5)

where ci is the cost of compensation for plot i in a sample of D damaged plots. The cost of protection is given by:

B = ∑Pj=1 fj(eqn 6)

where fj is the cost of protection (fencing) of plot j (proportional to the perimeter of the plot in the case of electrified fencing) in a sample of P plots for which prevention was decided. We simplified this calculation by considering that all plots were square and had the average surface area (2 ha) observed in our sample, i.e. approximately 600 m in perimeter.

Results

Model fitting and sensitivity analysis

The results of the ANN model, with 500 iterations and 5 neurones in the hidden layer, are presented in Fig. 2. The correlation coefficient (r) computed for the regression between observed and estimated values is 0·945. Figure 2(a) shows that the ANNs gave satisfactory results over the whole range of values of the dependent variable. The points are well aligned on the diagonal of the perfect-fit line (co-ordinates 1 : 1). Although poorly represented, the highest values of the output variable are aligned along the fit line. Nevertheless, a few points lie far off. Some low values are slightly underestimated. For 77·7% of predicted values, deviation from observed values does not exceed 890 (5% of the range of observed values). For 94·6% of predicted values it does not exceed 1780 (10% of the range of observed values) as shown on Fig. 2(a). The distribution of residuals does not seem to be normal, as there is an exaggerated clustering of residuals toward the centre (averaging to zero) and a straggling tail towards large positive values (Fig. 2c). Thus, the assumption of normality may not hold. Lilliefors test of normality of residuals (Lilliefors 1967) gives a maximum difference of 0·152 (P < 0·001). The study of the relationship between residuals and values estimated by the model shows complete independence (Fig. 2b, r = 0·003, n = 220, P = 0·956). Figure 2(b) shows that the points are well distributed on both sides of the horizontal line representing the average of residuals.

Figure 2.

Results of fitting the model with 220 observations and a 17-5-1 network. (a) Scattered plot of estimated values vs. observed values. The solid line indicates the perfect fit line. The two dashed lines are separated from the fit line by 10% (1780) of the maximum range of damage cost (17800). (b) Relationship between residuals and estimated values. (c) Distribution of residuals with normal adjustment curve.

Figure 3 shows results of the sensitivity analysis of seven quantitative variables. Damage cost is negatively correlated with the proximity of paved road, proximity of house and the number of houses in the vicinity. A negative but less linear relationship is also observed between damage cost and the number of other buildings. There is a positive relationship, represented by sigmoid curves, between damage cost and the degree of enclosure or the density of surrounding vegetation, which means that damage cost increases in the median part of the range of these variables. Last, there is a linear but weaker positive relationship between damage cost and the number of wild boars culled, considered as an index of wild boar population density.

Figure 3.

Sensitivity profiles (or ‘responses’) of 7 quantitative independent variables to the predicted value of damage cost. Each independent variable is tested vs. the 6 other independent variables placed at one of five standard levels (minimum, 1st quartile, median, 3rd quartile, maximum). The horizontal axis is a common scale for all independent variables tested (from level 1 = minimum value to level 12 = maximum value). Vertical axis is the amount of compensation.

Model testing

The predictive performance with the leave-one-out procedure is represented in Fig. 4(a), which shows the scattered plot of observed vs. predicted values. The rather low correlation coefficient of 0·27 does not account for the particular pattern of this plot. In fact, data fall into three categories: (i) on the fit line (prediction = observation); (ii) overestimated values (observed values are close to zero and predicted values are high); and (iii) underestimated values (observed values are high and predicted values are close to zero). This graph is striking because of the lack of intermediate cases. This particular pattern explains that the practical result of the model is far better than could be expected from the correlation coefficient: 88·2% of the predictions are at less than ± 20% from the observed value, and 82·3% are at less than ± 10%. The study of the relationship between residuals and predicted values shows some dependence (Fig. 4b) with a significant correlation coefficient (r = − 0·55, n = 220, P < 0·001). Figure 4(b) shows that this dependence results from underestimates or overestimates of a few observations, as said before. The histogram in Fig. 4(c) (mean = − 8, standard deviation = 3035) shows a large majority of observations close to the average.

Figure 4.

Result of testing the model with 220 observations and a 17-5-1 network by the leave-one-out procedure. (a) Scattered plot of predicted values vs. observed values. The solid line indicates the perfect fit line. Filled circles indicate predicted values deviating from observed values by more than 20%. Filled squares represent predicted values deviating from observed values by 10% to 20%. (b) Relationship between residuals and estimated values. (c) Histogram of distribution of residuals with normal adjustment curve.

Influence of critical effect size on the frequency of errors and the cost of errors

In France, compensation is granted when evaluation of impact exceeds 200 FF. This is the first value we can assign to critical ES II, i.e. we know that, if a plot is predicted ‘without impact’, the prediction will be false if actual impact exceeds 200 FF. In the sample of 200 plots where damage was declared, our ANN predicts impact below 200 FF in three plots only, all of them with an actual impact below 200 FF. In the control sample of 20 plots without damage, all predictions that damage will be lower than a particular threshold are true by design (no Type II errors). Hunting organizations grant prevention measures (fencing) on the basis of previous experience and field evaluation by gamekeepers. In practice, it means that protection is proposed when expected damage largely exceeds critical ES II (i.e. critical ES I is certainly greater than critical ES II). We analysed the frequency of Type I errors (i.e. useless protection is given to a plot where impact is actually smaller than critical ES I) and Type II errors (i.e. protection is not given to a plot where actual impact exceeds critical ES I), in the sample of 200 damaged plots, at six levels of the threshold value for prevention, from 200 FF to 4000 FF. Table 1 shows that the frequency of occurrence of Type I errors regularly increases along this sequence from 7·11% to more than 50%. A third of the predictions are false by the level 1000 FF, which presumably could be the lower limit to apply prevention measures. Type II errors decrease from ≈ 26% at the level 1000 FF. The frequency at level 200 FF is based on only three obervations. Proportions of predictions exceeding each level are similar to frequencies of Type II errors in the test sample, so the frequency at level 200 FF in a sample of plots randomly selected would presumably be not far from 50%.

Table 1.  Observed frequencies of Type I (predicted impact does not occur) and Type II (occurrence of unpredicted impact) errors, assuming that decision of protection is taken at different minimum levels of impact
Level of impact (in FF) for protectionTest sample Frequency of Type I errorsTest sample Frequency of Type II errorsControl sample (n = 20) Proportion of pre- dictions exceeding the level
2000·0711 (n = 197)0·0000 (n = 3)0·5000
5000·2035 (n = 172)0·2857 (n = 28)0·3000
10000·3250 (n = 120)0·2625 (n = 80)0·2000
20000·4468 (n = 47)0·2092 (n = 153)0·1000
30000·4643 (n = 28)0·1279 (n = 172)0·1000
40000·5500 (n = 20)0·0944 (n = 180)0·0500

Managers may be more interested in the cost of errors (economic cost for example) than by their frequency. The cost of errors has to be considered using different strategic options. A first strategic option is that where no direct prevention measures are to be taken. Prediction of the cost of impact would be used only to calculate the compensation budget. In this situation, the model is globally a good predictor: the total cost of impact predicted for the 200 plots of the damaged sample is 376944 FF, not far from the observed total of 404306 FF. This gap remains approximately the same when choosing critical levels other than 200 FF. Other strategic options are those where prevention measures are applied when the predicted cost of impact exceeds a certain level (critical ES I). The total cost of errors is the cost of unjustified prevention (resulting from Type I errors) plus the cost of compensation of erroneously non-protected plots (resulting from Type II errors). Table 2 shows the comparisons of costs of prevention and compensation in different hypotheses of prevention measures, knowing that the compensation threshold stays at 200 FF. Cost of prevention is estimated at ≈ 4 FF m–1, following Breton (1994) and Vassant (1994), i.e. 2400 FF per plot in our simplifying hypothesis. Cost of prevention plus compensation drops between the levels 1000 and 2000 FF, then increases slowly, but the cost of errors (difference between the observed costs and those that would have been observed if all predictions were exact) stays approximately the same in all hypotheses. Table 3 details the costs resulting from Type I and Type II errors in the hypotheses of prevention at 1000 FF and 2000 FF. More than 70% of the total cost results from Type I errors at 1000 FF, and from Type II errors at 2000 FF.

Table 2.  Comparison of the cost of impact, cost of prevention of impact, and error in prediction of cost of impact in five hypotheses of prevention measures
 Cost of preventionCost of compensationTotal
No prevention0404 306404 306
Prevention when predicted impact = 1000 FF
With observed error rates297 60010 852405 752
Without error247 20041 527288 727
Total cost of error  117 025
With observed error rates117 600240 099357 699
Without error144 000102 963246 963
Total cost of error  110 736
Prevention when predicted impact = FF
With observed error rates72 000286 587358 587
Without error91 200158 252249 452
Total cost of error  109 135
Prevention when predicted impact = 4000 FF
With observed error rates50 400318 948369 348
Without error67 200193 183260 383
Total cost of error  108 965
Table 3.  Respective cost of Type I and Type II errors in two hypotheses of prevention measures
 Cost of Type I errorsCost of Type II errorsTotal cost of errors
Prevention when predicted impact = 1000 FF85 73231 293117 025
Prevention when predicted impact = 2000 FF38 94471 792110 736

Discussion and Conclusion

Predicting the cost of impact

Although prediction of environmental risk has already been performed using linear functions in very complex situations (Hanratty & Stay 1994), linear modelling of environmental systems is applicable to a limited extent only. Among other reasons, this is because such models are not able to reproduce the behaviour of real systems when very low or high values of the variables are considered (Lek et al. 1996b). The data in the present work contained very few high values, and one of the specific difficulties of our task was to obtain satisfactory prediction of the dependent variable in such rare or exceptional instances.

The selection of input variables introduced in the modelling procedures, their ecological significance, and the sampling procedure, appear to be important elements for the quality of the model. The backpropagation procedure results in very high correlation coefficients especially in the training phase. In the testing phase, correlation coefficients were lower than in the training phase but still remained clearly significant. This difference between training and testing results is amplified in small data sets, because we worked in natural or semi-natural situations where there is no replication and each new observation contains unique information.

Except for a few overestimated or underestimated points, the leave-one-out procedure could be considered as satisfactory. It seems clear that this procedure is well suited to actual situations of risk management: the model is built from a maximum of previously verified observations, then each instance requiring prediction is introduced as a test set comprising a single observation. After validation, each new observation can be introduced in the learning set, and so on. In real situations where impact follows a yearly cycle, it should be possible to create a first model from all previously verified observations, then to give the model an additional learning set each year.

The sensitivity study clearly showed the non-linearity of the process linking independent variables and model performance (Fig. 3). The most sensitive factor, showing a sublinear negative relationship with damage cost, was the proximity of houses (an index increasing with the proximity of the house to the plot). A similar relation, though less marked, was observed for the number of houses within 500 m, and the proximity of paved roads. These results indicate a general negative relationship between the risk of damage and the amount of human equipment or activity. A second group of important factors deals with the presence of shelter habitats around the cultivated plot. Two of these factors are equally sensitive to the model: the number of ‘sheltered sides’ and the density of the vegetation in the shelter habitats. The number of sheltered sides varies from 0 (e.g. an isolated plot in an open field area) to 4 (e.g. a clearing in a forest). The sigmoid sensitivity curves indicate that the greatest increase of risk occurs when passing from low medium to high medium levels of both factors (i.e. between two and three sheltered sides, and between quite clear and quite dense for the vegetation density). The nature of the plant cover has much less influence than the degree of enclosure and density. Lastly, the sensitivity of wild boar cull (considered as an index of population density) shows a linear relationship with damage cost, but for a narrow range of the dependent variable. These results seem promising for the application of similar models to other animal-habitat systems.

Application to decision making in impact management

Strategies of decision making for the control of wildlife damage can be arranged in a sequence of steps which have been comprehensively described in a review paper by Slate et al. (1992). Knuth et al. (1992) specifically addressed the risk concept and divided it into several components (organismic, environmental, social and institutional), considering the way each component is perceived and treated. In our work, the context was greatly simplified by considering a geographical area where the perception of the risk of damage linked with a particular species (wild boar), and the compensation system, could be considered uniform. Thus, our goal was to implement a method to predict the compensation cost as a function of local environmental factors influencing wild boar behaviour, with the aim of giving managers a tool for determining local or regional policies of damage prevention. There are few scientific papers where prevention measures are based on the prediction of damage or the evaluation of risk. A comprehensive example can be found in the work conducted in Scandinavia on moose damage in forests (Saarenmaa et al. 1988; Saarenmaa & Nikula 1989), based on artificial intelligence modelling of moose behaviour as a function of the characteristics of landscape, plant community and forestry practices. We have had a similar goal in a previous paper (Spitz, Lek & Dimopoulos 1996), where the activity of wild boar in open areas (without particularly considering the damage problem) was analysed by reference to permanent and temporary aspects of human activity. In contrast, the present work did not try to model wild boar behaviour, but to quantify directly the impact of its behaviour as a function of those factors that are expected to influence it. The model is expected to tell managers where to apply prevention measures, and to what extent. The manager should collect descriptions of all plots in areas where impacts are habitually recorded in some plots, thus creating a sample of potentially more or less endangered plots. This sample ‘naturally’ gathered by the manager might not be exactly represented by our material where we assembled 200 plots with observed impact and 20 plots without impact. However, the similarity mentioned before between our two subsamples shows that our material is quite representative of a natural situation.

Analysis of the cost of impact (including the cost of prediction errors, prevention and compensation) shows that the manager can use different means of reducing this cost. In the plausible hypothesis that the cost of extant damage will be stable, a reduction of the cost of prediction errors should improve the efficiency of the predicting model. This can be obtained by increasing the learning sample and by a better choice of descriptive variables. Another way of reducing the cost of errors is to choose a prevention level where a particular type of error is minimized. For instance, using the level 1000 FF will reduce the total cost if the manager can reduce the cost of prevention, because Type I errors (unjustified prevention) are the most frequent errors at this level. Reducing the cost of prevention itself (and not only unjustified use of prevention) can be a by-product of prediction. Prediction of damage for a large cultivated area allows large blocks of endangered fields to be defined. The cost of fencing a large block is notably lower than fencing isolated fields (16000 FF for a 100-ha block, i.e. 50 2-ha plots, compared to 120000 FF for 50 separate 2-ha plots in our hypothesis), even if some plots in the block do not deserve protection. Managers can also use prediction to give farmers an incentive to locate sensitive crops in less risky places, or to modify other cultural practices. Total financial resource for impact management can thus be better distributed between ecological (and sociological) prevention, physical protection of sensitive areas, and compensation. In this respect, improving the efficiency of predictive tools, such as artificial neural networks, is a promising objective in applied ecology.

Acknowledgements

Special thanks are addressed to the wildlife officers in the Fédération Départementale des Chasseurs de l’Aude for their active cooperation.

Received 21 November 1997; revision received 4 February 1999

Ancillary