Assessing river biotic condition at a continental scale: a European approach using functional metrics and fish assemblages


D. Pont, Cemagref Unité HYAX Hydrobiologie, 3275 Route de Cézanne, Le Tholonet, 13612 Aix en Provence, France (fax +33 4 42 66 99 34; e-mail


  • 1The need for sensitive biological measures of aquatic ecosystem integrity applicable at large spatial scales has been highlighted by the implementation of the European Water Framework Directive. Using fish communities as indicators of habitat quality in rivers, we developed a multi-metric index to test our capacity to (i) correctly model a variety of metrics based on assemblage structure and functions, and (ii) discriminate between the effects of natural vs. human-induced environmental variability at a continental scale.
  • 2Information was collected for 5252 sites distributed among 1843 European rivers. Data included variables on fish assemblage structure, local environmental variables, sampling strategy and a river basin classification based on native fish fauna similarities accounting for regional effects on local assemblage structure. Fifty-eight metrics reflecting different aspects of fish assemblage structure and function were selected from the available literature and tested for their potential to indicate habitat degradation.
  • 3To quantify possible deviation from a ‘reference condition’ for any given site, we first established and validated statistical models describing metric responses to natural environmental variability in the absence of any significant human disturbance. We considered that the residual distributions of these models described the response range of each metric, whatever the natural environmental variability. After testing the sensitivity of these residuals to a gradient of human disturbance, we finally selected 10 metrics that were combined to obtain a European fish assemblage index. We demonstrated that (i) when considering only minimally disturbed sites the index remains invariant, regardless of environmental variability, and (ii) the index shows a significant negative linear response to a gradient of human disturbance.
  • 4Synthesis and applications. In this reference condition modelling approach, by including a more complete description of environmental variability at both local and regional scales it was possible to develop a novel fish biotic index transferable between catchments at the European scale. The use of functional metrics based on biological attributes of species instead of metrics based on species themselves reduced the index sensitivity to the variability of fish fauna across different biogeographical areas.


Fish populations and communities are sensitive indicators of habitat quality in rivers because they react significantly to almost all kinds of anthropogenic disturbances, including eutrophication, acidification, chemical pollution, flow regulation, physical habitat alteration and fragmentation (reviewed by Ormerod 2003). This sensitivity to the relative health of their aquatic environments and the surrounding watersheds is the basis for using biological monitoring of fishes to assess environmental degradation (Fausch et al. 1990). Over the last 30 years, a variety of fish-based biotic indices have been widely used to assess river quality, and the use of biologically based multimetric indices, inspired by the index of biotic integrity (IBI) (Karr 1981; Karr et al. 1986), has grown rapidly (Simon 1999). The main characteristic of these tools is that they employ a series of metrics based on assemblage structure and function that are integrated into a numerical index scaled to reflect the ecological health of the assemblage. Another characteristic is that they use the ‘reference condition approach’ (Bailey et al. 1998), comparing an ecosystem exposed to a potential stress with a reference system unexposed to such a stress (Hughes et al. 1998).

The accuracy of these biological assessments depends primarily on the sensitivity of these tools to natural environmental variation as opposed to human-induced disturbances of river biota. To reduce or remove the confounding effects of natural environmental variability, most authors have validated indices over a restricted range of geographical and environmental situations: particular states (Roth et al. 1998; Schleiger 2000), ecoregions and drainage areas (McCormick et al. 2001; Smogor & Angermeier 2001; Emery et al. 2003; Mebane, Maret & Hughes 2003), river sizes (Angermeier & Schlosser 1987; Simon & Emery 1995), water thermal regimes (Leonard & Orth 1986; Hughes, Howlin & Kaufmann 2004) and levels of fish diversity (Harris & Silveira 1999). Most authors also account for between-site natural variability by standardizing metrics in relation to river size. Angermeier, Smogor & Stauffer (2000) consider that multimetric indices perform best when coupled with a regional framework so that the metrics reflect region-specific attributes of natural biotic communities. Hughes, Whittier & Larsen (1990) called for a necessary compromise between the extremes of uniform nation-wide criteria and unique criteria for each waterbody. However, regardless of the approach, our capacity to distinguish between natural and human-induced variation of biological conditions at both local and regional scales remains a crucial point (Hughes et al. 1998).

The new European Union (EU) water policy, the Water Framework Directive (WFD), states that all European rivers should be assessed via a reference condition approach using bioassessment tools based on four biotic elements, including fish (EU 2000). These biological assessment tools must also indicate which functional characteristics of the biota are altered, in order to increase the probability of success of ecological river rehabilitation schemes (Pretty et al. 2003; Giller 2005; Palmer et al. 2005).

One way to attain this goal is to develop a common assessment method at the European scale using defined metrics that remain insensitive to natural environmental variability for all unimpaired sites, and that are monotonically linked to the intensity of human alteration for impaired sites. The objective of this present study was to develop a fish-based index applicable to all European rivers using a methodology already tested at a national level in France (Oberdorff et al. 2001, 2002). We had two main questions. (i) Is it possible, at the European scale, to model correctly a variety of metrics as a function of natural environmental descriptors defined at both local and regional scales, in the absence of any human disturbance? (ii) Are we able to quantify, for any tested site, its deviation from a reference condition site having similar natural environmental conditions?


site selection and pre-classification of disturbances

We used data from fish surveys of 12 European countries conducted by several laboratories and governmental environmental agencies (1978–2002). These 5252 river reaches or sites (Fig. 1) cover most of the climatic and physical conditions that occur in Europe.

Figure 1.

Map of Europe showing 11 river groups and the 5252 sites. D, Danube; E, Ebro River; MC, Mediterranean rivers from Catalunya; MF, Mediterranean rivers from France; MN, Meuse-group rivers; NP, north Portugal rivers; NE, northern European plain rivers; R, Rhône River; SE, south-west Sweden rivers; UK, United Kingdom rivers; WF, west France rivers. Symbols (circles and pluses) are only used to distinguish between river groups.

All sites had been sampled using electrofishing techniques (DC or PDC (pulsed direct current) waveform) during low flow periods. When possible (river depth < 0·7 m), river reaches were sampled by wading (64·9% of all sites). For most of these sites, the removal method was applied (87·9%) and stops nets were not used (88·7%). In large rivers (river depth > 0·7 m) sampling was from boats (35·1% of all sites), mainly in near-shore areas. The size of each sampled site was sufficient to encompass complete sets of characteristic local river habitat. For 67·7% of all sites, the whole river width was sampled. In others cases, the whole river section was only partially sampled (mainly in near-shore areas). In order to standardize the sampling effort, we only considered the first passage in all cases. Although our data are subjected to sampling noise, this sampling effort was sufficient to describe the fish assemblage. For sites where the removal method was applied with three successive passages (2275 sites), the mean percentage of the total number of species caught during the first passage was 91·9% (SD 16·3%) and the mean percentage of total abundance was 63·2% (SD 13·1%). Sampling effort was summarized by three variables: sampling technique (TECH; boat or wading), sampling method (METH; complete, whole river width sampling, or partial) and fished area (FISH). We only retained one fishing occasion per site.

For each site, the degree of human-induced alterations was evaluated based on available data, existing knowledge and expert judgement. Four disturbance variables were retained and rated as a function of their deviation from a natural state (from 1, no deviation, to 5, heavily degraded): hydrological disturbances (HYDR; classes 1–5, from more than 90% to less than 50% of the mean natural water level and from almost no to strong deviation from natural duration and intensity of flooding period); morphological conditions (MORPH; from negligible morphological alteration to complete channelization with most natural habitats missing); phosphorous, nitrogen and total organic carbon (NUTR; from conditions within 150% of background levels to deviation more than 300% from national backgrounds levels); and deviation from critical values of, for example, oxygen and pH (TOX). A total disturbance assessment (DISTURB) was obtained by summing up the four disturbance variables (range from 4 to 20).

Among the 5252 sites, 1608 sites were considered as reference sites (REF) when none of the four disturbance variables were rated over 2. This definition ensured sufficient sample sizes for all countries. While not all reference sites were pristine or totally undisturbed, the degree of alteration was null or very low. Among others sites, we distinguished between weakly disturbed (WI sites; 8 < DISTURB < 13) and heavily disturbed sites (HI sites; DISTURB > 12).

Each metric-specific model and the final index were validated independently by randomly dividing the reference data set into three subsets used for model calibration (REF-CAL, 1000 sites), model validation (REF-MET, 304 sites) and final index validation (REF-IND). We also randomly selected among the weakly disturbed sites two sets for metric selection (WI-MET, 958 sites) and final index validation (WI-IND, 304 sites) and among the heavily disturbed sites one set for metric selection (HI-MET, 958 sites) and one set for final index validation (HI-IND, 304 sites).

local environmental variables

Nine abiotic variables were measured in the field or from topographical maps, or estimated using GIS at each site: altitude (ELE; 0–1950 m), distance from source (DIS; 0–990 km), basin class (CAT; < 10 km2, 10–99 km2, 100–999 km2, 1000–9999 km2, > 10 000 km2), reach slope (SLOP; 0·01–199 m km−1), wetted width (WID; 0·5–1600 m), mean annual air temperature (TEMP; −2–+16 °C), presence/absence of a natural lake upstream (LAK), geological type (GEO; calcareous, siliceous) and flow regime (FLOW; permanent or temporary). Four of these explanatory variables (ELE, SLOP, DIS, WID) and FISH were log-transformed to reduce the skewness of their distribution.

regional units

To delineate biologically relevant regional units for European fish, we first considered the complete fish fauna lists for each drainage basin unit, which represent homogeneous entities with regard to long-term dispersal (Matthews 1998) and explain a significant part of fish community variability (Pont, Hugueny & Oberdorff 2005). However, because of the lack of available data for small basins, we grouped all coastal basins smaller than 25 000 km2 and draining to a given sea coast (ICES Fishing Areas,, hypothesizing that these contiguous basins were in contact during recent Holocene sea level variations. For our entire study area, we then compiled data from previous literature to establish lists of native fish species of 19 large basins and 17 groups of contiguous small basins. We examined the similarities (Jaccard Index) between these 36 fauna lists using the unweighted arithmetic average clustering method. As a cut-off value, we chose the similarity level corresponding to the best compromise between a minimal number of reference sites per cluster (at least 30) and the largest number of clusters to increase our description of the spatial variability of fish faunas at a large scale. This procedure (Fig. 1) resulted in 11 clusters or river groups (RIVG).

candidate metrics

We considered five functional attributes to define the list of candidate metrics: tolerance, trophic, reproduction, habitat and migration (Hughes & Oberdorff 1999; Oberdorff et al. 2001). The 309 fish species caught were assigned for these attributes, based on previous grey or published literature and completed by expert judgement when necessary (see the web site, 19 Dec 2005). Given the scale of the study and the number of species involved, natural history cannot be described in a rigorous, quantitative way for all the species. Nevertheless, our species assignments match well (average percent of match of 80%) with those realized recently for nine species attributes in Romania (Angermeier & Davideanu 2004).


Lithophilic species (LITH) require unsilted mineral substrate to spawn and their larvae are photophobic (Balon 1975). They tend to decrease in response to human disturbances such as siltation (Berkman & Rabeni 1987) and channelization (Brookes, Knight & Shields 1996). Phytophilic species (PHYT) tend to spawn on vegetation and their larvae are not photophobic. They decrease in response to channelization but will commonly increase with aquatic vegetation in relation with eutrophication.


The water column (WATE), benthic (BENTH), rheophilic (RHEO) and limnophilic (LIMN) species prefer to live and feed in their respective habitat. The abundance of species assigned to these four habitat attributes tends to decrease with increasing habitat alteration (Karr 1981; Oberdorff et al. 2002). RHEO species may also increase when river channelization increases flow velocity. Eurytopic species (EURY) are characterized by tolerance of contrasting flow conditions, and an increase would be indicative of alteration.

Trophic guilds

Obligatorily piscivorous species (PISC; more than 75% fish in the diet; Lyons et al. 1995; Goldstein & Simon 1999) and insectivorous/invertivorous species (INSE; more than 75% macro-invertebrates in the diet; Lyons et al. 1995) will tend to decrease in response to an alteration of their habitat. In contrast, a metric based on omnivorous species (OMNI; more than 25% plant material and more than 25% animal material; Schlosser 1982) will tend to increase in response to disturbance as OMNI are able to adapt their trophic regime in response to an alteration of river food webs (Karr 1981).


Tolerant (TOLE) and intolerant (INTO) groups reflect species sensitivity to any common impact related to altered flow regime, nutrient regime, habitat structure and water chemistry (Karr et al. 1986). Loss of intolerant species is a response to degradation, whereas the number of tolerant species will tend to increase with disturbance.


Potamodromous species (POTA), which migrate within the inland waters of a river system (Northcote 1999), and long-migratory (diadromous) species (LONG), which migrate across a transition zone between fresh and marine water, are expected to decrease in response to the effects induced by dams and water regulation.

The choice of how a metric is expressed is as important as the selection of the metric itself (Fausch et al. 1990; Karr & Chu 1999). Each candidate metric was therefore expressed in four units: number of species (Ns), relative number of species (%Ns; number of species divided by the total species richness), absolute densities (Ni; in number of individuals ha−1) and relative densities (%Ni). Total species richness (RICH) and total abundance (DENS) would generally decline with environmental degradation (Karr 1981). However, an increase in nutrients (eutrophication) or temperature can also lead to an increase with disturbance.

We also considered metrics based on non-native acclimated species, as they can play an important functional role in river ecosystems. Finally, all 309 species caught were classified into one or several guilds for which we could calculate 58 candidate metrics.

metric modelling

For metrics based on abundance data (n = 15), stepwise multi-linear regression analysis of each metric [log (x + 1) transformed] on the explanatory variables was used. The squares of each of the five quantitative variables (ELE, TEMP, DIS, SLOP, WID) were also included to allow for non-linear relationships. For metrics based on the number of species (n = 15), we used the same procedure but added an explanatory variable, FISH, as sampling area is well known to influence species richness (Angermeier & Schlosser 1989). For metrics based on the relative number of species or relative abundance (n = 28), we used stepwise logistic regression analysis. All these analyses were performed using our calibration reference data set (REF-CAL). RIVG, TECH, METH, LAK, GEO and FLOW were entered as dummy variables. Variable selection during the stepwise procedure was based on the Akaike information criteria (Hastie & Pregibon 1993).

Using the independent set of 304 reference sites (REF-MET), we validated each of the resulting metric-specific models, expecting that the intercept and the slope of the regression line of the observed vs. predicted values would not be significantly different from, respectively, 0 and 1. In addition, we arbitrarily set a minimal threshold of the variance explained by the model (determination coefficient) at 0·30.

The residuals of each of the valid metrics (i.e. the deviation between the observed and the predicted value of a metric) measured the range of variation of metrics after eliminating the effects of environmental variables and in the absence of any human disturbance. These residuals were standardized through subtraction and division by, respectively, the mean and the standard deviation of the residuals of the reference calibration data set (REF-CAL), even when computed on other data sets (REF-MET, WI-MET, HI-MET, REF-IND, WI-IND, HI-IND).

Most of the metrics are expected to be negatively linked to the intensity of human perturbation. This means that the expected value of the residuals for reference sites is zero and less than zero for impacted sites. Assuming that standardized residuals are N(0,1) distributed within reference sites, it is possible to compute the probability of observing a residual value lower than the computed one. The lower this probability, the higher the probability that a site is impacted. For metrics that are expected to be positively linked to human disturbance, we estimated the probability of observing values higher than the computed ones. For metrics that are expected to respond by an increase or a decrease depending on the type of perturbation, we considered the probability of observing higher values than the computed one for positive residuals, or of observing lower values for negative residuals. Transforming residual metrics into probabilities as described above is a way of rendering them comparable. All probability metrics vary between zero and one, and decrease as human disturbance increases. The expected distribution of these probabilities for reference sites is a uniform distribution with mean = 0·5.

metric selection

Metrics were selected after validation with the REF-MET data set on the grounds of their sensitivity to human-induced disturbance, and to maximize the independence among metrics and the diversity of metric types. First each metric was calculated for the subsets of weakly (WI-MET) and heavily (HI-MET) altered sites to test the hypothesis that the mean probability value of REF-MET was higher than WI-MET, and that the mean probability value of WI-MET was higher than HI-MET (unilateral t-test). Only metrics that fulfilled these two criteria were retained hereafter.

Then if two metrics were highly correlated (i.e. Pearson's r < 0·80 or > −0·80), we retained the metric based on a functional species attribute not yet selected or the metric demonstrating the strongest response to a high level of disturbance.

index calculation and validation

For a site, given its fish assemblage, geographical location, environmental features and the sampling method used, applying the models corresponding to each of the 10 metrics produces 10 residual values (one per metric) that are subsequently transformed into probabilities. The final index is obtained by summing up the 10 probabilities and rescaling the final score from 0 to 1 (by dividing it by 10). The index was validated on three new independent subsets: reference (REF-IND), weakly disturbed (WI-IND) and highly disturbed (HI-IND) sites. We hypothesized that (i) the mean value of REF-IND did not differ from 0·5 and (ii) REF-IND mean value > WI-IND mean value > HI-IND mean value (unilateral t-test).

Two explicit examples (Table 2) that convert actual site biological and environmental conditions into metric scores and a final index score are given. A software freely available on the web ( may be used for this purpose.

Table 2.  Two examples of calculation of the European fish index value from the 10 retained metrics used in the model (site 1, undisturbed site with no individual disturbance variable rated over 1; site 2, highly disturbed site with a global disturbance value (DISTURB) of 14). For each site, the list of species caught (within parentheses, number of individuals caught per species at a given sampling date and the metric set to which the species was assigned) and environmental conditions (see text for acronym signification) are given. Metrics (see text for acronym signification) are expressed in number of species (Ns), relative number of species (%Ns; number of species divided by the total species richness), absolute densities (Ni; in number of individuals ha−1) and relative densities (%Ni). Fish assemblage characteristics are converted into an observed metric. Environmental conditions are used to compute a theoretical metric value. The observed minus the predicted values are standardized and transformed into probabilities. In the absence of any disturbance, a value of 0·5 is expected. The index is obtained by summing up the 10 metrics
Site 1River Leven (UK)
Sampling date23 August 2002
Fish assemblageAnguilla anguilla (1, TOLE, BENT, LONG), Barbatula barbatula (5, BENT, RHEO, LITH), Cottus gobio (100, INTO, BENT, RHEO, LITH, INSE), Lampetra planeri (1, INTO, BENT, RHEO, LITH, POTA), Phoxinus phoxinus (55, RHEO, LITH), Salmo salar (53, INTO, RHEO, LITH, INSE, LONG), Salmo trutta fario (39, INTO, RHEO, LITH, INSE)
Environmental conditionsRIVG (United.Kingdom), CAT (< 100 km2), ELE (45 m), GEO (Siliceous), FLOW (Permanent), LAK (No), TEMP (8·5 °C), SLOP (2·75 m km−1), DIS (14 km) WID (7·3 m), METH (Whole), TECH (Wading), FISH (365 m2)
Index value0·80
Site 2Seine River (FR)
Sampling date19 September 1996
Fish assemblageAbramis brama (4, TOLE, BENT, OMNI, POTA), Anguilla anguilla (22, TOLE, BENT, LONG), Gobio gobio (5, INTO, BENT, RHEO, LITH, INSE), Leuciscus cephalus (13, RHEO, LITH, OMNI, POTA), Perca fluviatilis (6, TOLE), Rutilus rutilus (159, TOLE, OMNI), Sander lucioperca (1), Scardinius erythrophthalmus (6, PHYT, OMNI)
Environmental conditionsRIVG (West.France), CAT( > 10000 km2), ELE (8 m), GEO (Calcareous), FLOW (Permanent), LAK (No), TEMP 10·5 °C, SLOP (1·0 m km−1), DIS (615 km) WID (100 m), METH (Partial), TECH (Boat), FISH (1440 m2)
Index value0·16
  • *

    Metrics expressed in ln(x + 1).

Observed values0·0007·1433·7611·3861·0990·6931·0990·0600·0000·500
Predicted values5·4813·5540·9611·9292·0691·0660·8940·2360·1840·261


Twenty-nine of the 58 metrics were validated (Table 1). Regressions between observed and predicted values were highly significant (R2 30·7–60·8%). The intercepts and the slopes of the corresponding regression lines did not significantly differ from zero (Student's t-test; P-values from 0·071 to 0·954) and one (P-values from 0·055 to 0·969), respectively. Residual distributions were checked graphically to verify that they were symmetrical with only a few outliers.

Table 1.  List of the 29 metrics retained after the first validation procedure of the multiple linear or logistic models. Expected metric responses to human disturbances: positive response (+), negative response (–), positive or negative response (+/–). R2, determination coefficients of the regression of observed vs. predicted metric values using the independent reference data set (REF-MET). Mean metrics values (after standardization and transformation into probabilities; see text for detailed explanations) for REF-MET and the two data sets of weakly (WI-MET) and highly (HI-MET) disturbed sites. P-values of Student's t-test comparing REF-MET to WI-MET, and WI-MET to HI-MET
MetricsExpected responseR2Mean value (REF-MET)Mean value (WI-MET)Mean value (HI-MET)P-value (REF-WI)P-value (WI-HI)
Ni-PISC0·4020·4590·3630·387< 0·00001    0·95150
Ni-INSE0·3530·5600·2280·066< 0·00001< 0·00001
Ni-OMNI+0·4070·5120·4170·365< 0·00001    0.00040
Ni-EURY+0·4610·4650·4310·434    0·04870    0·56920
Ni-LONG0·3830·5090·2930·208< 0·00001< 0·00001
Ni-LITH0·3080·5480·2580·125< 0·00001< 0·00001
Ni-PHYT+0·3570·5300·4700·423    0·00290    0·00310
Ni-TOLE+0·4460·5160·4660·479    0·00630    0·80870
Ns-PISC0·4660·4610·3910·406    0·00020    0·84140
RICH+ –0·6080·4810·3510·339< 0·00001    0·21240
Ns-OMNI+0·5520·5170·4620·415    0·00530    0·00170
Ns-BENT0·4810·5270·3380·267< 0·00001< 0·00001
Ns-EURY+0·5550·4820·4800·458    0·47120    0·07470
Ns-RHEO0·4230·5120·2430·101< 0·00001< 0·00001
Ns-WATE0·5500·5000·4170·397    0·00010    0·10560
Ns-LONG0·3530·5040·2750·212< 0·00001< 0·00001
Ns-POTA0·4390·4990·4040·308< 0·00001< 0·00001
Ns-LITH0·3980·5120·2300·070< 0·00001< 0·00001
Ns-TOLE+0·4930·5140·4700·464    0·01980    0·34700
%Ni-EURY+0·4230·4730·5370·543    0·99940    0·64010
%Ni-RHEO0·3260·5290·4260·296< 0·00001< 0·00001
%Ni-LONG0·3760·5150·5170·541    0·56200    0·99570
%Ni-LITH0·4020·5280·3860·244< 0·00001< 0·00001
%Ni-TOLE+0·4780·5190·5020·418    0·20010< 0·00001
%Ns-INSE0·4290·5100·3250·213< 0·00001< 0·00001
%Ns-EURY+0·3460·4570·6070·585    1·00000    0·08360
%Ns-INTO0·4530·5190·3140·186< 0·00001< 0·00001
%Ns-LITH0·3890·5320·2600·097< 0·00001< 0·00001
%Ns-TOLE+0·3070·5380·3250·286< 0·00001    0·00650

Residuals were calculated and standardized (deviation between the observed and the predicted value; Table 2) for each metric and for each of the three data sets REF-CAL, WI-MET and HI-MET. We then transformed these residuals into probabilities in agreement with our previously defined response hypotheses (Table 1). Among the remaining metrics, 17 demonstrated a significant difference between REF-CAL and WI-MET mean values (P-values from 0·003 to < 0·000001) and between WI-MET and HI-MET mean values (P-values from 0·007 to < 0·000001). At this step, three of the five habitat-types metrics (WATE, LIMN, EURY), PISC and RICH metrics were excluded. As expected, the REF-CAL mean values were very close to 0·5 (from 0·499 to 0·560) for all the retained metrics. The responses to degradation were in agreement with our previous hypotheses: OMNI, TOLE and PHYT metric types increased, while the seven other metric types decreased. But metric responses varied in intensity, with the weakest deviation for metric types demonstrating a positive response to human disturbance (OMNI, TOLE and PHYT).

Five of the 17 remaining metrics were strongly correlated (Ni-OMNI and Ns-OMNI, Pearson's coefficient R= 0·85; Ni-LONG and Ns-LONG, R= 0·89; Ns-RHEO and Ns-LITH, R= 0·97; %Ni-RHEO and %Ni-LITH, R= 0·81; %Ns-INSE and %Ns-INTO, R= 0·89). We finally retained 10 metrics (for regression coefficients see web site, 19 Dec 2005): two trophic-based metrics (Ni-INSE, Ni-OMNI), two reproductive guild-based metrics (%Ns-LITH, Ni-PHYT), two habitat-based metrics (Ns-BENT, Ns-RHEO), two migration status-based metrics (Ns-POTA, Ni-LONG) and two tolerance status-based metrics (%Ns-INTO, %Ns-TOLE). Three of these metrics responded positively to human disturbance (Ni-OMNI, Ni-PHYT, %Ns-TOLE).

The metric values and final index score were computed for the three independent data sets (REF-IND, WI-IND, HI-IND) (Fig. 2). The mean value of the index (0·513) in the reference data set did not differ significantly from 0·5 (t = 1·834, P= 0·067). The mean index value (0·343) in the weakly disturbed sites (WI-IND) was significantly lower than that of the reference sites (t = 16·546, P < 0·000001) and significantly higher than that of the highly disturbed sites (0·235, t= 10·36, P < 0000001).

Figure 2.

Distribution of the index scores for REF-CAL (calibration reference sites), REF-IND (independent reference sites, n= 304), WI-IND (weakly disturbed sites, n= 304) and HI-IND (highly disturbed sites, n= 304).

By examining the percentages of well-classified reference (REF-IND) and disturbed sites (WI-IND and HI-IND) as a function of each index score value, we demonstrated that the best cut-off level for assemblage ‘impairment’ was an index value of 0·423, with 81·4% of the reference and disturbed sites correctly classified.

In order to test index independence to natural environmental variability, we performed a stepwise linear regression of the index values on all 10 environmental descriptors, using the independent reference data set (REF-IND). None of the descriptors was retained and the part of the index variability explained by these descriptors was not significant (R2 = 0·115, F-test = 1·505, P= 0·064). When considering each of the 10 environmental descriptor separately (Fig. 3), multiple comparison Tukey's test showed that the index values were invariant, whatever the value of the descriptor tested, except for the two highest elevation classes.

Figure 3.

Distribution of the index score for each of the 10 environmental variables (box-plot graphs) for the two independent validation data sets (REF-MET and REF-IND, n= 608).

To evaluate the ability of any impacted site to deviate from a reference condition (i.e. a mean index value of 0·5), we regressed the index values of all independent sites (REF-IND, WI-IND, HI-IND) on the global assessment impact variable (IMPACT). The relationship (Fig. 4) was highly significant (R2 = 0·4678, n= 912, P < 0·000001). The standard error was 0·1236 and the residual distribution was normalized (goodness-of-fit Kolmogorov–Smirnov test, P= 0·6013). Residuals at disturbance level 18 were highly dispersed because of the presence of an outlier (index value of 0·44) and because of the low number of sites (seven) at this level.

Figure 4.

Regression of the fish index values on the total human disturbance index (from four to 19). Mean index values (and their 95% confidence intervals) for each disturbance index class.


This study demonstrates that at a continental-scale (Europe) it was possible to develop a multimetric fish-based index that (i) remained invariant for all unimpaired sites, whatever the natural environmental conditions, and (ii) gave a significant negative linear response to a gradient of physical and chemical human disturbances (i.e. the index can be used to assess the human-induced impact on the biotic condition in rivers). To our knowledge, this is the first time this kind of index has been successfully developed at such a large spatial scale. This is particularly noteworthy given the great variability of fish composition across different European biogeographical areas. Three features of our approach may have contributed to this. First, using functional metrics instead of taxonomic metrics reduced the sensitivity of the index to the variability in fish faunas between biogeographical regions. Secondly, including the main factors known to affect fish assemblage structure in the model reduced the influence of the geographical and upstream–downstream variability of these variables. And finally, including a biologically based regionalized variable added spatial flexibility to our approach.

The advantage of considering functional metrics can be illustrated by the case of lithophilic species. Among the 63 lithophilic species included in the LITH metric type at the European scale, only two species are common to all 11 hydrological units (brown trout and rainbow trout). Only 23 species occur in six units while 17 species are only present in one hydrological unit. Despite this variability in species composition between river units, LITH consistently decreased in response to human disturbance across Europe as a whole, demonstrating that the retained metrics are truly functional ones.

As demonstrated by our models, functional descriptors of fish communities respond to environmental variability in several ways. But all the 10 retained metrics responded significantly to river slope. Sampling methods also significantly affected eight of the 10 metrics, as demonstrated previously by Reynolds et al. (2003).

A tenet of our approach is that the variance of the metrics not accounted for by the environmental variables included in the models should be in large part the result of human disturbance. In fact, human disturbance only explains about 50% of total index variance, suggesting that the models may be improved by adding environmental variables not considered in this study. The unexplained variance in the index may also result from imprecision in fish sampling because of the inescapable differences in fish sampling methods used between different habitats and countries. Data on different types of river modifications were not always comparable between countries, so we only retained the four most reliable and complete disturbance variables. However, others types of disturbance have to be considered, such as fishing and introduced species, which can affect biotic interactions.

Moreover, riverine fish assemblages may vary greatly over time. To check this, the variance in index scores associated with the temporal variability of fish assemblages and/or sampling variability was evaluated by computing the standard deviation associated with 12 time series (eight to 36 sampling dates) distributed among four countries (Belgium, France, Lithuania, Sweden), at sites where there were no perceivable changes in human disturbance intensity during the period sampled. A mean of the standard deviations per site of 0·06 can be compared with the value of 0·169 observed for references sites. As the variance of the index within reference sites is the result of non-modelled spatial variability, as well as to sampling noise and temporal variability, this suggests that sampling noise accounts for at most 35% of index variability. Hence the most probable solution to improving the power of the index would appear to be by improving the modelling of its spatial variability (e.g. by considering new variables or other modelling approaches).

Another predictive approach based on the modelling of assemblages within reference sites as a function of environmental variables, RIVPACS, has recently been applied to fish assemblages in New Zealand (Joy & Death 2002). Besides the way assemblages are modelled (classification vs. regression), the main difference with our approach is the use of taxonomic richness instead of several metrics. In a sense our metrics RICH may be considered a RIVPACS-type descriptor inserted within a more general, multi-metrics, approach. As a result we expect our approach to be more powerful and more flexible. Despite the fact that we modelled functional metrics instead of taxonomic metrics and that we included some important environmental variables, the spatial variability in metric values has not been fully accounted for by our models, as exemplified by the inclusion of the regionalized variable ‘river group’ in eight of the 10 retained models. The two metrics types that are insensitive to regional classification of fish fauna are the omnivorous species (Ni-OMNI) and tolerant species (%Ns-TOLE), i.e. metrics comprised of generalist species able to colonize a wide spectrum of river environments. Among the 29 characteristically omnivorous species, 21 of them are common to at least six of the 11 river groups.

In previous works, most authors consider that assessment criteria must be region-specific (Angermeier, Smogor & Stauffer 2000). Ecologically defined regions are thereby considered to be relevant entities even if there is still considerable debate regarding how they should be defined (Omernik & Bailey 1997; Van Sickle & Hughes 2000). In our approach, we explicitly considered this question by including in our models an environmental variable acting at regional scales (river groups), in accordance with current views emphasizing regional influences on biodiversity (Ricklefs & Schluter 1993). Our regional classification is generally in agreement with the classical biogeographical history of Europe (Banarescu 1992). Fish faunas from the Netherlands, northern Germany, Poland, Lithuania and northern Sweden (European North Plain river group) appear as similar, related to their common recolonization after the glacial periods. The Baltic Sea was oligohaline and did not represent an ecological barrier to dispersal. The ‘river group’ variable participates additively in our models, meaning that the models are qualitatively consistent over Europe. For instance, a metric that is positively correlated to river slope in a given region will also be positively correlated with river slope in other European regions. Hence the models are partly transferable between regions but some regional adjustments are needed. Interregional variations may be linked to variation in taxonomy and phylogenetic history that in turn affect metric distribution within faunas, and also to spatial variation in environmental constraints not included into models but which can also affect functional characteristics of fish assemblages (Smogor & Angermeier 2001). Clearly further work is needed to identify the factors underlying the regional component of our models.

This present approach is based on modelling fish assemblage structure in reference sites. Thus the definition and quality of the reference data set are key issues. We collected information from a very large number of sites distributed among 1843 European rivers, compiling a database unique in Europe. Although sites are not evenly distributed, they cover virtually all the environmental situations a European fish species can encounter within its area of distribution. However, natural large flood plain rivers were lacking in our reference data set, because of their rarity in western Europe, and the efficiency of our index to assess such environments needs to be improved in the future. We did not restrict our selection of reference sites to only undisturbed sites but also considered sites slightly impacted. However, the mean index value for undisturbed sites is slightly but significantly higher than for slightly impacted sites (5 DISTURB 8) (0·518 against 0·502; t-test value = 2·349, P= 0·019), suggesting that distinguishing between these two groups in the future would increase the power of the index.

In conclusion, the need to define sensitive biological measures of aquatic ecosystem integrity transferable to other catchments or regions at continental scales is now clear, especially in Europe with the implementation of the WFD. The solution we have used to meet this goal is to include in our reference condition modelling approach a more complete description of abiotic and biotic environmental variability at both local and regional scales. This sort of tool has never been developed before at a continental scale. Using this approach, our models and the final index are transferable, but only for sites and rivers belonging to the area considered in our previous calibration data set (REF-CAL). A generalization of our method to an even larger area is possible but only by collecting new data covering these new areas and by recalibrating our models. This methodology could also be improved by including better biological knowledge in the definition of the metric types, improving disturbance assessment and by using new statistical techniques. Lastly, the principles of our methodology could be applied to a wide variety of biological groups.


This work was funded by the European Commission under the Fifth Framework Programme (FAME project, contract number EVK1-CT-2001-00094). Site data and many insights for this study were provided by numerous biologists: W. Andrzejczak, J. Backx, R. Barbieri, C. Belpaire, R. Berg, R. Berrebi, J. Bochechas, J. Bocian, J. Böhmer, R. Borrough, J. Breine, T. Buijse, N. Caiola, F. Casals, J. de Leeuw, E. Degerman, T. Demol, U. Dussling, A. Economou, T. Ferreira, C. Frangez, I.G. Cowx, P.D. Gerard, S. Giakoumi, G. Grenouillet, R. Haberbosch, R. Haunschmid, A. Jagsch, K. Karras, V. Kesminas, P. Kestemont, M. Lapinska, J. Molis, A. Mühlberg, J. Oliveira, J.M. Olivier, J. Pettersson, Paul Quataert, Y. Reyols, H. Schmid, I. Simoens, A. Sostoa, A. Starkie, M. Stoumboudi, G. Verhaegen, T. Virbickas, E. Winter, H. Wirlöf, M. Zalewski, S. Zogaris. We are particularly grateful to Gertrud Haidvogl for the management of the FAME project. Two anonymous referees supplied helpful comments on the paper.