Evaluating fluctuating asymmetry in a stream-dwelling insect as an indicator of low-level thermal stress: a large-scale field experiment

Authors


Ian Hogg, Centre for Biodiversity and Ecology Research, Department of Biological Sciences, University of Waikato, Private Bag 3105, Hamilton, New Zealand (fax + 64 7838 4324; e-mail hogg@waikato.ac.nz).

Summary

  • 1We examined fluctuating asymmetry (FA) among individuals of the stream-dwelling stonefly Nemoura trispinosa (Plecoptera: Nemouridae) to determine whether individuals exposed to an increase of 2–3·5 °C in water temperature would show greater FA than reference (control) individuals.
  • 2Mature nymphs were collected from two adjacent channels (one experimental, one control) in a longitudinally divided stream both before and during a 2-year temperature manipulation. No consistent differences were found between the experimental and control channels for any measure of FA.
  • 3Four additional reference sites were studied to estimate ‘natural’ variation in FA, and to assess any relationship between FA and population genetic structure (e.g. heterozygosity). Variation in FA among these sites was greater than that resulting from the manipulation. Allozyme analysis indicated low to moderate levels of genetic differentiation among sites (Wright’s FST mean = 0·06, maximum = 0·13) and there were negative correlations between FA and heterozygosity (Hexp).
  • 4We surveyed experimental studies published since 1996 to evaluate the generality of our results. Of 44 comparisons examining an association between an experimental stress and FA, 19 (43·2%) failed to detect any relationship. This pattern did not depend on the taxonomic group or the number or type of traits, although some stressors appeared to be more likely to produce an increase in FA than others.
  • 5We conclude that FA may be unreliable for detecting subtle biological changes resulting from small temperature shifts, and concur with others that the technique should be viewed with extreme caution as a monitoring tool.

Introduction

Evaluating the consequences of slow, gradual, environmental change on natural ecosystems (e.g. rising global temperatures, habitat fragmentation, urbanization) requires measures that are sufficiently sensitive to detect subtle changes that may occur within resident populations. Population densities of individual species, although usually the parameter of interest, are often too variable spatially and temporally to provide a reliable measure of gradual and/or non-lethal environmental perturbations (Allan 1982). Moreover, the sampling effort required to obtain robust estimates of population size or population trends can be prohibitive. Consequently, researchers have often used changes in life-history parameters of selected target species (e.g. life-span, body size at maturity) to evaluate population-level responses to environmental perturbations (Odum 1985; Rosenberg 1992). However, concerns have been raised over whether such ‘end-point’ life-history parameters can provide a sensitive early warning of a stressed ecosystem (Petersen & Petersen 1983; Karr 1991; Osenberg et al. 1994). By the time changes in life-history traits are manifest, the population may already have been severely, and perhaps irrevocably, impacted.

An alternative approach to assess non-lethal or subtle environmental impacts is to evaluate developmental changes at the level of the individual (Hare & Carter 1976; Simpson 1980; Warwick 1985). The tendency of bilaterally symmetrical organisms to show small, random, differences between the left and right sides of their bodies (fluctuating asymmetry; FA) has become a popular method to evaluate levels of stress experienced by individuals within natural ecosystems (Valentine, Soulé & Samollow 1972; Clarke 1993; Hardersen, Wratten & Frampton 1999). Theoretically, morphological characters on the two sides of a bilaterally symmetrical organism should develop identically because they are products of the same genome (Clarke, Brand & Whitten 1986). Hence, any deviation from that which is genetically programmed may indicate stress during development, which in turn may be related to environmental quality (Leary & Allendorf 1989). Measuring FA has become an attractive technique because it is relatively simple, inexpensive and provides a quantifiable measure that enables comparison between both spatially and temporally separated individuals.

Analyses of FA, however, are not without problems. For example, it is often not possible to identify, a priori, appropriate control sites, unaffected by the stressor of interest. More critically, comparisons of FA among populations assume that the individuals and populations used for comparisons are genetically similar and hence will respond similarly to the same stressor. This is an important, yet rarely tested, assumption and there is increasing evidence to suggest that it may often be incorrect. For example, several studies have demonstrated positive relationships between genetic heterozygosity and developmental stability (Kat 1982; Leary, Allendorf & Knudsen 1983) such that homozygous individuals exhibit greater FA than more heterozygous individuals when exposed to the same stress. In addition, recent evidence indicates that aquatic invertebrates may consist of genetically distinct populations each adapted to local conditions (Jackson & Resh 1992; Hogg et al. 1998). Accordingly, susceptibility to a particular stress, and propensity to deviate from symmetry, may differ among individuals and populations in different locations. An assessment of both heterozygosity and genetic differentiation among sites is therefore essential to any thorough analysis of FA.

In this study, we examined the suitability of measures of FA to detect subtle changes resulting from small temperature increases in a freshwater stream. Our target species, Nemoura trispinosa Claassen (Plecoptera: Nemouridae), is widely distributed in eastern North America (Harper 1973) and is associated with cool water habitats (< 20 °C). Thus, it may be sensitive to thermal warming of stream habitats. In a previous study (Hogg & Williams 1996), N. trispinosa exhibited decreased size at maturity and earlier onset of emergence in response to experimentally increased water temperatures in a natural stream system. Given that strong life-history responses to the temperature manipulation were apparent, this would seem to be an appropriate system to examine for concomitant alterations in developmental stability (i.e. increases in FA) in response to sublethal thermal warming. Furthermore, previous studies on fish (Leary, Allendorf & Knudsen 1992) and terrestrial insects (Clarke & McKenzie 1992; Imasheva et al. 1997) have demonstrated that even small shifts in environmental temperatures (e.g. 3 °C) during development can result in measurable increases in FA. Accordingly, we tested the hypothesis that individuals of N. trispinosa exposed to elevated temperatures during embryonic and nymphal development would show greater FA than reference (control) individuals.

Our approach was threefold. First, we analysed FA in individuals from two adjacent stream channels (one control, one warmed) before and during a 2-year experimental temperature manipulation. Secondly, we estimated ‘natural’ variation in levels of FA using individuals from unmanipulated reference sites. For these individuals, we analysed both phenotypic (FA) and genotypic (allozyme) variability to estimate levels of differentiation among sites and levels of heterozygosity within sites relative to differences in FA. Finally, we compiled a survey of recent studies that have tested experimentally for an association between FA and various stressors to better assess the utility of FA in detecting sublethal environmental impacts.

Materials and methods

STUDY SITES AND COLLECTION OF n. trispinosa

Our primary study site, Valley Spring (site 1), is a small cold-water springbrook (first-order stream) located in southern Ontario, Canada (Fig. 1, inset). The stream was divided longitudinally at its source into one control and one experimental channel (sites 1C and 1E, respectively) as part of a large-scale field experiment designed to test the ecological effects of increased water temperatures on communities and populations of stream invertebrates. Complete details of this field site are provided in Hogg & Williams (1996). Briefly, water temperatures in the experimental channel were increased by 2–3·5 °C relative to the control channel over a 2-year period (June 1991–May 1993). The manipulation was preceded by one pre-manipulation year (June 1990– May 1991), thus providing a ‘before and after’ comparison of control and experimental (impact) channels (BACI design; sensuStewart-Oaten, Murdoch & Parker 1986). However, we caution that our design lacked true replication, a problem common to many large-scale field experiments (Eberhardt & Thomas 1991). Accordingly, we cannot state categorically that any observed differences were the result of our manipulation.

Figure 1.

Location of study sites within southern Ontario, Canada (inset).

Nymphs of N. trispinosa were collected from both control and experimental channels in May of each of the 3 study years. Nemoura trispinosa is univoltine (one generation per year) in Valley Spring and shows continuous development throughout the year, with adult emergence occurring from early May to late June (Williams & Hogg 1988). We sampled in early May to correspond with the timing of maximum nymphal size for N. trispinosa (Williams, Williams & Hogg 1995). We reasoned that any effects of the manipulation detectable at this stage in the life cycle would be most critical to the overall fitness of the organism. In an effort to minimize handling times (thus enabling genetic analyses below), nymphs were not sexed. However, sex ratios of N. trispinosa were similar in both experimental and control channels of Valley Spring (chi-square test P = 0·88, following Yates’ correction; Hogg & Williams 1996). Accordingly, differences in sex ratios were unlikely to have biased our analyses.

To evaluate variation in FA among N. trispinosa in the absence of any designated manipulation, and to evaluate the relationship between population genetic structure and FA, we collected a further 20–30 nymphs from both experimental and control channels as well as four additional sites (Fig. 1, sites 2–5). Additional sites were of similar habitats (cold-water springs) varying in distance from Valley Spring (1, 40, 150 and 250 km, respectively). Collections were made at similar distances from the spring’s sources (within 10 m), and thus thermal regimes (including annual and diurnal ranges) were expected to be similar (for detailed temperature records at site 1, see Hogg & Williams 1996). Qualitative estimates of temperature suggested that the more northerly locations (sites 4 and 5) may have had slightly (e.g. < 1 °C) cooler mean annual temperatures. Individuals from sites 1–4 were retained for the genetic analysis (see below).

ANALYSIS OF FLUCTUATING ASYMMETRY

To assess FA, we used the numbers of spines located on the anterior margin of the third tarsal segment of each leg (Fig. 2). Selection of this feature was based on: (i) previous studies that had successfully used similar (meristic) measures to demonstrate a relationship between FA and environmental conditions (Clarke, Brand & Whitten 1986; Clarke & McKenzie 1992; Leary, Allendorf & Knudsen 1992); and (ii) our own preliminary investigation showing that the spines were easily counted (range 12–22), variable in number both within and between individuals, and robust, with easily discernible follicles (Fig. 2), such that any damage due to handling would not bias our measures. Furthermore, removal of legs from the frozen animals would permit immediate processing of the specimen for allozyme analysis (see below). We focused solely on meristic measures for our comparison of control and experimental channels to avoid problems of measurement error associated with metric measures (Clarke & McKenzie 1992). However, we caution that our reliance on a single type of trait (i.e. leg spines on each leg) was intended only to determine whether these characters could be used as simple biological indicators of stress; we did not intend our analysis to provide a thorough evaluation of all possible measures of FA (Leung & Forbes 1997).

Figure 2.

Third tarsal segment (front, left leg) of nymphal Nemoura trispinosa showing spines used as a measure of fluctuating asymmetry.

To evaluate more fully any relationship between genetic structure of populations and FA, we also included a second measure, the diagonal length of the third tarsal segment of each leg. Leg pairs were dissected from between 20 and 50 animals for each treatment/location, and each set was permanently mounted on a slide. Spines were counted with the aid of a compound microscope at ×250 magnification, and leg lengths (sites 1–4) were taken with the aid of an image analyser attached to an inverted microscope (Axiovert 100; Carl Zeiss, Oberkochen, Germany) and calibrated at ×100 magnification.

GENETIC VARIABILITY VS. FLUCTUATING ASYMMETRY

In May 1993, 20–30 nymphs from sites 1–4 (above) were kept frozen at −74 °C for allozyme analysis. All individuals were of the same life-history stage/size range (head width > 0·95 mm) and were morphologically confirmed to be the same species (Stewart & Stark 1993). Following removal of legs for the asymmetry analyses, frozen carcasses were homogenized in 5 µl of Tris–glycine buffer and screened for polymorphic (variable) loci using cellulose acetate electrophoresis (Hebert & Beaton 1989) for the following enzymes: amylase (AMY), aldehyde oxidase (AO; EC 1.2.3.1), arginine kinase (ARK; EC 2.7.3.3), glucose-6-phosphate isomerase (GPI; EC 5.3.1.9), glyceraldehyde-3-phosphate dehydrogenase (G3PDH; EC 1.2.1.12), hexokinase (HEX; EC 2.7.1.1), lactate dehydrogenase (LDH; EC 1.1.1.27), mannose phosphate isomerase (MPI; EC 5.3.1.8) and phosphoglucomutase (PGM; EC 2.7.5.1). Following initial screening, four of the enzyme loci (ARK-1, GPI-1, MPI-1, PGM-1) were found to be polymorphic (showed allelic variation) at one or more sites and were selected for further analysis. For each site, we calculated the frequencies of alleles for each locus, as well as the proportion of heterozygotes (Hexp), based on Hardy–Weinberg equilibrium. To assess genetic differentiation among sites we used Wright’s (1978) FST and Nei’s (1978) unbiased genetic distance. All calculations were performed using BIOSYS-1 (version 1.7; Swofford & Selander 1981).

STATISTICAL ANALYSES

For each measure of FA (i.e. front, middle and back leg pairs) we calculated for each individual: (i) the signed difference between left and right (l − r); (ii) the absolute difference between left and right (/l − r/); and (iii) mean character size [(l + r)/2]. In some cases, it was not possible to assess all measures on a single specimen (e.g. missing legs). Accordingly, sample sizes varied slightly among the individual measures. There was no meaningful correlation between mean character size [(li + ri)/2] and measures of FA (r2 < 0·01, P = 0·5); accordingly, measures were not scaled for size.

To evaluate the possibility of anti-symmetry (i.e. a tendency away from bilateral symmetry), a Wilk–Shapiro statistic was used to test for normality. To test for any directional asymmetry (i.e. biased to one side), a paired t-test was used to determine whether the mean of signed differences between left and right sides was significantly different (P < 0·05) from zero. To test for differences between treatments or sites we used both parametric and non-parametric (Kruskal–Wallis) anova applied to: (i) the absolute differences (/l − r/) for each treatment/site (Palmer & Strobeck 1986; Leung, Forbes & Houle 2000); and (ii) a composite fluctuating asymmetry (CFA) score derived by summing FA characters for each individual prior to analysis (Leung, Forbes & Houle 2000).

Correlation analyses (Pearson’s) were used to test for relationships between: (i) each of the asymmetry measures (i.e. concordance); and (ii) FA and mean levels of heterozygosity (Hexp) for each site using both absolute values (/l − r/) and Kruskal–Wallis ranks. All tests were performed using Statistix for Windows (Version 2, Analytical Software, Tallahassee, FL).

SURVEY OF THE LITERATURE

To assess the broader utility of FA as a method to detect sublethal environmental impacts, we surveyed the published literature for all experimental studies published after 1996 (i.e. the last 5 years). We used this time period to supplement Leung & Forbes’ (1996) compilation of studies up to 1996. We did not undertake a formal meta-analysis because our goal was not to determine whether an overall relationship between FA and stress was detectable statistically (Leung & Forbes 1996; Leung, Forbes & Houle 2000). Rather, our goal was to assess the frequency with which individual studies were successful in detecting an association between FA and stress. We felt this was a more appropriate analysis for field workers and managers who might wish to evaluate the use of FA as an early indicator of environmental conditions.

We searched Biological Abstracts (BIOSIS Inc., Philadelphia, PA) and Current Contents (Institute for Scientific Information, Philadelphia, PA) for articles published between 1997 and 2001 using combinations of the key words ‘stress’, ‘environmental stress’, ‘fluctuating asymmetry’ and ‘developmental instability’. For each study, we recorded: (i) the taxon; (ii) whether an association between stress and FA was detected; (iii) the number and type (metric or meristic) of traits analysed; (iv) the type of stressor; (v) the design of the experiment; (vi) whether FA was examined separately for each trait or as a composite measure; and (vii) whether the authors had examined the genetic structure of the study population. A few studies examined more than one source of stress; we considered the results for each separately.

Results

CONTROL VS. EXPERIMENTAL CHANNELS

For a total of 641 FA measures (individual front, middle and back leg pair spine counts) taken from the control and experimental channels in Valley Spring (Fig. 3), signed left vs. right differences were normally distributed (Wilk–Shapiro) and differences between left vs. right were not significantly different from zero (paired t-test, P > 0·05 in all cases, corrected for multiple comparisons). For those individuals having all three leg pairs intact (n = 172), there was no concordance between the signed (l − r) or absolute differences (/l − r/) for one measure vs. another (r2 < 0·01, P > 0·05 in all cases). Accordingly, we considered all measures to be independent of each other.

Figure 3.

The frequency of signed left minus right differences in tarsal spine counts taken from individual leg pairs of Nemoura trispinosa in Valley Spring, Canada, (n = 641).

For individual N. trispinosa, differences in FA between control and experimental channels of Valley Spring varied depending on the character chosen and study year (Fig. 4). However, none was significant (Table 1) or showed consistent trends in response to the manipulation. Combining the character measures for each individual prior to analysis (Leary & Allendorf 1989; Leung, Forbes & Houle 2000) also provided no obvious trend in FA between the control and experimental channels (Fig. 5).

Figure 4.

Fluctuating asymmetry scores (Σ[/li − ri/]/n ± 1 SE) for spines on the front, middle and back leg pairs of Nemoura trispinosa collected from control (ambient; open bars) and experimental (ambient + 2−3·5 °C; stippled bars) channels of Valley Spring, Canada, before (year 1) and during (years 2 and 3) the temperature manipulation. Sample sizes are provided above the standard error bars

Table 1. anova of mean fluctuating asymmetry (FA) values for nymphal Nemoura trispinosa between control and experimental channels of Valley Spring, Canada (site 1), before (year 1) and after (years 2 and 3) the temperature manipulation (see text for details), for each of three traits (tibial spines on front, middle and back legs), and a composite FA score (CFA; values for front, middle and back legs were summed for individuals having all legs pairs intact prior to analysis)
YearTraitd.f.MSFP
Year 1Front legs1,811·110·600·44
 Middle legs1,853·722·800·10
 Back legs1,870·010·000·95
 CFA1,640·000·000·98
Year 2Front legs1,440·230·270·61
 Middle legs1,452·582·860·10
 Back legs1,423·612·050·16
 CFA1,390·050·010·91
Year 3Front legs1,811·960·860·36
 Middle legs1,885·082·290·13
 Back legs1,874·010·950·33
CFA 1,638·401·080·30
Figure 5.

Composite fluctuating asymmetry (CFA) scores (absolute asymmetry values [/l − r/] for spines on front, middle and back leg pairs combined for each individual) for Nemoura trispinosa collected from control (open bars) and experimental (stippled bars) channels of Valley Spring, Canada, before (year 1) and during (years 2 and 3) the temperature manipulation. Sample sizes are provided above standard error bars

REFERENCE SITES

Individuals collected from our ‘reference’ sites showed considerable variation in FA among locations (Fig. 6). The highest levels of FA were found in sites nearest Toronto (a major urban centre), while generally lower levels of FA were observed at the more northerly sites (numbers 4 and 5; Fig. 6). Differences were marginally significant for both the back leg FA and the composite FA (Table 2).

Figure 6.

Composite fluctuating asymmetry (CFA) scores (absolute asymmetry values [/l − r/] for spines on front, middle and back legs combined for each individual) for Nemoura trispinosa collected from six sites in southern Ontario, Canada, in May 1993 (year 3 of the study). Site numbers correspond to Fig. 1; 1C and 1E refer to control and experimental channels of Valley Spring, Canada, respectively.

Table 2. anova of mean fluctuating asymmetry (FA) values for nymphal Nemoura trispinosa collected from six reference sites (see text for details), for each of three traits (tibial spines on front, middle and back legs), and a composite FA score (CFA; values for front, middle and back legs were summed for individuals having all legs pairs intact prior to analysis)
Traitd.f.MSFP
Front legs5,118 1·750·750·59
Middle legs5,120 2·581·600·16
Back legs5,119 9·642·380·04
CFA5,9322·332·190·06

Results of the allozyme analysis indicated ‘weak to moderate’ (sensuWright 1978) levels of genetic differentiation for sites within 150 km (Wright’s FST mean = 0·07, maximum = 0·13). Values for Nei’s (1978) unbiased genetic distance were low (< 0·05 in all cases) but generally increased with increasing geographical distance between sites (Fig. 7).

Figure 7.

Dendrogram of Nei’s (1978) unbiased genetic distances for Nemoura trispinosa collected from five sites in southern Ontario, Canada, in May 1993.

For each measure of FA there was a negative correlation between mean levels of heterozygosity (Hexp) at a site and levels of FA [r2 range: (−)0·20 − (−)0·99; Table 3]. Because of the small sample size for number of sites (n = 6), many of these correlations were not statistically significant after Bonferroni correction for multiple tests. However, in every comparison correlation coefficients were negative, a pattern that would not be expected by chance alone.

Table 3.  Matrix of correlations (r-values) between expected heterozygosity (Hexp) for five populations of Nemoura trispinosa in southern Ontario, Canada, and levels of fluctuating asymmetry in the number of tibial spines and lengths of tibial segments for each leg pair. Two methods of calculating fluctuating asymmetry (FA) were used, mean absolute differences ([/li − ri/]/n) and mean Kruskal–Wallis (K−W) ranks. Differences significant at the P < 0·10, 0·05 and 0·01 levels (adjusted for multiple comparisons) are indicated with asterixes (*, **, ***, respectively)
FA calculationFront legsMiddle legsBack legsTotal
SpinesLengthSpinesLengthSpinesLengthSpinesLength
(/li − ri/)/n r−0·76−0·53−0·81*−0·94**−0·64−0·90**−0·80−0·64
K−W ranks r−0·82*−0·32−0·88**−0·99***−0·66−0·88**−0·90**−0·73

LITERATURE SURVEY

We found 39 studies published since 1996 that used experimental methods to test for an association between FA and stress (Table 4). A few of these studies examined > 1 stressor, yielding a total of 44 comparisons (see the Appendix). Of these, 25 (56·8%) found an association of FA with stress, while 19 (43·2%) did not (Table 4). The taxonomic group did not influence whether an association between FA and stress was detected (G-test with William’s correction, Gw = 1·04, d.f. = 2, P = 0·59, pooling taxa into three categories: terrestrial vertebrates, invertebrates and plants). However, only 10 of 22 (45·5%) experimental studies on insects recorded a significant effect of stress on FA (Table 4).

Table 4.  Summary of recent (post-1996) experimental studies examining the relationship between fluctuating asymmetry (FA) and stress
GroupStressorAssociation of FA with stress
YesNoTotal
MammalHabitat change 1  1
 Toxic  1 1
BirdCompetition 1  1
 Food  1 1
 Parasites/pathogens 2  2
LizardToxic  1 1
 Temperature 1  1
FishCompetition 1  1
 Toxic 1 1 2
 Temperature 1  1
InsectCompetition 1 1 2
 Food 2 6 8
 Genetic  1 1
 Parasites/pathogens 1  1
 Toxic 1 3 4
 Temperature 5 1 6
CrustaceanTemperature 1  1
PlantCompetition 1  1
 Genetic 1  1
 Parasites/pathogens  1 1
 Predation 2 1 3
 Toxic 2 1 3
Grand total 251944

Some sources of stress were more likely to influence FA (Table 4). Eight of nine (88·8%) studies using temperature and nine of 12 (75%) studies using biological ‘antagonists’ (parasites, pathogens, competitors) detected relationships with FA, whereas only two of nine (22·2%) studies using food and four of 11 (36·4%) studies using toxic stress detected an association with FA (Gw = 10·51, d.f. = 3, P = 0·02). There was no effect of the type of trait (metric, meristic or both, Gw = 1·16, d.f. = 2, P = 0·56) or the number of traits on the likelihood that an association of stress with FA was found (mean number of traits for studies detecting an association: 3·60 ± 0·57 SE; mean number of traits for studies with no association: 4·32 ± 0·82, t-test with unequal variances, t = 0·71, d.f. = 34, P > 0·47). Of the 44 comparisons reported, only 25 (56·8%) examined or controlled some aspect of the genetic structure of the populations under study (see the Appendix).

Discussion

The application of individual-based assessment techniques such as measurements of FA to detect ecosystem stress has become increasingly popular (Leary & Allendorf 1989). Indeed, Clarke (1993) has suggested that when an analysis of FA indicates no biological stress, it would be reasonable to assume that the population (and hence the ecosystem) is not being adversely affected. In stark contrast, our analysis of N. trispinosa failed to consistently detect changes in FA in response to the temperature manipulation, despite earlier studies indicating that this species was affected by the thermal manipulation (Hogg & Williams 1996; see below). No obvious or interpretable response of FA to the temperature manipulation was derived from any single character measure, nor were consistent patterns observed among characters or years (Fig. 4). Combining measures, although likely to have provided a more reliable assessment (Leary & Allendorf 1989; Leung, Forbes & Houle 2000), still revealed no clear pattern (Fig. 5).

Why were we unable to consistently detect responses in FA to the temperature manipulation? One explanation may be that the temperature increase (2–3·5 °C) was not highly stressful to N. trispinosa populations. Soulé (1982) has questioned whether small changes in environmental temperature are stressful to natural populations, given the wide range of temperatures to which organisms are exposed during development. However, despite the wide range in annual temperatures characteristic of many aquatic systems, most follow temporally predictable patterns (Vannote & Sweeney 1980), and hence deviation from the predictable may be stressful. Our sustained (2 years) manipulation of temperature resulted in measurable and consistent (both years, both males and females) responses in life-history end-points indicative of reduced fitness (e.g. reduced body size/body mass at maturity; Hogg & Williams 1996), demonstrating that the manipulation had a negative effect on N. trispinosa.

A second possibility is that our reliance on a small number of traits (i.e. leg spines on front, middle and back legs) may have lacked sufficient power to detect a change in FA. However, Clarke, Brand & Whitten (1986), Clarke & McKenzie (1992) and Leary, Allendorf & Knudsen (1992) successfully used similar (meristic) measures to demonstrate a relationship between FA and environmental conditions. Moreover, we were able to detect significant variation in FA among sites (Table 2) using these characters: if an effect of our temperature manipulation had been apparent, we should have been able to detect it. Further, our survey of a large number of experimental studies indicated that the likelihood of finding an association between FA and stress did not depend on the number of traits examined (Table 4).

A third explanation for our results may be that both the control and experimental populations already exhibited high levels of FA as a consequence of pre-existing conditions at the site (e.g. water temperatures, groundwater contamination) and hence we may have been unable to separate the effects of the manipulation from background levels. There is some evidence of elevated levels of FA in Valley Spring (sites 1C, 1E) compared with the more northern sites (i.e. sites 4 and 5; Fig. 6) although the effect this might have on the analysis is uncertain. In any event, these results question the usefulness of FA for discerning concomitant low-level environmental perturbations. Of the previous studies that have successfully demonstrated a link between FA and environmental stress, many have dealt with the effects of ‘toxic’ compounds such as industrial effluent, pesticides, and heavy metals (Valentine, Soulé & Samollow 1972; Ames, Felley & Smith 1979; Clarke 1993; Hogg, de Lafontaine & Eadie 1997; however, see also Leary, Allendorf & Knudsen 1992).

Finally, and perhaps more problematic, is that overall levels of developmental stability and hence FA may have been influenced by differences in the genetic structure of populations at the different sites. We found consistent negative correlations between mean levels of heterozygosity at a site (Hexp) and FA (Table 3), in accord with previous studies indicating that developmental stability may be lower when genomic heterozygosity is reduced (Eanes 1978; Kat 1982; Leary, Allendorf & Knudsen 1983). Moreover, while differences in allelic composition among our sites were generally low, genetic distance increased with increasing geographical distance (Fig. 7). The extent to which the limited number of loci used in our study is likely to be representative of the entire genome is uncertain. However, other benthic taxa also show considerable genetic differences among sites, even when considered over smaller geographical distances (Gooch & Hetrick 1979; Hogg et al. 1998), and some differentiation among populations is to be expected for most lotic species over their geographical ranges (Hogg, Eadie & de Lafontaine 1998).

These results have important implications for interpreting patterns of FA relative to the impact of environmental stress. For example, populations with high levels of genetic heterozygosity may be more ‘buffered’ from the effects of an environmental perturbation and so would exhibit a reduced response as measured by FA. Conversely, populations with low levels of heterozygosity may be more likely to exhibit increased levels of FA in response to the stress; in the extreme, these populations may already exhibit such high levels of FA that any additional effect of a new perturbation would be obscured. In any case, it is clear that evaluation and interpretation of differences in FA among sites (or among control and experimental populations) cannot be undertaken without a thorough evaluation of the genetic structure of the target populations. Interestingly, of the previous studies that have evaluated FA as a measure of environmental stress, only slightly more than half (56%) have assessed simultaneously the genetic structure of the study populations (see the Appendix).

The large variation in FA that we observed among the spatially separated reference sites also suggests that great care must be taken in selecting appropriate control sites. For example, had we chosen sites 4 or 5 as controls in our study (Fig. 6), we may well have concluded that the temperature manipulation had a significant effect on the magnitude of FA. Accordingly, the use of ‘before and after’ and/or paired-systems sampling approaches for interpreting FA data are essential.

Our review of recent experimental studies indicates that the results of our study are not unique. Of the 44 studies since 1996 that used an experimental approach to test for an association of FA with stress, almost half (43%) failed to detect any relationship, despite carefully controlled conditions. This result did not depend on the taxonomic group, although it is interesting to note that studies of insects were less likely to detect an association between stress and FA (45·5% of studies) compared with any other taxon (Table 4). The type of stressor did, however, influence whether an association of stress and FA was found, suggesting that FA may be a good indicator for some types of stress but not for others. However, this does not explain the results of our present study as temperature stress proved to be a reliable inducement of FA in almost all studies (eight of nine).

We concur with Leary & Allendorf (1989) that FA is not likely to be useful for detecting all environmental perturbations, and that studies such as ours are necessary to determine which species and characters may be useful indicators of potential environmental stress. Specifically, our results suggest that FA may be unreliable in detecting biological change resulting from low-level temperature shifts for N. trispinosa; we caution against relying on this technique for assessing subtle temperature shifts in other species or systems (e.g. to assess impacts of global warming).

Given the range of environmental pressures now being experienced by many aquatic and terrestrial ecosystems, there is clearly a need to develop rapid, reliable and efficient methods to assess potential environmental stresses before large-scale demographic or fitness consequences are evident. Measures of developmental stability may offer some promise, but clearly require further evaluation in the context of the genetic structure and developmental architecture of the species of interest before widespread use of these indices is encouraged.

Acknowledgements

We are grateful to S.J. Ormerod and two anonymous referees for their thoughtful and constructive comments. We thank N.C. Collins, and W.G. Sprules for helpful comments on the study and an earlier draft of the manuscript, S. Butt, P. Mathur, A. Quin, M. Stutman for assistance in processing the samples, and A. Tavares for providing the drawing for Fig. 2. Access to private property was kindly provided by H. Atwood and S. Barker. Funding was provided through Natural Sciences and Engineering Research Council (NSERC) of Canada operating grants to D.D. Williams and J.M. Eadie. Additional support was provided through an Ontario Graduate Scholarship, University of Toronto Open Fellowship, a Visiting Fellowship in a Canadian Government Laboratory to I.D. Hogg, and a NSERC summer scholarship to S.A. Butt. Additional logistic support was provided by the Centre Saint-Laurent, Environment Canada.

Appendix

Table Appendix.  Summary of recent (post-1996) experimental studies examining the association between a specified stress and levels of FA. I = individual; C = composite
OrganismStressor(s)Number/type of traitsExperimental designAnalysesGenetic analysisReference (individual/composite)
No association
Insect (hemipteran)Food16 metric lengths/shapes of legs/spines, genitaliaLaboratory replicatedI, CBreeding programmeArnqvist & Thornhill 1998
Insect (dipteran)Food3 metric eyestalk lengths, wing lengthsLaboratory replicatedINoDavid et al. 1998
Insect (coleopteran)Diet *association, but not consistent1 metric pronotumLaboratory replicatedINoPreziosi et al. 1999
Insect (dipteran)Food3 metric eyestalk lengths, wing lengthsLaboratory replicatedI, CBreeding programmeBjorksten et al. 2000
Insect (dipteran)Food, ethanol, cold shock *association for one trait (orbital bristles)3 metric wings, 2 meristic stenopleural and orbital bristlesLaboratory repeated-measuresI, CYesWoods et al. 1999
Insect (dipteran)Rearing density1 metric wingsLaboratory replicatedIYesChapman & Goulson 2000
Insect (dipteran)Food, inbreeding4 metric wing lengths, tibial lengthsLaboratory repeated-measuresIYesHosken, Blanckenhorn & Ward 2000
Insect (dipteran)Pesticide (ivermectin)5 metric wingsLaboratory replicatedINoFloate & Fox 2000
Insect (odonate)Insecticide (carbaryl)6 meristic wing cells 2 metric wing lengthsField mesocosms replicatedI, CNoHardersen, Wratten & Frampton 1999
Insect (odonate)Insecticide (carbaryl) *association for meristic traits3 meristic wing cells, 1 metric wing lengthsLaboratory replicatedI, CNoHardersen 2000
Insect (plecopteran)Temperature3 meristic tibial spinesField ecosystem BACII, CYesPresent study
LizardHormone (testosterone)2 metric colour patternsField repeated-measuresINoVeiga et al. 1997
Fish (grayling)Neurotoxin (methylmercury)4 meristic fin rays, gill rakers, 3 metric eye diameters jaw lengthLaboratory unreplicatedI, CNoVøllestad et al. 1998
Bird (tree swallow)Food4 metric tarsus, 9th primaries, outer retrices, massField replicatedINoHovorka & Robertson 2000
Mammal (mice)Insecticide (methoxychlor)10 metric mandibleLaboratory replicatedI, CParent–offspringLeamy, Doster & Huet-Hudson 1999
Plant (soybean)Salinity4 metric leaf veinsLaboratory replicatedI, CNoAnne et al. 1998
Plant (willow)Water, pathogen, competition1 metric leaf veinsField split-plotIMentionHochwender & Fritz 1999
Plant (birch)Defoliation1 metric leaf widthField experimentIClonedLappalainen et al. 2000
Association
Insect (dipteran)Temperature4 metric wing lengths, tibial lengthsLaboratory repeated-measuresIYesHosken, Blanckenhorn & Ward 2000
Insect (dipteran)Temperature1 metric wing lengths 2 meristic stenopleural chaetae, arista branchLaboratory replicatedIMentionImasheva et al. 1997
Insect (dipteran)Food1 metric wing lengths 2 meristic stenopleural chaetae, arista branchLaboratory replicatedIMentionImasheva, Bosenko & Bubli 1999
Insect (coleopteran)Food2 metric horn lengths, elytra lengthsLaboratory replicatedINoHunt & Simmons 1997
Insect (dipteran)Temperature4 meristic wing bristles, head stripeLaboratory replicatedI, CYesMcKenzie 1997
Insect (dipteran)Ectoparasites2 meristic thoracic bristlesLaboratory replicatedILaboratory stockPolak 1997
Insect (odonate)Insecticide (carbaryl)2 metric wing lengths 6 meristic wing cellsLaboratory replicatedI, CNoHardersen 2000
Insect (dipteran)Competition7 metric wings, tibia, femur, setaeLaboratory replicatedILaboratory stockBlanckenhorn, Reusch & Mühlhäuser 1998
Insect (dipteran)Temperature1 meristic stenopleural bristlesLaboratory replicatedIYesBubli, Loeschcke & Imasheva 2000
Insect (dipteran)Temperature1 metric wingsLaboratory replicatedIYesChapman & Goulson 2000
Crustacean (isopod)Temperature1 metric antennae, 1 meristic antennaeLaboratory replicatedINoSavage & Hogarth 1999
Fish (coho salmon)Hatchery conditions1 meristic fin raysHatchery replicatedI, C?He measuredCampbell 2000
Fish (coho salmon)Temperature5 meristic mandibular pores, gill rakers, fin raysLaboratory replicatedI, CBreeding programmeCampbell, Emlen & Hershberger 1998
Fish (mosquitofish, sand shiner)Insecticide (parathion, lindane)7 metric external morphologyLaboratory replicatedI, CNoAllenbach, Sullivan & Lydy 1999
LizardTemperature5 metric head features, limbs 3 meristic ventral plates, scalesLaboratory repeated-measuresINoBraña & Ji 2000
Bird (swallow)Weather increased brood size (competition)3 metric tarsus, tail, wingField experimental (cross-fostering)I, CSib-analysisCadée 2000
Bird (swallow)Parasitism1 metric tail lengthsField manipulationINoShykoff & Møller 1999
Bird (quail)Non-pathogenic antigens10 metric feathersLaboratory replicatedINoFair, Hansen & Ricklefs 1999
Mammal (shrew)Habitat modification8 metric mandiblesField experimentalINoBadyaev, Foresman & Fernandes 2000
Plant (wild mustard)Boron, salt, water, light, nutrients5 metric cotyledons, leaf shape, petals, seed podsLaboratory replicatedIMentionRoy & Stanton 1999
Plant (birch)Nitrogen fertilization1 metric leaf widthsField experimentIClonedLappalainen et al. 2000
Plant (Lychnis)Hybridization1 metric petal lengthsField and greenhouse experimentalIClonedSiikamaki 1999
Plant (poplar)Competition1 metric leaf widthsField experimentalIClonesRettig et al. 1997
Plant (willow)Defoliation1 metric leaf widthsField experimentalINoZvereva, Kozlov & Haukioja 1997a
Plant (willow)Herbivory, plant density1 metric leaf widthsField experimentalINoZvereva et al. 1997b

Ancillary