Monitoring a marine ecosystem using responses of upper trophic level predators


  • I. L. Boyd,

    Corresponding author
    1. British Antarctic Survey, NERC, High Cross, Madingley Road, Cambridge CB3 OET, UK
      Prof. I.L. Boyd, Sea Mammal Research Unit, Gatty Marine Laboratory, University of St Andrews Fife KY16 8LB, UK Tel. and Fax: 44-1334-462630.
    Search for more papers by this author
  • A. W. A. Murray

    1. British Antarctic Survey, NERC, High Cross, Madingley Road, Cambridge CB3 OET, UK
    Search for more papers by this author

Prof. I.L. Boyd, Sea Mammal Research Unit, Gatty Marine Laboratory, University of St Andrews Fife KY16 8LB, UK Tel. and Fax: 44-1334-462630.


  • 1This study examined the changing status of the marine ecosystem at the island of South Georgia (Southern Ocean) using up to 27 variables measured over 22 years from three upper trophic level predators that specialize in foraging upon krill (Eupuasia superba Dana). These variables included population size, breeding performance, offspring growth rate, foraging behaviour and diet. A method was developed for reducing these multivariate time-series to a single vector, called a combined standardized index (CSI).
  • 2Sensitivity analyses showed that missing values had a large effect upon the accuracy of the CSI but this effect was reduced if the individual variables were highly correlated. The level of correlation and proportion of missing values within the empirical data set were within the acceptable range. Individual variables had widely varying influence upon the CSI but, in general, those with longer time-series had the greatest influence.
  • 3Principal components analysis showed that variables representing offspring growth tended to explain the greatest proportion of the variability in the CSI and this was followed by variables representing diet.
  • 4There were 3 years in which the CSI showed extreme and significantly low values. There was a significant non-linear functional response (similar to the Holling Type II functional response) between the overall CSI and krill biomass and a similar relationship existed when the CSI was calculated for each species individually.
  • 5Separate analysis of variables that were likely to be representative of changing population size showed the presence of a significant decline between 1977 and 1998. There was no trend in the CSI from variables representative of foraging conditions during the summer breeding season. The study has shown that the marine ecosystem at South Georgia shows acute but transient variability that is amplified in the response of upper trophic-level predators. There is less certainty that trends in populations are a consequence of shifts in the degree to which the ecosystem can support krill-feeding seals and penguins.


Ecosystems may be viewed as complex systems which exhibit some degree of self-organization and that may have multiple stable states (Kay 1991; Schneider & Kay 1994; Bellamy & Lowes 1999). In practice the complexity of ecosystems means that their status can normally only be represented by proxy measurements, such as the abundance of selected species. Interpretation then becomes difficult when a large number of variables are included in the assessment. In general, the larger the number of variables the more complete will be the view that is generated of the ecosystem. In these circumstances it may be possible to generate an index, or groups of indices, that reduce this multidimensional problem to a smaller number of dimensions. For example, stock market indices are used to reduce the complex variability within a range of individual stocks into a manageable single dimension (Bonanno, Vandewalle & Mantegna 2000). When such an index is generated consistently through time it may be used to show trends or variability. In the context of ecosystems this has the potential to be used to interpret ecosystem stability.

This study examined the problem of how to generate and interpret an index of ecosystem variability from multiple time-series of variables. This is a particularly difficult technical problem because standard multivariate techniques do not lend themselves to the analysis of time-series. This arises because they do not take account of autocorrelation. An advantage of combining variables into a single index is that it permits extension of the multivariate time-series in cases where there are missing values in a subset of variables. It can also integrate biological variability across space and time scales to give a more robust summary than is possible using a single variable. In this case, we examined multiple time-series measured from predators at the top of the Antarctic marine food chain. Following the methods of De la Mare & Constable (2000), we derived a single combined time-series, which we have called the combine standardized index (CSI), from these multiple series. The principle underlying the study is that these predators will tend to exhibit changes in their populations, diet, reproductive performance and foraging behaviour that reflect the status of their food supply (Croxall et al. 1988). In this case the common food supply for these predators was Antarctic krill, a key component of the Antarctic marine food chain (Everson 1984), and all the predators depended to a greater or lesser extent on krill within the same geographical region (Boyd et al. 1998; Trathan et al. 1998). We then examined the form of the relationship between the index derived from predators and a direct measure of krill density in this region to test the hypothesis that the derived index may be useful for examining the functional responses of predators to prey abundance.

In large-scale marine ecosystems temporal variability is likely to take place across scales of years to decades (Alverson 1992; Latif & Barnett 1994; Levitus & Antonov 1995; Polovina, Mitchum & Evans 1995; Sugimoto & Tadokoro 1998). It is difficult to maintain continuous and consistent observation of these ecosystems across such long time periods. For a variety of reasons, data sets used to monitor ecological processes over extended periods are often incomplete. They may also be composed of variables made up from data with different distributional characteristics (e.g. counts, proportions, masses). These problems arise because of the need to gather data about a wide ranges of different processes and also because of the need to keep data collection procedures simple and standardized when data are being gathered over many years (Quetin & Ross 1992; Agnew 1997). Therefore, a further aim of the present study was to examine the practical problems created by having incomplete data sets. This was performed by simulating a time-series similar to the empirical series for Southern Ocean fur seals and penguins and then investigating the effects that the introduction of missing data had upon the performance of indices derived from these data.

Materials and methods

The variables being considered here took the form of time-series of variables which we refer to as column vectors. We examined several possible ways of combining vectors. Analyses were based upon a data matrix X made up of N vectors at least one of which had data present across M years. The variable types and the data matrix structure for the example being examined in the present study are given in Tables 1 and 2, respectively. A data matrix Y was used to define the presence and absence of missing values in X. It was composed of N vectors of order M which took the value of unity for each measurement present in X and zero for each absent value.

Table 1.  Indices measured in three species of krill predators from Bird Island, South Georgia. Transformations applied to each variable reflect both the distributional characteristics of the variable and the underlying premise that when variables are high then this reflects relatively good environmental conditions and when they are low this reflects relatively poor environmental condition. Logit transformations involving proportions of 0 or 1 were undertaken after adding or subtracting 1 × 10−3 to these values
VariableIndex (Mhk)SpeciesTransformationMethod
1·1Meal massMacaroni penguinMhkMass of stomach contents of adults returning to feed chicks
1·2Frequency of fishMacaroni penguinln[Mhk/(1−Mhk)]Proportion of stomachs sampled in variable 1 containing fish
1·3Percentage krillMacaroni penguinln[Mhk/(1−Mhk)]Proportion by mass of krill in the stomach samples in variable 1
1·4Frequency of krillMacaroni penguinln[Mhk/(1−Mhk)]Proportion of stomachs sampled in variable 1 containing krill
1·5Fledging massMacaroni penguinMhkMass of chicks at a specific date after the median date of egg laying
1·6Breeding successMacaroni penguinln[Mhk/(1−Mhk)]Proportion of nests that failed between courtship and incubation
1·7Meal massGentoo penguinMhkMass of stomach contents of adults returning to feed chicks
1·8Frequency of fishGentoo penguinln[Mhk/(1−Mhk)]Proportion of stomachs sampled in variable 7 with fish
1·9Percentage krillGentoo penguinln[Mhk/(1−Mhk)]Proportion by mass of krill in the stomach samples in parameter 7
1·10Frequency of krillGentoo penguinln[Mhk/(1−Mhk)]Proportion of stomachs sampled in variable 7 with krill
1·11Fledging massGentoo penguinMhkMass of chicks at a specific date after the median date of egg laying
1·12Breeding successGentoo penguinln[Mhk/(1−Mhk)]Proportion of nests that failed between courtship and incubation
1·13Frequency of fishAntarctic fur sealln[Mhk/(1−Mhk)]Proportion of scats collected during lactation that contained fish
1·14Frequency of krillAntarctic fur sealln[Mhk/(1−Mhk)]Proportion of scats collected during lactation that contained krill
1·15Forging trip durationAntarctic fur seal− lnMhkDuration of first 6 foraging trips to sea by females after parturition
1·16Pup growth (female)Antarctic fur seallnMhkRandom sample of 50 pups weighed once a month during the 4 months of lactation. Inclusive of birth mass.
1·17Pup growth differenceAntarctic fur seallnMhkRandom sample of 50 pups weighed once a month during the 4 months of lactation. Inclusive of birth mass. Difference between male and female
1·18Weaning mass (female)Antarctic fur sealMhkMass of pups at 3 months of age
1·19Weaning mass differenceAntarctic fur sealMhkDifference between the weaning mass of male and female pups
1·20Post-partum pup survivalAntarctic fur sealln[Mhk/(1−Mhk)]Proportion of pups surviving the first month of lactation
2·1Breeding population sizeMacaroni penguinMhkNumber of breeding pairs in a defined colony
2·2Breeding population sizeGentoo penguinMhkNumber of breeding pairs in a defined colony
2·3Timing of reproductionGentoo penguinMhkMean date before 31 Dec. on which pairs layed their first egg
2·4Timing of reproductionAntarctic fur sealMhkDate before 31 Dec. when 50% of pups were born
2·6Number of pups bornAntarctic fur sealMhkTotal number of pups born in a defined colony
2·7Pregnancy rateAntarctic fur sealln[Mhk/(1−Mhk)]Proportion of adult females pregnant in a defined colony
2·8Survival rateAntarctic fur sealln[Mhk/(1−Mhk)]Proportion of adult females surviving the previous winter
Table 2.  Data matrices used in the current analysis. (a) Matrices of summer variables and (b) matrices of population variables. See Table 1 for a description of the variables and the transformations applied to each. Thirty-six percent of values in the matrix formed by combining (a) and (b) are missing
1977 692****0·48857****0·60********
1978 *****0·25*****0·01********
1979 *****0·47*****0·29**130**11·71·60·69
1980 508****0·60790****0·58********
1981 *****0·53******** 89**11·92·70·83
1982 *****0·51*****0·05** 75****0·78
1983 *****0·49*****0·51********
1984 *****0·09*****0·29**1640·0560·008 9·71·50·71
1985 520****0·48830****0·43** 780·0930·01312·31·80·86
1986 448****0·50878****0·42** 790·0890·02212·62·30·78
1987 *****0·36604****0·43**1780·0840·02012·32·30·79
1988 *****0·36950****0·47**1070·0800·02111·630·77
1989 362*0·97*34470·61683*0·84*54640·46**1240·0800·02411·82·90·78
1990 3180·031·001·0032380·594520·630·590·8558000·36** 790·0720·01511·21·90·77
1991 2120·030·690·9031120·673170·890·180·6250430·010·380·852030·0700·01011·51·20·74
1992 4640·100·991·0035070·416430·800·500·9857920·630·901·00 970·0870·01312·820·93
1993 3710·280·820·9833180·554220·330·850·9854820·800·601·001230·0850·02012·52·50·81
1994 2760·450·110·4029130·46 820·960·130·4250650·040·370·781850·0600·00110·71·20·67
1995 2120·400·550·8330260·516790·800·540·7552390·580·550·951030·0750·02511·22·70·87
1996 2110·061·001·0031730·457201·000·250·5359420·790·710·98 900·0840·01711·82·50·83
1997 5240·050·960·9732950·488280·930·390·7959640·500·451·00 970·0840·02411·93·10·86
1998 2920·330·690·9034060·453820·330·820·9548100·000·480·991570·0800·00411·60·90·72
(b)   2·12·22·32·42·52·62·7             
1988 9463014 5227950·820·77             
1990 993497627268220·850·69             
1991 8352244 6235350·710·77             
1992 952360615217140·820·84             
1993 945340545207620·870·81             
1994 912279616227070·880·62             
1995 642215216185840·760·77             
1996 640240420217970·860·83             
1997 620281333267320·860·75             
1998 580251417185010·86*             

The computer program used in this analysis is available at

Combined standardized index (csi )

The combined standardized index was based on the principle that measurements collected at a common time base can be combined, after suitable transformation and standardization, by expressing them as the deviation of the combined values from a long-term mean (Gaston & McArdle 1994). In this study, we examined the performance of three approaches to calculation of a combined standardized index and each of these is described formally in Appendix I. The general approach is similar to that adopted by De la Mare & Constable (2000).

The first, and simplest, approach involved calculating the mean of all the observations in a particular time interval. There may be disadvantages associated with doing this if there are missing values in the time-series because, depending on the pattern of missing values, this could result in a bias in the index. The second approach was to calculate the covariance matrix for all the variables from that part of the data matrix in which there were no missing values. This could then be used to calculate the variance associated with each time interval. This variance produced a standard deviation which could then be used to re-standardize the sum of all the observations in a particular time interval. This was what we called a combined standardized index (CSI) and in the present study we called this particular index CSI(1). However, the necessity of having to use only the portion of the data matrix which was complete could result in reduced utility of the index because complete data may only be available from a relatively small part of the time-series. This is a statistical restriction because of the necessity for all of the eigenvalues of the covariance matrix to be positive. Therefore, the third approach was to calculate a covariance matrix using all data and then to smooth the covariance matrix to ensure that the eigenvalues were positive. The combined standardized index was then recalculated using the standard deviation derived from the smoothed matrix. In the present study we called this CSI(2).

Empirical data set

All the data used in the present study relate to the summer foraging and breeding success of three species that breed at Bird Island, South Georgia (54°S, 38°W). The species were Antarctic fur seal (Arctocephalus gazella Peters), gentoo penguin (Pygoscelis papua Forster) and macaroni penguin (Eudyptes chrysolophus Brandt). All these species have Antarctic krill as the dominant constituent of their diets (Reid & Arnould 1996; Croxall et al. 1999). Foraging ranges (both geographical and in the water column) overlap during the summer when all three species are rearing young. Antarctic fur seals and the two penguins regularly forage to depths of > 50 m (Boyd & Croxall 1992; Williams et al. 1992; Croxall et al. 1993).

The variables (Table 1) were chosen to represent two types of process: indicators of performance that relate to the concurrent environmental conditions, i.e. only those that are considered to be responses to the food availability within the same breeding season (Table 2a), and those, such as the number of breeding pairs or offspring production, which represent population or demographic processes that may have been influenced by events in previous years (e.g. Croxall et al. 1988; Croxall & Rothery 1991; Lunn & Boyd 1993). The former types of variables were called summer variables and the latter were caller population variables. The methods used to measure each index are published by the Convention on the Conservation of Antarctic Marine Living Resources (CCAMLR 1997) and they have remained unchanged throughout the study. Each index is described briefly in Table 1; the data are given in Table 2.

Randomization tests (Manly 1991; Appendix I) were used to estimate the probabilities of specific values of a CSI occurring by chance. The CSI was related to krill abundance in the region close to Bird Island where these species have been shown to forage (e.g. Boyd et al. 1998). Krill abundance estimates were obtained using hydroacoustic surveys during the summer which is concurrent with the breeding season of the predator species examined in the present study (Brierley et al. 1999).

Testing between methods of calculating a combined standardized index

Comparison of the three methods for calculating a combined index, mean vector, CSI(1) (where only the portion of the data matrix with no missing values was used) and CSI(2) (where the whole data matrix was used) were made for a wide range of scenarios in which there were missing values. Simulated data vectors (Appendix I) were examined using 1000 randomly selected scenarios of missing values. Scenarios with relatively high levels of correlation between vectors were compared with scenarios in which there was relatively low correlation (Appendix I). The scenarios were designed to simulate the pattern of missing values that would be expected for multivariate time-series in which there has been an increase in the number of parameters being measured through time. Such a pattern is illustrated in the empirical data examined in this study (Table 2). Both the minimum number of complete vectors and the minimum number of data points present in each reduced vector that had to be present for a valid test were set to be 2. We calculated the correlation coefficient between the CSIs resulting from the reduced data set and the CSI from the full data set. The median, 95th and 5th percentile of these correlation coefficients were calculated for different combinations of incomplete vectors and missing values.

Testing the influence of different variables

Jackknife correlation analysis (Manly 1991), a non-parametric influence function (Davison & Hinkley 1997) and principal components analyses were used to examine the relative contribution that different variables made to the CSI. The method used to examine the relative influence of each variable is described in Appendix I.


Effects of missing values where variables were relatively uncorrelated

In cases when there was a low level of correlation between variables (Fig. 1a), the correlation between the actual index and the observed index declined as a greater proportion of data values were excluded from the data set (Fig. 2). All the methods used to calculate the index produced similar results, although there was a suggestion that the CSI(1) method was less robust to the effects of missing values when there were few vectors in the data set (Fig. 2a,iv). In terms of the median correlation coefficient it appeared that the performance of the CSI(2) method was either similar to or better than the other two methods. The mean vector method was particularly sensitive to missing values when the number of vectors exceeded 15. In addition, the range of the correlation coefficients differed between methods. When there was a small number of vectors in the data matrix the mean vector method appeared to give the least variable results and the CSI(1) index was most variable. However, the mean vector method showed greater variability when there was a large number of vectors in the matrix (Fig. 2).

Figure 1.

Frequency distributions of the correlation coefficients produced from bivariate correlation between all variables in a simulated data matrix with relatively low levels of correlation (a), relatively high levels of correlation (b) and for the empirical data matrix of variables from three upper trophic level predators measured during the summer breeding season (c).

Figure 2.

Plots of the relationship between the median correlation coefficient (a) for correlations between the CSI or mean vector calculated for a data matrix with missing values and for the complete data matrix. Also shown are the 95th percentile (b) and the 5th percentile (c). Examples are shown for cases with different numbers of variables (n). The amount of missing data is represented as the percentage of points missing from the data matrix. These plots are based upon a simulated data matrix in which there was a relatively low level of correlation between the variables as defined in Fig. 1a. The data given in each panel represent results using the mean vector (dotted line), CSI(1) (dashed line) and CSI(2) (solid line) methods of calculating a combined index. There are 22 time steps in the simulated data matrix as in the empirical time-series. The correlation coefficients are for the relationship between each index from the complete data matrix and the index from the data matrix after random deletion of data to simulate the pattern of missing data in the empirical time-series. Each diagram is compiled from 1000 simulations. A grey horizontal bar shows the correlation coefficient (d.f. = 21) above which correlations were considered to be significant at P < 0·05. Wrinkles in the curves represent stochastic variability in the estimation procedure.

Irrespective of which method is used to calculate a combined index this analysis showed that, when there was a low level of correlation between variables, the index was highly sensitive to the presence of missing values and that this sensitivity changed little with increasing numbers of vectors in the data matrix. The potential range of correlations given by the 95th and 5th percentiles (Fig. 2) show clearly that there is the potential to produce a combined index which is not well correlated with the real combined time-series even when there was a relatively small number of missing values.

Effects of missing values where variables were correlated

When individual vectors within the data matrix tend to be correlated (Fig. 1b), each data vector carries information about other vectors that may have missing values. Therefore the presence of correlations between variables has the potential to mitigate the effect of missing values. We constructed data sets that had a similar degree of internal correlation to the empirical data matrix (Fig. 1b,c) using the method described in Appendix I. This showed that there was reduced sensitivity to missing values as a result of these internal correlations. Comparing Fig. 2a–c,i with Fig. 3a–c shows a marked improvement, in any particular scenario of missing values, in the median correlation coefficient for all three forms of the CSI. However, CSI(1) showed an increased range of variability and the mean vector appeared to have a greatly reduced range of variability, as judged by the changes in the 5th percentile when increasing numbers of points were excluded (Fig. 3c).

Figure 3.

Plots as in Fig. 2 for simulated data matrices with 20 variables and over 22 years but where there was a relatively high degree of correlation between the variables as defined in Fig. 1b. The shaded horizontal bar shows the correlation coefficient (d.f. = 21) above which correlations were considered to be significant at P < 0·05.

In the present study, we used a time-series lasting 22 years and the main analysis using summer variables involved 20 variables. The critical value for a significant correlation at this sample size will be 0·4, which shows that, using the CSI(2) method of calculating the combined index, and based upon the 5th percentile in Fig. 2c,i the analysis will be robust to up to 40–50% of missing values, irrespective of the level of correlation between vectors. The data set of summer variables used in the present study had 39% of its values missing. Therefore, it is unlikely that the CSI(2) from the summer data set will be radically different from the true value had there been no missing values and this conclusion is strengthened by the relatively high degree of correlations between the variables (Fig. 1c).

Structure within the matrix of penguin and seal data

CSI(2) was used to examine the structure within the matrix of empirical data. The plot of CSI(2) calculated for the two subsets of the data is shown in Fig. 4a. These were represented by (1) those variables representing changes in the population size and demographics of the three species of predators and variables that are likely to have been most influenced by feeding conditions in previous years (the population time-series, Table 2b), and (2) those representing changes in the feeding conditions experienced by the three species and their breeding performance (the summer time-series, Table 2a). Whereas the first set of variables represented those indices that are likely to have been influenced by events in the past, the second set represented variables that were likely to be most representative of environmental conditions concurrent with the breeding season in each year. In only one year (1994) was there congruence of extreme values in both sets of data (Fig. 4a). The value of the CSI for the population data declined significantly with time (linear regression, Ho: slope = 0, t = 3·16, P < 0·005) whereas the value for the summer data did not decline. Lagged cross-correlations that included positive and negative lags of up to 10 years showed no significant relationships between the summer and population time-series.

Figure 4.

Plots of the CSI(2) for krill predators from Bird Island using the population and summer data sets (a) and for the summer data set alone (b). The thick line in (b) represents the CSI(2) calculated using all the available data. The dashed, thin and dotted lines represent the CSI(2) calculated using only the most important variables (eigenvector coefficient > 0·2) from principal components 1, 2 and 3. Circles of different size represent different probabilities of occurrence by chance derived from a randomization test.

Randomization testing of the summer time-series provided estimated probabilities associated with the specific values of the CSI(2) index. This showed that the extreme values observed during 1978, 1984 and 1994 had probabilities of occurring by chance that were in the range 0·005–0·01 (Fig. 4b).

Considering the summer time-series, 16 of the 20 variables used in the analysis provided a significant (P < 0·05) positive correlation with the CSI(2) (Table 3). Only two of the variables had a negative correlation coefficient with the CSI(2). Both were measures of the frequency of fish in the diet of penguins (Table 3) but the correlations were not significant. All those variables that were not correlated significantly with the CSI(2) represented those variables concerned with measurement of variation in the composition of the diet.

Table 3.  Jackknife correlation coefficients (r) and bias approximations (bh) for the relationship between each vector from the matrix of predator performance data from Bird Island and the CSI(2) when calculated without the specified variable in the matrix
VariableEmpirical time-series
1·1 0·697  0·006−0·5651·353
1·2−0·299  0·435−0·1501·180
1·3 0·815  0·004−0·2381·165
1·4 0·854  0·003−0·2791·154
1·5 0·711  0·021−0·2351·169
1·6 0·473  0·026 0·0301·537
1·7 0·831< 0·001 0·0871·213
1·8−0·115  0·768−0·1681·184
1·9 0·501  0·140−0·2341·180
1·10 0·588  0·096−0·2411·164
1·11 0·764  0·010−0·2191·169
1·12 0·730< 0·001 0·8131·451
1·13 0·750  0·032−0·2581·150
1·14 0·841  0·010−0·2701·140
1·15 0·744< 0·001−0·6651·329
1·16 0·912< 0·001−0·2191·199
1·17 0·736  0·002−0·3531·133
1·18 0·824< 0·001−0·1271·251
1·19 0·702  0·002−0·2991·200
1·20 0·728< 0·001 0·1431·212

The estimate of the influence of each variable on the CSI(2) suggested that the four most important variables were 1·12 (gentoo penguin breeding success), 1·15 (fur seal foraging trip duration), 1·1 (macaroni penguin meal mass) and 1·17 (fur seal pup growth), respectively. These covered all three species and include variables that reflect the behaviour, foraging success and breeding success of the species that were monitored. However, in general they also represented those variables with the most complete times series (Table 2). The influence exerted by variables 1·15, 1·1 and 1·17 was also opposite to variable 1·12. Whereas variables 1·15, 1·1 and 1·17 tended to make the index more negative, variable 1·12 tended to make it more positive. This contrast was not due to low correlation between variable 1·12 and the CSI(2) (Table 3).

There appeared to be little congruence between the results of the jackknife correlation and the influence function except that values which were insignificant (P > 0·05) in the jackknife correlation also had relatively low absolute values of the influence statistic bh. However, low absolute values of bh did not necessarily also have low or insignificant values in the jackknife correlation. There was also little congruence between the influence function and the principal components analysis. For example, fur seal foraging trip duration (variable 1·15, Table 1a) was the second most important vector based upon the influence function (Table 3) but made only a modest contribution to the first two principal components of the covariance matrix.

Fig. 5 shows the CSI calculated separately for each species and demonstrates the reason why variables from all species had a high level of influence on the overall CSI. Overall, there was a high degree of congruence in the pattern of variability among species, especially in years of extreme values. Only in 1984 did there appear to be a large divergence in the response of one species (gentoo penguin) from the other two species.

Figure 5.

The combined standardized index, CSI(2), plotted against year for the three species examined in this analysis. The time-series of data from Antarctic fur seals did not begin until 1983.

The first principal component of the smoothed covariance matrix for the summer data explained 48% of variation in the data and the first five principal components explained 86% of the total variation (Table 4). Based upon those eigenvector coefficients with absolute values > 0·2, which are underlined in Table 4, all the species made important contributions to both of the first two principal components. Comparing PC1 and PC2, variables 1·6 and 1·14 (macaroni breeding success and frequency of krill in the diet of fur seals) both made important contributions to PC1 but had almost no influence on PC2. Variable 1·8 (frequency of fish in the diet of gentoo penguins) was important in both PC1 and PC2. Variables 1·2 and 1·4 (frequency of fish and krill, respectively, in the diet of macaroni penguins) were particularly characteristic of PC2 when compared with PC1 (Table 4). Therefore, it appears that the main variables separating PC1 and PC2 involve contrasts in the diet of the three species. This may reflect the idea that, in this group of predators, fish replaces krill in the diet when krill are scarce (Everson et al. 1999).

Table 4.  The first five principal components (PC) of the smoothed dispersion matrix from summer predator performance data. Each variable in Table 1 is matched with the corresponding eigenvector coefficient. The percentage (%) of variation explained by each eigenvector is given. Underlining has been used to highlight those coefficients that contributed most to PC1 and PC2 defined as those with eigenvector coefficients > 0·2
PCEigen- value%Eigenvector coefficients
Macaroni penguinGentoo penguinAntarctic fur seal
110·0647·70·21 0·07 0·12−0·05 0·12  0·66−0·09−0·46 0·04 0·14 0·00 0·07−0·19 0·31 0·100·08 0·080·24 0·09 0·09
2 3·5616·9−0·12 0·270·32 0·43 0·34−0·03−0·170·26−0·12 0·03 0·02−0·03−0·40−0·04−0·050·25−0·190·16−0·33−0·01
3 1·92 9·2 0·27 0·15−0·01−0·24−0·23−0·16−0·05 0·07−0·08 0·03−0·04 0·37−0·47−0·43 0·270·23 0·050·24 0·15−0·02
4 1·46 7·0 0·28−0·06−0·05−0·19−0·21−0·23−0·01 0·02−0·16−0·05 0·20−0·25−0·17 0·50−0·260·26−0·370·07 0·11−0·29
5 1·12 5·3 0·24 0·26 0·02−0·21−0·13  0·08 0·27 0·00−0·07−0·14−0·28 0·41 0·26 0·02−0·220·06−0·200·08−0·54 0·08

In order to reduce the analysis to a smaller number of variables, we used the variables which made the largest contribution to each principal component (when the absolute value of the eigenvector coefficient > 0·2) to recalculate the CSI(2) for PC1, PC2 and PC3. The values for CSI(2) calculated using these important variables are plotted in Fig. 4b. This showed that the main variables contributing to the first three principal components both tracked the overall CSI(2). Of the three significant extreme values PC1 was least congruent with the one that occurred in 1994, but it suggested a large positive anomaly in 1996. The other two principal components did not recognize the significant negative anomaly in 1984. Therefore, it appears that one reason for the division between principal components may be their differing ability to detect large anomalies.

Relationship between the csi and krill abundance

Eleven independent measurements of krill biomass were available from the summer season and from the vicinity where the predators foraged (Brierley et al. 1999). These were obtained from hydroacoustic surveys made simultaneously with the measurements of predators in the vicinity of Bird Island, although they were made over a shorter time period (4–6 weeks) than the whole summer season (16 weeks). There was a significant positive, non-linear relationship between krill biomass and the CSI(2) for the summer data set from the predators (Fig. 6a). A linear model fitted to the same data was not significant (adjusted r2 = 0·423, P > 0·05). Although both exponential and three-parameter power functions gave significant fits to the data, we found that a three-parameter hyperbolic function provided the best fit to the data. There was no significant linear or non-linear relationship between the CSI(2) from the most important variables (coefficient > 0·2) in the first principal component and krill abundance (Fig. 6b). However, there was a significant non-linear relationship between the most important variables in PC2 and krill abundance (Fig. 6c), whereas a linear regression fitted to these data was not significant (adjusted r2 = 0·353, P > 0·05).

Figure 6.

The relationship between the CSI(2) and krill biomass estimated from acoustic survey in the region around Bird Island (Brierley et al. 1999) (a). The same relationship is illustrated in (b) and (c) for the CSI(2) calculated using the most important variables contributing to the 1st (a) and the 2nd (b) principal components of the covariance matrix for the complete summer data set. Note that, when Brierley et al. (1999) provided a choice, biomass for the region closest to Bird Island was used. The Marquardt–Levenberg method was used to fit the non-linear models to each relationship.

CSI(2) was significantly related to krill abundance for all three species considered individually (Fig. 7). Again, a non-linear asymptotic model (hyperbolic function) gave a better fit to the data than either a linear model or a power function in each case. Of the three species examined, the response of macaroni penguins showed the tightest relationship with krill abundance measured in hydroacoustic surveys.

Figure 7.

The relationship between the CSI(2) and krill biomass estimated from acoustic surveys around Bird Island (Brierley et al. 1999) and the CSI. The CSI was calculated separately for Antarctic fur seals (a), macaroni penguins (b) and gentoo penguins (c). The Marquardt–Levenberg method was used to fit the nonlinear models to each relationship.


The advantages of combining variables as a CSI are that it permits extension of the multivariate time-series in cases where there are missing values and it can integrate biological variability across space and time scales which may be impossible using a single variable. However, this integration can also potentially blur important subgroupings within the data in which, for example, one group of variables has a different functional response to environmental forcing factors to others or where there are multiple forcing factors that act differently upon different groups of variables. The most obvious natural subgroupings of variables was by species, but the analysis showed that the overall pattern of variability in the CSI for summer variables drew roughly equally from the three species. It also showed that there was no particularly strong effect of variables of a specific type because those representing diet composition, demography and behaviour all provided important information. This reflects the high correlation between variables, an effect that was also observed by De la Mare & Constable (2000).

Method for creating an index

All the methods for calculating an index produced an index that was sensitive to missing values and there was relatively little to choose between the methods. However, where there was a low level of correlation between variables, the mean vector method appeared to show greater potential variability in response to missing values than the other two methods. The most important feature of data sets that appeared to alleviate the effects of missing values was the presence of high levels of correlation between variables. This was a characteristic of the empirical data set examined in the present study. Nevertheless, relatively high levels of correlation can bring with it statistical problems in that the lack of independence implied by correlations suggest that the CSI is likely to be weighted by the selection of variables to be included.

The CSI(2) index, which uses all the available data in the matrix, would appear to have the greatest utility for calculating indices of environmental variability by combining time-series vectors that have differing durations. This is because it appeared to be generally less sensitive to the presence of missing values in the data set than the CSI(1) method, which is based upon only the complete rows in the data matrix. The ability to perform principal components analysis on the covariance matrix when using this method means that it has greater utility for carrying out multivariate analyses than the mean vector method.

Assessing the relative contribution of different data vectors

The jackknife correlation analysis and the influence function illustrate one-dimensional approaches to examining the contribution made to the CSI by different variables. The lack of congruence between these approaches and the principal components (PC) analysis probably reflects the greater number of dimensions taken into account in the PC analysis. The influence function shows how much individual variables contribute to the CSI but the PC analysis shows that more of the variability in the CSI is contained within a range of variables. The contrast between the results derived from these methods for assessing the importance of different variables could also have resulted from non-linear responses in individual variables. Since fur seal foraging trip duration has a highly non-linear response to variability in environmental conditions (Boyd 1999), it is possible that the influence function may provide a more robust representation of variables that have non-linear responses to environmental variation.

Overall, as suggested by Brooks (1994), the importance of a particular variable may be assessed most satisfactorily by a combination of principal components analysis and influence functions. In the case of the example given in this study, it was important to use both forms of analysis to assess the importance of a variable.

Interpretation of the csi and principal components

The CSI representation of variability within a biological system is likely to represent the response to the physical and biological forcing factors operating on the measured components of the ecosystem. The relationship between the summer CSI and krill abundance suggests that the overall performance of these predators during the breeding season was, at least in part, driven by food availability. Although an effect of changes in krill abundance on predator performance has been recognized for some time within this group of predators (Croxall et al. 1988; Croxall, Reid & Prince 1999), it has not been quantified previously and this is the first characterization of a functional response between penguins or seals and their food supply. The non-linear form of the functional response is typical of a Holling type II functional response (Gurney & Nisbet 1998). This non-linearity of functional response has the effect of exaggerating the response to low levels of krill abundance which produced the extreme responses observed in 1978, 1984 and 1994 (Fig. 4b).

The present analysis allows a closer examination of those variables that contribute most to the relationship between predator foraging and reproductive performance and krill abundance. While the important variables within PC1 showed there was no significant relationship between krill abundance and the CSI resulting from using them alone, we found that the important variables in PC2 did have a significant relationship with krill abundance. This suggests that there may be some groups of variables that are more sensitive than others to changes in krill abundance. In the present study the composition of the diet was, not surprisingly, indicative of krill abundance. Those summer variables that were aligned with PC1 and that had positive effects (shown by positive eigenvector coefficients in Table 4), were generally those variables that indicated growth or breeding success in macaroni penguin and Antarctic fur seal offspring. Conversely, those represented within PC2 were generally those related to diet. Therefore, the principal components analysis appeared to have the capability of distinguishing which variables indicate specific processes.

A similar effect may also have been present in the comparison between the summer and the population indices (Fig. 4a). The principal components analysis of the complete data set involving both the summer and population variables did not show any strong distinction between these two groupings of variables, but the apparent trend towards congruence in the two groups of variables may be indicative of an ecological process. While the summer variables are likely to represent responses to the food availability within summer seasons, the population variables, which showed no significant relationship with summer krill abundance, are likely to represent the accumulated effects of long-term change. It is possible that the increased congruence of the two sets of variables during the second half of the time-series represents the effect of short-term variations in the availability of food in summer having longer-term effects upon the populations of the predators and their breeding schedules.


Many of the time-series measured in the present study were first established in order to use upper trophic level predators in pelagic marine food chains to monitor environmental variability (Croxall & Prince 1979; Agnew 1997). Monitoring of this type has been applied in several marine food chains (Cairns 1987; Montevecchi 1993; Ainley, Sydeman & Norton 1995; Monaghan 1996) but coordinated monitoring of top marine predator performance is operational only in the Southern Ocean (Agnew 1997; Constable et al. 2000). This has the objective of providing information about potential effects of fisheries on predators that are largely dependent on krill. This study has demonstrated that it is possible to combine time-series of variables that have a wide range of distributional characteristics and varying degrees of independence to provide a realistic view of ecosystem variability and long-term trends in the status of an ecosystem. The problem of missing data in these types of analyses presents some difficulties but we have shown that this can be tackled by using appropriate sensitivity analyses and by measuring variables that tend to be correlated through time. This also demonstrates the value of ‘redundancy’ within a monitoring programme in order to cushion subsequent analyses against the loss of data in critical variables.

Through this analysis we have shown that the marine ecosystem at South Georgia is subject to periodic fluctuations. Based upon the analyses of the responses of upper trophic-level predators these fluctuations would appear as extremes. However, we have shown that these extremes are likely to be exaggerated because of the non-linear functional response between the predators and their food supply. We have also shown that there is little evidence of a long-term trend in the status of the marine ecosystem at South Georgia. Apparent changes in predator populations are as likely to be indicative of processes other than those driven by ecosystem change.


The data used in this study were collected over many years by staff of the British Antarctic Survey. This involved enormous dedication and we are grateful to all our past and present colleagues for their efforts and for the opportunity to participate in this long-term effort. We thank in particular Dr A. Constable, Prof. J.P. Croxall and K. Reid for their comments and criticisms of early drafts. We also thank Dr W.K. de la Mare and other members of the CCAMLR Statistics Working Group whose discussions were the stimulus for much of the current study.

Received 3 July 2000; revision received 3 April 2001


Appendix I

Calculation of combined standardized indices

Data within each vector i, representing different biological variables measured in each year s, where the time step in this case is 1 year, were transformed to approach normality (Table 1) and standardized to a mean of 0 and a standard deviation of 1. The standardization and transformation was necessary in order reduce biases that would result from combining different data types (e.g. proportions, frequencies or rates) and also from differences in the relative magnitudes of the values using different units of measurement in each vector. Transformations help to normalize the distributional characteristics of the data. The sum of the parameters in year k is given by:


where xk is the vector of observed values from the data matrix X in year k and ykT is an equivalent vector from Y in which missing data in X are given a value of 0 in Y and when data are present in X the equivalent value in Y is set to unity. All of the n data vectors have length m years. The mean vector is therefore defined by:


The sum of each vector is a scalar and the variance of the sum for a positive definite matrix would be:


where S is the correlation matrix for all the n variables. Note that in this study all of the variances were positive. The combined standardized index (CSI) for each year is therefore:


However, all the eigenvalues of S must be positive to satisfy the requirement that the matrix is positive and definite (or semidefinite). This condition was rarely satisfied when there were missing values in X. It was therefore possible to develop indices from covariance matrices derived by only using the portions of the data matrix which had no missing values. This computation was used to derive CSI(1). In order to derive CSI(2), which made use of the complete data matrix, we applied a computationally updated version of the algorithm proposed by Huseby, Schwertman & Allen (1980) to compute the mean vector and covariance matrix for incomplete multivariate data. This followed the algorithm outlined by Sparks & Todd (1973) but applied routines in ANSI C from Vetterling et al. (1996). In brief, this procedure involved reduction of the correlation matrix S to tridiagonal form and eigenvalues and eigenvectors were then calculated using the Jacobi procedure (Stewart 1998). The correlation matrix was checked for being at least positive semidefinite (defined by all the eigenvalues being positive) and, if it was not, then the Schwertman & Allen (1981) smoothing procedure was applied. In effect, this sets all negative eigenvalues to a small positive value and recalculates the covariance matrix from:


where λi’s were the eigenvalues and βi’s were the associated eigenvectors of S. Using simulated data matrices, comparisons were made between the estimates of CSIs with or without a complete data matrix. The resulting eigenvalues and eigenvectors from the revised covariance matrix S* were used to provide a principal components analysis.

Confidence limits on the estimated combined standardized index

We wished to estimate the probability that a particular CSI could occur by chance. This was performed by reconstructing the data matrix using randomization tests (Manly 1991). This involved the random resampling of the individual data vectors. Each of the data vectors which was used to construct the CSI had been transformed to approach normality (Table 1). We used two resampling methods to assess the probability of each CSI occurring by chance. One method resampled with replacement from the residuals of the empirical distribution of values within each data vector and the other resampled from the residuals of a normal distribution. Random selection was carried out within vectors and followed a resampling scheme that took into account potential serial dependence in each time-series (Davison & Hinkley 1997). Therefore, each data vector was reconstructed while still maintaining the distribution of missing values defined by Y. This gave a new data matrix and a recalculation of zt for each iteration. To preserve the autocorrelation structure of zt, we selected only those time-series of zt in which there was a significant (P < 0·05) fit to the autocorrelation structure of the original time-series. Therefore, confidence intervals were based upon 10 000 of these selected simulations with congruent autocorrelation structures.

Influence function

In order to examine the relative contribution of each data vector to the CSI, an influence function was calculated using a non-parametric estimate of bias attributable to each of the parameters measured. This involved the use of a jackknife procedure (Davison & Hinkley 1997) in which the empirical influence (lk) of a variable on the CSI was approximated by:


where CSIs* was the estimated CSI calculated with xi omitted from the data where i = 1 to N. The jackknife approximation of bias (bi) and its variance (νi, Davison & Hinkley 1997) are given by:


Simulated data set

Based on an observation at time t being Yt, then data vectors were simulated from the autoregressive function:


where α1, α2 to αm are the autoregressive coefficients and Wt was a random term which was independent of the previous values in the series and followed a normal distribution with mean 0 and standard deviation 1. For the purpose of the present study, µ, the mean of the series was also set to 0. Each autoregressive coefficient was chosen randomly from a beta distribution β(2,4). A seed value for each time-series was defined as coming from a normal distribution with mean 0 and standard deviation 1. This resulted in production of relatively uncorrelated time-series (Fig. 1a). Time-series with higher levels of correlation were produced by selecting values of α from an initial set of values chosen randomly from the beta distribution described above. For each vector these were modified by adding random variation with a standard deviation of ± 0·2 (Fig. 1b).