Effects of missing values where variables were relatively uncorrelated
In cases when there was a low level of correlation between variables (Fig. 1a), the correlation between the actual index and the observed index declined as a greater proportion of data values were excluded from the data set (Fig. 2). All the methods used to calculate the index produced similar results, although there was a suggestion that the CSI(1) method was less robust to the effects of missing values when there were few vectors in the data set (Fig. 2a,iv). In terms of the median correlation coefficient it appeared that the performance of the CSI(2) method was either similar to or better than the other two methods. The mean vector method was particularly sensitive to missing values when the number of vectors exceeded 15. In addition, the range of the correlation coefficients differed between methods. When there was a small number of vectors in the data matrix the mean vector method appeared to give the least variable results and the CSI(1) index was most variable. However, the mean vector method showed greater variability when there was a large number of vectors in the matrix (Fig. 2).
Figure 1. Frequency distributions of the correlation coefficients produced from bivariate correlation between all variables in a simulated data matrix with relatively low levels of correlation (a), relatively high levels of correlation (b) and for the empirical data matrix of variables from three upper trophic level predators measured during the summer breeding season (c).
Download figure to PowerPoint
Figure 2. Plots of the relationship between the median correlation coefficient (a) for correlations between the CSI or mean vector calculated for a data matrix with missing values and for the complete data matrix. Also shown are the 95th percentile (b) and the 5th percentile (c). Examples are shown for cases with different numbers of variables (n). The amount of missing data is represented as the percentage of points missing from the data matrix. These plots are based upon a simulated data matrix in which there was a relatively low level of correlation between the variables as defined in Fig. 1a. The data given in each panel represent results using the mean vector (dotted line), CSI(1) (dashed line) and CSI(2) (solid line) methods of calculating a combined index. There are 22 time steps in the simulated data matrix as in the empirical time-series. The correlation coefficients are for the relationship between each index from the complete data matrix and the index from the data matrix after random deletion of data to simulate the pattern of missing data in the empirical time-series. Each diagram is compiled from 1000 simulations. A grey horizontal bar shows the correlation coefficient (d.f. = 21) above which correlations were considered to be significant at P < 0·05. Wrinkles in the curves represent stochastic variability in the estimation procedure.
Download figure to PowerPoint
Irrespective of which method is used to calculate a combined index this analysis showed that, when there was a low level of correlation between variables, the index was highly sensitive to the presence of missing values and that this sensitivity changed little with increasing numbers of vectors in the data matrix. The potential range of correlations given by the 95th and 5th percentiles (Fig. 2) show clearly that there is the potential to produce a combined index which is not well correlated with the real combined time-series even when there was a relatively small number of missing values.
Effects of missing values where variables were correlated
When individual vectors within the data matrix tend to be correlated (Fig. 1b), each data vector carries information about other vectors that may have missing values. Therefore the presence of correlations between variables has the potential to mitigate the effect of missing values. We constructed data sets that had a similar degree of internal correlation to the empirical data matrix (Fig. 1b,c) using the method described in Appendix I. This showed that there was reduced sensitivity to missing values as a result of these internal correlations. Comparing Fig. 2a–c,i with Fig. 3a–c shows a marked improvement, in any particular scenario of missing values, in the median correlation coefficient for all three forms of the CSI. However, CSI(1) showed an increased range of variability and the mean vector appeared to have a greatly reduced range of variability, as judged by the changes in the 5th percentile when increasing numbers of points were excluded (Fig. 3c).
Figure 3. Plots as in Fig. 2 for simulated data matrices with 20 variables and over 22 years but where there was a relatively high degree of correlation between the variables as defined in Fig. 1b. The shaded horizontal bar shows the correlation coefficient (d.f. = 21) above which correlations were considered to be significant at P < 0·05.
Download figure to PowerPoint
In the present study, we used a time-series lasting 22 years and the main analysis using summer variables involved 20 variables. The critical value for a significant correlation at this sample size will be 0·4, which shows that, using the CSI(2) method of calculating the combined index, and based upon the 5th percentile in Fig. 2c,i the analysis will be robust to up to 40–50% of missing values, irrespective of the level of correlation between vectors. The data set of summer variables used in the present study had 39% of its values missing. Therefore, it is unlikely that the CSI(2) from the summer data set will be radically different from the true value had there been no missing values and this conclusion is strengthened by the relatively high degree of correlations between the variables (Fig. 1c).
Structure within the matrix of penguin and seal data
CSI(2) was used to examine the structure within the matrix of empirical data. The plot of CSI(2) calculated for the two subsets of the data is shown in Fig. 4a. These were represented by (1) those variables representing changes in the population size and demographics of the three species of predators and variables that are likely to have been most influenced by feeding conditions in previous years (the population time-series, Table 2b), and (2) those representing changes in the feeding conditions experienced by the three species and their breeding performance (the summer time-series, Table 2a). Whereas the first set of variables represented those indices that are likely to have been influenced by events in the past, the second set represented variables that were likely to be most representative of environmental conditions concurrent with the breeding season in each year. In only one year (1994) was there congruence of extreme values in both sets of data (Fig. 4a). The value of the CSI for the population data declined significantly with time (linear regression, Ho: slope = 0, t = 3·16, P < 0·005) whereas the value for the summer data did not decline. Lagged cross-correlations that included positive and negative lags of up to 10 years showed no significant relationships between the summer and population time-series.
Figure 4. Plots of the CSI(2) for krill predators from Bird Island using the population and summer data sets (a) and for the summer data set alone (b). The thick line in (b) represents the CSI(2) calculated using all the available data. The dashed, thin and dotted lines represent the CSI(2) calculated using only the most important variables (eigenvector coefficient > 0·2) from principal components 1, 2 and 3. Circles of different size represent different probabilities of occurrence by chance derived from a randomization test.
Download figure to PowerPoint
Randomization testing of the summer time-series provided estimated probabilities associated with the specific values of the CSI(2) index. This showed that the extreme values observed during 1978, 1984 and 1994 had probabilities of occurring by chance that were in the range 0·005–0·01 (Fig. 4b).
Considering the summer time-series, 16 of the 20 variables used in the analysis provided a significant (P < 0·05) positive correlation with the CSI(2) (Table 3). Only two of the variables had a negative correlation coefficient with the CSI(2). Both were measures of the frequency of fish in the diet of penguins (Table 3) but the correlations were not significant. All those variables that were not correlated significantly with the CSI(2) represented those variables concerned with measurement of variation in the composition of the diet.
Table 3. Jackknife correlation coefficients (r) and bias approximations (bh) for the relationship between each vector from the matrix of predator performance data from Bird Island and the CSI(2) when calculated without the specified variable in the matrix
|1·1|| 0·697|| 0·006||−0·565||1·353|
|1·3|| 0·815|| 0·004||−0·238||1·165|
|1·4|| 0·854|| 0·003||−0·279||1·154|
|1·5|| 0·711|| 0·021||−0·235||1·169|
|1·6|| 0·473|| 0·026|| 0·030||1·537|
|1·7|| 0·831||< 0·001|| 0·087||1·213|
|1·9|| 0·501|| 0·140||−0·234||1·180|
|1·10|| 0·588|| 0·096||−0·241||1·164|
|1·11|| 0·764|| 0·010||−0·219||1·169|
|1·12|| 0·730||< 0·001|| 0·813||1·451|
|1·13|| 0·750|| 0·032||−0·258||1·150|
|1·14|| 0·841|| 0·010||−0·270||1·140|
|1·15|| 0·744||< 0·001||−0·665||1·329|
|1·16|| 0·912||< 0·001||−0·219||1·199|
|1·17|| 0·736|| 0·002||−0·353||1·133|
|1·18|| 0·824||< 0·001||−0·127||1·251|
|1·19|| 0·702|| 0·002||−0·299||1·200|
|1·20|| 0·728||< 0·001|| 0·143||1·212|
The estimate of the influence of each variable on the CSI(2) suggested that the four most important variables were 1·12 (gentoo penguin breeding success), 1·15 (fur seal foraging trip duration), 1·1 (macaroni penguin meal mass) and 1·17 (fur seal pup growth), respectively. These covered all three species and include variables that reflect the behaviour, foraging success and breeding success of the species that were monitored. However, in general they also represented those variables with the most complete times series (Table 2). The influence exerted by variables 1·15, 1·1 and 1·17 was also opposite to variable 1·12. Whereas variables 1·15, 1·1 and 1·17 tended to make the index more negative, variable 1·12 tended to make it more positive. This contrast was not due to low correlation between variable 1·12 and the CSI(2) (Table 3).
There appeared to be little congruence between the results of the jackknife correlation and the influence function except that values which were insignificant (P > 0·05) in the jackknife correlation also had relatively low absolute values of the influence statistic bh. However, low absolute values of bh did not necessarily also have low or insignificant values in the jackknife correlation. There was also little congruence between the influence function and the principal components analysis. For example, fur seal foraging trip duration (variable 1·15, Table 1a) was the second most important vector based upon the influence function (Table 3) but made only a modest contribution to the first two principal components of the covariance matrix.
Fig. 5 shows the CSI calculated separately for each species and demonstrates the reason why variables from all species had a high level of influence on the overall CSI. Overall, there was a high degree of congruence in the pattern of variability among species, especially in years of extreme values. Only in 1984 did there appear to be a large divergence in the response of one species (gentoo penguin) from the other two species.
Figure 5. The combined standardized index, CSI(2), plotted against year for the three species examined in this analysis. The time-series of data from Antarctic fur seals did not begin until 1983.
Download figure to PowerPoint
The first principal component of the smoothed covariance matrix for the summer data explained 48% of variation in the data and the first five principal components explained 86% of the total variation (Table 4). Based upon those eigenvector coefficients with absolute values > 0·2, which are underlined in Table 4, all the species made important contributions to both of the first two principal components. Comparing PC1 and PC2, variables 1·6 and 1·14 (macaroni breeding success and frequency of krill in the diet of fur seals) both made important contributions to PC1 but had almost no influence on PC2. Variable 1·8 (frequency of fish in the diet of gentoo penguins) was important in both PC1 and PC2. Variables 1·2 and 1·4 (frequency of fish and krill, respectively, in the diet of macaroni penguins) were particularly characteristic of PC2 when compared with PC1 (Table 4). Therefore, it appears that the main variables separating PC1 and PC2 involve contrasts in the diet of the three species. This may reflect the idea that, in this group of predators, fish replaces krill in the diet when krill are scarce (Everson et al. 1999).
Table 4. The first five principal components (PC) of the smoothed dispersion matrix from summer predator performance data. Each variable in Table 1 is matched with the corresponding eigenvector coefficient. The percentage (%) of variation explained by each eigenvector is given. Underlining has been used to highlight those coefficients that contributed most to PC1 and PC2 defined as those with eigenvector coefficients > 0·2
|PC||Eigen- value||%||Eigenvector coefficients|
|Macaroni penguin||Gentoo penguin||Antarctic fur seal|
|1||10·06||47·7|| 0·21|| 0·07|| 0·12||−0·05|| 0·12 || 0·66||−0·09||−0·46|| 0·04|| 0·14|| 0·00|| 0·07||−0·19|| 0·31|| 0·10||0·08|| 0·08||0·24|| 0·09|| 0·09|
|2|| 3·56||16·9||−0·12|| 0·27|| 0·32|| 0·43|| 0·34||−0·03||−0·17|| 0·26||−0·12|| 0·03|| 0·02||−0·03||−0·40||−0·04||−0·05||0·25||−0·19||0·16||−0·33||−0·01|
|3|| 1·92|| 9·2|| 0·27|| 0·15||−0·01||−0·24||−0·23||−0·16||−0·05|| 0·07||−0·08|| 0·03||−0·04|| 0·37||−0·47||−0·43|| 0·27||0·23|| 0·05||0·24|| 0·15||−0·02|
|4|| 1·46|| 7·0|| 0·28||−0·06||−0·05||−0·19||−0·21||−0·23||−0·01|| 0·02||−0·16||−0·05|| 0·20||−0·25||−0·17|| 0·50||−0·26||0·26||−0·37||0·07|| 0·11||−0·29|
|5|| 1·12|| 5·3|| 0·24|| 0·26|| 0·02||−0·21||−0·13 || 0·08|| 0·27|| 0·00||−0·07||−0·14||−0·28|| 0·41|| 0·26|| 0·02||−0·22||0·06||−0·20||0·08||−0·54|| 0·08|
In order to reduce the analysis to a smaller number of variables, we used the variables which made the largest contribution to each principal component (when the absolute value of the eigenvector coefficient > 0·2) to recalculate the CSI(2) for PC1, PC2 and PC3. The values for CSI(2) calculated using these important variables are plotted in Fig. 4b. This showed that the main variables contributing to the first three principal components both tracked the overall CSI(2). Of the three significant extreme values PC1 was least congruent with the one that occurred in 1994, but it suggested a large positive anomaly in 1996. The other two principal components did not recognize the significant negative anomaly in 1984. Therefore, it appears that one reason for the division between principal components may be their differing ability to detect large anomalies.
Relationship between the csi and krill abundance
Eleven independent measurements of krill biomass were available from the summer season and from the vicinity where the predators foraged (Brierley et al. 1999). These were obtained from hydroacoustic surveys made simultaneously with the measurements of predators in the vicinity of Bird Island, although they were made over a shorter time period (4–6 weeks) than the whole summer season (16 weeks). There was a significant positive, non-linear relationship between krill biomass and the CSI(2) for the summer data set from the predators (Fig. 6a). A linear model fitted to the same data was not significant (adjusted r2 = 0·423, P > 0·05). Although both exponential and three-parameter power functions gave significant fits to the data, we found that a three-parameter hyperbolic function provided the best fit to the data. There was no significant linear or non-linear relationship between the CSI(2) from the most important variables (coefficient > 0·2) in the first principal component and krill abundance (Fig. 6b). However, there was a significant non-linear relationship between the most important variables in PC2 and krill abundance (Fig. 6c), whereas a linear regression fitted to these data was not significant (adjusted r2 = 0·353, P > 0·05).
Figure 6. The relationship between the CSI(2) and krill biomass estimated from acoustic survey in the region around Bird Island (Brierley et al. 1999) (a). The same relationship is illustrated in (b) and (c) for the CSI(2) calculated using the most important variables contributing to the 1st (a) and the 2nd (b) principal components of the covariance matrix for the complete summer data set. Note that, when Brierley et al. (1999) provided a choice, biomass for the region closest to Bird Island was used. The Marquardt–Levenberg method was used to fit the non-linear models to each relationship.
Download figure to PowerPoint
CSI(2) was significantly related to krill abundance for all three species considered individually (Fig. 7). Again, a non-linear asymptotic model (hyperbolic function) gave a better fit to the data than either a linear model or a power function in each case. Of the three species examined, the response of macaroni penguins showed the tightest relationship with krill abundance measured in hydroacoustic surveys.
Figure 7. The relationship between the CSI(2) and krill biomass estimated from acoustic surveys around Bird Island (Brierley et al. 1999) and the CSI. The CSI was calculated separately for Antarctic fur seals (a), macaroni penguins (b) and gentoo penguins (c). The Marquardt–Levenberg method was used to fit the nonlinear models to each relationship.
Download figure to PowerPoint