Predicting fish species distributions
While we had detailed fish catch data for 96 commonly caught, demersal (bottom-dwelling) fish species from more than 21,000 research trawls (Figure 1A), the patchy spatial distribution of these required use of a robust interpolation procedure to provide geographically comprehensive descriptions of fish distributions. This was achieved using a statistical implementation of Boosted Regression Trees (BRT), a recently developed technique that uses stochastic gradient boosting to fit a model (Friedman et al. 2000; Friedman 2002), enabling sophisticated regression analyses of complex responses optimized for high predictive performance (Elith et al. 2006; Elith et al. 2008). This method differs from conventional regression in that, rather than fitting a single “best” model, it fits an ensemble of simple regression tree models. As a consequence, BRT draws on the strengths of regression trees, that is, their ability to handle continuous and categorical predictors while ignoring extraneous predictors, their accommodation of missing values in the predictors, and their fitting of interactions between predictors, while using boosting (the adaptive fitting of multiple models) to overcome their tendency to instability and lack of accuracy (Friedman & Meulman 2003).
Figure 1. Geographic distribution of a typical species (Mora moro). Shown are: A. The actual presences and absences in the research trawls—note the uneven geographic distribution of these. B. The predicted catch per unit effort. In A, dashed lines indicate the 200 and 1950 m depth contours, which define the limits within which analyses were performed.
Download figure to PowerPoint
To maximize the predictive performance of our BRT models, we chose environmental predictors that were functionally relevant to fish (Table S1, Leathwick et al. 2006). They included estimates of the trawl depth, temperature, and salinity at the sea floor, primary productivity at the ocean surface, and zones of ocean mixing and tidal currents. Estimates of sea floor water temperature and salinity were based on the World Oceans Atlas (Boyer et al. 2005). Estimates of suspended particulate matter, dissolved organic matter, and chlorophyll-a concentration were all derived from satellite imagery (Pinkerton & Richardson 2005). Data describing trawl distance, speed, and mesh size were included to allow standardization of catch success by taking account of differences in trawl parameters. While it would have been desirable to calculate the area swept for each trawl, lack of consistent collection of data describing net door spread, and headline height precluded this.
Because of the large number of trawls, we randomly split them into two sets—the first contained 17,000 trawls and was used for model fitting, while the second (4,314 trawls) was used solely for validation. Species catch data described the weight in kilograms of all species caught in 1% or more of trawls. Given the highly skewed (zero-inflated) distribution of the catch data, we fitted two BRT models for each species, and combined them using a delta-log-normal approach (Venables & Dichmont 2004). The first model for each species predicted its probability of catch using the presence/absence transformed data from all trawls in the training dataset, assuming binomial errors. The second was fitted to data only from those trawls in which a catch for a species occurred, and predicted the log of the catch, assuming normally distributed errors.
All regressions were fitted in R (R Development Core Team 2006) with the “gbm” library (Ridgeway 2006), and using a tenfold cross validation procedure to optimize model complexity for prediction (Elith et al. in press). The predictive performance of the final regression models was evaluated in two ways. First, estimates of predictive performance were calculated as part of the cross-validation procedure used to optimize model complexity. Second, we independently estimated the performance of all models by predicting for the evaluation data set both the probability of occurrence for each species (all trawls, n= 4,314), and its catch (trawls in which each species were caught). These predictions were then compared with the actual values using the area under the Receiver Operating Characteristic curve (AUC) statistic (Fielding & Bell 1997) for the predictions of occurrence, and the Pearson's correlation coefficient for the catch estimates.
Finally, the presence/absence and catch models were used to make environment-based predictions of the catch per unit effort for each species for 1.59 million grid cells, each of 1 km2 (Figure 1B). In making these predictions, we assumed fixed trawl parameters, that is, a trawl distance of 4.26 km, a speed of 5.92 km/h, and a codend mesh size of 75 mm. These predictions covered all of New Zealand's Exclusive Economic Zone with depths between 200 and 1950 m, including the 1.57 million grid squares for which no trawl data were directly available. Separate predictions were made of the probability and the amount of catch for each species, with the latter back-transformed with correction (Duan 1983) so that final values were in kilograms. Probability and catch predictions were then multiplied together to form one predictive data layer (kg/standard trawl) for each species.
MPA design and evaluation
In the second phase of our analysis, we used the reserve selection software Zonation (Moilanen et al. 2005, Appendix S1 in Supplementary Material) to design and evaluate a range of potential MPA configurations, based on the predicted fish distribution data layers created in the first phase of our analysis. Zonation is based on the specification of priorities and connectivity responses for biodiversity features (Moilanen 2007) rather than on setting conservation targets as for most other conservation planning methods (Sarkar et al. 2006). It is particularly suited to the analysis of very large data sets (Kremen et al. in press) and provides solutions that have both high conservation value and are well balanced with respect to representation levels, connectivity, and spatial patterns for species (Moilanen 2007).
The Zonation meta-algorithm (Moilanen et al. 2005; Moilanen 2007) starts by assuming that the full landscape is protected, and proceeds by progressively identifying and removing cells that cause the smallest marginal loss in conservation value. Removing grid cells of least conservation value first leaves the areas of highest value until last, and these areas are the most relevant for conservation. The critical part of the algorithm is the definition of marginal loss, which also allows species weighting and species-specific connectivity considerations to be applied. Here, we used the core-area definition of marginal loss (Moilanen et al. 2005; Moilanen 2007), which in simple terms embodies the following principles: (1) of two otherwise equal locations, that with a lower occurrence for the most important species is removed first; (2) assuming two otherwise equal locations, that with the occurrence of a lower-weight species is removed before that with an equal occurrence for a high-priority species; (3) assuming two identical locations with identical original occurrence levels for two different species, the one is retained that contains a species that has lost more of its distribution; (4) of two otherwise identical locations, that with higher cost is removed first. Mathematically, marginal loss in core-area Zonation is defined as
where wj is the weight of species j, pkj is the occurrence level of species j in site i, and Ci is the cost of adding cell i to the reserve network. The critical part of equation (1) is Qij(S), the proportion of the remaining distribution of species j located in cell i in the remaining set of cells, S. When a part of the distribution of a species is removed by cell removal the proportion located in each remaining cell goes up. In this manner, Zonation tries to retain high-quality core areas for all species until the end of cell removal, even if the species is initially widespread and common (Moilanen et al. 2005). Other variants of Zonation cell removal implement conservation planning based on additive value (Arponen et al. 2005) and specification of targets (Moilanen 2007), but we used core-area Zonation as it guarantees the retention of high-quality areas for all species, including those that occur in otherwise species poor areas.
All Zonation analyses used the predicted distributions of 96 fish species as their primary input. Nineteen endemic species were given higher priority in all analyses by allocating them a weight of five, while all other species were given a weight of one. An analysis of the sensitivity of outcomes to use of differential species weightings is provided in Appendix S1. To take account of the likely impacts of fragmentation on species protection provided by MPAs, we applied boundary quality penalties that allow the value of a target cell for a particular species to be reduced as cells in some surrounding neighborhood are removed (Moilanen & Wintle 2007). Losses were assessed in neighborhoods of varying size and at varying rates (Figure 2), depending on the known habits of species. For species living predominantly on the sea floor, mostly flat fish and eels, we used a 3 by 3 cell neighborhood, and a relatively slow loss of value (curve 1 in Figure 2), that is, 50% of the surrounding cells can be removed without loss of value in the target cell, but beyond this, removal of surrounding cells results in a linear decline to a value of 0.2 when all surrounding cells were removed. For species living immediately above the sea floor but caught largely as solitary individuals we used a 5 by 5 cell neighborhood and a slightly steeper loss curve (curve 2 in Figure 2). For species living above the sea floor but caught frequently as schooling aggregations we used a 7 by 7 cell neighborhood and a loss curve that declined linearly to a value of zero when all neighboring cells are removed (curve 3 in Figure 2). Finally, for the most mobile, semi-pelagic and schooling species we used a 9 by 9 cell neighborhood and a loss curve in which the value of a target cell diminishes to 20% when 50% of the surrounding cells are removed, and to zero when all surrounding cells are removed (curve 4 in Figure 2). A sensitivity analysis demonstrating the effects of these settings is provided in Appendix S1. One major algorithmic option in the software, uncertainty analysis (Moilanen et al. 2006), was not used here, but provides the capability to adjust the optimization process where greater uncertainty is associated with species predictions for some regions, for example, in the far north-east of our study area where there are only a few trawls.
Figure 2. Species-group responses describing the decline in conservation value of focal cells as cells in their neighborhood are removed from the solution.
Download figure to PowerPoint
With these commonalities, we carried out four analyses that explored conservation benefits and their costs under varying conditions:
Unconstrained or “no cost constraint” analysis.
Equal costs were used for all cells, that is, this analysis was driven solely by consideration of species distributions, connectivity and conservation value with no regard to potential costs to fishers.
“Full cost constraint” analysis.
Species weightings and boundary quality penalties were applied as in the previous analysis, but costs for grid cells varied depending on fishing intensity as recorded by fishers during the 2005 calendar year. The fishing intensity or “cost” data layer (Figure S1) was created by applying a kernel smoother with a 20 km smoothing neighborhood to the start locations of a completely independent set of 47,700 commercial trawls conducted during 2005. The resulting spatial data layer was then scaled to describe relative fishing intensity, with values ranging from zero for no fishing to 100 for maximum fishing intensity. Because Zonation requires all cost estimates to be greater than zero, we allocated nonfished grid cells a nominal value of 1.0e-6 for this analysis.
“Modified cost constraint” analyses.
Three modified cost analyses were run, each of which used differing modifications of the cost layer to alter the balance between costs for fished and nonfished cells and allowing a more comprehensive exploration of scenarios intermediate between our “no cost constraint” and “full cost constraint” scenarios. The modified cost estimates were calculated as:
where a is a parameter used to tune the influence of cost, and Ci is the true fishing opportunity cost estimate for the cell. Three analyses were carried out using modified cost layers with values for a set to 1, 2, and 5, higher values decreasing the importance given to protecting sites preferred by fishers when selecting optimal sets of sites for protection.
In a final analysis, we used Zonation to assess the costs and benefits of a set of benthic protection areas (BPAs) recently implemented at the request of fishers that provide partial protection to benthic species through the exclusion of bottom trawling (Ministry of Fisheries 2007). These areas were selected by fishers to protect a representative range of ecosystems based on a broad-scale environmental classification of New Zealand's marine environments, along with a number of particular high value sites. They avoided areas fished either currently or in the past (Seafood Industry Council 2008). Although the BPAs encompass 23.5% of New Zealand's EEZ, substantial parts of them (72.2%) are located in waters that are too deep to trawl with current technologies (> circa 2000 m). We therefore restricted our analysis to those parts of the BPAs that are in offshore waters of trawlable depths (200–1950 m), where they comprise 16.6% of the geographic area. In this replacement cost analysis (Cabeza & Moilanen 2006), we required Zonation to retain those cells falling within the BPA's until all other cells had been removed, enabling assessment of their conservation returns using the same criteria as in our other analyses.
In summarizing the conservation outcomes for each of the scenarios described above, we mostly use as a measure of performance the average percentage of species distributions protected in a given fraction of geographic area. We stress that this is an aggregate measure of performance that summarizes statistics describing the quality, extent, and spatial distributions of individual species (Moilanen 2007). Similarly, we also report the costs as a function of the fraction of the geographic area protected for each scenario: for the “full cost constraint” scenario, these costs were calculated as an integral part of the analysis, while costs for the remaining scenarios were calculated retrospectively using the fishing cost layer described above.