Dynamic species distribution models from categorical survey data

Authors


Summary

  1. Species distribution models are static models for the distribution of a species, based on Hutchinson's niche concept. They make probabilistic predictions about the distribution of a species, but do not have a temporal interpretation. In contrast, density-structured models based on categorical abundance data make it possible to incorporate population dynamics into species distribution modelling.
  2. Using dynamic species distribution models, temporal aspects of a species' distribution can be investigated, including the predictability of future abundance categories and the expected persistence times of local populations, and how these may respond to environmental or anthropogenic drivers.
  3. We built density-structured models for two intertidal marine invertebrates, the Lusitanian trochid gastropods Phorcus lineatus and Gibbula umbilicalis, based on 9 years of field data from around the United Kingdom. Abundances were recorded on a categorical scale, and stochastic models for year-to-year changes in abundance category were constructed with winter mean sea surface temperature (SST) and wave fetch (a measure of the exposure of a shore) as explanatory variables.
  4. Both species were more likely to be present at sites with high SST, but differed in their responses to wave fetch. Phorcus lineatus had more predictable future abundance and longer expected persistence times than G. umbilicalis. This is consistent with the longer lifespan of P. lineatus.
  5. Where data from multiple time points are available, dynamic species distribution models of the kind described here have many applications in population and conservation biology. These include allowing for changes over time when combining historical and contemporary data, and predicting how climate change might alter future abundance conditional on current distributions.

Introduction

Species distribution models are static, probabilistic models of the relationship between the distribution or abundance of species and environmental variables (Guisan & Zimmerman 2000). Often, they predict the probability of presence of a species from presence/absence data using logistic regression. Such models rely on the Hutchinson niche concept (Guisan & Thuiller 2005), in which niche space is defined as a real space whose axes are environmental variables. The Hutchinson fundamental niche is the set of points in this niche space at which the species is able to persist indefinitely, and is bounded by the set of points on which the growth rate is zero (Hutchinson 1957). Any point in physical space maps to a point in niche space (Colwell & Rangel 2009), although a point in niche space may correspond to one, more than one, or no point in physical space. If the boundary of the Hutchinson niche is determined by experimental or observational measurements of population growth rate (e.g. Birch 1953; Doak & Morris 2010), then the set of points in physical space that map to points inside the Hutchinson niche can be determined, and may represent a likely distribution for the species.

More often, presence/absence and environmental data are recorded from points in physical space, and used to build a species distribution model which predicts the probability of presence at points in niche space. It is generally assumed that points at which a species occurs are contained in a subset of the Hutchinson fundamental niche known as the realized niche (Guisan & Thuiller 2005), with caveats discussed below. The resulting predictions can then be mapped back to points in physical space. The output of such a model is probabilistic for several reasons. First, a species may be absent from suitable sites due to chance events or dispersal limitation. Second, a typical model is based on the projection of niche space onto a relatively small number of environmental variables, so that an apparently suitable site may have an unsuitable value of an unmeasured variable. Third, a species may be present in unsuitable sites as a result of dispersal from nearby suitable sites.

The output of a species distribution model is static, in the sense that it has no explicit temporal interpretation. A predicted probability of presence can be interpreted as the probability that a species will be present at a randomly-chosen site having specified environmental conditions, but not as the expected proportion of time for which the species will be found at a single site. In addition, such models cannot make conditional predictions, such as the probability that a species will be present at a site in the future, given that it is present now. Thus, although ideas about both spatial distributions and population dynamics underpin the Hutchinson niche concept, species distribution models have retained only the spatial component of this concept.

One reason for the relative neglect of population dynamics in this context is that it is very difficult to build a plausible model for environmental effects on population dynamics. A plausible model needs to account for both density dependence and stochasticity, and how they are affected by the environment. Developing and fitting such a model requires a great deal of biological knowledge, and large amounts of data on how abundances change over time under a range of environmental conditions. These data are rarely available. However, a new class of density-structured models makes it easier to include population dynamics in species distribution models. Density-structured models are structured population models in which the population is structured by density, rather than by age or life stage (Taylor & Hastings 2004; Freckleton et al. 2011), in the same way that multivariate abundance data are sometimes discretized into community states (e.g. Johnson 2005). Such models can be based on categorical estimates of abundance (e.g. ‘not seen’, ‘rare’, ‘occasional’, …), which are faster and cheaper to obtain than precise estimates of abundance. This allows larger numbers of sites to be surveyed for the same effort. For example, Queenborough et al. (2010) estimated the probabilities of transitions between ordered categorical abundance classes for arable weeds from surveys of 500 agricultural fields over 3 years. Density dependence is built into these models in a crude way: the probability of a transition into a given destination class can be different for each source class, and can therefore approximate arbitrary relationships between abundance and population growth (Freckleton et al. 2011). Such models are also stochastic: both environmental and demographic stochasticity may contribute to the estimated transition probabilities. Lowe et al. (2011) described an extension of these models in which transition probabilities are simple parametric functions of environmental variables. Since only categorical abundance data are needed, it becomes possible to survey large numbers of sites with a range of environmental conditions, and thus to study how these conditions affect population dynamics. Density-structured models with environmental explanatory variables can be used to construct dynamic species distribution models, whose predictions are conditional probabilities of future abundance categories, given current abundance categories. There are very close parallels between these models and dynamic occupancy models, used to describe changes in the occupancy of sites over time (e.g. Erwin et al. 1998; Royle & Kéry 2007; MacKenzie et al. 2003, 2006, 2009). In fact, it is only terminological differences that divide the occupancy literature from the species distribution model literature.

The stochasticity in dynamic species distribution models has an explicit temporal interpretation. In particular, the stationary probability of presence under given environmental conditions can be obtained. This stationary probability can be interpreted both as the probability that the species will be present at a randomly-chosen site with the specified conditions, and as the expected proportion of time for which the species will be present at a particular site with these conditions. The dynamic nature of these models means that they can be used to investigate temporal features of population dynamics. For example, there are plausible mechanisms which could lead to either higher or lower temporal variability close to the limits of a species' range (e.g. Williams, Ives & Applegate 2003). From a density structured model, it is possible to calculate the normalized entropy (Hill, Witman & Caswell 2004), which measures uncertainty in abundance category one time step in the future for a randomly-chosen site with given environmental conditions. This can be used to examine the relationship between unpredictability of abundance (on a categorical scale) and the suitability of a site. In addition, it is likely that populations at some sites will persist for a long time, while others will quickly go extinct. The expected persistence time at a site can be determined from a density-structured model.

Here, we develop density-structured population models for two intertidal invertebrates, the gastropods Phorcus lineatus and Gibbula umbilicalis, based on large amounts of categorical abundance data from the same sites in consecutive years. We use these models to obtain the predicted probability of presence for each species under a range of environmental conditions (mean winter sea surface temperature and wave fetch), and to examine how normalized entropy and expected persistence time change with environmental conditions. We explain how these models can be viewed as dynamic species distribution models.

Materials and methods

Study Species

Phorcus lineatus and G. umbilicalis (da Costa, 1778) are closely-related prosobranch gastropods whose distributions extend along the western seaboard of the Atlantic. The MarClim project has documented rapid extensions of the northern range edges of both species in the UK since the onset of global climate warming in the mid-1980s, with rates of up to 50 km per decade being some of the fastest recorded in any system globally (Mieszkowska et al. 2006, 2007). The current northern range limits are Portballintrae (Northern Ireland), Great Orme (North Wales) and Kimmeridge (Dorset) for P. lineatus, and Murkle Bay (Scotland) and North Foreland (Thanet) for G. umbilicalis (Crisp & Southward 1958; Lewis 1964; Garwood & Kendall 1985; Kendall & Lewis 1986; Mieszkowska et al. 2007; Mieszkowska 2012, J. Nunn, National Museums Northern Ireland, pers. comm.).

Phorcus lineatus and G. umbilicalis co-occur on rocky shores across much of their geographic ranges. They have overlapping vertical distributions, with P. lineatus occupying open rock, boulder and crevice habitats in the high to midshore and G. umbilicalis being most abundant in rockpools, cracks, under boulders and amongst fucoid holdfasts in the mid to low shore. The lower thermotolerance limits for both species are thought to occur between 4 and 7 inline imageC for the juvenile lifestage, with adults going into a state of torpor around –2 inline imageC. Environmental temperatures in UK coastal waters between the 1960s and early 1980s frequently approached the lower thermotolerance window, and recruitment success was extremely low at sites across the northern third of the biogeographic range (Garwood & Kendall 1985; Kendall, Williamson & Garwood 1987). Since the onset of climate warming in the mid 1980s recruitment success has dramatically increased due to higher survival during warmer, shorter winters, and population abundances are showing an exponential increase at long-term MarClim survey sites (Mieszkowska et al. 2007). Phorcus lineatus can live between 15 and 20 years (Kendall 1987), while G. umbilicalis may live up to 10 years.

Data

Abundance data

Abundance data were recorded on the ACFOR scale (Crisp & Southward 1958), which is based on approximately logarithmic categories (Table 1) and has been used since the inception of the Marine Biological Association's long-term rocky intertidal studies in the 1950s. It facilitates rapid assessment of the abundance and distributions of a wide range of species across large geographical areas. The primary data collectors (NM and MTB) were trained and cross-calibrated by Southward in the field to ensure consistency. For both species, the Rare, Occasional, and Frequent categories did not occur very often, so were pooled for statistical analysis. Thus, our analysis was based on four ordered categories (Table 1), from 1 (not seen) to 4 (most abundant).

Table 1. The ACFOR abundance scale, and the pooled categories used for statistical analysis
CategoryDefinitionPooled category
Not seenNo individuals found1
RareA few individuals encountered in a 30 minute search2
OccasionalFewer than 1 individual per m22
FrequentFewer than 1 individual per m2, but locally sometimes more2
Common1 to 10 individuals per m23
AbundantMore than 10 individuals per m24

Four hundred and thirty-two sites spanning the English, Welsh, northwest and north coasts of Scotland (and two sites in northwest France) were surveyed between 2002 and 2010, although not every site was visited every year. Sites were surveyed at the same time every year to avoid potential seasonal influences on abundances. Because we wanted to model year-to-year changes in abundance, we included only pairs of observations at the same site in consecutive years. This gave a total of 328 pairs from 84 sites for P. lineatus (Fig. S1, Supporting information), and 382 pairs from 104 sites for G. umbilicalis (Fig. S2, Supporting information).

Sea Surface Temperature

Sea temperature exerts a strong control over species distributions at the biogeographic scale (Hutchins 1947; Southward 1958; Brown 1984; Pörtner 2001), and over the timing and success of reproduction both within populations (Kendall 1987; Mieszkowska et al., 2007) and across geographic ranges for marine intertidal invertebrates (Orton 1920; Lewis et al. 1982; Southward, Hawkins & Burrows 1995; Mieszkowska et al. 2007). Monthly mean sea surface temperature (SST) measurements were downloaded on a 4 km grid for October–April for the winters of 2002–2003 through to 2009–2010 from http://poet.jpl.nasa.gov/. Night-time measurements from the MODIS/Aqua source with minimum quality 0 (best) and the Far-IR algorithm (Physical Oceanography DAAC 2010) were used. An overall winter mean SST was calculated for each grid cell, ignoring any missing values (Fig. S3, Supporting information). The 0·2% of grid cells having non-missing monthly means for fewer than 40 out of a possible 56 months were discarded, because a small number of grid cells with very few monthly means had outlying overall winter means. For each site with abundance data, the mean winter SST was obtained for the closest grid cell in terms of great circle distance on the WGS84 ellipsoid, using the R package sp version 0.9-83 (Pebesma & Bivand 2005). For all species, at least 90% of the distances from sites to the closest SST grid cell were <12 km (large values were usually due to the coarseness of the coastline mask).

Wave Fetch

Wave exposure influences the structure of rocky intertidal communities (Ballantine 1961; Lewis 1964). Topographical indices of wave exposure can provide predictive insights into the abundance of structurally and functionally important species of invertebrates and macroalgae at a spatial scale relevant to individual shores and communities (Burrows, Harvey & Robb 2008). Wave fetch values for a 200 m coastline grid, calculated as described in Burrows, Harvey & Robb 2008, were downloaded from http://www.sams.ac.uk/michael-burrows (Fig. S4, Supporting information). For each site with abundance data, the wave fetch for the closest grid cell (measured as above) was obtained. For all species, 90% of the distances from sites to the closest wave fetch cell were <0·23 km.

Wave exposure in the UK is often measured on the Ballantine scale, which is based on descriptions of typical communities (Ballantine 1961). For comparison with this scale, the geometric mean wave fetch values in km for the sites used by Ballantine are: very sheltered 8·5; sheltered 8·7; fairly sheltered 43, semi-exposed 166; exposed 224; very exposed 676.

Model

We used a first-order Markov model for transitions among abundance categories at each site:

display math(eqn 1)

where inline image is a vector of probabilities that site m is in each abundance category at time t, and inline image is a matrix whose entries inline image are the probabilities of one-step transitions from category j to category i at site m. We modelled the transition probabilities out of each state as functions of mean winter SST and wave fetch, using a baseline-category logit model (Agresti 2002, section 7.1). We estimated the parameters using Markov Chain Monte Carlo, and carried out a range of checks on convergence and model adequacy. Full details are given in the Appendix S1 (Supporting information).

Our approach is closely related to loglinear models for transition probabilities among species present at a point in space (Hill, Witman & Caswell 2002; Nelis & Wootton 2010), the only difference being that we work with continuous rather than categorical explanatory variables. A hierarchical model with site-specific constant terms might explain additional variability in the observations. We did not pursue this approach because for many sites, there were no or few observed transitions out of some categories, due to relatively stable abundances. In such cases, the site-specific constants would be unidentifiable or only poorly constrained by the data. Models for multistate occupancy data over multiple seasons (MacKenzie et al. 2009) are equivalent to the ones described here, except that they also include estimates of observation error (which is not identifiable in our data). For example, MacKenzie et al. (2012) modelled transitions between absence, presence without reproduction, and presence with reproduction in potential territories of the California spotted owl as functions of the Southern Oscillation Index, precipitation, and estimated energy expenditure by owls. Static species distribution models based on abundance categories are also closely related (Burrows, Moore & Hawkins 2006; Burrows 2012). In such models, ordinal logistic regression is used. Even though our categories are ordered, the transition probabilities out of a given category will not necessarily follow the same order, and so ordinal models may not be appropriate for transition probabilities. One important difference between static and dynamic models is that a static model cannot describe temporal dependence among observations, and so cannot make full use of data such as ours, in which there are observations at the same sites from multiple time points.

Summary Statistics

Three summary statistics of ecological interest were calculated. The stationary probability of presence can be interpreted either as the long-run proportion of time for which a species will be present at a particular site m, or as the proportion of sites with the same environmental conditions as site m at which the species will be present at a particular time. This can be thought of as a measure of site suitability. The normalized entropy (Hill, Witman & Caswell 2004) measures the uncertainty about the abundance category 1 year in the future, given that the current abundance category is drawn at random after the model has been running for a long time. The expected persistence time is the expected time until the first visit to the absent category, conditional on currently being in a non-absent category (drawn at random after the model has been running for a long time). We calculated each of these statistics on a 10 × 10 grid of values of mean winter SST and wave fetch, equally spaced over the observed range. We plotted contours of stationary probability of presence and expected persistence time directly on this grid. We plotted normalized entropy against stationary probability of presence for values calculated on this grid, because our main interest was in how predictability of dynamics changed with site suitability. Technical details are in the Appendix S1 (Supporting information).

Stationary probabilities of presence were projected onto a map of England, Wales, and Scotland. For every 10th point on the 200 m wave fetch grid described above, the closest SST value was selected. Both species require hard substrate, so any points without hard substrate (according to the MAGIC database, http://magic.defra.gov.uk/) were discarded. The posterior mean stationary probability of presence for each remaining point was estimated by linear interpolation on the 10 × 10 grid of combinations of these variables. Combinations outside the range of explanatory variables used to develop the model were discarded rather than extrapolated (but most such combinations were in areas without hard substrate, so this had little effect on the maps).

Results

Environmental Conditions and Abundance Categories

Mean winter SST was between 8·3 and 11·2 inline imageC, and wave fetch was between 2·5 and 1349 km. Sites were distributed fairly evenly within these ranges, except that there were no sites with low mean winter SST and high wave fetch (points on Figs 1a and 2a: the highest latitude sites, in Scotland, were predominantly on the moderately exposed to sheltered coastline typical of the region). For both species, there was an apparent tendency for sites with high mean winter SST to have higher modal abundance category (darker points on Figs 1a and 2a) than sites with low mean winter SST. The effects of wave fetch on modal abundance class were not obvious on inspection.

Figure 1.

Observed distribution and stationary probability of presence for Phorcus lineatus as a function of mean winter sea surface temperature (inline imageC) and wave fetch (km). On (a), each point is located at the values of mean winter sea surface temperature (inline imageC) and wave fetch (km) for a single site, and its colour represents the modal abundance category for the species at that site (darker colours are higher categories). Where more than one category had the modal frequency, the highest category is shown. Contours are posterior mean stationary probability of presence. (b) Posterior standard deviation of stationary probability of presence.

Figure 2.

Observed distribution and stationary probability of presence for Gibbula umbilicalis as a function of mean winter sea surface temperature (inline imageC) and wave fetch (km). On (a), each point is located at the values of mean winter sea surface temperature (inline imageC) and wave fetch (km) for a single site, and its colour represents the modal abundance category for the species at that site (darker colours are higher categories). Where more than one category had the modal frequency, the highest category is shown. Contours are posterior mean stationary probability of presence. (b) Posterior standard deviation of stationary probability of presence.

Observed Transition Counts

For both species, the raw transition counts (Tables 2 and 3) revealed a lot about dynamics. First, the diagonal elements of the transition count matrices were almost always the largest in each column. Thus, pooled over all sites and times, both species remained in the same abundance category from year to year more often than they changed categories. This is consistent with the long lifespans of both species [15–20 years for P. lineatus (Kendall 1987), up to 10 years for G. umbilicalis] relative to the annual time scale of the model. The only exception was transitions out of category 3 (Common) for G. umbilicalis, where there were 26 observations of persistence in category 3, and 35 of a transition to category 4 (Abundant). Second, elements close to the diagonal were generally larger than those far from the main diagonal. Thus, when changes in abundance occurred, they usually involved moving to an adjacent category, rather than a sudden population explosion or crash. Third, the elements below the diagonal were almost always larger than the corresponding elements above the diagonal. Thus, pooled over all sites and times, populations of both species generally moved upwards from a given abundance category more readily than they moved downwards. The only exceptions were that for G. umbilicalis, transitions from 3 (Common) to 2 (Rare, Occasional, or Frequent) were more frequent (20 observations) than those from 2 to 3 (18 observations), and transitions from 4 (Abundant) to 3 (Common) were more frequent (40 observations) than those from 3 to 4 (35 observations).

Table 2. Observed transition counts for Phorcus lineatus, pooled over all sites and times. Source states are columns, destination states are rows. Categories are: 1, Not Seen; 2, Rare, Occasional, or Frequent; 3, Common; 4, Abundant, as defined in Table 1
 Source
1234
Destination
149600
21441146
32211911
41817119
Table 3. Observed transition counts for Gibbula umbilicalis, pooled over all sites and times. Source states are columns, destination states are rows. Categories are: 1, Not Seen; 2, Rare, Occasional, or Frequent; 3, Common; 4, Abundant, as defined in Table 1
 Source
1234
Destination
1181250
21327206
37182640
401235144

Stationary Probability of Presence

For both species, the posterior mean stationary probability of presence increased with increasing mean winter SST (Figs 1a and 2a), and was relatively high for all the environmental conditions examined (P. lineatus, range 0·69–0·96, G. umbilicalis, range 0·68–0·98). However, the two species responded differently to wave fetch. For P. lineatus, the stationary probability of presence was generally higher at low than at high wave fetch (Fig. 1a), while for G. umbilicalis, over most of the range of mean winter SST, the stationary probability of presence was higher at high than at low wave fetch (Fig. 2a). For both species, the posterior mean stationary probability of presence changed rapidly with changes in wave fetch at low mean winter SST, but only slowly with changes in wave fetch at high mean winter SST. Thus, wave fetch is only likely to be important in determining occurrence at low mean winter SST (although there were no data from the extreme of high fetch and low SST). For both species, the posterior standard deviation of stationary probability of presence responded to mean winter SST and wave fetch in the opposite way to the posterior mean (Figs 1b and 2b). Thus, the stationary probability of presence was most certain when close to 1. Uncertainty was high for low SST and high fetch, where there were few observations. Contours of the stationary probability of each category against SST and fetch are given in the Figs S13 and S14 (Supporting information).

When projected onto a map, the posterior mean stationary probabilities of presence were strikingly different between the two species (Fig. 3). For P. lineatus (Fig. 3a), the stationary probability of presence was high at almost all sites with suitable substrate. This is because, although P. lineatus had low stationary probability of presence at sites with low SST and high fetch (Fig. 1a), there are few such sites with suitable substrate in the UK. In particular, the west coast of Scotland, with relatively low SST and low fetch, appears suitable for P. lineatus, although it does not now occur there. For G. umbilicalis (Fig. 3b), stationary probabilities of presence were generally lower in Scotland than in England and Wales. This is because G. umbilicalis has relatively low predicted probability of presence at cold sites with low fetch (Fig. 2a), and such sites are common in Scotland. Nevertheless, even the least suitable sites had predicted probabilities of presence above 0·6, consistent with observations of G. umbilicalis in Scotland.

Figure 3.

Posterior mean stationary probabilities of presence for (a) Phorcus lineatus and (b) Gibbula umbilicalis projected onto a map of England, Wales, and Scotland. Locations without hard substrate, or with environmental variables outside the ranges used to develop the models, are not plotted.

Normalized Entropy

Overall, normalized entropy was lower for P. lineatus (Fig. 4, open symbols, mean 0·39 over the entire grid of SST and fetch) than for G. umbilicalis (Fig. 4, filled symbols, mean 0·56 over the entire grid of SST and fetch). Thus, annual changes in abundance are more predictable for P. lineatus than for G. umbilicalis. This is reflected in the transition probabilities, which were generally closer to either 1 or 0 in P. lineatus (Figs S5–S8, Supporting information) than in G. umbilicalis (Figs S9–S12, Supporting information). A likely explanation is that P. lineatus, being larger and longer-lived (Kendall 1987), has a slower timescale for population dynamics than G. umbilicalis. The overall relationship between normalized entropy and stationary probability of presence differed between the two species (Fig. 4). For both species, the sharp decline in entropy at high probability of presence was determined by the stationary probability of state 4 (Abundant) and the transition probabilities out of state 4. At high SST, state 4 had a much higher stationary probability than the other states (Figs S13g and S14g, Supporting information), and a site in state 4 was much more likely to remain in state 4 than to move to a different state (Figs S8g and S12g, Supporting information). Thus, there is little uncertainty about the future state of such a site. For P. lineatus, entropy was also low when stationary probability of presence is low (Fig. 4, open symbols), at low SST and high fetch. Under these conditions, states 1 (Not seen), 2 (Rare, Occasional, or Frequent) and 3 (Common) were all moderately likely (Fig. S13, Supporting information). Sites in states 1 and 2 were much more likely to remain in these states than to change state when SST is low (Figs S5 and S6, Supporting information), so there is little uncertainty about the future state of such a site. Because no sites had a combination of very low SST and very high fetch, patterns involving this combination should not be over-interpreted (for the relationships between entropy and explanatory variables, see Figs S15 and S16, Supporting information).

Figure 4.

Normalized entropy against stationary probability of presence for Phorcus lineatus (open symbols) and Gibbula umbilicalis (filled symbols). Each point is the posterior mean for a single combination of mean winter sea surface temperature and wave fetch.

Expected Persistence Times

For P. lineatus, expected persistence times were high for high SST, and low for moderate SST (Fig. 5a). Somewhat surprisingly, expected persistence times were also high for the combination of low SST and high fetch, although with high uncertainty (Fig. S17a, Supporting information). Thus, although P. lineatus is relatively unlikely to be present at low SST and high fetch, if it is present, it is likely to persist for a long time. Back-transforming the log expected persistence times, typical persistence times ranged from around 100 years at sites with moderate SST and low fetch, to around 220 years for sites with low SST and high fetch.

Figure 5.

Posterior mean natural log persistence time for (a) Phorcus lineatus and (b) Gibbula umbilicalis as a function of mean winter sea surface temperature (inline imageC) and wave fetch (km).

For G. umbilicalis, expected persistence times were high for high SST, and low for the combination of low SST and low fetch (Fig. 5b). Uncertainty was high for the combination of low SST and high fetch, but also for high SST and low fetch (Fig. S17b, Supporting information). Typical persistence times ranged from around 12 years at sites with low SST and low fetch, to around 150 years at sites with high SST and low fetch. The lower expected persistence times for G. umbilicalis than for P. lineatus are likely to be a consequence of the shorter lifespan of G. umbilicalis.

Discussion

A dynamic species distribution model could be defined as a statistical model for short-term changes in abundance as a function of environmental variables, based on abundance data from large numbers of sites and more than one moment in time, and from which projections of long-term and large-scale aspects of distribution and abundance can be obtained. The density-structured models described here are a simple example, based on categorical abundance data.

Categorical abundance data (which could be as simple as presence/absence time series) offer some advantages for dynamic species distribution models. It is important to have data from large numbers of sites if the model is to be informative about large geographical scales, and from more than one time point if the model is to be dynamic. Collecting the detailed demographic information needed for a traditional model of population dynamics can be very slow and expensive, which may limit such models to small spatial scales (Queenborough et al. 2010). In contrast, collecting categorical abundance data is relatively fast.

If the population is not at equilibrium, a density-structured model based on categorical data will not be exactly correct, because transition probabilities will depend on the underlying true abundance distribution, which will be changing over time. In other words, the underlying process is unlikely to be lumpable in the sense of Kemeny & Snell 1960, definition 6.3.1). Nonstationary models for transition probabilities (e.g. MacKenzie et al. 2003, 2006, 2009) might be useful in such cases. However, it is important to distinguish between applications to unaggreggated and aggregated states. For example, MacKenzie et al. (2009) studied the dynamics of territory occupancy for California Spotted Owls. Each territory was classified as unoccupied, occupied without reproduction, or occupied with reproduction. It is reasonable to think of these as unaggregated states, for which nonstationary models present no difficulties. On the other hand, if aggregated states of a nonstationary process are modelled, every new time point will have a new set of transition probabilities. In addition, if different sites have different histories (as is likely in our case), then every combination of site and time will require its own set of transition probabilities. No interesting inferences can be made about the long-term behaviour of such models, and care is needed to avoid the pathological situation in which the number of parameters goes to infinity with the number of observations. Because of these difficulties, we have assumed stationarity, knowing that our results will not be exact. If the environment is changing only slowly and the process has been running for a long time, this approach may allow us to get approximate answers to questions about long-term behaviour that are outside the scope of static models. Nevertheless, it would be useful to know more about how the errors in an aggregated model depend on how far the true process is from stationarity.

The stationary probability of presence is the most obvious summary of a dynamic species distribution model. In our study, both species had higher stationary probabilities of presence at higher winter mean SST, but differed in their responses to wave fetch. In addition, the stationary probability of presence changed much more slowly with wave fetch at high than at low SST (Figs 1a and 2a). This suggests that if SST increases in future, wave fetch might become less important as a limiting factor in the distributions of P. lineatus and G. umbilicalis. For P. lineatus, the west coast of Scotland appeared to be suitable habitat, based on the stationary probability of presence (Fig. 3a). However, P. lineatus does not currently occur in Scotland (Mieszkowska 2012). Absence of suitable substrates on the north-west coast of England may have prevented P. lineatus from extending its distribution to the west coast of Scotland with recent climate warming. On the other hand, G. umbilicalis has crossed this dispersal barrier. The reason for this difference is unknown, although the faster timescale for population dynamics in G. umbilicalis may have been a factor.

The key difference between the stationary probability of presence from a dynamic species distribution model, and the predicted probability of presence from a static species distribution model, is that the former has an explicit temporal interpretation as the proportion of time a species is expected to be present at a site. With this temporal interpretation, it becomes clear that the stationary probability of presence alone does not contain all the information that might be useful in the study of a species' distribution. For example, in a dynamic model for presence/absence data, the stationary probability of presence is the ratio of the probability of colonization to the sum of the probabilities of colonization and extinction. A species with rapid turnover and a species with very slow turnover might have the same predicted probability of presence, but very different patterns of change over time. Aspects of population dynamics such as normalized entropy and expected persistence time can therefore be as important as predictions of presence/absence.

For both P. lineatus and G. umbilicalis, sites with a high predicted probability of presence had low normalized entropy. Populations at such sites were likely to be in the highest abundance category, and if in that category, to remain there. Although there were no sites with very low predicted probability of presence, any such site would be likely to be in the lowest abundance category, and to remain in that category. Thus, normalized entropy is likely to be highest at intermediate predicted probability of presence, when transitions to more than one abundance category are at least moderately likely.

Temporal variability in abundance of a species might change in populations at the edge of its range, although either increases or decreases in variability at range edges are plausible (Gaston 1990; Williams, Ives & Applegate 2003). Population variability is usually estimated from quantitative abundance data on a proportional scale (McArdle & Gaston 1995), such that, for example, a change in density from inline image to inline image is equivalent to a change from inline image to inline image. This is the natural choice because population growth is multiplicative rather than additive. If (as in our analysis) abundance categories are approximately logarithmic, then moving from any category to the next one up represents approximately the same proportional change. However, if there are relatively few categories, a population might have relatively high proportional variability, while being almost certain to remain in a single category, and thus have low normalized entropy. There may be applied situations in which knowing that a population will almost certainly remain rare is more important than knowing that it shows large proportional variability. In such situations, the crude information in normalized entropy is valuable.

Phorcus lineatus is larger and longer-lived than G. umbilicalis, and this is reflected in its lower normalized entropy and longer expected persistence times. One general consequence of such life history differences is that historical information on distributions is more useful for some species than for others. Knowing that a species was present 5 years ago is more informative about its current probability of presence than knowing that it was present 50 years ago, but there is no straightforward way to capture this idea in a static species distribution model. Dynamic species distribution models can quantify the rate at which the usefulness of historical data on distributions decays. For example, we estimated site-specific annual transition probability matrices inline image. The transition probabilities over k years are given by inline image, whose jth column is the conditional probabilities of each destination state after k years, given initial state j. The rate at which these conditional probabilities ‘forget’ the initial state and approach the stationary distribution is determined by the eigenvalues of inline image (Caswell 2001, pp. 95–96), and thus ultimately by life history. In a dynamic species distribution model, it would be possible to use both historical and contemporary data, while accounting for the loss of information over time. Another way in which dynamic information may be important for applications of species distribution models is in assessing the conservation value of sites for which we currently have data. For example, consider a set of sites at which a species is known to be present, and for which a static species distribution model is available. Some of these sites will have higher predicted probabilities of presence than others, but the knowledge that the species is present now does not modify our predictions about whether it will be present in 10 years. Dynamic species distribution models can provide this information, and thus allow a more detailed assessment of the future value of sites.

In conclusion, species distribution models are one of the major applications of Hutchinson's concept of the niche (Holt 2009). Making these models dynamic will help to bring information about population dynamics and life history back into the picture. Applications include estimating population persistence times for species of conservation importance and indicator species (including P. lineatus and G. umbilicalis), predicting the spread of invasive species, studying geographical patterns in population variability, and planning protected areas. Density-structured models for categorical data are one simple way of making species distribution models dynamic.

Acknowledgements

This work was funded by a Small Ecological Project Grant from the British Ecological Society, and by the Marine Biological Association. N.M. was supported by a Marine Biological Association Fellowship. Funding for the MarClim project was provided from Natural England, Countryside Council for Wales, Scottish Natural Heritage, Scottish Government (Scottish Executive), Defra, JNCC, The Crown Estate, States of Jersey and WWF. This work would not have been possible without Steve Hawkins, who established the MarClim project. Parts of this work were done while M.S. was a visitor at the Department of Mathematics and Statistics, University of Otago, New Zealand, and while M.S. was a Sabbatical Fellow at the National Institute for Mathematical and Biological Synthesis, an Institute sponsored by the National Science Foundation, the U.S. Department of Homeland Security, and the U.S. Department of Agriculture through NSF Award #EF-0832858, with additional support from The University of Tennessee, Knoxville. We are grateful to Daryll MacKenzie and an anonymous reviewer for constructive criticism of the manuscript.

Data accessibility

Sample locations and abundance data: held by Data Archive for Seabed Species and Habitats (DASSH), access conditions described at http://www.dassh.ac.uk/data/data_policy.html. Metadata available on the National Biodiversity Network Gateway at http://data.nbn.org.uk/datasetInfo/taxonDataset.jsp?refID=7&orgKey=11&dsType=T&dsKey=GA000445&grpType=2&.

R code: uploaded as online supporting information. Also available from http://www.liv.ac.uk/~matts/densitystructure.html

Ancillary