## Introduction

When surveying biological populations, it is increasingly common to record spatially referenced data, for example coordinates of observations, habitat type, elevation or (if at sea) bathymetry. Spatial models allow for vast databases of spatially referenced data (e.g. OBIS-SEAMAP, Halpin *et al*. 2009) to be harnessed, enabling investigation of interactions between environmental covariates and population densities. Mapping the spatial distribution of a population can be extremely useful, especially when communicating results to non-experts. Recent advances in both methodology and software have made spatial modelling readily available to the non-specialist (e.g., Wood 2006; Rue *et al*. 2009). Here, we use ‘spatial model’ to refer to any model that includes any spatially referenced covariates, not only those models that include explicit location terms. This article is concerned with combining spatial modelling techniques with distance sampling (Buckland *et al*. 2001, 2004).

Distance sampling extends plot sampling to the case where detection is not certain. Observers move along lines or visit points and record the distance from the line or point to the object of interest (*y*). These distances are used to estimate the *detection function*,* g*(*y*) (e.g., Fig. 1), by modelling the decrease in detectability with increasing distance from the line or point (conventional distance sampling, CDS). The detection function may also include covariates (multiple covariate distance sampling, MCDS; Marques *et al*. 2007) that affect the scale of the detection function. From the fitted detection function, the average probability of detection can be estimated by integrating out distance. The estimated average probability that an animal is detected given that it is in the area covered by the survey, , can then be used to estimate abundance as

where *A* is the area of the study region, *a* is the area covered by the survey (i.e. the sum of the areas of all of the strips/circles) and the summation takes place over the *n* observed clusters, each of size (if individuals are observed, ) (Buckland *et al*. 2001, Chapter 3). Often up to half the observations in a plot sampling data set are discarded to ensure the assumption of certain detection is met. In contrast, distance sampling uses observations that would have been discarded to model detection (although typically some detections are discarded beyond a given *truncation distance* during analysis).

Estimators such as eqn 1 rely on the design of the study to ensure that abundance estimates over the whole study area (scaling up from the covered region) are valid. This article focusses on *model-based* inference to extrapolate to a larger study area. Specifically, we consider the use of spatially explicit models to investigate the response of biological populations to biotic and abiotic covariates that vary over the study region. A spatially explicit model can explain the between-transect variation (which is often a large component of the variance in design-based estimates), and so using a model-based approach can lead to smaller variance in estimates of abundance than design-based estimates. Model-based inference also enables the use of data from opportunistic surveys, for example incidental data arising from ‘ecotourism’ cruises (Williams *et al*. 2006).

Our aims in creating a spatial model of a biological population are usually twofold: (i) estimating overall abundance and (ii) investigating the relationship between abundance and environmental covariates. As with any predictions that are outside the range of the data, one should heed the usual warnings regarding extrapolation. For example, if a model contains elevation as a covariate, predictions at high, unsampled elevations are unlikely to be reliable. Frequently, maps of abundance or density are required and any spurious predictions can be visually assessed, as well as by plotting a histogram of the predicted values. A sensible definition of the region of interest avoids prediction outside the range of the data.

In this article, we review the current state of spatial modelling of detection-corrected count data, illustrating some recent developments useful to applied ecologists. The methods discussed have been available in Distance software (Thomas *et al*. 2010) for some time, but the recent advances covered here have been implemented in a new R package, dsm (Miller *et al*. 2013) and are to be incorporated into Distance.

Throughout this article, a motivating data set is used to illustrate the methods. These data are sightings of pantropical spotted dolphins (*Stenella attenuata*) during April and May of 1996 in the Gulf of Mexico. Observers aboard the NOAA vessel Oregon II recorded sightings and environmental covariates (see http://seamap.env.duke.edu/dataset/25 for survey details). A complete example analysis is provided in Appendix S1. The data used in the analysis are available as part of the dsm package and Distance.

The rest of the article reviews approaches for the spatial modelling of distance sampling data before focussing on the density surface modelling approach of Hedley & Buckland (2004) to estimate abundance and uncertainty. We then describe recent advances and provide practical advice regarding model fitting, formulation and checking. Finally, we discuss future directions for research in spatially modelling detection-corrected count data.