SEARCH

SEARCH BY CITATION

Keywords:

  • goodness-of-fit;
  • habitat model;
  • MAXENT;
  • point process models;
  • pseudo-absences;
  • resource selection function;
  • species distribution model;
  • use-availability design

Summary

  1. Top of page
  2. Summary
  3. Introduction
  4. Implications of equivalence
  5. It's not what you use, it's how you use it
  6. Which way should you apply a given method?
  7. Discussion
  8. Acknowledgements
  9. References
  10. Supporting Information

1. The problems of analysing used-available data and presence-only data are equivalent, and this paper uses this equivalence as a platform for exploring opportunities for advancing analysis methodology.

2. We suggest some potential methodological advances in used-available analysis, made possible via lessons learnt in the presence-only literature, for example, using modern methods to improve predictive performance. We also consider the converse – potential advances in presence-only analysis inspired by used-available methodology.

3. Notwithstanding these potential advances in methodology, perhaps a greater opportunity is in advancing our thinking about how to apply a given method to a particular data set.

4. It is shown by example that strikingly different results can be achieved for a single data set by applying a given method of analysis in different ways – hence having chosen a method of analysis, the next step of working out how to apply it is critical to performance.

5. We review some key issues to consider in deciding how to apply an analysis method: apply the method in a manner that reflects the study design; consider data properties; and use diagnostic tools to assess how reasonable a given analysis is for the data at hand.


Introduction

  1. Top of page
  2. Summary
  3. Introduction
  4. Implications of equivalence
  5. It's not what you use, it's how you use it
  6. Which way should you apply a given method?
  7. Discussion
  8. Acknowledgements
  9. References
  10. Supporting Information

Technological and data storage advances have led to an enormous increase in data on the location and habitat preferences of species. In this paper, we specifically focus on examples where we have data on where a species has been found, without corresponding data on where they were not located. Such a model is most commonly fitted by comparing the environmental conditions at the presence locations with those elsewhere in the study region, sometimes referred to as ‘used-available’ data or a ‘use-availability’ design (Manly et al. 2002). A model fitted in this way is commonly referred to as a resource selection function (Boyce & McDonald 1999; Manly et al. 2002, Fig. 1).

The ‘classical’ approach to resource selection function estimation is to apply logistic regression to used-available data – the response variable takes the value one or zero depending whether a point is used (a known presence) or available (typically randomly selected points). The resource selection function is then defined as the exponential function of the linear predictor from the fitted logistic regression model (Boyce & McDonald 1999; Manly et al. 2002; Johnson et al. 2006). The literature on resource selection functions has its origins over thirty years ago (Johnson 1980), and the seminal text on the topic was first published twenty years ago (Manly et al. 2002, first edition 1992).

A related topic has recently emerged in the species distribution modelling literature, on methods for analysing presence-only data (Pearce & Boyce 2006; Elith & Leathwick 2009). Presence-only data typically consist of digitized opportunistic sightings or museum records of where a species occurs, over a broad geographic scale. By coupling these data with maps of environmental variables, we can model the spatial distribution of a species using generalizations of logistic regression (Pearce & Boyce 2006) or modern methods of classification (Phillips et al. 2006; Elith et al. 2006).

Although these approaches are called presence-only methods, they rely on environmental background layers, which act as a set of ‘pseudo-absence’ points (Pearce & Boyce 2006). Typically, the response variable in analyses is an indicator which takes the value one for presence points and zero for pseudo-absences. Presence-only analysis is a relatively recent topic in the species distribution modelling literature, which has exploded in recent times – ISI Essential Science Indicators rate it as one of the fastest-moving research fronts in the environmental sciences (accessed December 2012).

This paper is a contribution of a Journal of Animal Ecology Special Feature that arose from recognition that the two problems described above – estimating a resource selection function from used-available data and estimating a species distribution model from presence-only data – are equivalent. The equivalence of these two problems has been known for some time (Ferrier et al. 2002, for example), but despite this, the used-available and presence-only literatures appear to have developed largely in parallel with little cross-fertilization of ideas. The following section (Implications of equivalence) considers the question: how can we advance one literature by leveraging from lessons learnt in the other? Then (in It's not what you use, it's how you use it), we consider more broadly the question of how to advance current practice in used-available and presence-only data modelling, in particular, the importance of thinking about how best to apply a given analysis method to the data set at hand. Finally (Which way should you apply a given method?), we review three key considerations when applying a given analysis method – study design, data properties and using goodness-of-fit tools to inform analyses.

Implications of equivalence

  1. Top of page
  2. Summary
  3. Introduction
  4. Implications of equivalence
  5. It's not what you use, it's how you use it
  6. Which way should you apply a given method?
  7. Discussion
  8. Acknowledgements
  9. References
  10. Supporting Information

In this section, we will study recent trends in each of the resource selection and species distribution modelling literatures. A particular focus will be seeking opportunities where ideas developed in one literature can be transferred to advance the other.

Trends in the use-availability literature

Resource selection studies were initially aimed at understanding which resources were selected preferentially by the study species, via comparing use and availability of discrete resources (Manly et al. 2002). With the increase of geographic information systems (GIS) and detailed data on species distributions, the approaches have slowly moved towards explaining and predicting species distributions (Boyce & McDonald 1999), and the term ‘resource’ has been interpreted more broadly; for example, topological variables such as aspect and elevation are often considered to represent resources (Manly et al. 2002).

There has been some conjecture in the use-availability literature on how a resource selection function should be interpreted and how to specify a valid data model for estimating a resource selection function. Manly et al. (2002) and Johnson et al. (2006) argue that a resource selection function computed from used-available data does not give probabilities of use, rather it is proportional to this probability (‘relative probabilities’). However, Keating & Cherry (2004) criticized logistic regression and the ‘classical’ exponential function approach to estimation (Boyce & McDonald 1999; Manly et al. 2002), arguing that the probabilities so calculated are not valid (because they can exceed one) and that they can be biased by ‘contaminated controls’ (Lancaster & Imbens 1996), that is, points designated as available may have actually been used. However, a valid model for use-availability designs can be derived as a weighted distribution (Johnson 1980; Johnson et al. 2006; Lele & Keim 2006), and slope parameters obtained using logistic regression approximate those of the weighted distribution approach. Aarts et al. (2012) noticed that the weighted distribution likelihood is identical to a type of Poisson point process likelihood, from which they deduced that the resource selection function actually models intensity (or density of observations) rather than a probability. This alleviates difficulties interpreting values that exceed one and perhaps clarifies what was previously (Manly et al. 2002; Johnson et al. 2006) intended by a ‘relative’ probability. Another important point to note is that both a point process approach and weighted distribution theory clarify the role of available points – they should not be viewed as true zeros, but as a numerical trick to approximate the integral of the likelihood function (Warton & Shepherd 2010)

When modelling used-available data, decisions on what entails used and available have to be made. It will be seen later that similar issues arise when modelling species distributions based on presence-only data. For habitat analysis, used is most often defined as the relative time spent in different habitats, in which case it is important to consider location detection bias (Nielson et al. 2009; Frair et al. 2010). Defining available is perhaps less straightforward. First of all, not all regions in space are equally accessible, and this changes when an animal moves from one place to the next. Incorrectly defining available may have important repercussions on the fitted function; for example, Beyer et al. (2010) showed that changes in the definition of available may even lead to changes in the sign of model coefficients. Several solutions have been put forward, such as only considering the habitats in the direct vicinity of the animal (e.g. using discrete choice models – McCracken et al. 1998), sampling available points based on a null model of usage, by incorporating accessibility as a covariate (Manly et al. 2002; Aarts et al. 2008), or by analysing used-available data as a movement model (Moorcroft & Barnett 2008).

The influence of resource availability on the estimated resource selection function is poorly understood. Changes in the availability of resources may lead to drastic changes in the perceived preference for those resources (Mysterud & Ims 1998), even when an organism uses the same movement rule to explore and exploit space, known as functional response in habitat use (Mauritzen et al. 2003; Matthiopoulos et al. 2011). When data can be obtained on individuals or subpopulations that experience very different environmental conditions, the effect of changes in availability on resource selection can be accounted for in population-level models. Hebblewhite & Merrill (2008) proposed a mixed-effects approach to capture variations in the resource selection function and related these to variation in resource availability. Matthiopoulos et al. (2011) extended the method to directly incorporate as covariates the available amount of each resource to each individual at each sampling time.

A theme in the original text on resource selection function estimation (Manly et al. 2002) was that of model-based vs. design-based inference. That is, one can test hypotheses and assess uncertainty in parameter estimates either using a model-based approach that assumes the model is correct (e.g. using a Z-test of a model coefficient) or using a design-based approach that makes use of independent sampling units in the study design to make valid inferences despite possible model misspecification (e.g. bootstrapping individuals to estimate the sampling error of a model coefficient). Boyce et al. (2002) and Manly et al. (2002) specifically discussed the jackknife and bootstrap, design-based methods which can be used to make inferences about resource selection functions that are robust to model misspecification. While design-based inference is on occasion implemented in the use-availability literature (e.g. Matthiopoulos et al. 2011), such methods are especially rare in presence-only analysis.

Implications for presence-only analysis

By taking the above trends and reframing them in terms of species distribution modelling, we can develop ‘new’ ideas for ways to advance current practice in presence-only modelling. Below is a summary of some subsequent opportunities for advancing presence-only analysis.

Accounting for heterogenous sampling and detection bias in presence-only records

The problem of accounting for detection bias has received some recent attention (Phillips et al. 2009; Dorazio 2012) in presence-only modelling, but ideas from the use-availability literature are worthy of consideration, for example modelling such bias via additional covariates (Manly et al. 2002; Aarts et al. 2008).

Methods for studying drivers of change in species distribution

A key concern in species distribution modelling is predicting the response of species distribution to a changing environment, whether such change is driven by habitat fragmentation, climate change or some other cause (Elith & Leathwick 2009). Fitting a species distribution model in a changing environment corresponds directly to the problem of estimating a resource selection function under changing resource availability (Matthiopoulos et al. 2011). The ideal approach would be to model species presence-only data that arose in a range of environmental conditions and to model it as a function, not just of environmental variables, but also as a function of the availability of these environments. Location-only data are sometimes collected under a broad variety of environmental conditions, for example, when the distribution of a species has been fragmented into several spatially isolated populations, or when long-term data give information on species distribution under a range of different sets of climatic conditions. Such circumstances present an opportunity to directly study the effect on species distribution of changes in availability of different environments, using the methods of Matthiopoulos et al. (2011).

Design-based inference for species distribution models

While the focus in species distribution modelling has been largely on obtaining accurate predictions of species distribution, design-based tools (Manly et al. 2002) offer the opportunity to assess uncertainty in predictions, in model parameters and in differences in predictive performance between competing analysis methods. For example, the block bootstrap (Lahiri 2003) has potential – it can be applied to large spatial blocks to make approximately valid inferences despite spatial autocorrelation (assuming weaker dependence between observations separated by larger distances).

Trends in the presence-only literature

Presence-only analysis is a relatively new and fast-moving literature, with rapid development and uptake of new analysis methods (for example Phillips et al. 2006, cited over 2000 times within 6 years). In comparisons of the predictive performance of different analysis methods (such as Elith et al. 2006), the more successful methods tend to be those with some form of regularization (Phillips et al. 2006; Reineking & Schröder 2006), community-level approaches (Ferrier & Guisan 2006; Elith & Leathwick 2007; Ovaskainen & Soininen 2011) or ensemble approaches where predictions are averaged across competing models (Araújo & New 2007; Elith et al. 2008). Although having said this, it must be emphasized that the results of methodological comparisons depend heavily on how the comparisons are made, and if not designed carefully, comparisons can be biased towards more complex models (Wenger & Olden 2012; Hijmans 2012).

Such a range of different methods has been proposed that it is an ongoing challenge to understand the properties of different methods and their interrelationships. Indeed, there have been calls for conceptual unification (Elith & Leathwick 2009; Aarts et al. 2012). Particular points at issue have been the lack of a formal model framework for presence-only data, and the question of how to select pseudo-absences, a problem known to have important implications for model outcomes (Chefaoui & Lobo 2008, for example) and which tends to be investigated via an ad hoc simulation perspective (Barbet-Massin et al. 2012, for example) as opposed to via a more systematic approach. This issue is very much related to the challenge of defining available points in used-availability analysis.

We think that a significant step in addressing the above issues is the proposal of point process models (Warton & Shepherd 2010; Baddeley et al. 2010; Aarts et al. 2012) as a natural statistical framework for modelling presence-only data, and the demonstration that in the Poisson case this is equivalent to MAXENT (Renner & Warton 2013), some used-available methods (Aarts et al. 2012) and (approximately) logistic regression on randomly chosen pseudo-absences (Warton & Shepherd 2010). These results are an important step forward for two reasons. Firstly, point process models offer a framework for choosing the number and location of pseudo-absences – treating pseudo-absences not as false absences but instead as a mathematical construct which we use to help estimate the likelihood function (via numerical integration, Warton & Shepherd 2010). Secondly, point process models come with a set of tools for inference about parameters and for assessing goodness-of-fit (Diggle 2003; Cressie 1993). MAXENT and related methods have previously lacked such tools, but their equivalence with point process models implies that point process methods can be applied directly.

Implications for used-available analysis

As before, we can take the above trends and reframe them in terms of resource selection function estimation, in order to identify some potential ways to advance current practice in used-available modelling.

Modern methods of estimating resource selection functions

There is the potential to improve predictive performance in used-available analyses, using modern tools that have had success in the presence-only literature. Particular examples include methods involving regularization (such as the LASSO, Phillips et al., 2006; Reineking & Schröder 2006), or some form of model averaging (Araújo & New 2007; Elith et al. 2008). Such ideas are explored elsewhere in this special issue.

Community-level used-available models

In cases where there are used-available data for multiple species in the same study region, one might be able to ‘borrow strength’ across species to improve predictions, in much the same way as community-level species distribution models (Ferrier & Guisan 2006; Ovaskainen & Soininen 2011).

A point process framework for used-available analysis

Point process models, as a valid approach to modelling any form of point event data (Cressie 1993), have considerable potential to assist in constructing readily interpretable models and can inform choice of the number and location of available points (Warton & Shepherd 2010; Aarts et al. 2012).

It's not what you use, it's how you use it

  1. Top of page
  2. Summary
  3. Introduction
  4. Implications of equivalence
  5. It's not what you use, it's how you use it
  6. Which way should you apply a given method?
  7. Discussion
  8. Acknowledgements
  9. References
  10. Supporting Information

The above suggestions, and more broadly, any advance in methods for analysing used-available or presence-only data, have the potential to improve predictive performance. It is well understood that applying different analysis methods to a data set can give very different results and very different predictive performance (Elith et al. 2006). However, it should also be understood that when different methods are applied in a similar way, differences in performance are typically more modest, and in some instances, seemingly different analysis methods can lead to near-identical results.

These ideas are illustrated on a bighorn sheep data set (T. Ryder, Wyoming Game and Fish Department, unpublished data). Five bighorn sheep (Ovis canadensis) were GPS-collared, and their location (in the Seminoe Mountains, Wyoming, USA) recorded hourly between 1 January and 15 April 2010 (Fig. 1). Maps of five environmental variables were also available over the whole study region, giving information on topology (aspect, slope, elevation) and exposure (distance from nearest tree, and distance from nearest ‘escape terrain’). It was of interest to estimate the resource selection function characterizing the behaviour of these five sheep and estimate the relative importance of the five environmental variables to the sheep.

We applied three quite different analysis methods to the bighorn sheep data – logistic regression (Warton & Shepherd 2010), maximum entropy (MAXENT, Phillips et al., 2006) and multivariate adaptive regression splines (MARS, Elith & Leathwick, 2007). These three methods were applied in similar ways – using the same random set of 2617 pseudo-absences, the same five environmental variables, included as linear, quadratic and interaction terms. Results were very similar across methods, as seen from inspection of maps of predicted values (Fig. 2) or from consideration of the relative importance of different environmental variables (Table 1).

Table 1. Different methods can give not-so-different results: the relative importance of different environmental variables (reported as % of explained deviance estimated via a leave-one-out approach) in (a) Logistic regression; (b) MAXENT; (c) MARS; when modelling the bighorn sheep data as in Fig. 2. Note that results are broadly similar across models; for example, the rank order of the five environmental variables is unchanged across the three models
Variable(a) Logistic regression(b) MAXENT(c) MARS
Aspect7·78·312·8
Distance to escape20·713·914·5
Slope0·90·70·1
Elevation34·449·741·9
Distance to tree31·518·218·0

However, precisely how a method is applied to data can have dramatic effects on results and predictive performance. For example, consider analyses of the bighorn sheep data using the same method, a Poisson point process model, with the same five environmental variables (as listed in Table 1). However, we have applied this analysis method in three different ways:

  1. As a static model – using the five environmental variables only, and not accounting for the time-sequencing of the data in any way.
  2. As a movement model (described below) – using raw data without transformation.
  3. As a movement model – using transformed data, where appropriate.

The movement model was fitted using the same modelling framework as the static model, but it additionally included three ‘movement variables’ in analyses, and pseudo-absences or ‘quadrature points’ (Warton & Shepherd 2010) were chosen in a different way. The three additional movement variables were a function of a sheep's last known sighting (distance from last location, direction of movement, time-of-day). By including these terms, the interpretation of model output changes – instead of modelling where a sheep is, we are modelling where a sheep will go next (given where it last was and when it was there). Pseudo-absences were chosen in the neighbourhood of a sheep's current location (similar to Forester et al. 2009), whereas for the static model, a regularly-spaced 30×30 m grid consisting of 78 182 pseudo-absences was used (Warton & Shepherd 2010). Further details are included in an Appendix S1.

When a point process model was applied to the bighorn sheep data in the above three ways, results differed substantially, both in terms of the appearance of maps of predicted sheep intensity (Fig. 3) or the relative importance of different variables ( Table 2).

Table 2. Changing how you apply method can dramatically affect results. The relative importance of different explanatory variables (reported as % of explained deviance estimated via a leave-one-out approach) when using a (a) static model; (b) movement model on raw data; (c) movement model on suitably transformed data. BIC of the fitted models is also included as a measure of goodness-of-fit
Variable(a) Static model(b) Movement model, raw data(c) Movement model, transformed data
Static variables
Aspect32·50·290·21
Distance to escape1·50·040·02
Slope0·10·030·04
Elevation3·10·040·04
Distance to tree0·50·020·01
Movement variables
Distance moved78·376·3
Direction of movement0·050·15
Distance moved × Time of day0·963·85
BIC657 713500 261496 544

While we can advance the methodology for presence-only and used-available analysis, and in so doing make some performance gains, there is clearly significant potential to make gains by advancing our thinking about how to use a given analysis method. This potential is evident in comparing results in Figs 2 and 3 or Tables 1 and 2 – while changing the analysis methodology had some effects on results, changing how a given method was applied had a more substantial effect. This raises the question: once you have chosen an analysis method, how should you decide how to apply it?

Which way should you apply a given method?

  1. Top of page
  2. Summary
  3. Introduction
  4. Implications of equivalence
  5. It's not what you use, it's how you use it
  6. Which way should you apply a given method?
  7. Discussion
  8. Acknowledgements
  9. References
  10. Supporting Information

Given that the way a method is applied substantially influences results, it is important to consider carefully how to apply any chosen analysis method to the data at hand. A key consideration when analysing data in ecology is that the assumptions and approach make ecological sense (Austin 2002), a consideration which has implications for the choice of variables for analysis, and their form of inclusion in the model. In addition, there are some important statistical considerations:

  • Study design – match the analysis method to the method by which the data were collected.
  • Data properties – study the properties of data to be analysed and ensure variables are analysed on the appropriate scale.
  • Goodness-of-fit – apply diagnostic tools to assess how well the given method of analysis fits the data.

These points are illustrated by example below, using the bighorn sheep data.

Study design

How data are analysed needs to be directly related to how it was collected. What were the independent sampling units (subjects) that were sampled? How were data collected on these subjects? See Cressie et al. (2009) for some additional considerations, including analysis when subjects had unequal probabilities of being sampled.

In the case of the bighorn sheep data (Fig. 1), the study involved putting radiocollars onto five sheep, and using GPS to track sheep movement at hourly intervals. Hence, the five sheep were the independent sampling units, and the data that were collected are best described as movement data. The data did not arise as some ‘static’ list of points where the sheep have been located. This means that static models (as in Figs 22 and a) which do not take account of the time-sequencing in the data are inappropriate, due to a mismatch between the way the data were collected and the way the data are treated in analysis. A high level of temporal autocorrelation has been introduced through repeated sampling, which needs to be accounted for to validly infer the nature of the environmental association (Patterson et al. 2008).

image

Figure 1. Example used-available data: (a) the hourly locations of five bighorn sheep; (b) elevation (in metres), one of five measured environmental variables considered in analyses; (c) an estimated resource selection function, as an exponential function of the five environmental variables (more specifically, the exponent of the map produced in Fig. 3a). The question considered in this paper is how to improve on current practice in how to produce a model as in (c).

Download figure to PowerPoint

A more natural model for the bighorn sheep data aims to predict future sheep locations not only as a function of environmental variables, but also as a function of previous location(s) as in Fig. 3b,c. This changes the interpretation of the model from a static model, of where the sheep is standing, to a movement model, of where the sheep is going. Hence, we were able to directly model the resource selection decisions that a sheep was making. Distance from last known location proved to be by far the most important predictor of where a sheep was next seen, accounting for over 75% of explained deviance in the movement models ( Table 2b,c). This is not a particularly surprising result – obviously the best place to look for a sheep is where you last saw it! However, given how important previous location was in predicting a sheep's future location, it was important to incorporate this information into analysis.

Movement might be expected to show some diurnal variation, so because sheep were tracked at hourly intervals, time-of-day should also be incorporated into the model. In fact, analyses suggested that time-of-day (and its interaction with distance moved) was the second most important variable in the model. Further inspection suggested that sheep were most active at night and least active early in the morning.

The identities of the five different sheep were not made use of in the analyses of Fig. 2 and incorporating that knowledge could further improve models. One way to make use of sheep identity is to make design-based inferences (Manly et al. 2002) about predictive performance. For example, we can use a leave-one-out approach to consider how well a model predicts the movement of sheep i, when the model was constructed using all data except that for sheep i. Such use of independent ‘test data’ to assess predictive performance is an important idea in model validation (Boyce et al. 2002), and using different sheep as the test data, we can assess how well the model transfers from one sheep to the next.

image

Figure 2. Different methods can give not-so-different results: predicted bighorn sheep distribution (on the scale of the linear predictor) for (a) logistic regression; (b) MAXENT; (c) MARS; all models were fitted to the bighorn sheep as quadratic functions of the same set of environmental variables, using the same set of 2617 random pseudo-absences. Redder (darker) regions indicate higher sheep density.

Download figure to PowerPoint

Data properties

Some variables should not be analysed in their raw form, but instead routinely require transformation. A common example in biology is size variables, which tend to be the outcome of multiplicative processes and hence are quite naturally interpreted on a logarithmic scale (Kerkhoff & Enquist 2009). A different example, encountered in the bighorn sheep data set, is circular variables (Fisher 1993). Aspect is a circular variable measured from 0 to 360 degrees, with 0 and 360 both meaning the same thing – a due north aspect. Hence, this variable makes little sense when analysed on an arithmetic scale (Fig. 4a), but should be transformed in some way to reflect its circularity. A second circular variable in the sheep data set is time-of-day, which is circular in time (0 to 24 h).

A simple way to modify a circular variable for regression analyses is to include its sine- and cosine-transformations as predictors – inline image and inline image, where K is the periodicity of the circular variable (K = 360 and K = 24, respectively, for aspect measured in degrees and time-of-day measured in hours). The sine and cosine functions map a circular variable onto the unit circle such that it can be interpreted as a directional quantity (Fisher 1993). Figure 4b plots the aspects at which sheep were located, with ‘jittering’, such that a high density of points suggests an aspect highly favoured by sheep. Contrary to Fig. 4a, a pattern can be seen – the sheep tend to favour southerly aspects.

With reference to the point process models introduced previously (Fig. 3), model (b) was fitted using aspect and time-of-day without transformation. On more careful consideration of data properties, both variables should have been sine- and cosine-transformed, as in model (c). The implications of these changes on results were relatively modest – the maps of predicted sheep intensity were broadly similar in Fig. 3b,c, and in Table 2b,c, the main gains seemed to come from treating time-of-day as a circular variable, which captured an additional 3% of deviance.

Goodness-of-fit

The precise diagnostic tools that can be used to assess goodness-of-fit depend on the method of data analysis. Generalized linear models for example can typically be diagnosed using residual plots (Dunn & Smyth 1996) and using information criteria. A conspicuous shortcoming of some machine learning methods is the lack of diagnostic tools – having fitted a support vector machine (Hastie et al. 2009) to data, for example, what model assumptions were made, and how can we check that they were reasonable?

A range of diagnostic tools have been developed for point process models (Baddeley et al. 2000; Diggle 2003; Baddeley & Turner 2005). We can use information criteria, for example, applying BIC to the three point process models of Table 2 suggests model (c) is the most appropriate. Graphical tools can also be applied, for example K-functions (Baddeley et al. 2000; Diggle 2003). For Fig. 5, the cumulative conditional intensity of latitude inline image (Cressie 1993) was calculated separately for each sheep using a leave-one-out approach, then plotted as a function of time. If the model fitted were valid, values of Λ(y) at points where sheep were observed would be approximately uniformly distributed, and importantly, they would be independent of time (or any other variable). This is evidently not the case for the static model (Fig. 5a), where each sheep's location in a north-south direction ‘drifts’, because of the dependence between a sheep's current location and its most recent known location. Models (b) and (c) account for this dependence sufficiently well that spatial dependence is no longer detectable (Fig. 5b,c).

Discussion

  1. Top of page
  2. Summary
  3. Introduction
  4. Implications of equivalence
  5. It's not what you use, it's how you use it
  6. Which way should you apply a given method?
  7. Discussion
  8. Acknowledgements
  9. References
  10. Supporting Information

The equivalence of the problems of analysing used-available and presence-only data presents some opportunities to advance on current practice in each discipline, by leveraging ideas developed in their corresponding contexts. Several ideas on this front have been suggested in this paper, by noticing ideas developed in one literature that have seemingly been under-developed in the other – some particularly interesting examples include the application of modern analysis methods (Elith et al. 2006) in used-available analysis, and the development of presence-only models of functional response to environment (Matthiopoulos et al. 2011) as a means to study the potential effects of a changing environment on species distribution.

While there are some ideas that have been developed in one literature and not the other, some insight can also be gained by studying commonalities. An important common theme in the used-available and location-only literatures is that of how to select available points or pseudo-absences to include in analyses (Chefaoui & Lobo 2008; Forester et al. 2009; Barbet-Massin et al. 2012). The significant potential influence of this decision is evident in the analyses of Tables 1 and 2, where method of pseudo-absence choice seemed the most striking source of differences in results. In the models of Table 1, a random set of 2617 pseudo-absences was analysed. This gave completely different results to Table 2a, which analysed 78 182 pseudo-absences in a uniform rectangular grid at a fine spatial resolution. The latter is a natural and effective sampling scheme for a static model (Warton & Shepherd 2010), and the substantial differences in results suggest that for this data set, 2617 random pseudo-absences was grossly insufficient. Results were different again in Table 2b,c, where for each presence point, a set of pseudo-absences was chosen in a radial design around the last known location (similar to Forester et al. 2009). On the question of precisely how to choose the number and location of pseudo-absences, point process models (Warton & Shepherd 2010) and animal movement models (Moorcroft & Barnett 2008; Forester et al. 2009) seem to have particular potential – when the role of the pseudo-absences is implicit in the modelling framework, there is no need to make ad hoc decisions to specify their number and location.

Although not the focus of this paper, it should be noted that there is a vast and growing body of literature on methods for modelling animal movement (Patterson et al. 2008; Moorcroft & Barnett 2008, for example), suitable for data such as the bighorn sheep example considered here. The Poisson point process approach considered in Fig. 3 was not a typical animal movement modelling approach, rather it was an ‘omnibus’ approach for analysing point patterns adapted to the problem of modelling movement. However, the approach was sufficiently flexible that it could construct quite reasonable regression models of presence-only data in either the static or movement context, to illustrate some key ideas.

The key idea demonstrated in Table 2 and Fig. 3 was that the most important consideration in analysis is perhaps not which analysis method to use, but how to apply any given method in a manner that is appropriate for the data at hand. Both the used-available and presence-only literatures are rife with papers proposing advances in analysis methodology (for example, Phillips et al. 2006; Elith & Leathwick 2007; Elith et al. 2008; Matthiopoulos et al. 2011), but less attention tends to be paid to the perhaps more important question of how to apply a method appropriately. Key statistical considerations are as follows: analyse data in a manner which reflects the study design; consider data properties; use diagnostic tools to assess how reasonable a given analysis is for the data at hand. Yet, some methods of analysis lack the flexibility to handle different study designs (e.g. incorporating animal movement), and some are seriously deficient in diagnostic tools for assessing goodness-of-fit. Perhaps, this is where the greatest gains can be made in advancing methods for used-available and presence-only analysis.

image

Figure 3. Changing how you apply a method can dramatically affect results, as demonstrated here when predicting intensity of bighorn sheep (on the log-scale) using a Poisson point process model fitted as a: (a) static model; (b) movement model on raw data; (c) movement model on suitably transformed data. Redder (darker) regions indicate higher density.

Download figure to PowerPoint

image

Figure 4. Aspect is a circular variable that should be transformed for analysis, rather than treating it as quantitative as in (a), values should be sine- and cosine-transformed prior to analysis, to map them onto a circle as in (b), which plots the aspect at all locations where a bighorn sheep was recorded as present. Note we can see from (b) that sheep are found especially often on slopes with a southerly aspect.

Download figure to PowerPoint

image

Figure 5. Goodness-of-fit of the different models to the bighorn sheep data. Cumulative intensity of latitude, Λ(y), is presented for a point process model fitted as a (a) static model; (b) movement model on raw data; (c) movement model on suitably transformed data. Λ(y) is plotted against time using a different colour for each of the five sheep. A model which fits the data well would have a cumulative intensity that takes uniformly random values which are independent over time – the ‘drift’ observed in (a) is clear evidence of lack-of-fit, suggesting that we need to use a movement model which predicts sheep location as a function of last known location (b) and (c).

Download figure to PowerPoint

Acknowledgements

  1. Top of page
  2. Summary
  3. Introduction
  4. Implications of equivalence
  5. It's not what you use, it's how you use it
  6. Which way should you apply a given method?
  7. Discussion
  8. Acknowledgements
  9. References
  10. Supporting Information

Thanks to Wayne Thogmartin, Lyman McDonald, Bryan Manly and Falk Huettmann for the invitation to present a paper in The Wildlife Society Symposium ‘Location-Only and Use-Availability Data: Analysis Methods Converge’, Waikoloa, Hawaii, November 5–10, 2010. Thanks to the associate editor, Lyman McDonald and an anonymous reviewer for helpful suggestions. DIW was supported by the Australian Research Council Discovery Projects Scheme (DP0985886). Ian Renner provided helpful R code for figures.

References

  1. Top of page
  2. Summary
  3. Introduction
  4. Implications of equivalence
  5. It's not what you use, it's how you use it
  6. Which way should you apply a given method?
  7. Discussion
  8. Acknowledgements
  9. References
  10. Supporting Information

Supporting Information

  1. Top of page
  2. Summary
  3. Introduction
  4. Implications of equivalence
  5. It's not what you use, it's how you use it
  6. Which way should you apply a given method?
  7. Discussion
  8. Acknowledgements
  9. References
  10. Supporting Information
FilenameFormatSizeDescription
jane12071-sup-0001-Appendix.pdfPDF document139KAppendix S1. This appendix gives further details on the data analyses leading to Figs 2 and 3.

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.