Volume 40, Issue 8
Review
Free Access

Cross‐validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure

David R. Roberts

E-mail address: droxroberts@gmail.com

http://orcid.org//0000‐0002‐3437‐2422

Dept of Biometry and Environmental System Analysis, Univ. of Freiburg, Freiburg, Germany

Search for more papers by this author
Volker Bahn

Dept of Biological Sciences, Wright State Univ., Dayton, OH, USA

Search for more papers by this author
Simone Ciuti

Dept of Biometry and Environmental System Analysis, Univ. of Freiburg, Freiburg, Germany

Search for more papers by this author
Mark S. Boyce

Dept of Biological Sciences, Univ. of Alberta, Edmonton, AB, Canada

Search for more papers by this author
Jane Elith

School of BioSciences, Univ. of Melbourne, Melbourne, Australia

Search for more papers by this author
Gurutzeta Guillera‐Arroita

School of BioSciences, Univ. of Melbourne, Melbourne, Australia

Search for more papers by this author
Severin Hauenstein

Dept of Biometry and Environmental System Analysis, Univ. of Freiburg, Freiburg, Germany

Search for more papers by this author
José J. Lahoz‐Monfort

School of BioSciences, Univ. of Melbourne, Melbourne, Australia

Search for more papers by this author
Boris Schröder

Inst. of Geoecology, Div. of Landscape Ecology and Environmental Systems Analysis, Technische Univ. Braunschweig, Braunschweig, Germany, and: Berlin‐Brandenburg Inst. of Advanced Biodiversity Research (BBIB), Berlin, Germany

Search for more papers by this author
Wilfried Thuiller

Lab. d’Écologie Alpine, Univ. Joseph Fourier, Grenoble, France

Search for more papers by this author
David I. Warton

School of Mathematics and Statistics and Evolution and Ecology Research Centre, The Univ. of New South Wales, Sydney, Australia

Search for more papers by this author
Brendan A. Wintle

School of BioSciences, Univ. of Melbourne, Melbourne, Australia

Search for more papers by this author
Florian Hartig

Dept of Biometry and Environmental System Analysis, Univ. of Freiburg, Freiburg, Germany

Theoretical Ecology, Univ. of Regensburg, Regensburg, Germany

Search for more papers by this author
Carsten F. Dormann

Dept of Biometry and Environmental System Analysis, Univ. of Freiburg, Freiburg, Germany

Search for more papers by this author
First published: 08 December 2016
Citations: 227

Abstract

Ecological data often show temporal, spatial, hierarchical (random effects), or phylogenetic structure. Modern statistical approaches are increasingly accounting for such dependencies. However, when performing cross‐validation, these structures are regularly ignored, resulting in serious underestimation of predictive error. One cause for the poor performance of uncorrected (random) cross‐validation, noted often by modellers, are dependence structures in the data that persist as dependence structures in model residuals, violating the assumption of independence. Even more concerning, because often overlooked, is that structured data also provides ample opportunity for overfitting with non‐causal predictors. This problem can persist even if remedies such as autoregressive models, generalized least squares, or mixed models are used. Block cross‐validation, where data are split strategically rather than randomly, can address these issues. However, the blocking strategy must be carefully considered. Blocking in space, time, random effects or phylogenetic distance, while accounting for dependencies in the data, may also unwittingly induce extrapolations by restricting the ranges or combinations of predictor variables available for model training, thus overestimating interpolation errors. On the other hand, deliberate blocking in predictor space may also improve error estimates when extrapolation is the modelling goal. Here, we review the ecological literature on non‐random and blocked cross‐validation approaches. We also provide a series of simulations and case studies, in which we show that, for all instances tested, block cross‐validation is nearly universally more appropriate than random cross‐validation if the goal is predicting to new data or predictor space, or for selecting causal predictors. We recommend that block cross‐validation be used wherever dependence structures exist in a dataset, even if no correlation structure is visible in the fitted model residuals, or if the fitted models account for such correlations.

The problem of structured data

Ecological data often show internal dependence structures: the tendency for values of nearby observations to be more similar than distant observations is widespread, if not pervasive. It can be found within every spatial scale from microhabitats to continents (spatial structure; Legendre 1993, Koenig 1999, Dormann et al. 2007), or within sequentially timed observations (temporal structure), such as in animal telemetry data (Rooney et al. 1998, Otis and White 1999) or population size estimates (Lundberg et al. 2000, Bjørnstad and Grenfell 2001). In behavioural ecology, individuals may form groups (herds, flocks, schools, packs) with synchronised activity or movement (hierarchical structure; Wu and David 2002, Sumpter 2006). In multi‐species analyses or in analyses of genetic populations, evolutionary relatedness may also lead to dependence between species (phylogenetic structure) or populations of species (genetic structure) with more recent divergence will tend to be more similar than those which diverged longer ago (Harvey and Pagel ).

While such underlying structures in the data are not fundamentally problematic for statistical analyses, they tend to create two undesirable outcomes. First, model error, as well as neglected processes and variables connected to these structures, often leads to dependence structures in the model residuals, which violates the critical assumption of independence present in many models and methods (Legendre and Fortin 1989, Miller et al. 2007). Second, because predictor variables are often correlated with underlying dependence structures (e.g. climate with space), models may use predictors to overfit the residual dependence structure and thereby remove it, partially or completely.

The standard statistical answer to this problem is the use of appropriate parametric models that include the respective dependence structure (Table 1), such as spatial or temporal autoregressive models, mixed models, or phylogenetic least squared regressions. In principle, these models solve the problem of independence and should allow the use of standard parametric methods for evaluating model fit and model selection (Dormann et al. 2007, Miller et al. 2007). In practice, however, specification errors as well as the problem of structural overfitting, as described above, can lead to a poor performance of these parametric model evaluations. Moreover, many popular machine‐learning methods such as random forest or neural networks do not allow accounting for such dependence structures. For all these reasons, it is crucial that we have robust nonparametric methods for validation, selection, and assessment of predictive accuracy of models when used on ecological data with internal dependence structures.

Table 1. Guidelines for achieving reliable error estimates in consideration of modelling objectives (extrapolation vs. interpolation) and cross‐validation approaches that may block in predictor space, structure, both predictor space and structure, or neither
Cross‐validation structure
Random Blocked
Cross‐validation predictor space Random Correct interpolation error without random structure Correct interpolation error with random structure
Blocked Correct extrapolation error without random structure Correct extrapolation error with random structure

Ideally, model validation, selection, and predictive errors should be calculated using independent data (Araújo et al. 2005). For example, validation may be undertaken with data from different geographic regions or spatially distinct subsets of the region, different time periods, such as historic species records from the recent past or from fossil records. Most commonly, however, either no such independent data exist or they do not meet assumptions of independence (Araújo et al. 2005). Further, changes in biological relationships, community structures, or evolutionary changes may affect species responses in different regions or time periods (Fielding and Bell 1997, Maguire et al. 2015). Because of these difficulties, predictive error on new data is commonly approximated by cross‐validation, in which data are (repeatedly) split into two subsets, one used for model training and the other for model testing (see Supplementary material Appendix 1 Table A1.1 for an overview of specific approaches and Table A2 for compiled references). This principle of data splitting is central to many of today's statistical algorithms and workflows, in particular for all predictive modelling frameworks in ecology (Hastie et al. ). The central assumption here is that training and evaluation data are independent. If not, error estimates will be too optimistic, and model selection will favour too complex models.

Early in their development, statistical models were typically assessed on their fit to the data alone (euphemistically referred to as ‘resubstitution’), representing an extreme case of non‐independence of the hold‐out. Of course, any such dependence of the validation with the training data will favour overfitted models (Larimore and Mehra 1985, Hawkins 2004), resulting in artificially small error estimates and thus overly optimistic estimates of model performance (Mosteller and Tukey , Olden et al. 2002, Arlot and Celisse 2010). A similar situation occurs when there are dependence structures in the data. When data held‐out for validation are drawn from nearby in the dependence structure (e.g. close in space or time, from the same herd, etc.) the independence of evaluation data can be compromised (Dormann et al. 2007, Kuhn 2007, Hastie et al. , Telford and Birks 2009, Bahn and McGill 2013), again producing overly optimistic estimates of prediction error (Mosteller and Tukey , Picard and Cook 1984), and potentially leading to erroneous scientific conclusions (Kuhn 2007, Hastie et al. , Telford and Birks 2009).

In other words, non‐independence of hold‐out data from the training data erroneously makes models appear more reliable than they are, enticing us to have more faith in their predictions than is actually warranted. Comparative studies of model validations for ecological applications have consistently demonstrated this e.g. (Olden et al. 2002, Reineking and Schröder 2003, Araújo et al. 2005, Veloz 2009, Lieske and Bender 2011, Roberts and Hamann 2012a, Wenger and Olden 2012, Bahn and McGill 2013). Problematically, modellers often partook (and frequently still do) in what Stone (1974, p. 111) labels controlled division of data, wherein the cautious statistician … sets aside a randomly [our emphasis] selected part of his sample without looking at it and then plays without inhibition with what's left, confident in the knowledge that the set‐aside data will deliver an unbiased judgment on the efficacy of his analysis. Of course, such random data splitting does not provide independent validation when a dependence structure is present and, thus, “unbiased judgment” is compromised.

In response, statisticians have introduced a smorgasbord of cross‐validation approaches in an effort to achieve unbiased error and parameter estimates (Stone 1974, Picard and Cook 1984, Shao 1993, Kohavi ), many of which have been incorporated into ecological studies (Mankin et al. 1977, Verbyla and Litvaitis 1989, Power 1993, Rykiel 1996). Early solutions were leave‐n‐out cross‐validation approaches (Stone 1974, Picard and Cook 1984) that run iteratively, each time withholding a small randomly selected subset of the data for testing. Because these approaches have also been shown to produce biased error estimates (Shao 1993, Kohavi , Telford et al. 2004, Amesbury et al. 2013), further corrections have been proposed, for example by incorporating distance‐based buffers around hold‐out points (Bahn , Telford and Birks 2009, Le Rest et al. 2014).

A general strategy to increase independence in cross‐ validation is to split data into “blocks” at some central point(s) of the dependence structure, such as in time or space. There are some examples of block cross‐validations in the ecological literature, implemented with a wide variety of stated objectives: most often for identifying non‐transferability or the general inability to extrapolate, but also for increasing independence, avoiding overfitting, providing more reliable error estimates, or selecting better predictive models (Supplementary material Appendix 1 Table A1.2–A1.3). When systematically compared with random data splits, they consistently demonstrate larger errors in predictions (Burman et al. 1994, Arlot and Celisse 2010, Lieske and Bender 2011, Roberts and Hamann 2012a, Wenger and Olden 2012, Bahn and McGill 2013, Radosavljevic and Anderson 2014). It should be noted, however, that few studies have explicitly demonstrated that the estimates resulting from blocked cross‐validations are indeed closer to the ‘true’ error that would be expected for a truly independent dataset (but see Trachsel and Telford 2016).

However, there is also reason to be cautious about reported block cross‐validation errors. While block cross‐validation addresses correlations, it can create a new validation problem: if blocking structures follow environmental gradients, blocking may hold out entire portions of the predictor space (i.e. ranges and/or combinations of predictor variables), introducing extrapolation between cross‐validation folds (Kennard and Stone 1969, Snee 1977). Consequently, when predicting to the hold‐out data, the model has to predict outside the ranges or into new combinations of predictor values of those included in the training folds. This could occur, for example, with spatial data splits, as climatic environments tend to be geographically structured (e.g. latitudinal gradients of temperature), or in temporal splits, as some periods will not have experienced certain predictor conditions (Zurell et al. 2012). In some cases, one may make a virtue of necessity, using this to test a models extrapolation error. In general, however, the concern remains that unwanted blocking of the environmental space could lead to an overestimation of interpolation errors.

Our objective for this article is to examine the utility and application of block cross‐validation. We review existing approaches, clarify the reasons for their use and their potential implementations, and discuss their shortcomings and challenges. We believe a better understanding of blocking and its relevance is highly important. Currently, blocking is not widely used in biogeographical studies and, when it is, the motivations for doing so are often not clear. This is a concern when so many studies now involve prediction to new times and/or places. However, we also demonstrate compelling reasons for using block cross‐validation even for model predictions to the same time and same region. Moreover, the majority of applications in our review come from the species distribution modelling literature, block cross‐validation has broad applicability to virtually any ecological analysis performed on structured data. We illustrate this point through simulations and case studies across a range of ecological questions. Specifically, we look at four block cross‐validation scenarios: spatial blocking, blocking by hierarchical groups, phylogenetic blocking, and blocking in predictor space. Via these analyses, we demonstrate that:

random cross‐validations, even with models that should correct for dependence structures, yield error estimates that are too low;

block cross‐validation does not only increase error estimates but actually provides estimates that are closer to true values;

blocking in structured data often restricts the predictor space and controlling this tendency may be necessary, depending on whether inter or extrapolation in predictor space is the goal.

Cross‐validation with structured data

Ecological variables (observations of biota) commonly contain four types of internal structure: autocorrelation in time, autocorrelation in space, group dependence structures, and phylogenetic structure (i.e. relatedness). These can lead to two issues in statistical models: 1) non‐independence of residuals, and 2) overfitting to the dependence structure of the data. The first issue arises, for example, when a model misses a structured variable, or if it does not describe its effect on the response perfectly. Non‐independence of residuals violates a central assumption of regression models and other statistical methods, typically leading to over‐optimistic confidence intervals and incorrect p‐values (Ives and Zhu 2006). The second issue, overfitting to the dependence structure of the data, describes the phenomenon that the model may absorb structured residual variation with another predictor (e.g. time or space themselves or another covariate that correlates with them). This can mask both the first problem and the underlying model misspecification, creating an overfitted model that does not predict well to new data.

To clarify these two issues, we provide four ecological examples where both issues could emerge:

1) Temporal structure ‐ Imagine that we have annual time series data of antelope population size, which fluctuates over time but always tends to be similar to a previous year's size. Also imagine that we have annual rainfall in each relevant year as a covariate. Population size may be partly driven by rainfall but is also affected by other covariates which we have not measured but that may also be structured in time, leading to model residuals that are temporally autocorrelated (non‐independence of residuals). However, because the missing covariates are also likely to correlate with rainfall, and that they all follow similar temporal structure, the model may attribute part of the effect of these other covariates to rainfall itself, which would result in biased parameter estimates and reduced autocorrelation of residuals (overfitting). Rainfall and demographically induced temporal autocorrelation are confounded.

2) Spatial structure ‐ Imagine the distribution of an anolis lizard across an island archipelago, which likely dispersed gradually throughout the individual islands from a single source, so their populations are spatially structured (i.e. data from nearby islands are likely to be more similar). If we model lizard distribution with climate, we will certainly end up with spatial autocorrelation in model errors due to the historic dispersal pattern of populations. However, these residuals will be reduced because we also certainly (and unintentionally) alias part of the geographic space via spatially structured environments. Thus, even if climate was immaterial to the species, it would be used by the model as a trend‐surface‐regression to reduce residual spatial autocorrelation (overfitting). Geographic space and climate space are confounded.

3) Hierarchical or group structure ‐ Imagine we have recorded observations of hyena movements paired with tree cover classes. While hyena movement may be partly driven by tree cover, they are also driven by movements of other hyena individuals in the same cackle. A habitat selection model based on tree cover will then result in residuals autocorrelated by individual animals or even by groups themselves (non‐independence of residuals). Further, each cackle likely moves within different tree covers, particularly if the cackles tend to avoid one‐another (i.e. tree cover correlates with individuals or groups). Therefore, other variables that correlate with individuals or groups, and thus also with tree cover, may be partly accounted for in the model by tree cover itself (overfitting), further reducing model residuals. We are unable to separate the contribution to the model of tree cover and the underlying random structure that tree cover represents, thus confounding individual or grouping structures.

4) Phylogenetic or genetic structure ‐ Imagine that we have drought tolerance data for a tree species, which tends to vary across several genotypes. Also imagine that we use distance from the coastline as a covariate, knowing that interior populations can survive drier conditions. Drought tolerance may be partly driven by the distance from the coast, but is also affected by other, unmeasured covariates, leading to model residuals that are autocorrelated by phylogenetic relatedness (non‐independence of residuals). It is possible that these missing covariates correlate both with coastal distance (say, if they are environmentally driven) and with genetic relatedness (especially if the species migrated post‐glacially from a single ice‐age refugium, structuring genetic relatedness in space). In this case, the model may attribute part of the effect of the unmeasured covariates to coastal distance, which would result in biased parameter estimates and reduced autocorrelation of residuals (overfitting). Coastal distance is confounded with phylogenetic dissimilarity. The same type of situation could apply when considering multiple species that are phylogenetically related.

Non‐independence of residuals may be addressed by explicitly modelling correlation structures, such as with autoregressive models (for space and/or time), hierarchical models (for describing nested structures), or phylogenetic contrasts (Ives and Zhu 2006; Table 1). For example, in parametric models, such problems are addressed by moving from simple regression models with independent random error assumptions to more complex model structures such as conditional spatial autoregressive models (CAR; Cressie ), integrated nested Laplace approximations (INLA; Rue et al. 2009), or geographically weighted regressions (GWR; Fotheringham et al. ), time series methods such as autoregressive integrated moving averages (ARIMA; Brockwell and Davis ), methods that include random effects such as generalised linear mixed models (GLMM; Breslow and Clayton 1993), or phylogenetic comparative methods such as phylogenetic generalised least squares (PGLS; Grafen 1989). These model structures can account for correlations among data points, yielding unbiased estimates, at least in theory.

Overfitting is a more insidious problem because it can easily escape detection unless cross‐validations are carefully implemented. Here, structure in observations (e.g. space, time, groups) is being explained by the model through some other non‐causal covariate. This is particularly common in ecology because covariates themselves can be structured in the same way as the residuals (i.e. in space, time, phylogeny, etc.). Thus, covariates need not be orthogonal to model structure, as assumed implicitly by methods in the previous paragraph. Resulting model predictions may perform fine in a situation where the correlation structure between non‐causal and the “true” predictors (i.e. underlying structures) remains unchanged (Bahn and McGill 2007), but they could completely fail when predicting to novel situations. Methods that directly target residuals (e.g. spatial variograms, or regression models with structured errors; Table 1) may fail to detect this problem because overfitting may hide the structure of the residuals. This can occur even when using models that account for dependence structures, such as those discussed above (and listed in Table 1).

As a consequence of both problems, cross‐validation on random data splits (all of which will be largely consistent in underlying structure) as well as the various parametric modelling options (Table 1) will tend to underestimate predictive error (Araújo et al. 2005, Veloz 2009, Bahn and McGill 2013), leading to false confidence in model predictions. We show in our later examples that this problem persists, although to a lower extent, even if appropriate parametric models (e.g. spatial models, mixed models, PGLS) are used. To address these issues associated with dependence structures, whether or not we know they are present, we can introduce blocking across the given correlation structure into our cross‐validations (Table 1). Different modelling objectives and different underlying data structures will necessitate different blocking approaches. When models are intended only to predict within the same spatial and temporal ranges or on the same individuals or groups by which they were trained, random cross‐validation may yield fair error estimates because the model's conditionals do not change. When models are intended to only to predict on the same data structure, without the desire to make causal inferences, random cross‐validation may yield fair error estimates. However, if the goal is to infer causal predictors, or predict into new dependence structures (i.e. new locations, new time periods, new individuals or groups, or for new species within a phylogeny), blocking is required. Moreover, blocking can also be used to estimate errors under extrapolation in predictor space, which will be discussed in the next section.

Blocking to account for spatial and temporal autocorrelation

When validation data are randomly selected for cross‐ validation from the entire spatial domain, training and validation data from nearby locations will be dependent (spatial autocorrelation). Consequently, if the objective is to project outside the spatial structure of the training data, error estimates from random cross‐validations will be overly optimistic. To address this, blocks can be designed across the spatial structure itself (i.e. in contiguous geographic space). This effectively forces testing on more spatially distant records, thus decreasing spatial dependence and reducing optimism in error estimates (Trachsel and Telford 2016). We demonstrate this via a simulation in Box 1. Temporal autocorrelation, which is functionally similar to spatial autocorrelation in a single dimension, presents the same dependence challenges. When models are intended to predict in time, blocks may be drawn in the same manner (i.e. blocks of contiguous time) to better ensure independence between cross‐validation folds (Burman et al. 1994, Racine 2000, Bergmeir and Benitez 2012).

Box 1. Spatial blocking

Spatial structure in data can lead to the underestimation of model prediction error when covariate predictors allow models to fit these structural patterns. In this simulation we investigate the ability of spatial blocking strategies to minimize this problem. We simulated data of species abundances on a 50 × 50 grid that depended in complex ways (interactions, non‐linear combinations, limiting effects, and exclusion by disease) on four spatially autocorrelated ‘environmental’ variables. We modelled the data using Random Forest (Breiman 2001) and compared the root mean squared errors (RMSE) among evaluation strategies. The results are based on 100 replicated landscapes (Supplementary material Appendix 2 for details).

Evaluation strategies tested included 1) using the same data for training and evaluation (resubstitution), 2) randomly splitting data into training and test data (random), 3) splitting the data into training and test data blocked in space with block sizes 10 × 10, 20 × 20 cells and half of the grid (25 × 50 cells), and 4) a leave‐one‐out cross validation (LOO) with spatial buffering , in which the cell held‐out for evaluation is buffered by a circle of cells (radius 5, 8 or 10), which are also omitted from the training set. We either used all test sites resulting from the evaluation strategies, even if they were environmentally non‐analogous to training data (no‐analogues included), or restricted testing sites to analogue ones (minimal environmental extrapolation). Cross‐validations were compared to an ‘ideal’ RMSE, which was estimated by producing a model for each of the 100 landscapes and predicting to the remaining 99 then averaging the errors to achieve a single RMSE per landscape.

Our results show that ignoring dependence between training and test sites (resubstitution and randomly drawn folds) lead to artificially low error estimates, while block cross‐validation and the buffered LOO produce error estimates much closer to the true error as determined by predicting on new, independent data, particularly when test sites are forced to be environmentally analogous to training sites. We also find that the size of the blocks needs to be substantially larger than the range of the spatial autocorrelation in the model residuals (∼10 units) to provide a good error estimate, while a buffer size equivalent to distances at which residual autocorrelation is reduced to zero suffices for the buffered LOO (Supplementary material Appendix 2 Fig. A1.1).

Complete R scripts for this simulation are provided in Supplementary material Appendix 6.

image

Root mean squared error (RMSE, n = 100 simulations) for the various spatially blocked cross‐validation approaches. Semi‐transparent shading and black vertical lines respectively show the 95% range and mean values of true RMSE (Ideal), which is determined by predicting onto independent realizations of the simulation. Solid lines show RMSE distributions with all test locations included while dashed lines show RMSE distributions with test locations non‐analogous to training locations removed.

Blocking to account for random effect structures

A somewhat different structure is presented by hierarchical data, such as blocked or nested experimental designs, data replicated by individuals in groups, or repeated measurements such as animal telemetry data. In these cases, data are structured by units such that observations within the same block, individual, or group will tend be more similar in quality and more dependent in response. Just as with spatial or temporal autocorrelation, models parameterised on such data may be fitted to the grouping structure itself via covariate predictors and consequent optimism in error estimates would be expected from random cross‐validations (i.e. that predict only on the same grouping structure) relative to the error expected when predicting to new individuals or groups.

We present an example in Box 2, in which we estimate resource selection functions (RSFs; Manly et al. ) with repeated movement observations of individual ungulates. In this case study, blocking for cross‐validation by individual animals circumvents the problem of the underlying random structure and delivers a more realistic error estimate for predictions onto new individuals. This case study also illustrates that, while a including random effects in a regression approach (i.e. a mixed model) might yield unbiased model parameter estimates, it cannot offer a reliable uncertainty for those estimates and, further, does not address the problem of underestimation of predictive error from cross‐validations with random data splits.

Box 2. Blocking by individual or group

We estimated resource selection functions (RSFs) for 43 female elk Cervus elaphus monitored using satellite telemetry in Alberta, Canada. We fitted generalized linear mixed models (GLMM) with a Bernoulli response (1 = use by elk; 0 = available, i.e. random points drawn from elk home ranges), environmental covariates as predictors, and elk individual as a random intercept. Exponentiated non‐intercept parameters estimates of the GLMMs were interpreted as relative selection strength in favour of any given predictor (Lele et al. 2013).

RSFs were evaluated using five‐fold cross‐validation as proposed by Boyce et al. (2002), based on nonparametric correlation between RSF bins and area‐adjusted frequencies for each withheld sub‐sample of the data in turn. A model with good predictive performance has a strong positive correlation (Boyce et al. 2002). We evaluated the performance of a full resubstitution model (train data = test data) and three five‐fold cross‐validations, each with a different blocking design: 1) random, in which each elk contributed to each fold with 20% of its position fixes (no blocking); 2) randomly selected individuals, in which each elk contributed to one fold with 100% of its fixes and home ranges of elk belonging to different folds may overlap (blocked by individual); and 3) spatially blocked individuals, in which each elk contributed to one fold with 100% of its fixes and selected in such a way that home ranges of elk belonging to different folds do not overlap (blocked by individuals that behave independently). Extended methods and complete results are provided in Supplementary material Appendix 3.

Evaluation by both resubstitution and random cross‐validation erroneously suggests outstanding model performance (Fig. B2.1a–b). In contrast, both blocked cross‐validations (by individual and by spatially blocked individual) showed notably lower performances on average and much higher variability in Spearman‐rank correlations across folds (Fig. B2.1ab). Cross‐validation blocked by random individuals resulted in a notable decrease in model evaluation performance relative to random splits across all position fixes. Blocking by spatially independent individuals resulted in no further decrease in model performance, suggesting that independence between folds was achieved at the level of individual animals (or, ecologically speaking: individuals with overlapping home ranges did not behave more similarly than any two random individuals). Parameter estimates, while consistent on average across methods, covered a wider breadth of values in the blocked cross‐validations, providing a measure of uncertainty for their true values (Supplementary material Appendix 3).

Both the acceptance of precision in parameter estimates as well as optimism in model validations due to non‐independent folds are prevalent in RSFs studies (Supplementary material Appendix 1 Table A1.2–A1.3; but see Wiens et al. 2008, Koper and Manseau 2009, Coe et al. 2011). Block cross‐validation can help avoid such overconfidence in model performance and foster greater care in the search for sound model structures.

Complete R scripts and data for this case study are provided in Supplementary material Appendix 6.

image

Summaries of resource selection function (RSF) implementations incorporating several validation approaches including resubstitution, cross‐validation with random fixes, cross‐validation with randomly blocked individuals, and cross‐validation with spatially blocked individuals, showing (a) the area‐adjusted frequency of RSF score bins and (b) Spearman's rank correlations (rho) between RSF bin ranks and area‐adjusted frequencies for each cross‐validation fold.

Blocking to account for phylogenetic correlations

Species properties are often phylogenetically conserved, meaning that closely related species tend to be more similar to each other than distant relatives. Consequently, analyzing data across species can lead to phylogenetically correlated residuals, resulting in individual observations that are not independent (Felsenstein 1985). Just as in time, space, or individuals and groups, phylogenetic structure can be overfit by the model when covariates correlate with phylogenetic structure. It has therefore become common in ecological analysis to fit regression models that include phylogenetic structure in their residuals (PGLS; Table 1; Revell 2010). To ensure independence in cross‐validation, it may also be necessary to block observations by phylogenetic distance. To our knowledge, such an approach to cross‐validation has not been undertaken in the phylogenetic literature. We demonstrate in Box 3 that this greatly improves inference for phylogenetically structured data.

Box 3. Blocking to address phylogenetic correlation

To show the use of phylogenetic blocking, we simulate a simple trait‐environment relationship (body mass versus latitude) with residual variation structured by phylogenetic distance, then cross‐validate regression predictions using three approaches: a random k‐fold, a k‐fold blocked in phylogenetic distance, and a leave‐one‐out approach with buffering by phylogenetic distance. We also use these cross‐validations for model selection, as well as considering model selection based on AIC of a standard regression (LM) and a geographic least squares regression (GLS).

Trait‐environment data were random environmental observations (latitudes between 0 and 90) for 50 hypothetical species with a phylogeny created by a standard birth‐death process. Body mass response was calculated from a quadratic (3 parameter) function, to which we added phylogenetically structured error by sampling from a multivariate normal distribution with phylogenetic distance as covariance (Fig. B3.1a–b). Semivariograms indicated that residual autocorrelation did not extend beyond ∼0.5 units of phylogenetic distance.

Model selection was between eight regressions of increasing complexity (i.e. increasing polynomial order), based on minimum AIC for the LM and GLS resubstitution approaches or based on minimum root mean squared error (RMSE) for the cross‐validations. Three cross‐validation approaches were considered. First, we ran 5‐ and 10‐fold cross‐validations with data assigned to folds randomly. Second, we ran blocked 5‐ and 10‐fold cross‐validations with folds defined by hierarchical clustering of the phylogenetic distances (Fig. B3.1b). Last, we implemented a phylogenetically independent leave‐one‐out cross‐validation, in which each data point was withheld in turn for model testing and the remaining data used for model training, with the exception of data within a predetermined buffer of phylogenetic distance (either 0.00, 0.25, 0.50, 0.75 or 1.00 phylogenetic distance units) around the withheld point. The entire data building and cross‐validation simulation process was repeated 100 times. Extended methods and complete results are provided in Supplementary material Appendix 4.

For model selection (Fig. B3.3c), the GLS was the best model selection tool of any approach (correct structure in 60% of the simulations), while the blocked cross‐validations and buffered leave‐one‐out approaches also performed well. The LM, the random cross‐validations, and the unbuffered leave‐one‐out were the worst (correct structure in 12–18% of simulations), more often choosing overly complex models. For error estimates, blocked and leave‐one‐out cross‐validations resulted in both median RMSE and ranges of RMSE better‐approximating the true errors in the data generating model, while both the LM and GLS resubstitution as well as the random cross‐validations and leave‐one‐out cross‐validations with smaller buffers gave optimistic error estimates (Fig. B3.1d). Only the five‐fold blocked cross‐validation and the leave‐one‐out with the largest buffer size (1.0) resulted in RMSE values higher than those of the true model (i.e. overly‐pessimistic validations).

Generally speaking, GLS reduced overfitting in model selection compared to the non‐independent approaches (i.e. LM, random cross‐validations, and leave‐one‐out cross‐validations with smaller buffer sizes). However, error estimates for GLS, while an improvement on the non‐independent approaches, were still optimistic. The block cross‐validations and leave‐one‐out cross‐validations with sufficiently large buffers provided the best combination of model selection and reliable error estimation.

Complete R scripts for this simulation are provided in Supplementary material Appendix 6.

image

Blocking in phylogenetic space. (a) Sample regression (grey line) showing the simulated relationship between a species trait (body mass) and the environment (latitude) with phylogenetically structured correlation in the residuals, conforming to the assumptions of a PGLS model. Shapes represent random cross‐validation folds and colours represent blocked cross‐validation folds. (b) Sample phylogenetic tree in which 25 tips (species) are assigned to cross‐validation folds randomly or assigned to folds by hypothetical clades based on phylogenetic distance. Autocorrelation in the phylogenetic structure can be visualised as simulated trait residuals structured by genetic distance in the tree. (c) Violin plot showing the distribution of model complexity (number of parameters in the selected model) for each cross‐validation approach across the simulations. The number of parameters in the true data generating model (3 parameters) is shaded in grey. (d) Results of the phylogenetic blocking simulation showing distributions of RMSE in trait predictions from the resubstitution, from both the random and blocked cross‐validations (with the number of folds, k, shown in brackets), and from the buffered leave‐one‐out (LOO) cross‐validation (with the buffer size, b, shown in brackets). The shaded grey area represents the 5th to 95 percentile range of the true model RMSE. Horizontal bars below the plot show the 5th to 95th percentile range of the RMSE for each approach.

Disentangling blocking and extrapolation

So far, we have discussed block cross‐validation as a means for model selection (Box 3) and calculating a corrected interpolation error in the presence of correlation structures within data or residuals (Box 1, 2). Blocking, however, can also be used to estimate extrapolation error. Extrapolations into new predictor space are different from changes in underlying structure of the data: the latter only changes correlations between predictors, while the former requires the model to predict a response in an area of predictor space about which they are uninformed. Models typically show larger error when extrapolating into these no‐analogue conditions (Pearson 2006, Elith and Graham 2009). Consequently, if our modelling goal is extrapolation, we are likely to underestimate prediction errors with standard cross‐validation approaches (Heikkinen et al. 2012). On the other hand, blocking may inadvertently restrict the predictor space for model training, especially as data structures are often collinear with clines in predictor variables (e.g. spatial temperature clines), creating overly pessimistic error estimates when model extrapolation is of no interest. Thus, when making decisions about cross‐validation approaches, model objectives must be carefully considered (Table 1).

Avoiding extrapolation

Environments tend to be structured in space and time: climates tend to be similar in nearby locations just as they tend to be similar in consecutive time periods. Therefore, because blocking to achieve structural independence in cross‐ validation requires the grouping of similar structural groups (e.g. contiguous space or time), such blocking also might group similar predictor space together, potentially removing all such like space from the remaining data. This effect is likely to be more pronounced when simple sets of predictor variables with global effects such as climate are used, in contrast to variables that explain distributions at finer grains and more local extents (Mackey and Lindenmayer 2001). When models are intended to interpolate (i.e. predict only to similar predictor space), blocking may induce extrapolation in cross‐validation when unnecessary. While it's unlikely that this can be avoided entirely, these effects can be minimised by 1) using blocks no larger than necessary considering the grain and extent of analysis and the spatial scale of patterning of environment, 2) using as much data as possible for model training, and 3) representing predictors equally across blocks or folds.

Extrapolation will generally increase as more data are withheld for testing (Box 4). Consequently, predictor coverage in training data can be maximised by making blocks (or leave‐one‐out buffers) only as large as required. The minimum blocking distance should be the extent of autocorrelation in model residuals (‘autocorrelation requirements’ in Fig. 2a). If correlation structures are multidimensional, anisotropic analyses of residual autocorrelation (in individual dimensions) may also allow blocks to be narrower in one direction than the other, while still achieving independence. For example, a model describing spatial data that is missing a key temperature variable may result in residual autocorrelation that extends in the north– south direction (the general direction of the temperature gradient) much farther than in the east– west direction. Dividing such data into square blocks or defining a circular buffer radius for a leave‐one‐out approach based on an isotropic autocorrelation distance would, in such a case, result in unnecessarily large east–west block distances, and potentially introduce unnecessary extrapolation into the cross‐validation. Portrait‐oriented rectangular blocks might better limit extrapolation.

Box 4. Blocking for extrapolation

We examined the effect of blocking in environment on cross‐validation, in a typical species distribution modelling approach for Douglas‐fir Pseudotsuga menziesii habitats in western North America. Species presence– absence records were paired with climate data from the 1961–1990 period and divided into groups for k‐fold cross‐validation using several data splitting approaches: random splits, splits in geographic space, splits in predictor space, as well as data resubstitution (no splitting). Geographic splits were implemented with a two‐fold checkerboard pattern across spatial grids of varying size. We also implemented a buffered leave‐one‐out cross‐validation with buffer radii of 100, 500, 1000 and 1500 km (Box 1).

Correlograms indicated that residual autocorrelation was virtually non‐existent past distances of ∼220 km. Modelling was done with Random Forest (Breiman 2001) and models were evaluated on predictions to all folds using AUC. Extrapolation was quantified by 1) calculating the Euclidean distance across all principal component predictors from each point in the withheld fold back to each point in the model training data, 2) for each withheld point, calculating the first percentile distance, and 3) calculating the average of the first percentile distances from all points in the withheld data. See Supplementary material Appendix 5 for detailed methods and results.

Both spatial and environmental blocking induce extrapolation between folds (Fig. B4.1a). The largest spatial blocks (e.g. 1 × 2 blocking) and the coarsest environmental blocks (e.g. 2‐group cluster analysis or splitting only in PC1) result in both the largest environmental distances and the lowest estimates of predictive accuracy (AUC), with the effect being much stronger for spatial blocks than for purely environmental blocks (Fig. B4.1a). There is a small but visible decrease in AUC for environmental blocks relative to spatial blocks at the same geographic distance, suggesting a cumulative effect on predictive accuracy when spatial and environmental extrapolation are combined (Fig. B4.1b). The buffered leave‐one‐out approach both minimises extrapolation and increases predictive accuracy relative to other methods at similar geographic distances.

While the effect of spatial autocorrelation may account for decreasing accuracy at distances up to ∼220 km, it cannot explain the continued decrease in AUC at much larger distances. Also, with only a moderate effect on accuracy, environmental extrapolation requirements are also unlikely to be the cause of this decrease. More likely, across larger spatial blocks, the underlying spatial structure changes (e.g. competition regimes, disease presences, changes in local adaptations of genetic populations, etc.). While some of this structure may be overfit with spatially autocorrelated predictors, this overfit is likely to break down across space, thus reducing model predictive accuracy in new regions (i.e. into alternative blocks). This overfit also hides these effects from our measurements of spatial autocorrelation, as correlograms or semivariograms were built using residuals from a full data model that could be overfit to the complete spatial structure.

In summary, while purely environmental splits force extrapolation, they are unlikely to account for spatial autocorrelation or structural overfitting because blocks may be spatially intertwined. Further, smaller spatial blocks, even those that seemingly account for residual autocorrelation, may be insufficiently large to account for structural overfitting.

Complete R scripts and data for this case study are provided in Supplementary material Appendix 6.

image

Model prediction accuracy (AUC) as function of the minimum (a) environmental distance and (b) minimum geographic distance between training and test data in various k‐fold blocking approaches. While relationships are drawn as linear, the theoretical minimum AUC is 0.5 (for a random model).

image

Examples of dependence structures, parametric solutions to parameter estimation, and the associated blocking approaches for cross‐validation to increase reliability of prediction error estimates.

image

Approaches for choosing appropriate block sizes to minimise extrapolation. (a) The tradeoffs in block size selection between addressing residual autocorrelation requirements and working within the data and computational limits. (b) Sample fold assignments of hypothetical spatial blocks for cross‐validation, which result in different levels of representation of predictor space based on combinations of two environmental variables (Variable 1 and Variable 2). When feasible, all blocks may be assigned to their own fold for cross‐validation (Unique). When fewer folds than blocks are used, grouping contiguous blocks together will result in high dissimilarity between folds (Contiguous), except in very homogeneous environments. Both systematic assignment of folds (Systematic) and repeated random assignment with the final assignment based on minimum dissimilarity (Optimised random) can ensure lower dissimilarity between folds. While this figure shows a spatial example, similar approaches could be used for other correlation structures.

We state the extents of residual autocorrelation as a blocking minimum because, as explained above, overfitting via structural covariates may reduce residual autocorrelation while not offering increased independence. Therefore, larger blocks than suggested by variograms or other measures of autocorrelation may be required to avoid optimistic error estimates, though the extent of this effect is unlikely to be known by the modeller. While the potential for introducing extrapolation is higher when blocks are made conservatively large, this can be mitigated through different approaches to block fold assignments (Fig. 2b).

While the preferred number of folds in k‐fold approaches has been suggested to be between 5 and 10 (Kohavi , Hastie et al. ), such recommendations are perhaps more appropriate for random splitting where an ad hoc number of folds must be chosen. To include as much data for model training as possible in each cross‐validation iteration, each block should be its own fold. This, of course, maximises required iterations of the cross‐validation, resulting in a potentially computationally expensive procedure, particularly when models are slow to fit or when data sets are very large (‘computational limits’ in Fig. 2a). However, while a cross‐validation with a large number of folds might be computationally intensive, there is no conceptual barrier to it so long as validation data meet the assumptions of independence. That said, with small values of k, resulting in limited model training data (‘data limits’ in Fig. 2a), bias in cross‐validation folds may become problematic, so the value of k may depend strongly on the overall data quantity (Hastie et al. ). The recycling of training data is, of course, maximised in leave‐one‐out approaches, which are also the most computationally intensive, requiring a new model to be fitted for each point in the data.

When leave‐one‐out approaches or k‐fold approaches with numerous folds are not feasible or not desired, assignments of numerous blocks to fewer folds can be implemented in ways that ensure a greater variability of predictors are represented in each fold (Fig. 2b). While random assignment of blocks to folds might result in good representation of predictor space in all folds, other approaches may better ensure this. For example, blocks can be systematically assigned to folds in checkerboard or repeating patterns to distribute them evenly across the data (Fig. 2b, ‘systematic’). A more directed approach could also be to divide predictors manually between folds (e.g. manually distribute blocks with similar values for key environmental values between folds). For more complex predictor space (i.e. more variables), random fold assignments can be repeated many times, measuring predictor dissimilarity between folds for each iteration, then choosing the optimal assignment resulting in lowest dissimilarity between folds (Fig. 2f, ‘optimised random’). This approach, of course, only ensures that predictor space in each fold will be equally different and not necessarily different in the same way. We note that whilst this is called ‘block’ cross‐validation, the implication is not that the divisions need to be rectangular. For instance, blocks might be based on sampling units themselves (Buston and Elith 2011), Box 2, or on distinct geographic features such as river catchments (Chee and Elith 2012) or mountain ranges (Bulluck et al. 2006).

Deliberately inducing extrapolation

In practice, modellers are often not only interested in the accuracy of their model predictions within the domain for which data exists (interpolation), but also beyond this domain (extrapolation). The need for such estimates is apparent in applications such as species habitat projections under future climate change, for which the prediction data is likely to contain no‐analogue predictor space, i.e. conditions not observed within the training data (Williams and Jackson 2007). Such extrapolation requirements are relatively straightforward to identify and measure via comparisons of training and prediction data, such as by examining individual variable ranges or by measurements of multivariate distances such as the MESS maps in Maxent and related procedures (Elith et al. 2010, Zurell et al. 2012, Mesgaran et al. 2014).

The key question for model extrapolation then is not whether a model is still ‘valid’ when applied to new data (it almost certainly is not), but rather to what degree the violation of assumptions undermines predictive accuracy. Extrapolation errors are difficult to estimate because no data exist in the domain to which the model is predicting. In such cases, we may consider cross‐validation strategies that try to simulate model extrapolation: splitting training and testing data so that the domain of predictor combinations in both sets is not overlapping. To sensibly interpret the results, we require some measure of dissimilarity in predictor space, a metric not completely straightforward to quantify. Models may include numerous predictors, some of which are autocorrelated and not all of which are equally important to every species. The simplest metrics of dissimilarity are comparisons of individual variable ranges (Capinha et al. 2012, Anderson 2013) that, while identifying extreme values in single variable dimensions, do not identify new arrangements of variable combinations. A more comprehensive approach involves measuring multivariate distances across standardised variables (Williams et al. 2001, Elith et al. 2006, Roberts and Hamann 2012b, Mesgaran et al. 2014) or principal components (Broennimann et al. 2012, Eiserhardt et al. 2013; Box 4) or measurements to multivariate convex hulls around data clouds (Cornwell et al. 2006).

There are limited examples of cross‐validations implemented with data splits directly in predictor space (Supplementary material Appendix 1 Table A1.2) and most are a byproduct of spatial data splitting (Fløjgaard et al. 2009, Roberts and Hamann 2012a, Wenger and Olden 2012). While Newbold et al. (2015) and Stephens et al. (2016) used biome delineations to block, these are also inferred extrapolations based on predefined spatial groups. Many assessments of model extrapolation fall under tests of the ‘transferability’ or ‘generalisability’ of a specific habitat model (Thomas and Bovee 1993, Leftwich et al. 1997, Schröder and Richter 2000, Chee and Elith 2012, Schibalski et al. 2014). While these studies evaluate model performances, they seldom quantify extrapolation requirements or analyse links between predictive performance decline and dissimilarity between training and prediction data. To date, while some ecological studies have linked decreases in predictive accuracy to measures of data dissimilarity (Capinha et al. 2012), only few have attempted to systematically quantify such patterns (Thuiller et al. 2004, Heikkinen et al. 2012, Roberts and Hamann 2012a, Bahn and McGill 2013), all of which expectedly found decreased prediction accuracy with increased dissimilarity between training and testing data. In these comparative studies, extrapolation was always a byproduct of spatial blocking. Of course, such validations assume that assessments of transferability in space to different predictor space can mimic assessments of transferability in time to different predictor space (Blois et al. 2013), but see (Schröder and Richter 2000).

How should blocks in predictor space be constructed?

A key challenge to blocking in predictor space for cross‐ validation is to decide how folds should be defined to inform the predictive objectives of the model. The intuitive approach may be to measure the dissimilarity between training and prediction data, then define blocks in such a way that the extrapolation requirements within the cross‐validation are as similar as possible in magnitude and direction to those for the predictions. In k‐fold approaches, where every fold is used exactly once for testing, this becomes a zero‐sum game where the more one fold resembles the objective extrapolation, the more the others do not! In these cases, tests of predictions to particular folds may be more informative in the context of extrapolation than the overall error estimate across all folds.

Alternatively, cross‐validations could be run several times with spatial or other structured blocks defined in a variety of sizes and/or orientations. This approach produces a range of validation statistics from the cross‐validations rather than just a single value (Fig. 3a). From this range, it may be possible to define a limit, either in spatial block size or variable dissimilarity, at which the model no longer produces useful predictions. Such extrapolation limits, or ‘forecast horizons’, are common tools in economics (Ohlson and Zhang 1999), and meteorology (Foley et al. 2012) but have also recently been considered in ecology (Petchey et al. 2015; Fig. 3b). While this approach does not require any ad hoc calculation of dissimilarity, it is more computationally intensive in that dissimilarity measures and cross‐validations must be undertaken for many blocking structures to determine the range of model performance.

image

(a) Conceptual figure demonstrating the expected relationship between extrapolation requirements (either from dissimilarity between model training and prediction data or from blocking in cross‐validation) and model accuracy, where accuracy generally decreases while dissimilarity increases. (b) Conceptual figure of a `forecast horizon’ (blue dashed line), based on repeated cross‐validation with differently sized blocks, resulting from the drawing of a model performance threshold (dashed grey line). Because extrapolation requirements may vary at a given prediction performance threshold (e.g. across folds), the forecast horizon may include a range of values (blue shading). Extrapolations beyond the horizon would be considered too unreliable to be useful (adapted from Petchey et al. 2015).

To our knowledge, there are no examples in the literature of cross‐validations using blocks purely defined in predictor space. In Box 4, we offer a species distribution modelling case study for North American Douglas‐fir, in which we compare cross‐validations based on random splitting, spatial blocking, and environmental blocking. Our results demonstrate that blocking purely in environment will decrease perceived model accuracy in cross‐validations, but that estimates may remain optimistic if underlying correlation structures are not addressed.

Guidance: how to block

In this section, we suggest a workflow for cross‐validation to clarify when and how to implement different data splitting strategies. The focus of this workflow is not on providing a fixed recipe for blocking, but rather on highlighting the questions a researcher should ask in this context. The exact answers to these questions are necessarily dependent on modelling objectives, data structures, computational capabilities, as well as the desire for conservatism in assessments of model forecast errors in the context of their results. We have discussed these implications and tradeoffs above.

Step 1. Assess dependence structures in the data

Determine the dependence structure in the raw data (temporal/spatial/phylogenetic autocorrelation using autocorrelation plots, variograms, or correlograms; quantify variance contribution in nested data using intercept‐only mixed effect models). This serves as rough guidance on the scale of blocking (at least as many units as the range of autocorrelation; at least at the most variable hierarchical level). It should be emphasised here that, while modellers are most often concerned with autocorrelation in model residuals, dependence structures in this step are assessed on raw data, as this is where overfitting of predictor variables may occur.

Step 2. Determine prediction objectives

Will the model predict into new dependence structures (spaces, times, groups, etc.), or into new predictor space, or both? While extrapolation in predictor space, time, and geographic space may be straightforward to quantify (Box 1, 4), changes in hierarchical structure may necessitate more deliberation. For example, while some determinations of what constitutes a new ‘group’ of individuals may be obvious, others may be more nuanced (e.g. herds with non‐overlapping ranges vs. individuals in the same herd that move largely independently, Box 2).

Step 3. Block according to objectives and structure

When predictions will be made into new dependence structures, blocks should be drawn so that similar structural conditions are grouped together (e.g. spatial blocks when predicting to new sites; time‐slice blocks of similar duration as the one predicting to; herds as blocks when predicting to a new herd; clades defined at the same phylogenetic branching depth as the clade to predict to; Box 1, Box 2, Box 3 for examples). Blocks can also be designed or arranged (or fold assignments or cross‐validation methods can be chosen) to either minimise extrapolation or to emulate the extrapolation required between the training and prediction data (Box 4).

Step 4. Perform the cross‐validation

Cross‐validations may be performed for model comparison (and thus selection), error estimation, or both. There can be many blocks within each fold of a cross‐validation. For example, if the block size of a spatial data set is 100 × 100 km, then the entire study region of Canada can be checkerboarded and a random set of blocks is assigned to each fold. Or, if within a herd, subgroups of ungulates with similar movement exist, these may serve as blocks and be assigned to fold across herds.

Step 5. Make ‘final’ predictions

The analyst essentially has two choices of how to determine a ‘final model’ or make ‘final predictions’. Both have distinct advantages and disadvantages:

All the available training data can be used to fit a new model with which to make a single set of final predictions (Kuhn and Johnson ). As the error estimates from such a model are invalid, error estimates derived from the blocked cross‐validation methods should be used. This approach favours final prediction quality over perfect accuracy of error estimates. It has the advantage of using all the data and thus likely being the best predictor, particularly for smaller datasets. It has the disadvantage that the error estimates from the cross‐validation no longer apply perfectly to the predictions, as they were made with slightly different models. However it would be safe to assume that the error estimates are conservative (i.e. the final model should perform better), so this may not be a major disadvantage.

All the individual models from the cross‐validation can be preserved and predictions from all models can be combined (Hastie et al. ). For example, in a k‐fold approach, k different models are fitted, each describing a slightly different combination of training data. Predictions on the new data can be made with each of the k models, then averaged. This approach has the advantage of preserving the direct relationship between the models and the error estimates (i.e. the ‘final models’ are exactly those evaluated) as well as offering a variance for each prediction in the training data. On the downside, predictions are always made by models fitted with incomplete training data, compromising the sufficiency principle (i.e. that all possible information has been gleaned from the data) in the same way bagging does.

Challenges and limitations

While block cross‐validation may helpful in situations with non‐independent data, there are several instances in which spatial, temporal, phylogenetic, or blocking in predictor space may not be fruitful. This section aims at raising awareness of these problems and of the general limitations that prediction may face.

When data are scarce, cross‐validation approaches that require models to be trained with further subset data may not be feasible. Similarly, even when data are numerous but only cover small spatial or temporal ranges, achieving independence between training and test data by blocking may not be possible. For example, if spatial autocorrelation persists at distances larger than half the spatial extent of the data, achieving independence in folds will be impossible no matter how spatial blocks are structured. This may also be the case for animal telemetry data when individuals move as a unified group. In such cases, no plethora of data records from within the same group will accommodate effective cross‐validation for predictions to new independent individuals. This is more likely to occur within opportunistically collected data, than in data collected in systematic surveys.

Irregular sampling may lead to data clusters in space, time or along other correlation structures, which may lead to difficulties in defining effective regular blocks (Fig. 4a). In such cases, the models fit on training data may encounter highly variable sample sizes and prevalence rates, resulting in artificially large error estimates. One solution may be to use irregularly arranged but similarly sized blocks (Fithian et al. 2015; Fig. 4b) or irregularly shaped blocks (Lieske and Bender 2011; Fig. 4c).

image

Conceptual illustrations of challenges for block cross‐validation in space. Data may be (a) highly unbalanced in distribution of samples, which can be addressed through (b) irregular spacing of blocks of consistent size and shape, or (c) irregularly shaped blocks. Data may also be (d) highly unbalanced in prevalence (number of presences versus absences), which can be addressed through (e–f) non‐gridded blocks and/or irregularly shaped blocks.

Similarly, even when sampling coverage is unbiased and regular, in presence– absence data, prevalence of occurrences may be highly unbalanced (Fig. 4d), leading to blocks entirely lacking either presences or absences (e.g. withholding the centre square in Fig. 4d for validation). Unbalanced mean values of the response can also make cross‐validation problematic if, for example, one tries to validate predictions using a block with only absence locations (Fig. 4d). While this may primarily be a presence–absence design problem, similar arrangements may appear in continuous response data. For example, in analyses with linear link functions, unequal means only affect estimates of the intercept, but for non‐linear link functions (such as in count or presence–absence data) it affects all estimates.

Possible solutions for selecting evaluation blocks when data sampling is irregular or unbalanced could be: 1) non‐gridded but consistently shaped blocks (e.g. pie‐slices, Fig. 4e), 2) stratified blocks with similar mean/prevalence (Elith et al. 2008, who suggest a blocking strategy with equal number of occupied sites per block, Bahn and McGill 2013), 3) grouped sets of blocks (e.g. checkerboards; Box 1, 4) to ensure coverage of both presence–absence, and 4) buffered leave‐one‐out approaches (Bahn , Telford and Birks 2009, Le Rest et al. 2014; Box 1, 4). It should be noted, however, when using non‐regular spatial block shapes and arrangements, blocks may address autocorrelation inconsistently.

Last, in new predictor space, we might also encounter changes in relationships among covariates (changing correlation structures) or among species interactions themselves (Fielding and Bell 1997). This can be particularly problematic when predicting over larger time scales, enabling evolutionary changes to violate assumptions of niche conservatism (Maguire et al. 2015). Both situations, potentially undetectable by the modeller, can result in a loss of predictive power (Austin 2002) and are not, unfortunately, addressed through blocking in cross‐validation.

Final thoughts

In this review and synthesis, we have discussed the role of block cross‐validation for better estimating prediction errors. It addresses prediction optimism, arising from non‐independent hold‐out or from overfitting data dependence with covariates. We did not, however, attempt to address the effect of this overfitting on parameter estimation, or on model selection. These topics would benefit from further exploration.

Statistical models in ecology are used not just to describe the present state of natural systems, but also to predict their change or development over time. Such models are fairly simple to create and have thus become ubiquitous in all areas of ecological research. To determine whether these statistical simplifications of ecological systems are useful, we need effective model validation procedures that produce reliable error estimates. Unfortunately, many popular evaluation and cross‐validation approaches may result in erroneous and misleading assessments of model performance, either due to known and detectable issues of non‐independence (e.g. residual autocorrelation) or due to more clandestine issues (e.g. structural parameterisation via covariates).

While developing a single‐best approach to model validation applicable to all situations is impossible, informed choices may be guided by simple tests of dependence structures and extrapolation demands. Parametric measures of model fit or cross‐validations with random data splits only provide reliable error estimates for model predictions in very specific cases where critical assumptions of independence and non‐extrapolation are met. Such situations are rare in ecology and, as our simulations and case studies illustrate, can be very difficult to identify ad hoc.

In cases where the assumption of independence is compromised or where model extrapolation is likely, cross‐validations with non‐random blocks, carefully chosen in light of modelling objectives, can offer more reliable error estimates. The price of slightly conservative model validation (e.g. using a blocked approach when not necessary) is small compared to the unwarranted confidence in model predictions one might have with random cross‐validation (or with no cross‐validation at all). By overestimating predictive confidence, ecological modellers fail to adequately incorporate uncertainty into conservation and management decision‐making and, more critically, sacrifice scientific credibility when a high proportion of such prognoses end up being wrong.

Data deposition

Data available from the Dryad Digital Repository: < http://dx.doi.org/10.5061/dryad.737gk> (Roberts et al. 2017).

Acknowledgements – We thank the German Science Foundation (DFG) for funding the workshop ‘Model averaging in Ecology’, held in Freiburg 2–6 March 2015 (DO 786/9‐1), where the ideas included in this manuscript were developed. We thank Lara Budic for consultation and R code for phylogenetic trees and are grateful to all those who made their data available.

Funding – DRR is supported by the Alexander von Humboldt Foundation through the German Federal Ministry of Education and Research. BS is supported by the German Science Foundation (grant no. SCHR1000/6‐2). CFD acknowledges additional funding by the DFG (DO786/10‐1). DIW and JE are supported by Australian Research Council Future Fellowships (grant no. FT120100501 and FT0991640). GGA is the recipient of a Discovery Early Career Research Award from the Australian Research Council (project DE160100904). The work of JJLM was supported by the Australian Research Council Discovery Project DP160101003. Collection of data used to build elk resource selection models (Box 2) was funded by the Alberta Conservation Association (ACA – Grant Eligible Conservation Fund; grants to SC and MSB), the Natural Sciences and Engineering Research Council of Canada (NSERC CRD; grants to MSB and postdoctoral fellowship to SC), and Shell Canada limited. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author contributions – All authors conceived the idea for this study. DRR, FH, CFD, SC and VB designed the study, and DRR, SC and VB carried out the simulations and analyses. The initial draft was written by DRR. All authors contributed comments and improvements to the manuscript.

Supplementary material (Appendix ecog‐02881 at < www.ecography.org/appendix/ecog‐02881 >). Appendix 1: supplementary tables. Appendix 2: spatial blocking (Box 1), extended methods and detailed results. Appendix 3: blocking by individual or group (Box 2), extended methods and detailed results. Appendix 4: blocking to address phylogenetic correlation (Box 3), extended methods and detailed results. Appendix 5: blocking for extrapolation (Box 4), extended methods and detailed results. Appendix 6: R scripts and applicable data for the simulations and case studies.

    Number of times cited according to CrossRef: 227

    • Mapping the geogenic radon potential for Germany by machine learning, Science of The Total Environment, 10.1016/j.scitotenv.2020.142291, 754, (142291), (2021).
    • Integrating multiple data sources and multi-scale land-cover data to model the distribution of a declining amphibian, Biological Conservation, 10.1016/j.biocon.2019.108374, 241, (108374), (2020).
    • Unraveling the habitat preferences of two closely related bumble bee species in Eastern Europe, Ecology and Evolution, 10.1002/ece3.6232, 10, 11, (4773-4790), (2020).
    • A standard protocol for reporting species distribution models, Ecography, 10.1111/ecog.04960, 43, 9, (1261-1277), (2020).
    • Evaluating Indirect Effects of Hunting on Mule Deer Spatial Behavior, The Journal of Wildlife Management, 10.1002/jwmg.21916, 84, 7, (1246-1255), (2020).
    • A gradient boost approach for predicting near-road ultrafine particle concentrations using detailed traffic characterization, Environmental Pollution, 10.1016/j.envpol.2020.114777, (114777), (2020).
    • Modeling and mapping aboveground biomass of the restored mangroves using ALOS-2 PALSAR-2 in East Kalimantan, Indonesia, International Journal of Applied Earth Observation and Geoinformation, 10.1016/j.jag.2020.102158, 91, (102158), (2020).
    • Important Considerations when Using Models, The Journal of Wildlife Management, 10.1002/jwmg.21930, 84, 7, (1221-1223), (2020).
    • Forest microclimate dynamics drive plant responses to warming, Science, 10.1126/science.aba6880, 368, 6492, (772-775), (2020).
    • When to stay and when to leave? Proximate causes of dispersal in an endangered social carnivore, Journal of Animal Ecology, 10.1111/1365-2656.13300, 89, 10, (2356-2366), (2020).
    • Community science validates climate suitability projections from ecological niche modeling, Ecological Applications, 10.1002/eap.2128, 30, 6, (2020).
    • Museo ToolBox: A Python library for remote sensing including a new way to handle rasters., Journal of Open Source Software, 10.21105/joss.01978, 5, 48, (1978), (2020).
    • Not so robust: Robusta coffee production is highly sensitive to temperature, Global Change Biology, 10.1111/gcb.15097, 26, 6, (3677-3688), (2020).
    • Machine Learning and Statistical Models to Predict Postpartum Hemorrhage, Obstetrics & Gynecology, 10.1097/AOG.0000000000003759, 135, 4, (935-944), (2020).
    • Seasonal-to-interannual prediction of U.S. coastal marine ecosystems: Forecast methods, mechanisms of predictability, and priority developments, Progress in Oceanography, 10.1016/j.pocean.2020.102307, (102307), (2020).
    • Global warming will affect the maximum potential abundance of boreal plant species, Ecography, 10.1111/ecog.04720, 43, 6, (801-811), (2020).
    • Model averaging for mapping topsoil organic carbon in France, Geoderma, 10.1016/j.geoderma.2020.114237, 366, (114237), (2020).
    • Environmental Determinants of Aedes albopictus Abundance at a Northern Limit of Its Range in the United States, The American Journal of Tropical Medicine and Hygiene, 10.4269/ajtmh.19-0244, 102, 2, (436-447), (2020).
    • Calibration of probability predictions from machine‐learning and statistical models, Global Ecology and Biogeography, 10.1111/geb.13070, 29, 4, (760-765), (2020).
    • Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models, Ecography, 10.1111/ecog.04890, 43, 4, (549-558), (2020).
    • The potential natural vegetation of large river floodplains – from dynamic to static equilibrium, Journal of Hydro-environment Research, 10.1016/j.jher.2020.01.005, (2020).
    • Neural hierarchical models of ecological populations, Ecology Letters, 10.1111/ele.13462, 23, 4, (734-747), (2020).
    • Unveiling geographical gradients of species richness from scant occurrence data, Global Ecology and Biogeography, 10.1111/geb.13055, 29, 4, (748-759), (2020).
    • Modeling avian full annual cycle distribution and population trends with citizen science data, Ecological Applications, 10.1002/eap.2056, 30, 3, (2020).
    • How good are the models available for estimating sugar content in sugarcane?, European Journal of Agronomy, 10.1016/j.eja.2019.125992, 113, (125992), (2020).
    • Calibration of low-cost particulate matter sensors: Model development for a multi-city epidemiological study, Environment International, 10.1016/j.envint.2019.105329, 134, (105329), (2020).
    • Assessing the impact of an invasive bryophyte on plant species richness using high resolution imaging spectroscopy, Ecological Indicators, 10.1016/j.ecolind.2019.105882, 110, (105882), (2020).
    • New technologies in the mix: Assessing N‐mixture models for abundance estimation using automated detection data from drone surveys, Ecology and Evolution, 10.1002/ece3.6522, 10, 15, (8176-8185), (2020).
    • Size-differentiated patterns of exposure to submicron particulate matter across regions and seasons in China, Atmospheric Environment, 10.1016/j.atmosenv.2020.117745, 238, (117745), (2020).
    • Spatial modelling of ecological indicator values improves predictions of plant distributions in complex landscapes, Ecography, 10.1111/ecog.05117, 43, 10, (1448-1463), (2020).
    • Cougar roadside habitat selection: Incorporating topography and traffic, Global Ecology and Conservation, 10.1016/j.gecco.2020.e01186, 23, (e01186), (2020).
    • Evaluating the potential for bird‐habitat models to support biodiversity‐friendly urban planning, Journal of Applied Ecology, 10.1111/1365-2664.13703, 57, 10, (1902-1914), (2020).
    • Thermal thresholds heighten sensitivity of West Nile virus transmission to changing temperatures in coastal California, Proceedings of the Royal Society B: Biological Sciences, 10.1098/rspb.2020.1065, 287, 1932, (20201065), (2020).
    • Prioritizing coastal wetlands for marsh bird conservation in the U.S. Great Lakes, Biological Conservation, 10.1016/j.biocon.2020.108708, 249, (108708), (2020).
    • Keeping up with the times: Mapping range-wide habitat suitability for endangered species in a changing environment, Biological Conservation, 10.1016/j.biocon.2020.108734, 250, (108734), (2020).
    • A Generalized Additive Model Correlating Blacklegged Ticks With White-Tailed Deer Density, Temperature, and Humidity in Maine, USA, 1990–2013, Journal of Medical Entomology, 10.1093/jme/tjaa180, (2020).
    • Beyond counts and averages: Relating geodiversity to dimensions of biodiversity, Global Ecology and Biogeography, 10.1111/geb.13061, 29, 4, (696-710), (2020).
    • Machine learning for digital soil mapping: Applications, challenges and suggested solutions, Earth-Science Reviews, 10.1016/j.earscirev.2020.103359, (103359), (2020).
    • Spatial validation reveals poor predictive performance of large-scale ecological mapping models, Nature Communications, 10.1038/s41467-020-18321-y, 11, 1, (2020).
    • Overview of LifeCLEF 2020: A System-Oriented Evaluation of Automated Species Identification and Species Distribution Prediction, Experimental IR Meets Multilinguality, Multimodality, and Interaction, 10.1007/978-3-030-58219-7_23, (342-363), (2020).
    • Climate Change and Alpine Screes: No Future for Glacial Relict Papaver occidentale (Papaveraceae) in Western Prealps, Diversity, 10.3390/d12090346, 12, 9, (346), (2020).
    • Count Regression and Machine Learning Approach for Zero-Inflated Over-Dispersed Count Data. Application to Micro-Retail Distribution and Urban Form, Computational Science and Its Applications – ICCSA 2020, 10.1007/978-3-030-58811-3_40, (550-565), (2020).
    • On the interpretability of predictors in spatial data science: the information horizon, Scientific Reports, 10.1038/s41598-020-73773-y, 10, 1, (2020).
    • Deadwood stocks in south-western European forests: Ecological patterns and large scale assessments, Science of The Total Environment, 10.1016/j.scitotenv.2020.141237, 747, (141237), (2020).
    • Machine Learning Applications in Hydrology, Forest-Water Interactions, 10.1007/978-3-030-26086-6_10, (233-257), (2020).
    • Spatiotemporal Statistics: Analysis of Spatially and Temporally Correlated Throughfall Data: Exploring and Considering Dependency and Heterogeneity, Forest-Water Interactions, 10.1007/978-3-030-26086-6_8, (175-204), (2020).
    • Machine learning in space and time for modelling soil organic carbon change, European Journal of Soil Science, 10.1111/ejss.12998, 0, 0, (2020).
    • Development of a Geogenic Radon Hazard Index—Concept, History, Experiences, International Journal of Environmental Research and Public Health, 10.3390/ijerph17114134, 17, 11, (4134), (2020).
    • Including indigenous knowledge in species distribution modeling for increased ecological insights, Conservation Biology, 10.1111/cobi.13373, 0, 0, (2020).
    • Altitudinal, latitudinal and longitudinal responses of cloud forest species to Quaternary glaciations in the northern Neotropics, Biological Journal of the Linnean Society, 10.1093/biolinnean/blaa070, (2020).
    • Adjusting for Spatial Effects in Genomic Prediction, Journal of Agricultural, Biological and Environmental Statistics, 10.1007/s13253-020-00396-1, (2020).
    • Wetland Surface Water Detection from Multipath SAR Images Using Gaussian Process-Based Temporal Interpolation, Remote Sensing, 10.3390/rs12111756, 12, 11, (1756), (2020).
    • Cross-Validation: A Method Every Psychologist Should Know, Advances in Methods and Practices in Psychological Science, 10.1177/2515245919898466, (251524591989846), (2020).
    • Mapping Floristic Patterns of Trees in Peruvian Amazonia Using Remote Sensing and Machine Learning, Remote Sensing, 10.3390/rs12091523, 12, 9, (1523), (2020).
    • Fast pathogen identification using single-cell MALDI-ATOF mass spectrometry data and deep learning methods, Analytical Chemistry, 10.1021/acs.analchem.9b05806, (2020).
    • Predicting ambient PM2.5 concentrations in Ulaanbaatar, Mongolia with machine learning approaches, Journal of Exposure Science & Environmental Epidemiology, 10.1038/s41370-020-0257-8, (2020).
    • Mapping the Risk Terrain for Crime Using Machine Learning, Journal of Quantitative Criminology, 10.1007/s10940-020-09457-7, (2020).
    • Alien mammal assemblage effects on burrow occupancy and hatching success of the vulnerable pink-footed shearwater in Chile, Environmental Conservation, 10.1017/S0376892920000132, (1-9), (2020).
    • Harmonized Landsat 8 and Sentinel-2 Time Series Data to Detect Irrigated Areas: An Application in Southern Italy, Remote Sensing, 10.3390/rs12081275, 12, 8, (1275), (2020).
    • Modify Leave-One-Out Cross Validation by Moving Validation Samples around Random Normal Distributions: Move-One-Away Cross Validation, Applied Sciences, 10.3390/app10072448, 10, 7, (2448), (2020).
    • A permutation test and spatial cross-validation approach to assess models of interspecific competition between trees, PLOS ONE, 10.1371/journal.pone.0229930, 15, 3, (e0229930), (2020).
    • Identifying BAP1 Mutations in Clear-Cell Renal Cell Carcinoma by CT Radiomics: Preliminary Findings, Frontiers in Oncology, 10.3389/fonc.2020.00279, 10, (2020).
    • Harmonizing Multi-Source Sonar Backscatter Datasets for Seabed Mapping Using Bulk Shift Approaches, Remote Sensing, 10.3390/rs12040601, 12, 4, (601), (2020).
    • Concentrations of criteria pollutants in the contiguous U.S., 1979 – 2015: Role of prediction model parsimony in integrated empirical geographic regression, PLOS ONE, 10.1371/journal.pone.0228535, 15, 2, (e0228535), (2020).
    • Deep learning applied to glacier evolution modelling, The Cryosphere, 10.5194/tc-14-565-2020, 14, 2, (565-584), (2020).
    • Incorporation of Remote PM2.5 Concentrations into the Downscaler Model for Spatially Fused Air Quality Surfaces, Atmosphere, 10.3390/atmos11010103, 11, 1, (103), (2020).
    • Invasive fountain grass (Pennisetum setaceum (Forssk.) Chiov.) increases its potential area of distribution in Tenerife island under future climatic scenarios, Plant Ecology, 10.1007/s11258-020-01046-9, (2020).
    • Modeling Landscape Use for Ungulates: Forgotten Tenets of Ecology, Management, and Inference, Frontiers in Ecology and Evolution, 10.3389/fevo.2020.00211, 8, (2020).
    • Selecting environmental descriptors is critical for modelling the distribution of Antarctic benthic species, Polar Biology, 10.1007/s00300-020-02714-2, (2020).
    • Ecological forecasts reveal limitations of common model selection methods: predicting changes in beaver colony densities, Ecological Applications, 10.1002/eap.2198, 0, 0, (2020).
    • Machine Learning Models of Groundwater Arsenic Spatial Distribution in Bangladesh: Influence of Holocene Sediment Depositional History, Environmental Science & Technology, 10.1021/acs.est.0c03617, (2020).
    • Common mistakes in ecological niche models, International Journal of Geographical Information Science, 10.1080/13658816.2020.1798968, (1-14), (2020).
    • Accuracy of Empirical Satellite Algorithms for Mapping Phytoplankton Diagnostic Pigments in the Open Ocean: A Supervised Learning Perspective, Frontiers in Marine Science, 10.3389/fmars.2020.00599, 7, (2020).
    • Modelling geospatial distributions of the triatomine vectors of Trypanosoma cruzi in Latin America, PLOS Neglected Tropical Diseases, 10.1371/journal.pntd.0008411, 14, 8, (e0008411), (2020).
    • Cross-Validation for Correlated Data, Journal of the American Statistical Association, 10.1080/01621459.2020.1801451, (1-38), (2020).
    • Infrared Spectrometry as a High-Throughput Phenotyping Technology to Predict Complex Traits in Livestock Systems, Frontiers in Genetics, 10.3389/fgene.2020.00923, 11, (2020).
    • A deep learning reconstruction of mass balance series for all glaciers in the French Alps: 1967–2015, Earth System Science Data, 10.5194/essd-12-1973-2020, 12, 3, (1973-1983), (2020).
    • Fusion-Based Hypoxia Estimates: Combining Geostatistical and Mechanistic Models of Dissolved Oxygen Variability, Environmental Science & Technology, 10.1021/acs.est.0c03655, (2020).
    • Validation of presence‐only models for conservation planning and the application to whales in a multiple‐use marine park, Ecological Applications, 10.1002/eap.2214, 0, 0, (2020).
    • Monitoring tropical forest succession at landscape scales despite uncertainty in Landsat time series, Ecological Applications, 10.1002/eap.2208, 0, 0, (2020).
    • Comparison of linear regression, k-nearest neighbour and random forest methods in airborne laser-scanning-based prediction of growing stock, Forestry: An International Journal of Forest Research, 10.1093/forestry/cpaa034, (2020).
    • Mammal population densities at a global scale are higher in human‐modified areas, Ecography, 10.1111/ecog.05126, 0, 0, (2020).
    • Willingness to Pay versus Willingness to Vote: Consumer and Voter Avoidance of Genetically Modified Foods, American Journal of Agricultural Economics, 10.1002/ajae.12001, 102, 2, (505-524), (2019).
    • Providing relevant information when discussing induction of labor with the pregnant woman, Acta Obstetricia et Gynecologica Scandinavica, 10.1111/aogs.13792, 99, 8, (1100-1100), (2019).
    • A field‐validated species distribution model to support management of the critically endangered Poweshiek skipperling () butterfly in Canada, Conservation Science and Practice, 10.1111/csp2.163, 2, 3, (2019).
    • Complementary strengths of spatially‐explicit and multi‐species distribution models, Ecography, 10.1111/ecog.04728, 43, 3, (456-466), (2019).
    • Predicting species abundances in a grassland biodiversity experiment: Trade‐offs between model complexity and generality, Journal of Ecology, 10.1111/1365-2745.13316, 108, 2, (774-787), (2019).
    • Machine learning algorithms to infer trait‐matching and predict species interactions in ecological networks, Methods in Ecology and Evolution, 10.1111/2041-210X.13329, 11, 2, (281-293), (2019).
    • Correlative climatic niche models predict real and virtual species distributions equally well, Ecology, 10.1002/ecy.2912, 101, 1, (2019).
    • Model complexity affects species distribution projections under climate change, Journal of Biogeography, 10.1111/jbi.13734, 47, 1, (130-142), (2019).
    • Ignoring biotic interactions overestimates climate change effects: The potential response of the spotted nutcracker to changes in climate and resource plants, Journal of Biogeography, 10.1111/jbi.13699, 47, 1, (143-154), (2019).
    • Advantages and insights from a hierarchical Bayesian growth and dynamics model based on salmonid electrofishing removal data, Ecological Modelling, 10.1016/j.ecolmodel.2018.10.018, 392, (8-21), (2019).
    • African elephant poaching rates correlate with local poverty, national corruption and global ivory price, Nature Communications, 10.1038/s41467-019-09993-2, 10, 1, (2019).
    • Predicting Landscapes from Environmental Conditions Using Generative Networks, Pattern Recognition, 10.1007/978-3-030-33676-9_14, (203-217), (2019).
    • Habitat and host factors associated with liver fluke (Fasciola hepatica) diagnoses in wild red deer (Cervus elaphus) in the Scottish Highlands, Parasites & Vectors, 10.1186/s13071-019-3782-3, 12, 1, (2019).
    • Application of Machine Learning to Model Wetland Inundation Patterns Across a Large Semiarid Floodplain, Water Resources Research, 10.1029/2019WR024884, 55, 11, (8765-8778), (2019).
    • Resource selection of mule deer in a shrub‐steppe ecosystem: influence of woodland distribution and animal behavior, Ecosphere, 10.1002/ecs2.2811, 10, 11, (2019).
    • Ecohydrology of Interannual Changes in Watershed Storage, Water Resources Research, 10.1029/2019WR025164, 55, 10, (8238-8251), (2019).
    • Range of motion in the avian wing is strongly associated with flight behavior and body mass, Science Advances, 10.1126/sciadv.aaw6670, 5, 10, (eaaw6670), (2019).
    • Recommendations and future directions for supervised machine learning in psychiatry, Translational Psychiatry, 10.1038/s41398-019-0607-2, 9, 1, (2019).
    • See more

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.