Exploring the spatially explicit predictions of the Maximum Entropy Theory of Ecology

Aim The Maximum Entropy Theory of Ecology (METE) is a unified theory of biodiversity that attempts to simultaneously predict patterns of species abundance, size, and spatial structure. The spatial predictions of this theory have repeatedly performed well at predicting diversity patterns across scales. However, the theoretical development and evaluation of METE has focused on predicting patterns that ignore inter-site spatial correlations. As a result the theory has not been evaluated using one of the core components of spatial structure. We develop and test a semi-recursive version of METE’s spatially explicit predictions for the distance decay relationship of community similarity and compare METE’s performance to the classic random placement model of completely random species distributions. This provides a better understanding and stronger test of METE’s spatial community predictions. Location New world tropical and temperate plant communities. Methods We analytically derived and simulated METE’s spatially explicit expectations for the Sorensen index of community similarity. We then compared the distance decay of community similarity of 16 mapped plant communities to METE and the random placement model. Results The version of METE we examined was successful at capturing the general functional form of empirical distance decay relationships, a negative power function relationship between community similarity and distance. However, the semi-recursive approach consistently over-predicted the degree and rate of species turnover and yielded worse predictions than the random placement model. Main conclusions Our results suggest that while METE’s current spatial models accurately predict the spatial scaling of species occupancy, and therefore core ecological patterns like the species-area relationship, its semi-recursive form does not accurately characterize spatially-explicit patterns of correlation. More generally, this suggests that tests of spatial theories based only on the species-area relationship may appear to support the underlying theory despite significant deviations in important aspects of spatial structure.

predicting patterns that ignore inter-site spatial correlations. As a result the theory has not been 23 evaluated using one of the core components of spatial structure. We develop and test a semi-24 recursive version of METE's spatially explicit predictions for the distance decay relationship of 25 community similarity and compare METE's performance to the classic random placement model 26 of completely random species distributions. This provides a better understanding and stronger 27 test of METE's spatial community predictions. 28

Location: 29
New world tropical and temperate plant communities. 30

Methods: 31
We analytically derived and simulated METE's spatially explicit expectations for the Sorensen 32 index of community similarity. We then compared the distance decay of community similarity of 33 16 mapped plant communities to METE and the random placement model. 34

Results: 35
The version of METE we examined was successful at capturing the general functional form of 36 empirical distance decay relationships, a negative power function relationship between 37 community similarity and distance. However, the semi-recursive approach consistently over-38 Introduction 50 Community structure can be characterized using a variety of macroecological relationships 51 such as the species-abundance, body size, and species spatial distributions. Increasingly 52 ecologists have recognized that many of these macroecological patterns are inter-related, and 53 progress has been made toward unifying the predictions of multiple patterns using theoretical 54 models (Storch et al., 2008;McGill, 2010). One approach to predicting suites of macrecological 55 patterns are process-based models such as niche and neutral dispersal models, which have the 56 potential to provide biological insight into the process structuring ecological systems (Adler et 57 al., 2007). Alternatively, a new class of constraint-based models suggest that similar patterns 58 may be produced by different sets of processes because the form of the predicted pattern is due 59 to the existence of statistical constraints rather than directly reflecting detailed biological 60 processes (Frank, 2009(Frank, , 2014 in which individuals are randomly placed on the landscape (Coleman, 1981). Our approach 101 provides a stronger evaluation of the performance of this model and whether it can explain 102 patterns of spatial structure in the absence of detailed biological processes. 103 Methods 104 METE has thus far been used to derive the probability that a random cell on a landscape will 105 be occupied by a given number of individuals (i.e., the intra-specific spatial abundance 106 distribution). Predictions for this distribution have been based either on recursively subdividing 107 an area in half or on predicting species abundances directly at smaller scales (Harte, 2011;108 McGlinn et al., 2013). In addition to the spatial abundance distribution, the DDR requires a 109 prediction for the correlations in abundance among neighboring cells, which has proven difficult 110 to derive for METE (Harte 2011). 111

Developing METE's Spatially Explicit Predictions 112
METE's spatial predictions depend on two conditional probability distributions which are 113 computed using independent applications of MaxEnt: 114 1) the species abundance distribution (SAD), Φ(n | S0, N0), the probability that a species has 115 abundance n in a community with S0 species and N0 individuals, and 116 2) the intra-specific spatial abundance distribution, Π(n | A, n0, A0), the probability that n 117 individuals of a species with n0 total individuals are located in a random quadrat of area A 118 drawn from a total area A0. 119 The METE prediction for Φ is calculated using entropy maximization with constraints on the 120 average number of individuals per species (N0/S0) and the maximum number of individuals N0 121 for a given species, which yields a truncated log-series abundance distribution (Harte et al., 2008;122 Harte, 2011). The spatially implicit Π distribution is solved for using entropy maximization with 123 constraints on the average number of individuals per unit area (n0/A0) and the maximum number 124 of individuals n0 of a given species. Although METE requires information on total metabolic rate 125 to derive its predictions, the exact value that this constraint takes has no influence on Φ and Π 126 (Harte et al., 2009;Harte, 2011). allocations within the parent cell, 2) between-cell distance is defined in reference to an artificial 156 bisection scheme which does not have a one-to-one correspondence with physical distance, and 3) 157 the correlation between cells does not decrease smoothly with physical distance. Alternative 158 approaches have been proposed for deriving the DDR for METE based on computing the single-159 cell Π distribution at two or more scales and then using the scaling of this marginal distribution 160 to infer the probabilities of a given spatial configuration of abundance (Harte 2011). However, 161 these approaches have yet to yield predictions for the DDR.
to compute due to the multiple levels of recursion, ignore patterns of abundance (i.e., are 164 formulated only in terms of presence-absence), and are not exact. An alternative approach to 165 deriving semi-recursive METE predictions for the DDR is to use a spatially-explicit simulation. 166

Spatially Explicit METE Simulation 167
To simulate semi-recursive METE's spatial predictions, the equal probability rule (Eq. 168 B1) that METE assumes when total area is halved is recursively applied starting at the anchor 169 scale A0 and progressively bisecting the area until the finest spatial grain of interest is achieved 170 The DDR is sensitive to the choice of the spatial grain of comparison (Nekola & White, 200 1999); so, we examined the DDR at several spatial grains for each dataset. We examined spatial 201 grains resulting from 3-13 bisections of A0. To ensure that the samples at a given grain were 202 square we only considered odd numbers of bisections when A0 was rectangular and even 203 numbers of bisections when A0 was square. To ensure the best possible comparison between the 204 observed data and METE and to avoid detecting unusual spatial artefacts in the METE predicted 205 patterns we employed the "user rules" of Ostling et al. (2004) such that samples at a specific 206 grain (i.e., level of bisection) were only compared if they were separated by a specific line of 207 bisection (i.e., a given separation order, Fig. 1 and Appendix A, Fig. A3). This approach was 208 taken rather than the standard method of constructing the DDR from all possible pairwise sample 209 comparisons without reference to an imposed bisection scheme. We We checked that our results were consistent with the results provided in previous studies 225 (Harte, 2007, Fig. 6.7 and 6.8, 2011, Fig. 4.1), and that the DDR generated by the community 226 simulator closely agreed with the analytical solution Eq. B5 (Appendix B, Fig. B1). The code to 227 recreate the analysis is provided as Appendix D and at the following publicly available 228 repository: http://dx.doi.org/10.6084/m9.figshare.978918.

Results 230
In general, the semi-recursive METE distance decay relationship (DDR) provided a poor 231 fit to the empirical DDR (Figs. 2 and 3). The average and median community similarity results 232 were highly correlated (r = 0.98) and generated qualitatively similar results (Appendix A Figs. 233 A5 and A8); therefore, we focus on the results based on averaging similarity. While the METE 234 DDRs exhibited the general functional form of the empirical DDRs, an approximately power-law 235 decrease in similarity with distance, they typically had lower intercepts and steeper slopes than 236 the empirical DDRs (Fig. 2, Appendix A, Fig. A4 and A6). Both the empirical and METE 237 predicted DDR were better approximated by power rather than exponential models (Appendix A, 238 Fig. A6). METE converged towards reasonable predictions at fine spatial grains; however, this is 239 to be expected because at these scales similarity in both the observed and predicted patterns must 240 converge to zero due to low individual density (grey points in Fig. 3A,B). This is because when 241 individual density is low the probability of samples sharing species decreases rapidly simply due 242 to chance. The RPM is known to be a poor model for distance decay because it does not exhibit a 243 decrease in similarity with distance. However, it fit the empirical DDR slightly better than 244

METE (Figs. 2 and 3). 245
The METE DDR was not strongly influenced by the choice of using the observed or the 246 METE SAD (Figs. 2 and 3A,B). The METE SAD typically yielded a DDR with a slightly lower 247 intercept with the exception of the four tropical sites where it produced DDRs with slightly 248 higher intercepts. In general, we did not observe strong consistent differences between the 249 habitat types (Fig. 2, Appendix A, Fig. A7). 250 Our formulation of a semi-recursive METE produced SARs that generally agreed (i.e., 251 within the 95% CI) with the recursive and non-recursive formulations of METE (Harte et al. lower richness at fine spatial scales which is consistent with predicting stronger patterns of 254 spatial aggregation compared to the other formulations of METE (Appendix A, Fig. A1 and A2). Our results suggest that semi-recursive METE differs from spatial patterns observed in 294 nature. This deviation could indicate that the emergent statistical approach to modeling spatial 295 structure is incorrect, with specific biological processes such as dispersal limitation or 296 environmental filtering directly controlling spatial correlation (Condit et al., 2002;Gilbert & 297 specific formulation is wrong. For example it could be that the approaches outlined by Harte 300 (2011) that are more sophisticated in how they handle spatial correlations will be more 301 appropriate or that a generalized version of this kind of recursive approach like that developed by 302 Conlisk et al. (2007) in which the degree of aggregation is a tunable parameter will capture the 303 reality of biological systems more precisely. However, process-, and constraint-based models 304 should not necessarily be treated as mutually exclusive. For example other process-based 305 theories make power-law like predictions for the form of the DDR. In fact, it has recently been 306 suggested that at fine spatial scales most theories will make predictions that are approximately 307 power-law in nature (Nekola & McGill, 2014). This means that simply noting power-law like 308 DDR relationships does not provide a strong method for differentiating among theories. In fact, 309 had we simply looked for power-law like behavior we would have concluded that the semi-310 recursive METE was consistent with empirical data. However, one of the properties that makes 311 METE such a strong theory is that it makes specific predictions for precise parameters as well as 312 general forms of empirical relationships. This allows it to be more rigorously compared to data 313 and to other theories that predict different parameters values for a similar general form of the 314 DDR (e.g., neutral theory). It is inherently difficult for theories to predict large numbers of patterns simultaneously, which is 319 why evaluating theory in this way provides stronger tests than evaluating single patterns (McGill, 320 therefore both easier to evaluate and also more broadly useful since they allow a large number of 322 patterns to be predicted from a relatively small amount of information. Because there are many 323 patterns to evaluate it is also more likely that deviations from theory will be identified (White et 324 al., 2012). In some cases these deviations may indicate that the theory is fundamentally unsound, 325 but in others it may suggest modifications to the theory to address the observed deviations 326 suggests that this form of the theory over-predicts the strength of spatial correlation. These 335 results coupled with studies of the species-area relationship suggest that semi-recursive METE 336 accurately predicts the scaling of species occupancy but not spatial correlation. More generally, 337 our results demonstrate that tests of spatial theories that focus solely on the species-area 338 relationship and related patterns are only evaluating part of the spatial pattern, the distribution of 339 occupancy among cells. Evaluating these theories using the DDR in addition to the SAR will 340 help identify cases where the theories are correctly identifying some aspects of spatial structure, 341 but not others, and thus yield stronger tests of the underlying theory. In some cases this will 342 require extending the theory to make additional predictions, but this effort will provide both 343 more testable and more usable theories. light grey line) for each site at a single spatial grain. Community similarity represents the 520 average of the abundance-based Sørensen index for each spatial lag. The spatial grain displayed 521 was taken at either 8 or 9 bisections of the total area depending on whether the total extent was a 522 square or a rectangle respectively. Geographic distance was calculated as the average physical 523 distance between the samples compared at given separation order (see Methods and Fig.1 for  524 additional information). 525 526 Fig 3. The log-log transformed one-to-one plots of the predicted and observed abundance-based 528 Sørensen similarity values for the three models across all distances and spatial grains. The solid 529 line is the one-to-one line. The grey points represent values from spatial grains in which the 530 average individual density was low (i.e., less than 10 individuals) and thus both the observed and 531 predicted similarities must be close to zero simply because of a sampling effect. 532