Generative spatial generalized dissimilarity mixed modelling (spGDMM): An enhanced approach to modelling beta diversity

Turnover, or change in the composition of species over space and time, is one of the primary ways to define beta diversity. Inferring what factors impact beta diversity is not only important for understanding biodiversity processes but also for conservation planning. At present, a popular approach to understanding the drivers of compositional turnover is through generalized dissimilarity modelling (GDM). We argue that the current GDM approach suffers several limitations and provide an alternative modelling approach that remedies these issues. We propose using generative spatial random effects models implemented in a Bayesian framework. We offer hierarchical specifications to yield full regression and spatial predictive inference, both with associated full uncertainties. The approach is illustrated by examining dissimilarity in three datasets: tree survey data from Panama's Barro Colorado Island (BCI), plant occurrence data from southwest Australia and plant abundance surveys from the Greater Cape Floristic Region (GCFR) of South Africa. We select a best model using out‐of‐sample predictive performance. We find that the form of the best model differs across the three datasets, but our models provide performance ranging from comparable to significant improvement over GDMs. Within the GCFR, the spatial random effects play a more important role in the modelling than all the environmental variables. We have proposed a model that provides several improvements to the current GDM framework. This includes advantages such as a flexible spatially varying mean function, spatial random effects that capture dependence unaccounted for by explanatory variables, and spatially heterogeneous variance structure. All these features are offered in a model that can adequately handle a large incidence of total dissimilarity through ‘one‐inflation’, as would be expected from highly biodiverse areas with steep turnover gradients.


| INTRODUC TI ON
Change in the composition of biotic assemblages in space and time is important for revealing the ecological processes that structure and maintain biodiversity spatially across landscapes, detecting biodiversity loss or change in the composition of assemblages, or tracking global biodiversity change (Ferrier et al., 2020;Hoskins et al., 2020).
Such turnover is referred to as beta diversity and has been applied to taxonomic, functional, phylogenetic and spectral measures of diversity (see, e.g.Graham & Fine, 2008;Schweiger & Laliberté, 2022;Whittaker, 1960).Unfortunately, modelling beta diversity presents many unique challenges and, while several methods have been proposed, there are many issues that remain to be overcome.Here we reformulate the generalized dissimilarity modelling (GDM) method of Ferrier (2002) and Ferrier et al. (2007) and propose several significant advances that should aid modelling of beta diversity and the many applications it supports.
While beta diversity can be expressed as a single metric representing variation across multiple samples in a region (Whittaker, 1960), it is more commonly expressed as differences in composition between pairs of samples (Anderson et al., 2011).Modelling beta diversity based on pairwise analyses requires a distance-based statistical framework.Several approaches have been proposed (Anderson et al., 2011), though each has its limitations or flawed assumptions.
Early efforts include the Mantel test of correlation between distance matrices (Legendre, 1993), ordination of assemblages based on their pairwise compositional differences, for example, non-metric multidimensional scaling (NMDS; Prentice, 1977), and linear matrix regression (Manly, 1986), where the level of difference between pairs of assemblages is related linearly to differences in environment or space.
Unfortunately, these properties are often not linear because: (i) Most ecological dissimilarity measures are constrained to range from 0, when assemblages are identical, to saturating at a maximum value of 1, once pairs of assemblages are completely different.In biodiverse regions, datasets may contain a high proportion of 1s, which are not adequately explained by models based on standard distributional assumptions.This is akin to the issue of 0-inflation commonly faced in population and community modelling (Blasco-Moreno et al., 2019).(ii) The rate of change in assemblage composition in relation to environmental variables can vary nonlinearly along a gradient (Fitzpatrick et al., 2013;Oksanen & Tonteri, 1995), which necessitates modelling explicitly in a spatial context (Ferrier et al., 2007).
Responding to issues of nonlinearity in relating beta diversity to environmental gradients, Ferrier and colleagues developed GDM (Ferrier, 2002;Ferrier et al., 2004Ferrier et al., , 2007)).This approach was argued as an extension of generalized linear modelling of pairwise distances and has become increasingly popular for analysing compositional turnover (Mokany et al., 2022).GDM underpins the calculation of several Global Biodiversity Change Indicators proposed by the Group on Earth Observations Biological Observation Network (GEO BON), which are used to support global initiatives like the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES) and the Convention on Biological Diversity's (CBD) Post-2020 Global Biodiversity Framework (Ferrier et al., 2020;Hoskins et al., 2020).Unfortunately, GDM itself retains some flaws and limitations, which could have consequential implications for global biodiversity monitoring and management.Here, we highlight issues with the GDM method and then propose and implement a richer, generative modelling approach that is fully spatial.For us, generative means coherent, that is, a process specification which could have generated the observed data.

| A review and critique of generalized dissimilarity modelling
Generalized dissimilarity modelling treats between-site dissimilarities (beta diversity measures) as the response of interest.In the paper, we also use GDM to represent generalized dissimilarity model.For two sites, between-site dissimilarities are calculated as a function of community composition, considering differences in the presence/absence of taxa (at some taxonomic level) and, for some metrics, differences in their relative abundance.The GDM imagines that dissimilarities are explained by differences between monotonically warped environmental variables; the warping enables differences between environmental variables to relate nonlinearly to dissimilarity.Following Ferrier et al. (2007), GDM describes dissimilarities using a linear combination of differences in monotone transformations of a vector of site-level environmental covariates, the latter denoted by X(s) where s ∈ , the study domain of interest.
In addition, they may employ a monotone transformation of spatial distance to account for additional dissimilarity that is not effectively explained by environmental variables.
Adopting a dissimilarity measure for two sites, Z s, s ′ , the GDM does the following.It proposes to model a transformed/predicted ecological mean distance, on R + , and then links it to the interval of total dissimilarity through 'one-inflation', as would be expected from highly biodiverse areas with steep turnover gradients.where h( ⋅ ) and f p ( ⋅ ) are unique monotone increasing functions for distance and all the environmental covariates respectively.Ferrier et al. (2007) specify h( ⋅ ) and f p ( ⋅ ) using I-spline basis functions with positive coefficients, to ensure that the transformation of the environmental covariate is monotone (Ramsay, 1988).The number and choice of I-spline basis functions are supplied for each of the regressors individually, before the IRLS fitting.
There are several features of this modelling/fitting approach that raise cause for concern.
• First, there is no formal likelihood for the data.Fitting, using the proposed link and variance function, applies an ad hoc optimization criterion.It corresponds to employing a weighted version of a normal likelihood for data that is clearly not normally distributed.
More concerning is that it ignores the evident dependence between the sites, or Z's, for example, between Z s, s ′ and Z s, s ′′ .
• Second, the choice of a binomial variance function seems to be motivated by thinking about Z s i , s j arising as some sort of proportion associated with a sum of independent and identically distributed (i.i.d.) Bernoulli trials.Even if the ecological dissimilarity measure is based on presence/absence data, there are no such trials to imagine such a variance function.When applied to abundance data, for example, per cent plant cover in a sample site, such a variance function is clearly inappropriate.Furthermore, GDM assumes that variance in dissimilarity behaves proportionally to (1 − ), equivalently in terms of the transformed ecological distance, .We explore whether this assumption is appropriate below.We note that Ferrari andCribari-Neto (2004) andFerrier et al. (2007) suggest that other variance functions associated with distributions on 0, 1 could be investigated.This would not remedy the concern regarding the absence of a likelihood.
• Third, bootstrapping uncertainty is tenuous, even if the proposed optimization criterion were suitable, because it assumes independent observations and clearly Z s i , s j and Z s i , s k are not independent.However, such dependence across sites can be envisaged through structured spatial dependence.
• Fourth, the GDM ignores the need for a point mass at 1, that is, the one-inflation problem.In our species-level dataset below, essentially half of the dissimilarities are equal to 1.These pairs cannot be ignored in the model fitting and modelling them to be less than 1 with probability 1 denies the reality of the dissimilarity process realization over  × .One might suggest deleting the perfect dissimilarities.This can certainly introduce sampling bias.Furthermore, apart from losing a potentially large amount of informative data (≈ 50 % in our GCFR family-level data), it is unclear how to do this.If one deletes a Z ij because it is exactly 1, in principle should one delete site i , that is, all Z ij ′ , j ′ ≠ j? Similarly, should one delete site j all other Z i ′ j , i ′ ≠ i?With a high incidence of 1s, if one implemented this over all of the 1s, likely little data would remain.
• Finally, the modelling is not generative and could not have produced the dissimilarities we observe.That is, the implicit likelihood being optimized is inappropriate and is not a likelihood for the data.Given the strong interest in illuminating the complex processes of biodiversity, there is a need for models that extend beyond a patchwork of statistical pieces.We would also note that adoption of such regression modelling necessitates investigation of model adequacy and model comparison, not currently considered with the GDM.

| A proposed remedy: Generative spatial generalized dissimilarity mixed modelling
Given the criticisms listed in the previous section, we propose an alternative hierarchical Bayesian modelling framework to remedy these criticisms.We refer to our proposed model as a generative spatial generalized dissimilarity mixed modelling (spGDMM).In this paper, we also use spGDMM to represent spatial generalized dissimilarity mixed model.The main aspects of our contribution are: (i) for study region, , we offer fully spatial modelling for the dissimilarity Z s, s ′ between s and s ′ with s, s � ∈  × .That is, conceptually, the latter is the space over which pairwise dissimilarity exists.This requires introduction of novel spatial random effects modelling, with attractive interpretation, added to the mean regression term.Furthermore, these random effects enable us to capture the evident probabilistic dependence between, for example, Z s, s ′ and Z s, s ′′ .Additionally, we demonstrate the need for a flexible heterogeneity of variance specification for Z s, s ′ ; (ii) our approach enables spatial interpolation of dissimilarity.In principle, this can be done with the GDM approach.However, prediction incorporating spatial dependence performs better, as we demonstrate through kriging, holding out pairs of sites.In this regard, we offer novel out-of-sample model validation and comparison; (iii) we offer a formal stochastic treatment for the incidence of 1s, employing right truncation to provide a point mass at 1.The point mass is induced through transformation of a latent dissimilarity; and (iv) we do all the above through specification of an explicit hierarchical model.Such modelling, fitted within a Bayesian framework, enables us to obtain full inference and uncertainty.Furthermore, such modelling is generative, and thus equivalently, coherent, that is, it prescribes a probabilistic specification which can generate the dissimilarity data we observe.
Importantly, it can generate perfect dissimilarities.

| Data and dissimilarity
We fit the spGDMM to three datasets.To anchor our analysis, two of these datasets are prominent and accessible examples that are well described in the GDM literature.The first and primary dataset we analyse is from van der Merwe et al. (2008aMerwe et al. ( , 2008b) ) consisting of per cent cover data of all plant species at 413 sites in the Hantam-Tanqua-Rogeveld subregion of South Africa's Greater Cape Floristic Region (GCFR; Figure 1).Given that these data have not been described in the context of GDMs, we describe the ecological aspects of the data in Supporting Information A. Since the level of species turnover is particularly high, we also model the rates of turnover at the family level.The sites contain 288 species, 144 genera and 52 families.For each site s, we observe a vector of per cent cover for each taxonomic group (species or family) Y(s).
For 413 sites, we have = 85,078 pairwise dissimilarities Z s, s ′ , observed in .Specifically, we use Bray-Curtis (BC) dissimilarity on the available per cent cover data.Because per cent cover is a continuous variable, the BC dissimilarity is For Z s, s � = 0, Y j (s) = Y j s � for all j.For Z s, s � = 1, locations s and s ′ must have no species (or families) in common.We note that the Panama and southwest Australia data have a low proportion of locations with complete dissimilarity, while this is quite prevalent in the South African dataset, particularly at the species level.
The computed dissimilarity depends on the taxonomic scale considered.If computing the dissimilarity at species scale, then Z s, s ′ arises as a sum over 288 terms.At family scale, we sum over 52 terms.In this manuscript, we focus our discussion on dissimilarities calculated at family level; however, we also include some results for a species-level analysis and present more detailed results in Supporting Information E.
We plot the family-and species-level dissimilarities against the spatial distance between sites in Figure 2. In addition, we plot mean dissimilarity and the proportion of dissimilarities equal to 1 as a function of distance.At family scale, 4.6% of the observed dissimilarities are exactly 1.This increases to 49% at the species scale, frequently occurring across all spatial distances.In fact, mean dissimilarity is very large even at short distances.An appropriate generative model for data like this must include a point mass at 1.We selected a set of seven explanatory variables for the GCFR data that captured a range of climatic, topographic and edaphic features that could likely drive turnover in taxa.These variables are described in further detail in Supporting Information A. For the Panama and southwest Australia data, we use explanatory variables that were provided with datasets described by Ferrier et al. (2007) and Fitzpatrick et al. (2013) respectively.
The second dataset consists of 39 survey sites of 225 rainforest tree species from Panama's Barro Colorado Island (BCI), initially described by Condit et al. (2002) and presented in the GDM context by Ferrier et al. (2007).We use only 39 of the 50 sites because 11 are not geocoded.The third dataset consists of occurrence data for 856 plant species across 94 locations in southwest Australia.This is the example dataset provided in the 'gdm' R Package (Fitzpatrick et al., 2022a) which is a subset of the data described by Fitzpatrick et al. (2013).

| Model specifications
We model Z s i , s j , the dissimilarity between sites s i and s j , explicitly through Z s i , s j = min 1, e V(s i ,s j ) , where So, we immediately obtain a point mass at 1 for the distribution of Z s i , s j whenever V s i , s j ≥ 0, and Z s i , s j ∈ (0, 1) when V s i , s j < 0 .That is, we adopt a version of the usual Tobit regression setting, introducing V s i , s j as a 'latent dissimilarity' on R 1 .
Whenever the latent dissimilarity becomes large enough (≥ 0) the observed dissimilarity becomes 1.Unlike many inflation approaches, like hurdle model or 0-and one-inflated beta models (Ospina & Ferrari, 2012), the point mass is driven solely by the dissimilarity model rather than a logistic regression model for each So, our modelling effort is to consider specifications for s i , s j , s i , s j and 2 s i , s j .Our primary contributions come through our specifications of s i , s j and 2 s i , s j .In particular, for s i , s j we consider the form analogous to expression (1).The parameter 0 represents the baseline log-dissimilarity.That is, it determines the mean of V s i , s j when s i = s j (see below and Supporting Information B1 for further discussion regarding 0 ).Here, h( ⋅ ) will be an increasing function, arguing that, with  1 > 0, transformed dissimilarity increases with distance between sites.Here, the X K 's are covariates with f k ( ⋅ ) increasing, providing the 'warping' function for covariate X k .
As in Ferrier et al. (2007), we adopt cubic I-spline basis functions to specify h( ⋅ ) and f k ( ⋅ ) (Ramsay, 1988), each with two interior knots at the 33% and 67% of the predictor.However, we use a reparametrization of the I-spline coefficients so that the warping functions h and f k map to 0, 1 for all k.This standardizes the predictors, making all variables comparable through k and 1 .When interpreting our model output, 1 and the k parameters become variable importance parameters.Specifically, we express h , where I j (x) are I-spline bases, Together, these constraints make h( ⋅ ) and f k ( ⋅ ) monotone increasing and bounded between 0, 1 .Although our specification of s i , s j is analogous to that which is often proposed for GDM's (see, e.g.Ferrier et al., 2007), the variable importance parameters add interpretability to the GDM framework.In sufficiently rich datasets, h and/or f k could be spatially varying and estimated by combining approaches in White et al. (2021White et al. ( , 2022)).
The role of s i , s j is twofold.First, as in usual spatial modelling, we are adding a spatial random effect to provide adjustment to the mean, to potentially capture the difference effects of unmeasured variables.That is, as we specify s, s ' , it directly takes the form of a non-negative difference in a latent covariate and enters the expression for V s i , s j as a covariate would.In a different view, it provides pairwise site adjustment to the global intercept 0 .It is clearly different from h s i , s j which is a fixed effect as a function of distance between sites.However, it is not a usual Gaussian process (GP) adjustment in the 'local' sense since it is a function on R 2 × R 2 .In fact, we consider three choices for (and employ model comparison below): , the squared difference between normals, suggesting a chi-square type of process , the absolute difference between normals, suggesting a folded (or half) normal type of process Continuing, the s, s ′ process with two arguments, is more challenging than customary spatial random effects with a single spatial argument.It provides spatial random effects over two spatial locations and is not a function of the distance between the two locations.The interpretation as a difference, squared or absolute, between a Gaussian process realization (s) and s ′ , in terms of adjusting the dissimilarity on the V s, s ′ scale is intuitively clear.As an adjustment, capturing potentially unmeasured or unobserved predictors, it must be positive; it can only increase expected dissimilarity as with the fixed effects in the mean.In different words, since a dissimilarity is a function of two spatial arguments, so must a spatial random effect in the mean (on a transformed scale).As an adjustment, presumably, the closer the spatial locations, the smaller the expected adjustment.However, for a pair of locations at a given distance apart, this adjustment need not be the same as we move around the study region.The Gaussian process and the 'difference' process that it induces enable this.
For the second and third specifications above, s, s ′ , as a function of a GP, is a realization of a stochastic process.This provides the second role for the 's.They create expected dependence between the V's and, hence, the Z's.In fact, using the second and third specifications, we can explicitly calculate the covariance between V s 1 , s 2 and V s 3 , s 4 and, further, between Z s 1 , s 2 and Z s 3 , s 4 (see Supporting No dependence is introduced in the absence of the 's. Furthermore, the V's, hence the Z's are conditionally independent given (4) the 's, enabling us to write the likelihood over the Z's as a product form which is convenient for model fitting.In Supporting Information C.2, we digress technically to briefly consider properties arising under the foregoing specification of the random effect, s, s ′ in the model for Z s, s ′ .We remark that, with s, s � = g (s) − s � , is not a stationary function.
We will see in Section 4 that the choice of s, s ′ depends on the taxonomic scale at which we calculate the BC dissimilarity.
We prefer the squared form at the species level but the absolute form at the family level.Furthermore, s i , s j introduces the difference between a Gaussian process realization at s i and at s j .So, (s, s) = 0 with probability 1.We discuss this point in Supporting Information C.2.
We also consider three choices for 2 s i , s j to capture variance patterns suggested by Supporting Information A.2: • 2 constant, the homogeneous variance case Information A. Thus, we let g( ⋅ ) be unconstrained.
• Finally, we consider log 2 s, s � = g s, s � where now g is a function of s, s ′ .Again, g( ⋅ ) is estimated, in the form of a cubic polynomial.In preliminary modelling, we found that this formulation is only estimated effectively with large datasets, so we only present it for the South Africa data.
Ultimately, we choose the final model specification through cross-validation (Section 3.2).
In summary, our models are more complex than the customary GDM.However, in order to remedy our concerns with the latter, our extension has only added the novel spatial random effects and more flexible variance structure.Our proposed spGDMM can be used for customary regression inference and for prediction, as in Ferrier et al. (2020) and Hoskins et al. (2020).With prediction as a primary target, our model selection criteria are motivated by performance in the data space, that is, in out-of-sample predictive performance.Criteria such as AIC, BIC or WAIC judge performance in parameter space.Ultimately, we choose across a set of model specifications through cross-validation (Section 3.2).We do not employ leave-one-out (loo) validation; it is a 'simple' version of the cross-validation that we do.We learn more about model performance if we hold out a larger set and replicate.
Under these specifications, we can calculate the expected dis-

| Model comparison
To assess the performance of spGDMM, model comparison is accomplished through 10-fold cross-validation on all three datasets.To do the cross-validation, we randomly distribute all sites into 10 groups of nearly equal size, and, in each stage of the cross-validation, we hold out the 10% of sites associated with each partition and all dissimilarities associated with those sites.After fitting the model to dissimilarities associated with the training subset of sites, we predict the hold-out dissimilarities.We repeat this procedure 10 times so that each site in our dataset is held out exactly once.Because each dissimilarity is associated with two sites, every dissimilarity is held out as test data twice.We average predictive performance across the 10 experiments using the criteria discussed below.
To compare models, we primarily use the (continuous) ranked probability score (see Brown, 1974;Matheson & Winkler, 1976, for early discussion), which we abbreviate as CRPS, because it is a strictly proper scoring rule (Gneiting & Raftery, 2007).Although we use the abbreviation CRPS, we emphasize that we are working with a mixture of continuous values and a point mass at 1 (see Tang et al., 2023, for similar discussion).Thus, strictly, this is not a continuous ranked probability score.In practice, however, we estimate the CRPS using the empirical CDF of the predictions, where each prediction is associated with each posterior sample from our MCMC model fitting (Krüger et al., 2021).For ease of comparison with the GCFR data, we divide all CRPS values by the smallest CRPS to give 'relative CRPS', which we abbreviate as rCRPS.In addition to CRPS, we offer root mean squared error (RMSE) and mean absolute error (MAE) as out-of-sample predictive criteria because they allow us to compare our models to those proposed by Ferrier et al. (2007).
For the GCFR data, we also include a summary of how well the model is capturing the exact 1s in the data.Specifically, we calculate the proportion of 1s predicted correctly (PPC) in the data using the posterior median of predictions.If we let Z s, s � be the posterior predictive median for an unobserved Z s, s ' and n 1 be the number of ones in the test dataset, then , for all sites i in the test set, while j sums over all sites.This PPC criterion enables (5) examination of how well the model predicts exact 1s.The standard GDM cannot predict exact 1s, so it always has a PPC of 0.
Using the model comparison approach in Section 3.2, we compare all combinations of s, s ′ and 2 s, s ′ discussed in Section 3 using CRPS, RMSE and MAE.In addition, using the same 10-fold cross-validation approach, we fit the Ferrier et al. ( 2007) GDM model implemented in R (Fitzpatrick et al., 2022a), employing the same number of spline basis functions as we use.However, because the Ferrier et al. (2007) fitting does not provide predictive distributions, we only compute RMSE and MAE.We also include a naive model that predicts all dissimilarities to be the sample mean of dissimilarities in the training set.For the family-and species-level analyses of GCFR, as well as the Panama and Australia data, we present the results of this comparison in Table 1.

| Model comparison results
In comparison with the naive model (scalar mean only), for the family-level analysis of the Greater Cape Floristic Region (GCFR), the Ferrier model improves by about 7% and 8% in terms of RMSE and MAE respectively (Table 1).Under our specifications, all but our model 3 performed better than the Ferrier model in terms of RMSE and MAE.For the family-level dissimilarity, two of our three models without spatial random effects are marginally better in prediction than the Ferrier et al. (2007) model.The models with spatial random effects s, s ′ are much better in prediction than those without spatial random effects.The models using the 2 specification of s, s ′ were between 7 and 10% better than those where s, s � = 0 , depending on the specification of 2 s, s ' .The models using the folded normal specification of s, s ′ were between 11% and 14% better than those where s, s � = 0, depending on the specification of 2 s, s ′ .
For the GCFR data, the Surprisingly, the Ferrier model outperforms all but one of the sp-GDMM models in terms of RMSE and MAE.We speculate that this can be attributed to the presence of very low levels of one-inflation and a relatively weak spatial residual structure in the dissimilarities.

| Variable importance and response curves
Because we constrain the environmental warping functions f k ( ⋅ ) and distance warping function h( ⋅ ) to be monotone increasing from 0 to 1, we can compare the relative importance of each variable through the variable importance parameters: 1 and k .We emphasize that these parameters enter the model through the shape given by f k .We plot the posterior means and 90% credible intervals for the variable importance parameters in Figure 3 for the GCFR, BCI and southwest Australia data.Inferential results of the GCFR data focus on our family-level analysis.
Four of the seven environmental variables in the GCFR data have posterior mean variable importances greater than 0.1: annual precipitation, heat load index, elevation and soil nitrogen (listed in order of importance).The BCI data had two of three variables with importances greater than 0.1 (distance and elevation), while the southwest Australian data had three of four variables with importances greater than 0.1 (winter precipitation, distance and soil phosphorus).
The model suggests that annual precipitation (gmap) is the most important driver of beta diversity in the South Africa data, holding all other variables constant.Annual precipitation appears to be almost twice as important as the second most important environmental variable: heat load index (Theobald et al., 2015).Precipitation was also found to be the most important variable for the southwest Australian data, though the distance and phosphorus variables were also quite close in importance.Holding all other variables constant, both elevation and soil nitrogen are also important drivers of beta diversity in the South Africa data.
To clarify how these environmental variables drive beta diversity in the spGDMM, we plot the estimated k f k ( ⋅ ) for all covariates in  we fix the scale of the y-axis so that the relative importance of these variables is clear.In addition, we add the estimated warping curve Above 0.1% in soil total nitrogen, there is effectively no difference in beta diversity.The Ferrier model had similar dynamics but showed a higher effect from soil nitrogen.We speculate that the differences in results between the spGDMM and the Ferrier mode arise in part from the inclusion of spatial random effects and the use of different link functions.
For the Panama data, the Ferrier model showed weak effects (< 0.1 %) for precipitation on beta diversity while the spGDMM found none.The spGDMM found weak effects of elevation on beta diversity that increased between 200 and 600 m and tapered after 600 m.The Ferrier model found similar results, though at stronger effect levels.Both models found that distance had strong effects for beta diversity with the spGDMM finding a steep effect increasing between 0 and 20 km while the Ferrier model showed a steadier increase across the distance range.
The spGDMM and Ferrier model results often overlapped in the Australia data.The most significant departure between the two models was for the distance variable with the Ferrier model finding a weak, but steadily increasing effect of distance on beta diversity.
The spGDMM found a strong effect that sharply increased between 0 and 150 km.

| Spatial random effects
We visualize the estimated spatial random effect surface in two ways in Figure 5. First, we plot the posterior mean of (s).Second, we plot s, s � = | (s) − s � |, by fixing s and plotting the posterior mean as a function of s ′ .Every point s generates a surface that adjusts the dissimilarities at s ′ .These fixed s sites are located where we would expect to find vegetation belonging to the Succulent Karoo and Fynbos biomes respectively (See Figure 1).The scale of the differences in s, s ′ is large relative to the scale of the scaled warping functions in Figure 4.This suggests that the spatial random effect contributes to the mean dissimilarity more than the environmental covariates, even when considering the additive effect of many environmental variables.As is discussed at length in the spatial confounding literature (see, e.g.Khan & Calder, 2022;Reich et al., 2006), it is likely that the spatial random effects diminish or alter the estimated effect of some

| DISCUSS ION
Understanding and predicting turnover across landscapes is critical to biodiversity studies and conservation planning.GDMs have, perhaps, become a standard tool for many ecologists (Mokany et al., 2022) because of their capacity to handle the nonlinear relationship between compositional dissimilarity and environmental distance as well as their accessibility in software implementation (Fitzpatrick et al., 2022b).While the original GDM framework has advanced our understanding of beta diversity patterns and what drives them, it does not offer a coherent solution that could generate the data observed.In response, we have proposed a generative spatial model that we refer to as the spGDMM.
We believe that the spGDMM approach is a useful step forward for explaining turnover in a rigorous manner.Using a suitable link  We found that our spGDMM yielded consequential predictive improvement over the GDM approach of Ferrier et al. (2007) particularly for the GCFR plant dataset.Our results confirm adequate predictive capacity for other ecosystems when our models were applied to the Barro Colorado Island (BCI) and southwest Australia data (Section 4.1).In terms of inference, the model conclusions were often similar though we note that the differences in results between our models and the Ferrier model arise in part from the inclusion of spatial random effects and the use of different link functions.We highlight that our model is able to predict dissimilarities with a value of 1 (total dissimilarity) while the Ferrier model cannot.We feel that this is a valuable contribution because many biodiversity hotspots, for example, the GCFR, are characterized by their large degree of species turnover (Latimer et al., 2005) which in turn would lead to a high proportion of 1s even within small spatial extents that are well sampled.This will also be the case in broad-scale studies that examine biodiversity across regional to global scales.Furthermore, given the time and resources needed to survey an area in terms of its species, one can imagine a scenario where limited samples within a diverse area can easily inflate the proportions of 1s, creating a need for models that handle their prevalence.
In terms of inferring what drives plant family turnover within the GCFR, we found that mean annual precipitation, heat load index, elevation and soil nitrogen were the most important of the eight variables that we examined.Broadly, these results align with previous efforts to examine turnover in the region.Factors such as climate, topography and edaphic features have been highlighted in driving turnover in taxonomic composition and subsequently turnover in biomes (Mucina & Rutherford, 2006;Power et al., 2017).Out of all the variables examined, we found that mean annual precipitation was the most important in explaining beta diversity.This result is also echoed in the southwest Australia data where the amount of winter rainfall was the most important factor in driving plant species turnover in the Mediterranean portion of the continent, matching the findings of Fitzpatrick et al. (2013).Mean annual precipitation and heat load index had relatively sharp thresholds with their relationship to beta diversity.We hypothesize that the sharp increases of these two variables are likely indications of turnover in biomes, particularly the transition of the arid Succulent Karoo to fire-prone Fynbos biomes.The mean annual precipitation for the Succulent Karoo is around 170 mm with the wettest areas reaching levels of 300 mm (Mucina et al., 2006).
Strikingly, the threshold of 300 mm is also where our results for annual precipitation start to strongly impact beta diversity.The threshold of the heat load index is also likely the point where moisture availability favours Succulent Karoo over Fynbos vegetation either through the absence of fire or greater seed viability for succulent karoo vegetation in drier areas (Rebelo et al., 2006).
The spatial random effects in our models play an important role, contributing more to explain mean dissimilarity than our environmental covariates.This makes a strong case for the need to include spatial random effects in future dissimilarity modelling.Within our data, we believe that the spatial random effects are capturing unmeasured local and regional factors that are likely related to biome turnover.As seen in Figure 5, the two fixed points are roughly located in an area of Succulent Karoo (left) and Fynbos (right).The magnitudes of the random effect are lower where we would expect the same biome of the fixed point and higher where we would expect a different biome.
There are several future applications that can be developed based on the generative spatial models we describe.The incorporation of spatial random effects can produce a number of hypotheses regarding spatial patterns and processes that can subsequently be explored and tested explicitly (Latimer et al., 2006(Latimer et al., , 2009)).Future work could include the investigation of the temporal dynamics of turnover leading to a model that considers Z Y t 1 (s), Y t 2 (s) , that is, the dissimilarity of a site between times t 1 and t 2 .An even richer effort would consider spatio-temporal turnover, Z Y t 1 (s), Y t 2 s � to better understand change in biodiversity in the past and to project future changes.This /generative models, hierarchical models, model comparison and validation, spatial random effects, warping functions 0, 1 through = 1 − e − .It further suggests a scaled variance function proportional to (1 − ).With this link and variance function, iteratively re-weighted least squares (IRLS) fitting is implemented to infer the parameters in the specification of .The proposed form for s, s ′ is � = 0 + h d s, s � + P ∑ p=1 | f p X p (s) − f p X p s � |2 | MATERIAL S AND ME THODS

1
Location of sites in the Greater Cape Floristic Region in South Africa overlaid a map of biome boundaries.Biome boundaries derived from (South African National Biodiversity Institute, 2006).point mass.As a result, our model provides coherence for the clear one-inflation present in observed BC dissimilarities.An alternative link function that would allow both 0-and one-inflation, if needed, is mentioned in Supporting Information C.1.
histogram on the Bray-Curtis dissimilarities plotted against distance between sites calculated at (Left) family scale and (Right) species scale for data within the Greater Cape Floristic Region.The cyan line marks the mean BC dissimilarity over 25 km bins, and the green line plots the proportion of BC dissimilarities equal to 1 over 25 km bins.
with g a random function, giving a heterogeneous variance as a function of distance.In this case, we let g( ⋅ ) be a cubic polynomial in distance.There is an intuitive argument that the variance of the difference should increase as a function of distance (in the spirit of a variogram).However, this intuition is contradicted by the patterns in Figure A.1 of Supporting similarity, E Z s, s ′ and var Z s, s ′ given s, s ′ .For the mean, we obtain We defer this calculation to Supporting Information C.1 where we also show the behaviour of E Z s, s ' | s, s ' .The calculation to obtain cov Z s 1 , s 2 , Z s 3 , s 4 requires the bivariate normal cdf and becomes too messy to be useful.We use weakly informative prior distributions for all model parameters.To fit the model we use Markov chain Monte Carlo (MCMC) methods to sample from the posterior distribution ( | Z) of all model parameters conditioned on all dissimilarities Z.Because the posterior conditional distribution of 2 is an inverse gamma distribution, we carry out a Gibbs update for this parameter.For further details regarding priors, model fitting and prediction estimation see Supporting Information B.1.Code and data are provided at https:// zenodo.org/ recor ds/ 10091442 (White, 2023).

Figure 4 .
Figure 4.The warping functions, f k ( ⋅ ), are plotted without the variable importance scaling in Supporting Information D. In these plots, using the Ferrier model.Our model for the warping is on the scale of the latent log-dissimilarity and the Ferrier model is on the scale of 1 − e − .To allow comparison between the inference provided by our model and the Ferrier GDM model, we multiply all estimated curves from the Ferrier model by a common scaling factor so that the highest point on any curve is equal to the highest point on the posterior mean curves.Holding other covariates constant in the South Africa data, annual average precipitation has almost no estimated impact on beta diversity until the average exceeds 300 mm according to estimates from the spGDMM.Beyond 300 mm, annual precipitation has a very strong impact on beta diversity.This suggests that precipitation averages below 300 mm are effectively similar in terms of beta diversity.The Ferrier model found a similar result, but showed an additional impact on beta diversity around 150 mm of rainfall.The warping of elevation is approximately linear, with some tapering for elevations above 1300 m.The Ferrier model results behaved similarly but had a strong impact for elevation above 1300 m instead of tapering.From the estimated warping for heat loads in Figure4, heat loads less than 170 are effectively the same in terms of beta diversity.When heat loads exceed 170, there are significant effects on beta diversity.The Ferrier model estimated that heat load had an effect at much smaller values, but tapered off around estimates of 190.Uniquely, the beta diversity driven by soil total nitrogen occurs at low values (< 0.1 %) for the spGDMM.

F
fa ll C o n c .E le v a ti o n H e a t L o a d M e a n T e m p S o il C o n d u c ti v it y S o il N it r o g p e ra tu re P re c ip it a ti o n D is ta n c e Variable importance of the environmental predictors.We note that much of the random effect signal we observed in the posterior means likely corresponds with biome transitions from Succulent Karoo to Fynbos biomes along the edge of the escarpment.

F
Product of variable importance parameter ( k or 1 ) and warping function.Posterior mean in black, 90% credible interval in grey, and scaled curves from Ferrier model in red.(Top/Top-Middle) all covariates for the family level analysis of the GCFR data; (Bottom-Middle) precipitation, elevation and distance curves for Panama data; (Bottom) winter precipitation, soil phosphorus, max temperature and distance for Australia data.Within a dataset, all curves are plotted on fixed y-scale.
are available, see Supporting Information C), we have specified a model with a flexible spatially varying mean function as well as spatially varying variances.Perhaps more importantly, we have incorporated novel spatial random effects to capture dependence between dissimilarities as well as to improve out-of-sample prediction of dissimilarities.We have also implemented one-inflation, acknowledging that, at the species level in the Greater Cape Floristic Region (GCFR), essentially half of the observed dissimilarities are 1.