Joint species distribution modeling with competition for space

Joint species distribution models (JSDM) are among the most important statistical tools in community ecology. However, existing JSDMs cannot model mutual exclusion between species. We tackle this deficiency in the context of modeling plant percentage cover data, where mutual exclusion arises from limited growing space and competition for light. We propose a hierarchical JSDM where latent Gaussian variable models describe species' niche preferences and Dirichlet‐Multinomial distribution models the observation process and competition between species. We also propose a decision theoretic model comparison and validation approach to assess the goodness of JSDMs in four different types of predictive tasks. We apply our models and methods to a case study on modeling vegetation cover in a boreal peatland. Our results show that ignoring the interspecific interactions and competition reduces models' predictive performance and leads to biased estimates for total percentage cover. Models' relative predictive performance also depends on the predictive task highlighting that model comparison and assessment should resemble the true predictive task. Our results also demonstrate that the proposed JSDM can be used to simultaneously infer interspecific correlations in niche preference as well as mutual competition for space and through that provide novel insight into ecological research.


Introduction
Community ecology has seen fast and significant development of statistical methods for joint species distribution modeling in recent years (Warton et al., 2015;Ovaskainen and Abrego, 2020;Nordberg et al., 2019;Vanhatalo et al., 2020). Species distribution models (SDMs) are statistical models that describe and predict the variation in the occurrence and abundance of species in space and time (Gelfand et al., 2006). Joint species distribution models (JSDMs) are their extensions to modeling multivariate species communities (Ovaskainen and Abrego, 2020;Vanhatalo et al., 2020). These models are used in a wide variety of applications, ranging from applied use, such as natural resources management and conservation planning (Kallasvuo et al., 2017;Guisan et al., 2013), to scientific inference, such as studying species' responses to environmental filtering (Clark et al., 2016;Kotta et al., 2019) and interspecific relationships (Tikhonov et al., 2017). (J)SDMs are routinely used also for prediction. The most common type of prediction is species distribution mapping where predictions from (J)SDMs are turned into thematic maps showing either the occurrence probability or abundance of species over a region of interest (Gelfand et al., 2006;Elith and Leathwick, 2009;Mäkinen and Vanhatalo, 2018). Other common predictive applications of (J)SDMs are biomass estimation over spatiotemporal domains (Kallasvuo et al., 2017) and predictions concerning biodiversity at unobserved locations (Clark et al., 2016).
The state-of-the-art JSDMs are built using hierarchical multivariate generalized linear models appended with Gaussian latent factors (Pollock et al., 2014;Warton et al., 2015;Nordberg et al., 2019;Ovaskainen and Abrego, 2020). The underlying assumption in these models is that species observations are linked to a latent Gaussian model, which includes a description for species' environmental niche through a linear model of environmental covariates, and a multivariate residual process modeled through latent factors. Modifications and extensions to this general structure include, for example, models where species are clustered together into archetypes for which the environmental responses are shared Hui et al., 2013;Johnson and Sinclair, 2017;Sollmann et al., 2021), models that cluster species' according to similar dependence pattern in their latent factors (Taylor-Rodríguez et al., 2017;Shirota et al., 2019), and models where the covariate effects are described by non-parametric models (Vanhatalo et al., 2020). Even though this general approach allows for flexible modeling of species dependencies through interspecific correlations in both covariate effects and random factors, interpreting their results is challenging. The environmental covariate effects and latent factors are commonly claimed to separate the environmental filtering from biotic interactions. However, as demonstrated by Poggiato et al. (2021), latent factor part of an JSDM is confounded with the environmental covariate effects so that current JSDMs cannot really achieve this. Similarly, another common claim that interspecific correlations in the latent factor part of the model can give hints on species interactions (see, e.g., Pollock et al., 2014;Ovaskainen et al., 2016;Wilkinson et al., 2019) has been heavily criticized (Clark et al., 2014;Blanchet et al., 2020). For example, positive interspecific correlations can arise from both competition and mutualism which arise from very different biological processes (Poggiato et al., 2021).
A specific example of species-to-species interaction that cannot be accounted for by current JSDMs, and which we tackle in this work, is exclusive competition for space. This is a process that commonly arises especially among plant species who compete for growing space and light. As a motivating example for our work, we consider inference and prediction for vegetation data that are measured through percentage cover. By definition, percentage cover is the proportion of area occupied by a species. Since a given space can be occupied only by one specimen at a time, increasing percentage cover of one species necessarily decreases the space available for other species that grow on the same vegetation layer and, hence, compete for the same common space. Percentage cover data, thus, reveal inherent negative dependence among species occupying a common vegetation layer whereas similar, direct, negative effects are not anticipated between species occupying different layers.
We build a hierarchical JSDM with an explicit description for interspecific exclusion. We use latent Gaussian variable models and multivariate Gaussian processes (Gelfand et al., 2004;Vanhatalo et al., 2020) to describe species specific niche preference and relative competitive performance over the study area. However, instead of modeling each species as conditionally independent given the Gaussian latent variable, as done in contemporary JS-DMs (see, e.g., Warton et al., 2015;Ovaskainen and Abrego, 2020), we model the observed percentage covers of mutually exclusive species with Dirichlet-Multinomial model. Dirichlet distributions model interspecific competition for space -the exclusion effect arising from the fact that only one specimen can be in one location at a time. Multinomial distributions model the uncertainty in the observation process related to measuring the percentage covers over inventory plots. Dirichlet distributions have been used in the context of JS-DMs earlier, e.g., by Taylor-Rodríguez et al. (2017) and Shirota et al. (2019) to cluster species according to their responses to latent factors and by Johnson and Sinclair (2017) and Sollmann et al. (2021) to cluster species according to their responses to environmental covariates. Our model differs from these earlier works since we do not use Dirichlet process to cluster species but we use Dirichlet distributions to describe a conditional exclusion process given the latent factors and environmental effects. A Dirichlet-Multinomial model similar to our model has earlier been used to model vegetation cover data in ordination settings by Damgaard et al. (2020). However, our model is the first one combining it with formal joint species distribution modeling framework.
We test and demonstrate the properties of our model with a simulation study after which we apply them to a case study on plant community data. In the case study, we make predictive percentage cover maps and estimate the total vegetation cover of moss and vascular plant species over a study site around an eddy covariance tower in a boreal peatland in southern Finland ( Figure 1). Eddy covariance tower is a measurement device to monitor vertical fluxes between biosphere and atmosphere (Korrensalo et al., 2019). They collect data on, e.g., GHG exchange rates between the biosphere and atmosphere over natural and artificial ecosystems. Our estimates on plant percentage cover will subsequently be used to calibrate the carbon gas flux estimates made by the eddy covariance tower. Out of the modeled plant species, mosses grow in one layer and therefore compete for space against others alike whereas vascular plants can grow in multiple layers and, thus, do not express exclusive interspecific competition for space.
We compare our models to the contemporariry JSDMs, and traditional stacked SDMs. The latter model each species independently and then combine their predictions into a community prediction. Since we are interested in model performance in several different types of predictive tasks, we propose to use rigorous decision theoretic approach to formulate model comparison and validation method for each of these tasks. We use the log score to measure model's predictive performance and uniform Q-Q plots of the probability integral transform (PIT) to assess their probabilistic calibration. We use then structured cross-validation (CV) to estimate models expected predictive performance in a given prediction scenario. That is, we divide the data into training and test data-blocks such that they structurally resemble the properties of the training data and predicted outcomes in the true predictive task.
The rest of the paper is organized as follows. In Section 2, we introduce the motivating case study and, in Section 3, we introduce the models used in this work. We then propose the model comparison and validation methods in Section 4 and the simulation experiments and the case study analyses in Section 5. We present the results in Section 6 and end by discussion in Section 7.

Motivating case study: plant community modeling
Our case study site is located at a boreal poor fen, which is part of Siikaneva peatland complex in Southern Finland ( Figure 1). The study site has an eddy covariance tower, which monitors the CO 2 and CH 4 fluxes within a 200-meter radius footprint around the tower that extends from the margin of the peatland towards the center. The carbon gas exchange between an ecosystem and atmosphere is primarily determined by the amount of photosynthetically active biomass and the area of green leaves -even though abiotic factors, such as the light and water availability and temperature, play a role as well (e.g. Peichl et al., 2018). Therefore, knowledge about the abundance and structure of the vegetation is an essential input for modeling the ecosystem climatic impact (e.g. Korrensalo et al., 2019) and for ecosystem models in general. In this context, peatland ecosystems are of special interest because they store approximately one third of global terrestrial carbon in their peat layer (Gorham, 1991) and are the largest natural source of atmospheric methane (e.g. Heilig, 1994). Different peatland species have distinct photosynthetic capacities and differ by the substrate quality they provide for decomposition processes. Further, the abundance of different vascular plant species is an important control for ecosystem-scale methane efflux, as certain species act as conduits of methane from the peat to the atmosphere (Bhullar et al., 2013), bypassing the microbial methane oxidation in the oxidation peat layers (Larmola et al., 2010). Plant species groups also differ by their emissions of biogenic organic compounds, that have a net cooling effect on climate (Tiiva et al., 2009;Faubert et al., 2011).
Typical to aapa mires, the margin of the study site is poorer in nutrients and slightly drier than the wetter center. The bottom layer of the site is formed by mosses (mainly Sphagnum L. species) and the field layer consists of vascular plants adapted to the conditions where water table prevails close to the surface. Both of these vegetation layers have a significant role in the greenhouse gas cycling of the site. For this work, we selected the six most common Sphagnum (S.) mosses: Sphagnum papillosum, S. balticum, S. fallax, S. magellanicum, S. majus and S. angustifolium and the following eight ecologically interesting vascular plants: Carex lasiocarpa, Carex limosa, Carex rostrata, Empetrum nigrum, Eriophorum vaginatum, Pinus sylvestris, Rubus chamaemorus and Scheuchzeria palustris. The Sphagnum species grow in the same vegetation layer and are mutually exclusive so that only one species can occupy a spatial location at any one time (Gong et al., 2020). The vascular plants, however, can grow in layers that overlap with each other and the Sphagnum layer so that they are not mutually exclusive to others. For species names we followed The Plant List (2013).
To quantify the spatial variation of the vegetation within the eddy covariance tower footprint, a vegetation inventory was done in the summer of 2017. The grid sampling design with 328 inventory plots was applied within 200 m distance from the eddy covariance tower (Figure 1). Locations that were in the drainage ditch and mineral soil in the border of the region were excluded from the grid and thus the formed grid is not symmetric.
For every inventory plot, the percentage cover of each species was estimated within a circular frame of 0.071 m 2 (radius 15 cm). The percentage cover was estimated as a total horizontal projection on the ground and reported in 0.25 percentage unit accuracy at cover below 1% and in 1 percentage unit accuracy at cover above it. As is typical, the percentage cover observations are expert assessments based on visual inspection of the inventory plots. To aid the visual assessment researchers divide the inventory plot (either mentally or using a mesh) into a homogeneous lattice. A reported percentage cover then corresponds to the proportion of lattice nodes occupied by a species. Hence, we denote by y i = (y i1 , . . . , y iJ ) the number of these lattice nodes occupied by all species at the ith inventory plot so that the estimated percentage covers are given as where N i,j corresponds to the number of lattice nodes used to estimate that species. The lattice size is, thus, related to the accuracy of the estimation. For example, a one percentage unit accuracy corresponds to N i,j = 100 whereas 0.25 percentage unit accuracy corresponds to N i,j = 400. This leads naturally to Binomial marginal distribution for y ij conditionally to the true percentage cover in an inventory plot. However, the estimation for percentage covers of sphagnum species were synchronized so that their total did not exceed 100% leading to Multinomial joint distribution for the numbers of occupied lattice nodes by all sphagnum species conditionally to their true percentage covers (see Section 3.1). To minimize human-related errors, such as misclassified or omitted species (Kennedy and Addison, 1987), each plot was inventoried by two persons.

Observation model
We model the species specific cover jointly using hierarchical Bayesian approach where each hierarchical layer represents a part of the modeled process ( Figure 2). We denote by D ⊂ R 2 the bounded study region of interest and by a set S = {s 1 , s 2 , ..., s n } the locations (centroids) of the inventory plots such that s i ∈ D ∀i ∈ {1, ..., n}, where n is the total number of plots (see Figure 1). The set of modeled species is denoted by E = {e 1 , ..., e J }, where e j is the identifier of the j'th species and J is the total number of species. We then allocate species into species groups such that each species group is formed by mutually exclusive species; that is, by species that cannot grow over each other. These species groups form nonempty disjoint sets E g which are defined such that E = ∪ p g=1 E g where p is the total number of species groups ( Figure 2). Triangles show the locations of the main plots having additional plots and squares present the locations of the main plots without additional plots.  Motivated by the data collection process (Section 2), we assume that, at each inventory plot, a researcher has counted the number of mesh nodes occupied by species and collected them into vector y i . We denote by a vector φ i = (φ i1 , φ i2 , ..., φ iJ ) the true species specific percentage covers in the inventory plot; that is, φ ij is the percentage cover of species j at plot i. We further group the true percentage covers and node counts into vectors formed by mutually exclusive species groups: φ i,g = (φ ij ) j∈Eg and y i,g = (y ij ) j∈Eg , g = 1, . . . , p. Now, conditional on φ i the species specific node counts at location s i can be modeled as where y i,g 0 = N i,g − j∈Eg y ij and φ i,g 0 = 1 − j∈Eg φ ij correspond to the number of mesh nodes and the proportion of the inventory plot where none of the species from species group E g is present, and N i = [N i1 , . . . , N ip ]. Hence, our observation model differs importantly from the contemporary (J)SDMs where the observations y i are conditionally independent given the underlying probability of presence parameter (see e.g., Gelfand et al., 2006;Taylor-Rodríguez et al., 2017;Ovaskainen and Abrego, 2020). For our data, this contemporary approach would mean that each species forms its own group with Binomial observation model so that π (y i |N i , φ i ) = J j=1 Bin(y i,j |φ ij , N i,j ). Our observation model (1) is similar to the observation model for pin-point vegetation cover data used by Damgaard et al. (2020) in their model-based ordination study. However, they modeled all species counts with single Multinomial model whereas, by species grouping, we explicitly account for the fact that some species can grow on top of others, which leads to conditional independence in species counts between different species groups given φ i . Parameters N ig , g = 1, . . . , p are formally sample sizes of a Multinomial distribution but they also corresponds to the percentage cover measurement accuracy. The smaller N ig is, the more uncertainty is assumed in the expert assessments. In the limit as N ig = 1, the resulting observation model is a categorical distribution. In the other limit, as N ig → ∞ we would have E[y ij /N ig ] → φ ij and Var[y ij /N ig ] → 0 corresponding to exact percentage cover observations.

Model for percentage covers
The assumption that species compete for space only among species in the same species group leads to (conditional) independence between percentage covers in different groups, so that we can decompose the percentage cover model as (2) We then model the group-wise percentage covers with a Dirichlet distribution where γ g is a scale parameter and α(f i,g ) = α i,g = (α ij ) j∈Eg , 1 − j∈Eg α ij is a vector of expected percentage covers defined through the soft-max function where f i,g = (f ij ) j∈Eg is a vector of species specific latent variables. In single species groups, the Dirichlet distribution reduces to Beta distribution and the softmax function reduces to the inverse logit-link function. The choice of Dirichlet distribution is justified since it is a natural prior for proportions. The stick breaking generative model of Dirichlet distribution has also an intuitive ecological interpretation as exclusive competition for space. The expected percentage cover, α ij , is a summary of a species' relative strength, among others in its group, to occupy a location. The bigger α ij , the more likely it is that a species "breaks" a large proportion of an inventory plot for itself. The scale parameter γ g governs then the level of randomness in this breaking process so that the randomness decreases with increasing γ g . Moreover, since the expected percentage cover depends on latent variables f i,g , we can use covariates to explain species relative competition strengths (see Section 3.3). Hence, a natural measure for exclusive competition for space between two species is the correlation between their percentage covers, which conditionally on latent factors is: This correlation is the strongest when α ij = α ij = 1/2 and decreases towards zero when either or both of the α terms decrease to zero. Moreover, as α ij varies in space the correlation varies in space as well and, hence, the model can capture spatial variation in the strength of space competition. Note though, that we use the term competition for space in a rather abstract manner since it does not contain explicit mechanistic description for possible biological processes underlying it.

Gaussian latent variable models
We model the species specific latent processes with latent Gaussian variable models where β j is a vector of covariate effects (including an intercept) for species j, x i is a vector of (environmental) covariates and ij is a Gaussian random effect for species j at a location s i . This is a typical Gaussian latent variable formulation for JSDMs where environmental covariates are used to explain the environmental niche of a species and random effects are used to describe the residual variation not explainable by the environmental covariates (Warton et al., 2015;Ovaskainen and Abrego, 2020;Vanhatalo et al., 2020). In single species SDMs, both the covariate effects, β j , and random effects, ij , are given mutually independent (Gaussian) priors across species reflecting an implicit assumption that the processes behind species' niches are mutually independent. However, this is an unrealistic assumption in practice since species typically show positive and negative associations in their niche preferences (Ovaskainen and Abrego, 2020). For this reason, JSDMs model these associations with hierarchical priors that contain interspecific correlations between the covariate effects and random effects, which can significantly improve the predictive accuracy of species distribution models (Nordberg et al., 2019). In this work, we follow Vanhatalo et al. (2020) in building JSDMs by giving a joint multivariate Gaussian priors for covariate effects so that the prior for species specific effects of d'th covariate is The priors for the random effects are introduced next.

Stationary spatial Gaussian processes
We model the spatial latent processes ij , either as mutually independent or jointly dependent among species. In the former approach, a species specific latent function is given a univariate Gaussian process prior where k (e) j (s i , s i ) is the species specific spatial covariance function. The superscript (e) stands for stationary exponential covariance function with a species specific length scale parameter, l j , governing how fast the spatial correlation decays, and a variance parameter σ 2 j governing the magnitude of the variation. For JSDMs, we extend the baseline model by allowing for interspecific correlations between the spatial latent processes. We model dependencies through linear model of coregionalization (LMC; Gelfand et al., 2004;Banerjee et al., 2015) such that spatial latent processes are expressed as a linear combination of J zero mean univariate Gaussian processes having the covariance functions k (e) j (s i , s i ). We collect all spatial random effects into vector (S) = ( 1 (S) , . . . , J (S) ) , where j (S) denotes the vector having latent variables for species j at spatial locations S. The LMC model induces a multivariate Gaussian prior (Gelfand et al., 2004;Vanhatalo et al., 2020) where ⊗ denotes Kronecker product of the covariance matrices that are constructed with j (s i , s i ; l j , σ 2 j = 1) and the matrix B j = L j L j . Matrix L j is the j th column of the Cholesky decomposition of the coregionalization covariance matrix Σ = J j=1 L j L j , which models the interspecific dependencies between species niche preferences. The variance parameter σ 2 j of k (e) j (s i , s i ) is set to 1 to ensure identifiability. The model (9) is the most flexible version of the LMC models where each species has its own process characteristics encoded by k j . However, we can reduce the flexibility by reducing the number of unique covariance functions in the model. For example, if all species share the same covariance function k . . J, we obtain LMC(1) model, which is also called intrinsic model for coregionalization (Gelfand et al., 2004). In this model the species specific latent processes are (correlated) random draws from the same underlying Gaussian process whereas in (9) the latent processes are (correlated) random draws from J different Gaussian processes. We tested also models in between these two extremes and denote by LMC(k) a model where we have k distinct covariance functions so that k k (s i , s i ), ∀j ≥ k when k is less than J. Note that the LMC(k) models always have J unique spatial latent processes and the coregionalization matrix Σ is positive definite in all these models.
Our approach for modeling the multivariate spatial random effect is reasonable with our current application that has only moderate number of species. If extended for larger number of species, it would, however, run into trouble through rapid increase of the coregionalization covariance matrix Σ . Hence, with more species it would be reasonable to replace the LMC model with a latent factor model where we would have only p < J spatial Gaussian processes so that the interspecific covariance matrix would be of low rank (see, e.g., Taylor-Rodríguez et al., 2017;Ovaskainen and Abrego, 2020).

Non-stationary spatial Gaussian processes
Stationary Gaussian processes work typically well if the region of interest is sampled relatively sparsely, so that data does not allow inference for non-stationarity in spatial correlation, or if environmental covariates are accurate enough to describe potential nonstationarities in the latent process f (s i ) (Schmidt and Rodríguez, 2011). However, we do not have environmental covariates for the case study area, the case study data is sampled with high resolution (see Section 5.2), and the properties affecting peatland vegetation can vary considerably within the study region. Hence, we consider also non-stationary spatial random effects. We extended the stationary spatial random effect model by setting non-stationary covariance function for Gaussian process. We include non-stationarity into the model by changing stationary covariance function in equation (8) to a non-stationary Matérn (ν = 3/2) covariance function with spatially varying length scale parameter (e.g. Paciorek and Schervish, 2006) is the Mahalanobis spatial distance between locations s i and s i for species j and Σ ij = l 2 ij 1 0 0 1 so that l ij varies spatially. The superscript m in the above equation stands for Matern covariance. We model the spatially varying length scale parameter by giving a Gaussian process prior for its logarithm where the mean function µ l j specifies the expected value of log(l ij ) and k (e) j (s i , s i ) is the stationary exponential covariance function (8). Gaussian process prior gives smoothly varying length scale and modeling logarithm of the length scale l ij ensures positivity. Since the matrix Σ ij is diagonal and all its diagonal elements are the same, the spatial correlation is isotropic at each location but the strength of the correlation decay varies within the area. The Non-stationary covariance function in equation (10) reduces to stationary Matérn (ν = 3/2) covariance function if the length scale parameter, l ij , is set to be the same at each location. Non-stationary multivariate spatial random effects have earlier been used in the context of species distribution modeling by Schmidt and Rodríguez (2011). They applied also LMC framework but their non-stationary covariance kernels were constructed differently. We applied the non-stationary GP to models with and without interspecific dependence between the spatial latent processes. In the former case, each latent process was given an independent non-stationary GP prior. In the latter case, the latent processes were modeled jointly with LMC (equation (9)

Hyperpriors
The model specification is completed by assigning prior distributions to the model hyperparameters. In the stationary GP models, we gave weakly informative half-inverse-Student-t prior for the length scale parameters, 1/l j ∼ Student − t + (µ, s 2 , ν), which gives a priori more weight for the larger length scales. The parameter values of the prior were chosen according to the size of the study area (see Section 5.2). The location and scale parameters of the half-inverse-Student-t prior for the length scale parameters of the stationary GP models were selected such that the length scale is less than 400 meters with probability 0.99. This gives 1/l j ∼ Student − t + (0, 0.19 2 , 5) which corresponds to preferring smooth over small scale variability in vegetation composition. In the non-stationary GP models, the prior for the mean of the log length-scale was µ l j ∼ N (4.5, √ 2 2 ), which favors relatively large l ij values (> 100 m). For the length scale and variance parameters of the GP prior for log l ij (11) we gave weakly informative priors, 1/l j ∼ Student − t + (0, 2 2 , 4) and 1, 4). The location and scale parameters in these priors were selected such that l j is less than 38 meters with probability 0.99.
We gave a Gamma prior, γ g ∼ Gamma(3/2, 2/3), for the process parameters governing the level of randomness. It is a priori likely that randomness in the vegetation composition is high for which reason we assigned scale and rate parameters of the Gamma distributions so that they give more weight for the smaller values of the process parameters. In order to model coregionalization matrix Σ efficiently we use a separation strategy (e.g. Barnard et al., 2000) where coregionalization matrix is decomposed into correlation matrix Ω and vector of standard deviations ω such that Σ = diag(ω)Ωdiag(ω). The correlation matrix Ω was given LKJ-prior with unit shape (Lewandowski et al., 2009) that defines a prior distribution which is marginally uniform over all correlation parameters (Vanhatalo et al., 2020). We gave half-Student-t prior for the standard deviations, ω ∼ Student − t + (0, 4 2 , 4). In the simulation study, we assigned environmental covariate effects β a multivariate normal prior β ∼ N (0, Σ β ) where the covariance matrix was decomposed into correlation matrix Ω β and vector of standard deviations ω β such that Σ β = diag(ω β )Ω β diag(ω β ). The correlation matrix Ω β was given LKJ-prior with unit shape and for the standard deviations we gave half-Student-t prior, ω β ∼ Student − t + (0, 4 2 , 4). For each intercept term β 0,j we assigned weakly informative Student-t priors β 0,j ∼ Student − t(0, 2.5 2 , 4) in the simulation and case studies. In principle, we could also give an informative prior for N ig to account for the fact that the estimate for experts' accuracy is not exact. However, we do not consider this option here.

Posterior inference
All the alternative species distribution models were implemented using Stan via Rstan (Stan Development Team, 2018) with which we conducted posterior sampling for all the model parameters and latent variables. We ran four parallel Markov chains of 2000 iterations such that first 1000 iterations of each chain were discarded as warmup. Posterior sampling was done using dynamic Hamiltonian Monte Carlo Sampler as coded in Stan version 2.18.1. The convergence and effective sample sizes were checked using trace-and autocorrelation plots and through potential scale reduction factor and Geyer's initial monotone sequence criterion. After conducting the posterior sampling, we drew posterior predictive samples of  (16) and (17). Total percentage cover maps An average of location-wise log joint over species predictive densities. Equation (13).
PIT histogram of location-wise total over species predictive distributions. Equations (16) and (18). Species specific total vegetation cover over the study area CV 3 : An average of speciesand CV-fold-wise log joint over locations (within a CV-fold) predictive densities. Equation (14). P IT 3 : Species specific PIT histograms of total over locations (within a CV-fold) predictive distributions. Equation (16) and CDF is estimated analogously to (18). Total vegetation cover over the study area CV 4 : An average of CVfold-wise log joint over species and locations (within a CV-fold) predictive densities. Equation (15). P IT 4 : PIT histogram of CVfold-wise total over species and locations (within a CVfold) predictive distributions. Equation (16) and CDF is estimated analogously to (18). the species specific percentage covers at prediction locations covering the study area (see, e.g., Vanhatalo et al., 2020).

Model comparison and validation
The central aims of our study are to provide posterior predictive maps for the vegetation cover within the study area, and to provide posterior predictive distributions for the total vegetation cover over the study area. Hence, we compare alternative models with the goodness of their out-of-sample posterior predictive distributions using cross-validation (CV) log posterior predictive density diagnostics (Vehtari and Ojanen, 2012). Model comparison indicates the best model among the alternatives but does not tell whether any of the models has actually good fit for the purpose. Hence, after choosing the best model we assess the goodness of its predictive distributions using probability integral transform (PIT) statistics (Gneiting et al., 2007). All the model comparison and model validation methods are summarized in Table 1 and we explain them in detail next.

Predictive model comparison with cross-validation
Species specific percentage cover maps (so called, species distribution maps) are produced by forming a lattice mesh over the study area and calculating point-wise posterior predictive distributions for all the mesh cells. Summary statistics (mean, variance, quantiles, etc.) of these predictive distributions can then be drawn as maps. On the other hand, when predicting the total vegetation cover, we need to calculate the joint predictive distribution over all the mesh cells. Moreover, we want to compare models' predictive performance both by species and for the total cover by all species. Hence, we constructed own CV splitting strategy for each of these tasks.
In order to compare models in terms of producing species-specific percentage cover maps, we divided the observed dataset randomly into K distinct subsets, indexed by sets of locations S 1 , ..., S K such that the full data set is given by ∪ K k=1 S k = S. The single species, point-wise, predictive performance was then where π(y ij |Y(S \k(i) ), S \k(i) , s i ) is the CV posterior predictive density for y ij . The set k(i) is the CV set that contains location s i , and Y(S \k(i) ) includes all other species observations except the observations in the CV set k(i). The comparison of models in the task of producing total vegetation cover maps was done analogously so that we calculated the average log (spatially) point-wise joint over species posterior predictive density. This leads to K-fold CV criterion where π(y i |Y(S \k(i) ), S \k(i) , s i ) is the joint posterior predictive probability mass function of all the species at location i. We compared models in predicting per species total percentage cover over an area by first calculating for each CV-fold and species the log joint posterior predictive density of observations in that CV-fold and then taking the average over the CV-folds and species. That is, where Y j (S k ) collects all the observations of the jth species at locations S k . Similarly, to evaluate models' performance in predicting the total percentage cover over all species and an area we calculated the average of CV-fold-wise log joint predictive densities with where Y(S k ) collects observations of all species at locations S k . Note that in all above CV criterion we have calculated the average over individual log predictive densities. Hence, the statistics are in comparable scale.
We used K = 10 in all CV metrics. Moreover, we estimated the posterior predictive densities using Monte Carlo over Markov chain samples from the posterior distribution. For example, in CV 1 the posterior predictive density is approximated with π(y ij |Y(S \k(i) ), S \k(i) , and γ (m) denote the m th posterior sample from the joint posterior (predictive) distribution of the latent variable and Dirichlet model parameters respectively; that is f In CV 2 the posterior predictive density is approximated with π(y i |Y(S \k(i) ), S \k(i) , (m) ) and analogously in CV 3 and CV 4 . After obtaining the posterior samples for the model parameters and latent variables at data locations, the posterior predictive samples can be constructed in Gibbs style using the full Gaussian conditional distribution of the latent variables (see Vanhatalo et al., 2020, Section 4.1). If the samples of posterior predictive densities, such as π(y ij |f (m) ij , γ (m) ), have large variance, the Monte Carlo approximations for log predictive densities may become unreliable. This is likely, especially in case of CV 2 -CV 4 where we estimate multivariate densities. We used bootstrapping with 1000 replicates to estimate the uncertainty in the CV criterion induced by the Monte Carlo approximation for the log predictive densities. Each bootstrap replicate of the CV criterion was based on a random sample of size M with replacement from the posterior sample of model parameters and latent variables.

Model validation with PIT
A key property of a predictive distribution is its calibration (Gneiting et al., 2007) which we evaluated using randomized probability integral transform (PIT Denuit and Lambert, 2005). Uniform Q-Q plots of the randomized PIT is a graphical method to evaluate whether data, y ij , can be considered as a random sample from the discrete predictive distribution that is given by the fitted model or not. To construct Q-Q plots of the randomized PIT for species-and location-wise predictive distributions we define where v ij is a draw from standard uniform distribution, F (y ij ) is the posterior predictive cumulative distribution function (CDF) for observation y ij , and r min is the minimum gap between any possible adjacent values of y. We set F (y ij − r min ) = 0 if y ij − r min < 0 since percentage cover cannot be negative. Calibration is evaluated graphically by plotting uniform Q-Q plot with point-wise 95% confidence interval of the values u ij since they follow standard uniform distribution if F (y ij ) corresponds to the true data generating process (Gneiting et al., 2007). Q-Q plots of the PIT were drawn for CV predictive distributions using the same splitting strategy as in the CV tests. Hence, P IT 1 corresponds to species-and location-wise predictions for y ij , P IT 2 to location-wise predictions for the sum over species,ȳ i = j y ij , P IT 3 to species-wise predictions for the sum over locations,ȳ j = i y ij , and P IT 4 to predictions for the sum over species and locations,ȳ = ji y ij (Table 1). We used Monte Carlo approximation for the posterior predictive CDFs. For P IT 1 we approximated the posterior predictive CDF directly as In P IT 2 -P IT 4 we first constructed Monte Carlo estimator for the posterior predictive densities for the sums over species and/or locations conditional on latent variables and parameters. These were calculated for each posterior sample so that π(ȳ i |f , γ (m) ). After this we calculated CDF as The CDFs for P IT 3 and P IT 4 were estimated analogously.

Simulation study
We conducted a simulation study to demonstrate and test the applicability of the proposed model. To mimic the design of our case study, the simulated data included 200 plots that were selected at random from the inventory plots of the case study. We constructed a regular mesh, D M , over the study area, D, which was used for posterior predictive checks. We included four species competing for space (i.e., mutually exclusive species) and four species that do not compete for space with other species. First, we simulated two environmental covariates, x and z, in the inventory plots and in the mesh nodes. We drew covariate x as a random realization from a non-stationary Gaussian process (10) where logarithm of a spatially varying length scale was given a Gaussian process prior (11) with mean function µ = 4.5 and exponential covariance function with length scale l = 80 and variance σ 2 = 1. The second covariate, z, was sampled from a stationary Gaussian process (7) having exponential covariance function with length scale l = 80 and variance σ 2 = 1. We formed a latent Gaussian variable for all species following f j (s i ) = β 0,j + β 1,j x i + β 2,j z i , where the species specific intercept β 0,j and covariate effects β 1,j and β 2,j were sampled from independent Gaussian distributions. We sampled species observations, y ij , for the competing species group from a Dirichlet-Multinomial distribution and for the non-competing species from a Beta-Binomial distributions. The scale parameter of the Dirichlet-multinomial was set to γ = 10 and the scale parameters of the Beta-binomial distributions were sampled from a log-normal distribution γ g ∼ log-N(µ = 2, σ 2 = 0.5 2 ). We fitted three models to the simulated data. In all of them, the conditional distribution for species observations given the latent Gaussian variables corresponded to the true data generating process with Dirichlet-multinomial and Beta-Binomial components whose scale parameters were given priors as in Section 3.4. In the first model, to be denoted Cov+LMC(1) S , the Gaussian latent variable model included covariate x and a spatial random effect with LMC(1) prior. The covariance function of the LMC(1) prior was the stationary exponential with hyperpriors as defined in Section 3.4. Hence, this model corresponds to a situation where the (non-stationary) environmental covariate is included in the model and the (multivariate) stationary spatial random effect captures the effects of the missing (stationary) covariate. In the two other models, the Gaussian latent variable model included only the intercept and spatial random effects, that is f (s i ) = β 0,j + i,j . In the second model, the spatial random effects were given LMC(1) prior with the same nonstationary covariance function that was used to generate the covariate z (model LMC(1) NS ). Table 2: Summary of the alternative models compared in the case study. The abbreviations in the model names are: C for constant latent function (f j (s i ) = β 0,j ); IGP S and IGP NS denote f j (s i ) = β 0,j + ij where spatial GPs are respectively independent stationary and non-stationary; LMC(k) S and LMC(k) NS denote respectively stationary and non-stationary linear model of coregionalization with k distinct covariance functions; BB and DM denote respectively Beta-Binomial and Dirichlet-Multinomial processes.

Model
Species niche preference Competition Constant Independent heterogeneous Dependent heterogeneous No Yes stationary non-stationary stationary non-stationary C+BB In the third model, the spatial random effects were given LMC(1) prior with the same stationary covariance function that was used to generate the covariate x (model LMC(1) S ). After fitting the three models to the data, we compared them using the CV criteria and Q-Q plots of the randomized PIT as well as with respect to the true underlying Gaussian latent variable model over the grid cells.

Case study analyses
In the plant community modeling case study (Section 2), we used one species group for all the Sphagnum moss species and each vascular plant formed its own species group. Since Sphagnum mosses had larger than 1% cover throughout the study region we set N i,a = 100 for them corresponding to the used 1 percentage unit measurement accuracy. Vascular plant percentage cover measurements ranged from below to above 1% so we assumed in average 0.5 percentage unit accuracy for them leading to N i,a = 200 for vascular plants.
To evaluate the effect of model structures to leaf area predictions, and to test if modeling competition for space improves model predictions, we compared 12 alternative models with different structures for the Gaussian latent variable model and the observation model. Since we did not have detailed covariate information from the study region, we did not incorporate covariate effects to the latent Gaussian variable model (6) but tested seven alternative spatial GPs for it: a constant, f j (s i ) = β 0,j , or a spatial GP, f j (s i ) = β 0,j + ij , where Gaussian random effects ij were either independent or dependent with either a stationary or non-stationary covariance function. Each of the latent variable models was combined with either an observation model containing Dirichlet-multinomial for the group of mosses or an observation model where all species had independent Beta-Binomial distributions. The compared models are summarized in table 2 and their DAGs are summarized in Figure 2 and in Figures B1-B3. We compared and assessed the alternative models with the CV criteria and Q-Q plots with point-wise 95% confidence intervals of the randomized PIT summarized in Table 1. For computational reasons, these were done using a random subset of 200 inventory plots. We then trained the best model with all data to predict the vegetation cover for each species separately, across all species and across both species groups (mosses and vascular plants). The predictions were done over a regular mesh, D M , covering the study area, D.
The resolution of the mesh was chosen so that further decreasing the size of the grid cells (A = 4 m 2 ) did not affect the posterior predictive variance of the total vegetation cover V ar(φ).

Simulation study
The results from the simulation study are collected in Appendix A and here we summarize them. As expected, the model closest to the data generating model (Cov+LMC(1) S ) had the best and the non-stationary random effect model (LMC(1) NS ) the second best overall performance when measured with the 10-fold CV log predictive density estimates (Table A1). In terms of Q-Q plots of the PIT ( Figures A5-A7, A14-A16 and A23-A25) and the total percentage cover predictions over the grid cells ( Figures A8, A17 and A26) the differences between the models were small. The main purpose of the simulation study was, however, to critically assess the identifiability and posterior sampling of model parameters. For all models, the MCMC sampling of model parameters and Gaussian latent variables converged and mixed reasonably well. For model LMC(1) NS the convergence was slower and the mixing of the sample chain not as good as for the other two models though. In Cov+LMC(1) S practically all parameters and latent variables were well identified. The 95% posterior credible intervals for model parameters included the true data generating parameter values for all cases except the length-scale of the LMC(1) covariance function, which was slightly underestimated ( Figure A1). The mismatch between the posterior distribution and the true value of the lengthscale most likely results from randomness in the realization of the hidden covariate, z, which the LMC(1) term adapts to. Since Cov+LMC(1) S did not include the hidden covariates its spatial random effect should adapt to the species specific β 2,j z terms and its coregionalization covariance matrix, Σ , estimates the true interspecific covariance matrix induced by β 2,j parameters; that is, β 2 β 2 where β 2 = [β 2,1 , . . . , β 2,J ]. However, since we have sampled the process only in 200 locations the coregionalization covariance matrix should estimate better the sample covariances between by β 2,j z terms. The 95% posterior credible intervals of the elements of the coregionalization matrix included the true interspecific covariance in most cases and the sample covariance in all cases ( Figure A2). The posterior mean of the interspecific exclusive competition measure (5) predicted over the study area matched also well the true simulated competition measure ( Figure A9). The models LMC(1) S and LMC(1) NS did not include any covariates so the only parameters having a counterpart in the data generating model are the covariance function length-scales, µ l in the non-stationary covariance function, the scale parameters of the Dirichlet and Beta distributions, and the coregionalization covariance matrix. All these parameters were well identified since the most of their 95% posterior credible intervals included the true data generating values ( Figures A10, A19). The only exception was the posterior distribution of the length-scale of the LMC(1) S model, for which the 95% posterior credible interval did not include the true data generating value. In general, the coregionalization matrices Σ of LMC(1) S and LMC(1) NS estimated well the interspecific sample covariances but underestimated the true interspecific covariances, induced by the two covariate effects in the simulated data, given by BB where B is a J × 2 matrix with j'th row B j· = [β 1,j , β 2,j ] ( Figures A11, A20). Also, the posterior means of the interspecific competition measures (5) predicted over the study area matched well the true simulated competition measure (Figures A18 and A27). These interspecific exclusive space competition measure predictions of LMC(1) S and LMC(1) NS were smoother and less accurate than the corresponding prediction with Cov+LMC(1) S which is reasonable since the former two lack the information from x 1 at the prediction locations. Table 3 summarizes the model comparison results for the alternative models in our case study. In all predictive tasks, the models with constant latent function (C+BB and C+DM) had the lowest CV predictive performance. This gives strong support for spatial heterogeneity in vegetation covers. However, when predicting the species-wise total over the region (CV 3 ), the difference to the best model was not significant if measured relative to standard error of the CV estimates. In general, models with Dirichlet-Multinomial observation model outperform the otherwise same models with Beta-Binomial in joint species predictions (CV 2 and CV 4 ) and perform practically equally well in single species predictions (CV 1 and CV 3 ). This indicates that accounting for species competition is more important for community predictions than for single species predictions. The best performing model in terms of CV 1 and CV 3 was LMC(1) S +BB with practically equal performance by LMC(1) S +DM and LMC(2) S +DM. However, the CV 1 and CV 3 estimates of all spatially heterogeneous models except IGP S +DM in CV 1 were within one standard error of the best CV estimate indicating that these differences were practically negligible. When looking at CV 1 and CV 3 for each species separately (Table B1), the relative differences between alternative models were small but the best model for all species always had either stationary or non-stationary LMC latent function prior. In terms of, CV 2 and CV 4 the best performing models were LMC(1) S +DM, LMC(2) S +DM and LMC(1) NS +DM. The former two models had practically equal CV 2 and CV 4 estimates and those of the latter were within one standard error estimate from the former two. The relatively larger difference between LMC(1) NS +DM and the best model might be, at least partly, explained by worse mixing of its Markov chain compared to the Markov chains of LMC(1) S +DM and LMC(2) S +DM which induces more weight to occasional low log predictive density values. The CV 2 and CV 4 estimates of all other models were significantly worse than those of the best three models. The Markov chains for IGP NS +DM and LMC(2) NS +DM converged slowly and their mixing was poor resulting into considerably smaller effective sample size for the log predictive densities compared to other models so we excluded them from the CV comparison. Time needed for posterior sampling of JSDMs with stationary spatial random effects ranged from 3 to 11 hours with a regular desktop computer. The sampling for single species SDMs (stationary or non-stationary) was considerably faster whereas the sampling for JSDMs with non-stationary spatial random effects was considerably slower (6.5 days for LMC(1)). Table 3: Model comparison results with 10-fold CV using log predictive density utilities (CV 1 , . . . , CV 4 ; see Table 1) together with the standard error estimates (se) and the Monte Carlo error estimate (me) for the CV estimates (see Section 4). The best CV estimate is indicated by bold font and the CV estimates that are within one standard error from the best model are indicated by italic font. Rows with -indicate models whose effective sample size for log predictive density was considerably smaller than for the rest of the models and, hence, were excluded from the comparison.

Model validation
Model comparison with CV did not indicate clear differences between the best models: LMC(1) S +DM, LMC(2) S +DM and LMC(1) NS +DM. Moreover, the former two were practically the same model since LMC(2) effectively reduced to LMC(1). Hence, we conducted model validation for the latter two. Q-Q plots with point-wise 95% confidence intervals of the randomized PIT 2 and PIT 4 for the LMC(1) NS +DM and LMC(2) S +DM models (Figure 3 and B4) show moderate departure from uniformity and they indicate that the total percentage cover is to some extent underestimated in average over the CV folds. However, it should be noted that Q-Q plots of the PIT 2 are formed from only 200 and Q-Q plots of the PIT 4 from 32 posterior predictive distributions. Hence, the deviation from uniformity is likely also due to randomness. Similar deviations from uniformity were observed also in the simulation study with the Cov+LMC(1) S model, which was corresponded to the true data generating process in the simulation study ( Figures A5-A7). The species wise Q-Q plots of the PIT 1 for LMC(2) S +DM and LMC(1) NS +DM models ( Figures B5 and  B6) are close to uniform for all other species except S. papillosum for which Q-Q plots of the PIT 1 is slightly ∪-shaped indicating underestimation of predictive uncertainty. This deviation from uniformity was more distinct for LMC(2) S +DM than LMC(1) NS +DM. Out of all species, S. papillosum was the one showing the most clear non-stationarity in the latent function of LMC(1) NS +DM model indicating that non-stationary Gaussian process prior might have been benefitial in this case. For most species the species wise Q-Q plots of the PIT 3 for LMC(2) S +DM and LMC(1) NS +DM models ( Figures B7 and B8) do not show clear deviations from uniformity even though the Q-Q plots are noisier than the Q-Q plots of the PIT 1 .

Percentage cover predictions and species interactions
Based on the model comparison and validation we selected LMC(1) NS +DM for the final inference. The percentage covers of the species have clearly different spatial patterns (Figure 4). Spatial distribution of the vegetation is typical for minerotrophic fen where center is lower than the edges and therefore also water table is on average higher (wetter) at the center. As spatial composition of sphagna and vascular plants follows the variation in the water table (WT), plants adapted to grow in drier areas are concentrated to edges of the study area and flarks are inhabitated by species that can withstand waterlogging. Hummock species like S. angustifolium and E. nigrum favoring drier habitat grow mostly at the edges of the study area. Lawn species S. papillosum dominating the composition of sphagna grows fairly evenly throughout the area. Lawn community type can be further divided into three subgroups specified by vascular plants (from drier to wetter) E. vaginatum lawn, C. rostrata lawn and C. lasiocarpa lawn. C. rostrata and C. lasiocarpa grow mainly on the southern edges of the study area while percentage cover of E. vaginatum tends to be higher on the northern edges. Species (S. majus, C. limosa and S. palustris) associated to wetter growing conditions (hollows) are concentrated in smaller hotspots at the center of the study region. Combined sphagnum cover (sum of sphagnum species covers) is relatively stable (85-95%) over the study area although species-wise distributions have distinct spatial patterns. Contrarily, spatial distribution of combined vascular plants cover has high variation such that hotspots of higher combined covers are located on both wetter center areas and drier parts of the study area.
The estimated interspecific correlations in niche preferences are summarized in Figure 5 and the exclusive competition for space between sphagnum species is summarized in Figure 6 in terms of the (negative) correlation in their percentage cover (5). Species can be differentiated into co-occuring groups according to the interspecific dependencies between species niche preferences. Sphagnums S. majus and S. fallax with vascular plants Figure 4: The posterior predictive means of the percentage covers and probability distributions of the combined percentage covers over the study area as predicted by LMC(1) NS +DM model. (A) species-wise percentage covers of the sphagnums, (B) total percentage cover of all sphagnum species combined and the total combined cover over the study area, (C) species-wise percentage covers of vascular plants, (D) total percentage cover of vascular plants combined and the total combined cover over the study area and (E) combined total cover of sphagnums and vascular plants. For combined totals over the area probability distributions are estimated using kernel density estimation on MCMC samples. Solid vertical lines show the posterior means for the combined covers, dashed lines show the sample means computed from the training data. Figure 5: Posterior mean estimates of the interspecific correlations in the species niche preference in the LMC(1) NS +DM model. White cells indicate that the 80% posterior credible interval of the correlation overlapped zero (i.e., weak support for interspecific correlation) and stars indicate that the 95% posterior credible interval did not overlap zero (i.e., strong support for interspecific correlation).
S. palustris and C. limosa adapted to grow in wetter conditions form one cluster. Lawn species C. lasiocarpa and C. rostrata can be classified into the second co-occuring cluster. High lawn-low hummock species S. angustifolium, S. magellanicum and P. sylvestris form the third cluster. Dwarf shrubs E. nigrum and R. chamaemorus form the fourth cluster (both species were also positively correlated with P. sylvestris but 80% credibility intervals for the correlation between R. chamaemorus and P. sylvestris overlapped zero). Lawn species S. papillosum, S. balticum and E. vaginatum were also positively correlated but 80% credibility intervals for correlations overlapped zero. The interspecific correlations in niche preference followed qualitatively similar pattern in all LMC models. However, models with Beta-Binomial observation model found more negative interspecific correlations than models with Dirichlet-Multinomial observation model as illustrated by comparison of Figures 5 and B10.
Estimated total vegetation cover distributions for each model presented in Figure B9 show that non-stationary JSDM models predicted combined covers that are close to the observed sample means. All stationary models tend to predict combined cover to be smaller than their non-stationary counterparts. In each case, the 95% prediction intervals for nonstationary JSDM models overlap observed sample mean but for stationary models 95 % prediction intervals do not overlap observed sample mean. This suggests that there is non-stationarity in the observation process that should be taken into account or received vegetation cover estimates might be biased. Stacked species distribution models tend to overpredict combined vascular plants cover but underpredict combined sphagnum and total vegetation covers.

Discussion
Our proposed model performed well in the simulation experiments in terms of the identifiability of the model parameters and (out-of-sample) predictive performance. Importantly, the proposed model was able to reproduce the interspecific correlations in the residual latent Gaussian variables (the coregionalization matrices Σ ) ( Figures A2, A11, A20) and the parameters for exclusive competition for space (5) (Figures A9, A18, A27). The latter Figure 6: The exclusive competition for space between sphagnum species. The maps on the lower left corner show the spatial distribution of the (negative) interspecific correlation in percentage covers for all pairs of sphagnum species. The matrix on the upper right corner shows the average of these correlations over the whole study area.
were equally accurately inferred in the presence of (simulated) environmental covariates ( Figure A9) and in the absence of them ( Figure A18). The computational time for posterior inference ranged from hours to several days but, to our experience, it was comparable to other JSDMs employing spatial random effects.
Our case study results (table 3) support previous empirical findings that modeling interspecific correlations improves JSDM predictions through information sharing between species (Ovaskainen and Soininen, 2011;Vanhatalo et al., 2020;Nordberg et al., 2019). Our case study results show additionally that explicit modeling of exclusive competition for space improves models' predictive performance. This improvement is more important in predictions concerning community structure and total percentage cover (CV 2 and CV 4 in table 3) than in single species predictions (CV 1 and CV 3 in table 3). The models did not differ only in their predictive properties, though, but they led to different inference results as well. For example, models with Beta-Binomial observation model found more negative interspecific correlations in species niche preferences than models with Dirichlet-Multinomial observation model (figures 5 and B10). Even though the differences between the best models were small, the proposed non-stationary JSDM model worked slightly better than its stationary counterpart. The reason for this was that some species, most clearly S. papillosum, had non-stationary latent Gaussian field (see Figure 4). However, we want to emphasize that the more complex non-stationary spatial structure is an overkill in JSDMs in general. As mentioned already by Schmidt and Rodríguez (2011), non-stationary spatial random effects are more likely needed when environmental covariates are not available than when they are available. To our experience, the environmental covariates typically capture non-stationary aspects in the data. Moreover, often spatial data are collected with so low resolution that inferring non-stationarity from them is not possible. Non-stationary LMC was justified for our case study, though, since our data were sampled with high spatial resolution and we did not have environmental covariates from the region.
Our results demonstrate also that the relative performances of alternative models can be different in different model comparison metrics. This highlights the importance of tailoring the model comparison method to the specific task at hand. The best models (LMC(1) S +DM, LMC(2) S +DM and LMC(1) NS +DM) were among the best in all predictive tasks whereas other models performed equally well to them only in some of the predictive tasks. Namely, all spatially heterogeneous models were practically equally good in producing species distribution maps and per species total cover estimates (CV 1 and CV 3 see table 3). However, in joint species predictions the best models stood out as clearly the best. Notably, the spatially homogeneous models C+BB and C+DM did not differ significantly from the other models in per-species total over area predictions (CV 3 in table 3 and Figure B9), which is reasonable since, in theory, sample mean over uniform random locations is an unbiased estimate for species-wise total covers. However, Foster et al. (2021) give an example on how sample mean over uniform random locations may fail in practice if the underlying field is very heterogeneous. We expect that estimating areal vegetation composition more precisely can be beneficial in areal greenhouse gas emission models since vegetation is important explanatory factor in carbon dioxide and methane flux models. The case study results demonstrate that presented model offers valid method to estimate areal vegetation cover, and uncertainty associated to it, for those purposes. The lack of covariates is a shortcoming in our case study though, but information on relevant covariates was not available for the study area. For example, height of water table and covariates related to soil properties could improve the model fit further as they are a commonly known drivers of vegetation patterns in peatlands (Andersen et al., 2011).
Drawing conclusions about species associations and competition from observational data is challenging since we cannot control for all the covariates and processes shaping the community. As demonstrated and discussed by, for example, Clark et al. (2014), Blanchet et al. (2020) and Poggiato et al. (2021) latent factor models offer limited insight into realized dependence behavior between species at sites and cannot, in reality, separate environmental effects from biotic interactions since these two components are functionally confounded. In this respect, our inference is an improvement compared to other JSDMs where interspecific interactions are modelled only with latent factors. Our models with Dirichlet-Multinomial observation model describe exclusive competition for space functionally differently from the effects of environmental covariates, and these two descriptions are located in different hierarchical levels of our probabilistic model. Hence, the negative interspecific dependencies arising from the exclusive space competition are filtered from the other interspecific dependencies captured by the latent Gaussian model. A clear example of this is provided by the different inference results for the coregionalization matrices of the Beta-Binomial model compared to that of the Dirichlet-Multinomial observation model (Figures 5 and  B10). Our case study results are also highly congruent with the earlier ecological knowledge of the species in the study area. However, the Dirichlet-Multinomial model does not contain mechanistic description on biological processes (such as abiotic filtering and biotic competition for nutrients) underlying this exclusion process. It is a probabilistic description for the end result of those processes.
Even though Dirichlet process has previously been used in JSDM literature our approach differs from these earlier models significantly. Taylor-Rodríguez et al. (2017) and Shirota et al. (2019) use Dirichlet processes to cluster species in their responses to latent factors and Johnson and Sinclair (2017) and Sollmann et al. (2021) use them to cluster species in their responses to environmental covariates. We use Dirichlet distribution in conceptually different manner and in technically different part of the model compared to sample species specific percentage covers. Moreover, Dirichlet-multinomial model has been used to model vegetation cover data, for example, in ordination settings (Damgaard et al., 2020) but our model is the first one combining it with formal joint species distribution modeling framework.
We believe that the proposed modeling framework can be broadly used for vegetation distribution studies. The only a priori information on interspecific dependence between species required in our model is whether species inhabit same vegetation layer or not. In our case study, this boils down to grouping together the moss species. However, such information is often available also in other plant studies. In principle, the model could be extended also to cases where the space dependencies between species are not clear so that we make the group identifiers of species random and infer them alongside other model parameters. Our model could also be applied to other cases where species can be assigned to exclusive species groups according to a limited resource, and potentially also to cases where competition arises from priority effects where a species arriving first excludes a species arriving later. For example, coral species occupying same depth layer behave similarly to the mosses in our study and colonial breeding bird communities show competition for nesting space that could potentially be formulated through Dirichlet process. Another application domain for our model are compositional data where species are not necessarily ecologically exclusive but due to sampling process they are so in the observations. For example, sequencing based microbial community data are compositional by nature due to the limited volume of sample even if the microbial species were not competing for space. Similarly, Juntunen et al. (2012) modeled (fixed volume) trawl catches from a fish community with multinomial distribution without assumptions for exclusive competition for space. Hence, we believe that combining the Dirichlet-Multinomial layer with the hierarchical latent Gaussian layer of contemporary JSDMs can improve their predictive and inference performance considerably in applications where interspecific competition for space is present or where sampling process induces Multinomial structure among species observations. Shirota, S., A. E. Gelfand, and S. Banerjee (2019

A Simulation study analyses
In this supplementary material we present the results for the simulation study analyses.
The model comparison results are summarized in Table A1 and the model specific inference results are given in the subsequent sections. Table A1: Model comparison results with 10-fold cross-validation using log predictive density utilities (CV 1 , . . . , CV 4 ) together with the standard error estimates (se) and the Monte Carlo error estimate (me) for the CV estimates. The best CV estimate is indicated by bold font and the CV estimates that are within one standard error from the best model are indicated by italic font.

A.1 Inference using Cov+LMC(1) model
Latent process for species j at location s i was modelled as where β 0,j is an intercept for species j, x i is (environmental) covariate value at location s i , β 1,j is a covariate effect for species j and ij is a stationary Gaussian random effect for species j at location s i . Dependencies are modeled through linear model of coregionalization with one distinct covariance function (k=1).     Figure A5: Species specific Q-Q plots with point-wise 95% confidence intervals of the PIT of location-wise cross validation predictive distributions (PIT 1 ) for Cov+LMC(1) model. Figure A6: Q-Q plots with point-wise 95% confidence intervals of the PIT of locationwise total over species (PIT 2 ) and CV-fold-wise total over species and locations predictive distributions (PIT 4 ) for Cov+LMC(1) model. Figure A7: Species specific Q-Q plots with point-wise 95% confidence intervals of the PIT of total over locations (within a CV-fold) predictive distributions (PIT 3 ) for Cov+LMC(1) model.

A.2 Inference using LMC(1) NS
Latent process for species j at location s i was modelled as where β 0,j is an intercept for species j and ij is a non-stationary Gaussian random effect for species j at location s i . Dependencies are modelled through linear model of coregionalization with one distinct covariance function (k=1).     Figure A14: Species specific Q-Q plots with point-wise 95% confidence intervals of the PIT of location-wise cross validation predictive distributions (PIT 1 ) for LMC(1) NS model. Figure A15: Q-Q plots with point-wise 95% confidence intervals of the PIT of locationwise total over species predictive distributions P IT 2 and Q-Q plots with point-wise 95% confidence intervals of the PIT of CV-foldwise total over species and locations (within a CV-fold) predictive distributions (P IT 4 ) for LMC(1) NS model. Figure A16: Species specific Q-Q plots with point-wise 95% confidence intervals of the PIT of total over locations (within a CV-fold) predictive distributions P IT 3 for LMC(1) NS model.  A.3 Inference using LM C(1) S Latent process for species j at location s i was modelled as where β 0,j is an intercept for species j and ij is a stationary Gaussian random effect for species j at location s i . Dependencies are modelled through linear model of coregionalization with one distinct covariance function (k=1).     Figure A23: Species specific Q-Q plots with point-wise 95% confidence intervals of the PIT of location-wise cross validation predictive distributions P IT 1 for LMC(1) S model. Figure A24: Q-Q plots with point-wise 95% confidence intervals of the PIT of locationwise total over species predictive distributions P IT 2 and Q-Q plots with point-wise 95% confidence intervals of the PIT of CV-foldwise total over species and locations (within a CV-fold) predictive distributions (P IT 4 ) for LMC(1) S model. Figure A25: Species specific Q-Q plots with point-wise 95% confidence intervals of the PIT of total over locations (within a CV-fold) predictive distributions P IT 3 for LMC(1) S model.

B Case study analyses
Appendix B contains supplementary results for the case study analyses. It includes the following tables and figures: 1. Figures B1-B3: DAG representations of the alternative models.
2. Figures B4-B8: Q-Q plots with point-wise 95% confidence intervals of the randomized PIT 2 and PIT 4 for the LMC(2) S +DM model and species specific Q-Q plots with point-wise 95% confidence intervals of the randomized PIT 1 and PIT 3 for LMC(2) S +DM and LMC(1) N S +DM models.
3. Figure B9: Estimated posterior distributions for total vegetation covers for each model. Figure B10: Posterior mean estimates of the interspecific correlations for LMC(1) N S +BB model.

Tables B1 and B2
: Species specific model comparison results with 10-fold crossvalidation using log predictive density utilities CV 1 and CV 3 . Figure B4: Q-Q plots with point-wise 95% confidence intervals of the randomized PIT 2 and PIT 4 for the LMC(2) S +DM model separately for sphagnums, vascular plants and combined vegetation. Figure B5: Species specific Q-Q plots with point-wise 95% confidence intervals of the randomized PIT 1 for the LMC(1) NS +DM model. Figure B6: Species specific Q-Q plots with point-wise 95% confidence intervals of the randomized PIT 1 for the LMC(2) S +DM model. Figure B7: Species specific Q-Q plots with point-wise 95% confidence intervals of the randomized PIT 3 for the LMC(1) NS +DM model. Figure B8: Species specific Q-Q plots with point-wise 95% confidence intervals of the randomized PIT 3 for the LMC(2) S +DM model. Figure B9: Posterior predictive distributions conditional on the full training data for the estimated combined sphagnum, vascular plants and total cover over the area for every model. Black dots show the posterior predictive means and blue squares represent the empircal average total percentage cover over the training data. Figure B10: Posterior mean estimates of the interspecific correlations in the species niche preference in the LMC(1) NS +BB model. White cells indicate that the 80% posterior credible interval of the correlation overlapped zero (i.e., weak support for interspecific correlation) and stars indicate that the 95% posterio credible interval did not overlap zero (i.e. strong support for interspecific correlation). Table B1: Species specific model comparison results with 10-fold cross-validation using log predictive density utility CV 1 together with the standard error estimates (se) and the Monte Carlo error estimate (me) for the CV estimates.
1) C+BB 2) C+DM 3a) IGP S +BB 3b) IGP NS +BB Species CV 1 (se/me) CV 1 (se/me) CV 1 (se/me) CV 1 (se/me) Table B2: Species specific model comparison results with 10-fold cross-validation using log predictive density utility CV 3 together with the standard error estimates (se) and the Monte Carlo error estimate (me) for the CV estimates.