Decomposing dark diversity affinities of species and sites using Bayesian method: What accounts for absences of species at suitable sites?

Locally observed biodiversity always consists of only a fraction of its site‐specific species pool. Why some suitable species are absent, shaping dark diversity of that site, is a basic yet increasingly crucial question in the face of global biodiversity degradation. The ultimate processes underlying dark diversity associate with either dispersal or persistence limitations, or both. These two limitations in turn link to several characteristics of individual species and sites, making it challenging to detect the exact factors contributing to dark diversity in a particular metacommunity. Here, we propose a metric, dark diversity affinity (DDA), which measures the tendencies of individual species to be absent from suitable sites and of individual sites to miss suitable species. We developed a Bayesian model interrelating four types of datasets: metacommunity matrix of species presences in sites, species‐sites suitability matrix, species functional traits and site characteristics. In the model, DDA operates as an adjustment bridging the disparity between site‐specific suitability and observed presence/absence of each species at each site. Furthermore, DDA can be related to individual properties of species and sites through logistic regression sub‐models. We demonstrated our framework using nine empirical datasets of vertebrate, invertebrate and vascular plant metacommunities. We show the decomposed roles of species traits and site characteristics in defining DDA and, therefore, dark diversity in metacommunities. In the empirical datasets, various functional traits, which related to morphology, reproduction, dispersal ability, population attributes, resource specificity and life history, significantly affected species‐level DDA, while site characteristics regarding habitat types and attributes, resource availability, pollution, and edaphic and water conditions influenced DDA at the site level. Our framework provides a concept and methodological toolbox that allows identification of the processes underlying dark diversity and advances both the theory of community ecology and biodiversity conservation. Conservation actions can be more successful by knowing whether species loss in a particular metacommunity is associated to some species traits or site characteristics and what their relative contributions are.


| INTRODUC TI ON
While biodiversity is increasingly in the focus of science and conservation, the mechanisms shaping it have largely remained elusive (Chesson, 2000;Ricklefs, 2004). In the past decades, the theory of ecological communities has proposed a number of processes underlying biodiversity patterns that operate at local to regional scales via dispersals and survivals of individual species (Vellend, 2016). As a baseline, the match between species' ecological requirements and the set of abiotic and biotic conditions of sites determines which species are suitable at which sites, setting the foundation for potential biodiversity patterns. However, only a fraction of suitable species are present in general, while the rest of them are absent at a given time, belonging to the dark diversity of that site (Pärtel et al., 2011; Box 1). Although the disparity between suitability and species occurrence can be explained partly by limitations of data resolution (Lassueur et al., 2006;Pradervand et al., 2014), species fitting well to a site are still often truly absent due to various reasons (Boussarie et al., 2018). Two of Rabinowitz's seven forms of rarity can be archetypes of such true absence, that is, taxa characterized by wide habitat specificity and small population size (Rabinowitz, 1981). Thus, studying unexpected absence may offer insights of other underlying processes that complicate a simple match between species and sites. Dark diversity connects biodiversity at a local site to the metacommunity (regional level), forming an operational link across spatial scales. Even if it cannot be measured by a field observation, there are methods to estimate the dark diversity composition (Box 1), which enhances ecologists' understanding of metacommunity processes. First, by inspecting how many suitable species are present and absent, it enables one to take account of variations in site-specific species pools and compare multiple facets of biodiversity across regions, habitat types, and taxonomic groups (Pärtel et al., 2011). Second, dark diversity can distinguish whether compositional changes are due to species 'moving' between observed and dark diversities or to changes in the regional species pool (Gaüzère & Devictor, 2021;Trindade et al., 2020), and thus, it describes biodiversity for longer time periods. In addition, dark diversity provides insights into how local communities vary in their functional properties (Morel et al., 2022) and into multiple facets of the species rarity (Riva & Mammola, 2021). All these aspects make dark diversity a crucial tool for effective biodiversity conservation (Lewis et al., 2017).
Limitations in dispersal and persistence are the two ultimate drivers of dark diversity patterns. Species might not yet have reached a suitable site, or some temporary conditions (e.g. disturbance) might have caused a local extinction even if the site is suitable for the species at a given time. Some particular functional traits of species and characteristics of sites can be related to such dispersal and persistence processes. For example, compared to species in observed diversity, plant species in dark diversity tend to have lower dispersal ability and stress tolerance, higher dependence on mycorrhizal symbiosis and are more likely to be non-clonal (Belinchón et al., 2020;Moeslund et al., 2017;Riibak et al., 2020). At the site level, Riibak et al. (2020) found that plant dark diversity was generally larger in sites with lower regional-scale habitat availability, indicating seed source limitations. Similarly, in a global-scale study, dark diversity was relatively high in grassland sites with high biomass, indicating competitive conditions during productive years (Fraser et al., 2015). In addition, edaphic properties in grassland plant communities can influence both species pool size and the chance that species from the pool stay in dark diversity (Bennett et al., 2016), and various aspects of forest structure are related to the relative size of observed and dark diversities in a range of taxonomic groups including fungi, plants, and animals (Penone et al., 2019). Finally, wilderness (low human influence) is associated with relatively low dark diversity globally in arbuscular mycorrhizal fungal communities . Thus, we conclude that dark diversity in each local site is maintained through an intimate interplay of species-and site-driven mechanisms.
To comprehensively understand why certain species are absent from suitable sites and belong to dark diversity, it is crucial to decompose effectively species-and site-driven mechanisms. Two mutually non-exclusive scenarios can account for an absence of a species from one of its suitable sites: (1) the species traits have prevented its presence from the site, regardless of the characteristics of the site, or (2) the site characteristics have prevented the species presence, regardless of species' ecological performance. For the former and the latter scenarios, we postulate an inherent 'affinity' of the species and the site, respectively, with dark diversity. Moreover, even stronger consequences can be expected if both species and site characteristics cooperate (e.g. bad dispersers in geographically isolated sites). Therefore, unless careful control is taken, there is always a chance of confounding factors shaping dark diversity at species-and site-levels. Without can be more successful by knowing whether species loss in a particular metacommunity is associated to some species traits or site characteristics and what their relative contributions are.

K E Y W O R D S
biodiversity conservation, biogeography, community ecology, dark diversity, functional trait, macroecology, metacommunity, species pool controlling this potential, we might wrongly identify certain traits or site characteristics as a key driver, even though the counterpart feature was the actual driver. This concern applies not only to theoretical understanding of ecological communities but also to biodiversity conservation practice, highlighting the need for decomposing contributions by species traits and site characteristics.

BOX 1 Concepts, metrics and their relations regarding dark diversity affinity
Metacommunity. One observation site is regarded as one community, and the aggregation of all observation sites is called the metacommunity. The metacommunity matrix consists of observation sites (rows) and species (columns), where the species are observed at least once in the metacommunity. The metacommunity matrix can be filled with either presence/absence (1/0), abundance, or cover (e.g. percentage).
Site-specific suitability (suit) represents the ecological suitability of a site to a species. Each combination of species and site has unique suit value (suit i, j for species i at site j) regardless of species presence/absence. The value of suit is essential for estimations of dark diversity and site-specific species pool ( Figure B1). In this study, suit (0 ≤ suit ≤1) was estimated based on the species cooccurrence patterns observed over each metacommunity dataset using r package 'DarkDiv' (Carmona & Pärtel, 2021).
Dark diversity is the facet of biodiversity representing the amount of species suitable yet absent from a focal location. It is possible to estimate the size of dark diversity by summing suit values of absent species ( Figure B1). Species absent from the focal region and metacommunity is not included in dark diversity. In previous studies, dark diversity was sometimes calculated counting absent species with suit higher than a certain threshold (as a discrete count, not as the sum of continuous probabilities). The site-specific species pool ( Figure B1) consists of the observed diversity and the dark diversity.
Dark Diversity Affinity (DDA) is a conceptual metric (0 ≤ DDA ≤1) proposed by this paper. DDA estimates the degree to which a species at a site is likely to be absent (DDA > 0.5) or present (DDA < 0.5) in observations for a given level of suitability (suit). Numerically Species-site unified model. A Bayesian statistical model developed by this paper to simultaneously estimate dda sp and dda site . Outputs of the modelling, in particular, the regression coefficients of species (b sp ) and site (b site ) sub-models, offer unique insights into which species traits and site characteristics are responsible for their dda, and further, what mechanisms are shaping dark diversity patterns across the metacommunity (Figure 1).

FIGURE B1
Calculation of observed and dark diversities and species pool in a hypothetical community. An example plant community (therefore of one site) with two observed (Sp. 1, 2) and three absent species  having different values of suit. Absent species are those that are not observed at a focal site but in the metacommunity. Dark diversity affinity (DDA) represents the tendency of species being in absence and presence for a given suitability.
Here, we develop an analytical framework to quantify and decompose species and site affinities with dark diversity and link them to various species traits and site characteristics using Bayesian methods.
Previously dark diversity studies have only considered correlations between species traits and their frequencies being in dark diversity, or between characteristics of individual sites and their dark diversities, to interpret the underlying processes shaping biodiversity patterns. We further step forward to postulate the conceptual affinity of species and sites with dark diversity, by which we infer mechanisms responsible for dark diversity patterns (Box 1). Our framework can apply to metacommunity datasets (presence/absence in sites × species matrix) with functional traits of species and characteristics of sites. We demonstrate the framework's applicability using nine empirical datasets across a range of organisms compiled in the CESTES database (Jeliazkov et al., 2020; see the Section 2.4). We examine the decomposed roles of species functional traits (e.g. morphology, reproduction, and life history) and site environmental characteristics (e.g. habitat age and size, resource availability, pollution and edaphic conditions) in determining dark diversity. In addition, we verify the significance of the species-site decomposition by comparing the results of the unified model with two separate models of species and site. We further provide R and JAGS scripts of the unified model corresponding to one of the CESTES datasets (Data S1).

| Dark diversity affinity
In order to express the affinities at the species and site levels in the same scale, we propose a metric, dark diversity affinity, to measure the inherent tendency of species and sites increasing dark diversity.
Given the interplay of the species-and site-driven mechanisms, we assume that DDA is decomposed into components accounting for individual species and sites (dda sp and dda site , respectively). If a species is absent from its unsuitable site, there is no potential for DDA to be involved. But if a species is frequently absent from suitable sites, then the species is supposed to have a high dda sp . In contrast, if a species tends to be present even at sites with limited suitabilities, the species is assigned a low dda sp . At the site level, dda site measures the tendency of a site losing suitable species to dark diversity.
Values of dda sp and dda site depend on their individual species and site characteristics. The metrics, dda sp and dda site , are unified to serve as site-specific DDA for presence/absence of a given species (we call it unified DDA; hereafter DDA in italics). To summarize, dark diversity is a measure of the amount of suitable but absent species at a given location, while DDA is the tendency for the species' presence/absence in a site for a given suitability (see Box 1).

F I G U R E 1
Structure of the species-site unified model for dark diversity affinity (DDA) with a hypothetical dataset. Dark diversity affinity (tendency of species and sites increasing dark diversity) is estimated by a Bayesian model consisting of three operational sections (a-c). Observed and external variables (prab, suit, δ, and species and site characteristics) are represented by rectangle boxes, while parameters estimated in the unified model (p, DDA, dda sp and dda site ) are in round boxes. The red-blue colour gradient in matrices indicates the gradient of presence-absence expectation. The logistic regression sub-models are shown up to the third parameter for both species and site as an example (a), where the first and the second parameters (b 1 and b 2 ) are regression coefficients for numerical variables (x 1 and x 2 ), and the third is the class-specific intercept for categorical variables (x 3 ). Explanatory variables (x 1 , x 2 , and x 3 ) are species traits for dda sp and environmental characteristics for dda site . The structure of the sub-models can be modified flexibly to fit users' assumptions and datasets. Matrix multiplication in (b) between [1 -DDA] and suit operates cell by cell. See Text S1 for more detail of the model and codes.

| Species-site unified model
To model DDA, we employed a Bayesian modelling approach to cope with the following two challenges in one model: (1) fitting and unifying two logistic regression sub-models at species and site levels and (2)  where it operates specifically for species i at site j. The logit function restricts DDA, dda sp and dda site in the range of 0 to 1.
In the second section of the model (Figure 1b), the estimated DDA adjusts the site-specific suitability (suit) by downscaling it, which in turn predicts the presence likelihood p as follows: where p i, j and DDA i, j are individual estimates for species i at site j, and suit i, j is provided as an external estimate (in the current study we used r package 'DarkDiv' by Carmona and Pärtel (2021); see the Section 2.4).
The constant parameter δ is unique to a focal metacommunity, which lifts p up/down to the level of the observed presence rate. Therefore, although suit is scaled down by DDA, a positive δ allows p > suit (e.g. p and suit for species 2 at site n in Figure 1; see also Figure S1). Purposefully, δ is placed out of the logit function to prevent p breaching the 0-1 range.
To obtain δ for each focal metacommunity, the observed presence/ absence (prab) is averaged (presence rate, prab ), and we substitute prab and the average suit (suit) for p and suit in Equation (2), while keeping DDA = 0.5: Consequently, DDA = 0.5 serves as the neutral point (i.e. no pressure nor prevention to be absent) at a given suit. Thus, DDA > 0.5 and DDA < 0.5 (for both DDA and dda) expect species to be more absent and present, respectively, from sites with a given suitability (see Figure S1 for the numerical relations among parameters in Equation 2).
Finally, the third section links presence/absence observations (prab) to the predicted p assuming a Bernoulli distribution (Figure 1c), which is a special case of the binomial distribution with size = 1.
This section absorbs the residual between prab and p, which in turn guides the sampling of estimates. Although presence/absence serves as a response variable, the main goal of the model is to estimate DDAs, which mediate the disparity between prab and suit, and ultimately to link species and sites characteristics to unexpected prab at a given suit.

| Model fitting by Bayesian method
We applied Bayesian method to estimate the parameters of the species-site unified model: a, b, dda and DDA. As a and b are coefficients of the logistic regressions (Figure 1a), uninformative prior distribution would cause a 'separation problem' (Gelman et al., 2008;Lemoine, 2019), which segregates the distribution into the two extreme sides in the logit scale (0 or 1). Instead, we assigned weakly informative prior distributions to make the prior distributions nearly flat in the logit scale. Namely, we applied the distribution of t family, in particular, Cauchy, a special case of t distribution with one degree of freedom, so that its flat-tailed distribution gives the inference a certain potential to select values far from the centre. For the scale parameter, which determines the variance of Cauchy distribution, we set 0.5 and 2.5 as a default to a and b parameters, respectively. As δ adjusts the average dda to 0.5 (i.e. 0 in the linear scale), we limited the prior distribution of the intercept a with a relatively small variance, and this allows clear segregation of a and b roles in the submodels. For b parameter, we followed Gelman et al. (2008), which leads to a relatively conservative inference by restricting the prior distribution towards the centre in the logit scale. The mean of all the prior distributions was set to zero. Before fitting the model, we standardized numerical predictor variables to mean = 0 and sd = 0.5 in accordance with the scale parameter setting (Gelman et al., 2008). We fitted the model using Gibbs sampler JAGS 4.3.0 (Plummer, 2003) via rjags package (Plummer et al., 2016) in R version 4.1 (R Development Core Team, 2021). Estimation convergence was verified based on the Gelman-Rubin statistic R-hat ≤ 1. Rubin, 1992). Although a combination of a small metacommunity matrix and large datasets of species and site characteristics might cause a convergence problem, the unified model structure technically allows any sizes of input and output data without overdetermination (i.e. input < output). We fixed the number of posterior samples as 333 from each of three MCMC chains (999 samples in total). In Text S1, we further explain the detail of sampling configurations and remedies for bad convergence, and Data S1 provides examples of R and JAGS scripts.

| CESTES metacommunity datasets
To demonstrate the performance of our framework, we applied it to empirical metacommunity datasets compiled in 'metaCommunity Ecology: Species, Traits, Environment and Space (CESTES)' by Jeliazkov et al. (2020). For three taxonomic groups (vertebrates, invertebrates, and vascular plants), we selected datasets from either natural or semi-natural systems. We further excluded datasets with less than four variables of species traits or site characteristics.
Finally, we selected the largest three datasets in terms of the size of metacommunity matrix for each of the three taxonomic groups. We obtained nine CESTES datasets in total from terrestrial to aquatic and from tropical to boreal systems (see Text S2 for the details of selected datasets). We show results of one representative dataset with the largest metacommunity matrix from each taxonomic group, and the rest of the results for other datasets are in Text S3.

| Estimation of site-specific suitability
As long as the site-specific suitability takes the range of 0-1 or is standardized to this range, any suitability estimations can be applied to our modelling framework, such as species distribution modelling (Guisan et al., 2017) and unconstrained ordination analyses (Brown et al., 2019), and the definition of DDA will follow the estimation method. In this demonstration, we applied a co-occurrence based method developed by Carmona and Pärtel (2021) (r package 'DarkDiv'), which gives probabilistic suitability estimates. Based on the assumption that frequently co-occurring species share their ecological requirements, this method analyses pairwise patterns of species co-occurrence by calculating a standardized deviation from a null expectation (hypergeometric distribution), avoiding the estimates affected by species regional frequency.
This pairwise association is used as an indication matrix, where present species can indicate the site-specific suitability for all other species, and suitability indications are averaged across all present species.

| Selection of explanatory variables
We selected and prepared explanatory variables for species and site sub-models (Figure 1a) from each CESTES dataset to make the unified model both ecologically sensible (e.g. excluding irrelevant variables) and analytically feasible (e.g. avoiding multicollinearity).
For datasets with many variables, we also used principal component analysis (PCA) to reduce the dimension of the datasets (Figures S2-S4). Further details regarding the explanatory variables and data preparations are explained in Text S2.

| Testing the importance of DDA decomposition between species and site
To determine if there is a difference in parameter estimation when accounting for the interplay of species and sites compared to models that only focus on one or the other, we developed two single-domain models, one for species and one for sites. In each model, we used DDA = dda instead of Equation (1)

| RE SULTS
By fitting the species-site unified model to the nine contrasting metacommunity datasets (Text S2; Table S2), we explored different mechanisms which assist or hinder species with specific traits to be in dark diversity of sites with specific conditions. Here, we only present the results from the largest datasets of each taxonomic group (CESTES ID 51, vertebrates; ID 5, invertebrates; and ID 12, vascular plants), and the rest of the results are provided in Text S3. distributions between presence and absence (I vs. III and II vs. IV in Figure 2) show that all the present subsets obtained lower DDA values than the corresponding absent subsets on average, and thus p (presence likelihood) attained higher values for the present subsets than for the absent subsets. These patterns were consistent for all the nine datasets ( Figure 2; Figure S5). Decomposed dda sp and dda site depicted further details of what accounted for the DDA variability among different subsets. Across the nine datasets, the response of DDA to presence/absence was primarily regulated by dda sp , while dda site was less sensitive at the subset level (Figure 2; Figure S5). The difference between suit and the actual presence rate was balanced by the constant parameter, δ. For five datasets out of the nine, presence rates were higher than 0.5 × suit on average (Table S2), and δ was positive, while in the other four datasets, δ was negative (Figure 2; Figure S5).

| Species and site characteristics determining individual DDA
Although dda site was less responsive than dda sp to presence/absence at the subset level (Figure 2; Figure S5), seven out of the nine datasets had at least one regression coefficient (b -.site ) of dda site with significant effects (Figure 3; Figure S6). At the species level, all the nine datasets had more than one b -.sp with significant association with dda sp . This result, that both species and site characteristics regulate DDA via their own dda, corroborates the assumption that mechanisms shaping dark diversity operate at both species and site levels.
While significant drivers of dda varied among study systems, there were certain agreements within and across taxonomic groups. F I G U R E 2 Estimated parameter distributions of the species-site unified model in four subsets of site-specific suitabilities × presence/ absence of species for three CESTES metacommunity datasets (ID 51,5,and 12). The sites × species metacommunity matrix was divided into two classes by the site-specific suitability (suit) below/ above 0.5 (left and right panels) and then by presence/absence (upper and lower panels) to obtain four subsets of the metacommunity matrix elements. For each subset, the percentiles of model parameters were drawn in whiskers Vertebrates (birds in New Zealand forests, Figure S6A; bats in Central Amazonian rainforests, Figure S6B; and fishes of tropical estuaries in Mexico, Figure 3a) tended to have particular foraging behaviours (canopy and understorey gleaner for birds, Figure S6A) or trophic level (phytophagous bats, Figure S6B) that were prone to being in dark diversity. Morphological properties including body mass also significantly associated with dda sp , although body mass had opposite effects. That is, the larger mass, the more in dark diversity for birds ( Figure S6A), but the less in dark diversity for bats ( Figure S6B). Another consistent pattern among the vertebrates was that site characteristics showed limited associations with dda site , having only one significant b -.site for each dataset.
Among invertebrates, butterflies showed a partial agreement between two different datasets from Czech nature reserves ( Figure 3b; Figure S6D). At the species level, trophic specialization had an agreement, where species with narrower trophic range belonged to more dark diversity. At the site level, the following variables showed agreements between the two datasets: habitat area (the larger habitat area, the higher dda site ) and edge characteristics (the larger perimeter/area ratio and the lower edge permeability, the higher dda site ), and the number of habitat types (the more habitat types, the higher dda site ). Another invertebrate dataset, freshwater macro-invertebrates (ID 6 from NE Spain, Figure S6C), also showed significant associations of different food types and feeding habits with dda sp . Although the two datasets of butterflies showed many significant site-level variables, this was not the case for macro-invertebrates.
Among vascular plant datasets (Polish montane forests, Figure 3c; Sphagnum peatlands across Europe, Figure S6E; and inland salt-/fresh-water marsh, Figure S6F), consistently shorter species were more likely to belong to dark diversity (negative b -.sp ). Smaller specific leaf area (SLA) also consistently showed negative b -sp for Polish forests and European peatlands (the marsh system did not have SLA measurements). The effect of life span on dda sp differed between stable forest understories and marshes with frequent flood disturbances, where relatively stable forests and unstable marshes tended to fall short and long lifespan species in dark diversity, respectively (annual/biennial and herb perennial in Figure 3c and perennial in Figure S6F). In the vascular plant datasets, there was no common site-level parameter, except altitude where higher latitude sites tended to fall more species in dark diversity in European peatland ( Figure S6E) but no such pattern in Polish montane forest ( Figure 3c). Several parameters showed significant effects on dda site of Polish forests (negative b -.site for bryophyte coverage, decomposition level of deadwood substrate and habitat area, and positive b 5.site for shade level in Figure 3c) and for European peatlands (positive b -.site for altitude, mean annual temperature and atmospheric SOx deposition in Figure S6E). In contrast, the marsh system did not have any site-level parameter being significant. All the estimated individual dda sp and dda site are listed in Figure S7.

| Difference between single-and dualdomain models
In addition to the species-site unified model (dual-domain), we examined whether single-domain models of species and sites could yield similar estimates of regression coefficients as the unified model. Based on a 95% credible interval, we found 28 out of 248 regression coefficients (11.3%) across all the nine datasets showing inconsistent significances ( Figure S8). This suggests that by accounting or not accounting the interplay of species and site, the two ap-

| DISCUSS ION
The disparity between suitability and actual presence/absence had been long depreciated as unexplained deviance before the concept of dark diversity was proposed as another potential facet of local biodiversity (Pärtel et al., 2011). Here, we advance the field by providing a framework with a new conceptual metric, dark diversity affinity (DDA), which can delineate two novel ecological insights: the drivers shaping dark diversity, which explicitly account for independent influences by species and sites ( Figure 3; Figure S6) and inherent tendencies of individual species and sites to increase dark diversity ( Figure S7). Our demonstration illustrates that DDA is not just an arithmetic apparatus filling the suitability-occurrence disparity, but it can link to ecological features governed by both species traits and site characteristics.
From the nine empirical datasets, we have already witnessed the F I G U R E 3 Posterior distributions of logistic regression coefficients for dda sp and dda site in the species-site unified model for three CESTES metacommunity datasets (ID 51, 5 and 12). The left and right columns show estimated parameters for species and site explanatory variables, respectively. Parameters with more than one density distribution (e.g. b 11.sp of panel B) are of categorical variables where multiple class-specific intercepts were estimated. Density distribution is coloured based on a 95% credible interval (0.025-0.975): black if significantly positive (higher tendency for dark diversity), white if significantly negative (lower tendency for dark diversity) and grey for those encompassing zero. Before the model fitting, all numerical explanatory variables were standardized to mean = 0 and sd = 0.5. first examples of how species and sites engaged in unexpected presence/absence. The result, that both species and sites could independently shape dark diversity patterns, emphasizes the importance of decomposition between species' and sites' influences.
Such analytical decomposition can reduce the potential of confounding factors between them (e.g. plant species height and vegetation height at habitat level). DIC comparison also supported the additional complexity of the unified model compared to most of species-and site-only models ( Table S3). macro-invertebrates of ID 6). Dark diversity patterns of those systems with limited significant parameters might be primarily shaped by stochastic processes or the constructed sub-models might be failing to provide relevant variables for such processes.

| Uniqueness and extension potentials of the framework
Previous approaches similar to ours, that explored relationships between species and sites with their properties, mainly aimed to better understand and predict species distributions from both community and environmental perspectives, such as those by the fourth corner analyses (Brown et al., 2014;Legendre et al., 1997), joint species distribution modelling (Clark et al., 2014;Ovaskainen et al., 2016) and their descendants. These approaches inspect how functional traits associate with species occurrence patterns in the context of environmental conditions (Jamil et al., 2013) and biotic interactions shaping biodiversity patterns (Warton et al., 2015). In contrast, the primary focus of our approach is on the dark diversity mechanisms and patterns.
The species-site unified model can be flexibly modified to fit a research question and dataset, as we built single-domain models.
Although the model comparison between the single-domain and the unified (dual-domain) model mostly supported our decomposition approach (Table S3; Figure S8), such simpler models might also be justified under specific conditions (e.g. very similar taxa or sites, or just a lack of trait or environmental data). In contrast, it is also possible to add more domains that independently associate with the unified DDA from other domains (e.g. characteristics of communities of mutualists and parasites, for example, see Krasnov et al., 2022).
The DDA concept also allows us to test the 'false absence' effect, which has challenged dark diversity studies by asking whether a species is truly 'absent' or just 'undetected' (Boussarie et al., 2018).
In particular, by adding explanatory variables related to species detection rates (e.g. crypticness, body size, or mobility; Garrard et al., 2013;Roth et al., 2018) and difficulties of site conditions for a survey (e.g. vegetation density, survey efforts, or observer performance; Chen et al., 2009;Lardner et al., 2019) in the regression sub-models (Figure 1a), our approach can indicate when dark diversity estimates might be affected by sampling bias. In such a way, the technique can improve biodiversity surveys and monitoring schemes.

| Contribution to ecological research
Analyses using a more relevant and comprehensive set of variables and sampling designs for dark diversity (e.g. DarkDivNet dataset by Pärtel et al., 2019) will discover more underlying mechanisms of dark diversity. Overall, our framework is expected to open new windows not only to the theory of community ecology but also to biodiversity conservation. By identifying accountable species traits and environmental conditions, conservation and restoration planning can be formulated more effectively, which informs if a conservation focus should be on population management (e.g. assisted dispersal, relaxing of competition), site management (e.g. prevention of forest fires, landscape restoration for better connectivity) or both. Furthermore, dda sp can be a proxy for local extinction risks of species, and dda site can be used as a metric for spatial conservation prioritization to identify conservation hotspots. Specifically, the model output regarding

ACK N O WLE D G E M ENTS
We thank Carlos P. Carmona for valuable comments. The work has been supported by the European Regional Development Fund (Centre of Excellence EcolChange) and the Estonian Research Council (MOBJD598, PRG609, PRG293).

CO N FLI C T O F I NTE R E S T S TATE M E NT
We declare that there is no conflict of interest to disclose.

PEER R E V I E W
The peer review history for this article is available at https: