#### Conceptual description of CATS

The process of community assembly that is conceptualized by the CATS model (Shipley et al. 2006, 2012; Shipley 2009, 2010a) is an extension of Keddy's (1992) notion of trait-based community assembly. The model considers vegetation existing at two spatial scales (local and landscape). A ‘local’ plant community consists of those plants found in an area that is sufficiently small in spatial scale such that there are no pronounced environmental gradients occurring within it. For herbaceous vegetation, this might be less than a few square metres while for trees this might be less than a hectare. The meta-community consists of the ensemble of local communities in the landscape that can potentially exchange propagules. Vegetation at this larger spatial scale will experience different environmental conditions. The list of the S species that occur in the meta-community, i.e. that can potentially disperse into the local community and which can survive the abiotic conditions (but not necessarily the biotic conditions) of the local community, is the species ‘pool’ for this local community. Thus a local community is nested within the meta-community. A species can potentially be rare (or absent) from a local community while common in the meta-community. Similarly, a species can be rare (but not absent) from the meta-community but common in the local community.

To distinguish between those trait-based processes within the local community that confer different adaptive advantages in the local environment from other processes occurring in the larger landscape that affect the differential influx of propagules of different species into the local community, we first assume a model in which the *per capita* probabilities associated with all demographic rates (dispersal, germination, survival, reproduction) in a given local community are equal across species. Therefore, any differences in functional traits between species in this local community are independent of relative abundances. Given this model, the expected number of propagules immigrating from the meta-community to the local community is equal to the relative abundance of each species in the meta-community (‘dispersal mass effect’). Since all subsequent demographic rates between species in the local community are equal, the expected relative abundance of each species in the local community is also equal to the meta-community relative abundance, and subsequent deviations from this expected relative abundance in each local community are due solely to random demographic stochasticity. This assumption is similar to those of neutral models (Bell 2000; Hubbell 2001), except that neutrality is not required in the larger meta-community.

Now assume a model in which some functional traits do affect dispersal ability and subsequent probabilities of germination, survival, growth and reproduction in the local community. Given this assumption, species having traits that increase or decrease their dispersal ability will increase or decrease in relative abundance in a local community relative to this neutral expectation, because more or fewer propagules of such will arrive in the local community than expected given the abundance of the species in the meta-community. Once propagules reach the local community, then those individuals having better-adapted traits for the local environment will have higher probabilities of survival, growth and reproduction. Species possessing such better-adapted individuals therefore increase in relative abundance relative to the neutral expectation, while species possessing poorly-adapted individuals decrease in relative abundance relative to the neutral expectation. The compositional structure of this local community is therefore determined both by the influx of immigrants from the meta-community and by trait-based selection from the local environment.

If all individuals have the same probabilities of survival, growth and reproduction in the local community (i.e. if there is no trait-based local selection), and if all individuals in the meta-community have the same probabilities of immigration to the local community, then the structure of the local community will be the same as the structure of the meta-community plus random variation due to demographic stochasticity. In addition to demographic stochasticity, the only cause of local relative abundance of any given species is its abundance in the meta-community (i.e. dispersal mass effects). The vector of relative abundances for each of the S species in the meta-community is therefore called the ‘meta-community prior’ distribution; in previous publications this was called the ‘neutral’ prior (Shipley 2010a; Sonnier et al. 2010; Shipley et al. 2011, 2012). At the other extreme, if the probabilities of immigration, survival, growth and reproduction in the local community are entirely determined by trait-based selection, then the structure of the meta-community will be irrelevant to the local structure. This is because each species has, by definition, a non-zero probability of immigrating to the local community and of subsequently surviving the local abiotic conditions (otherwise it would not be part of the species pool). Once a species is present in the local community, then its subsequent survival, growth and reproduction are entirely determined by the traits of its individuals. Finally, local communities that are affected both by dispersal mass effects via the meta-community and by local trait-based selection will be located along a continuum between these two boundaries.

The processes generating patterns in a local community and in the meta-community are partially overlapping but they are not the same. Explicitly modelling such processes at both scales and linking the two via immigration is very difficult without making unrealistic assumptions about the relative importance of these processes *a priori*, which is self-defeating when the purpose is to empirically measure them. Instead, the meta-community pattern is measured, not modelled, and is treated as prior information. I then model the local community by assuming (as endpoints along a continuum) that this prior information is either irrelevant (because local trait-based selection is dominant) or is the only relevant information available (because local trait-based selection is absent).

#### Mathematical description of CATS

This conceptual model is translated into the quantitative CATS model based on the maximum entropy formalism of Jaynes (2003), as described in detail in Shipley (2010a). Here, I briefly outline its main points. The CATS model has three inputs: (i) a trait matrix, **T** = {t_{ij}}, of the j = 1, n chosen functional traits of each of the i = 1, S species known to occur in the species pool of the meta-community (these trait values can either be species averages or measured directly in the local community), (ii) a vector of *n* community-weighted trait values, , estimating the trait values of an average individual in the local community, and (iii) a prior probability distribution, **q **= {q_{1},…,q_{S}} specifying the hypothesized contribution of the meta-community in determining the structure of the local community. Here, o_{i} is the observed relative abundance of species i in the local community.

The classical model *R*^{2}, which is applicable to a multiple linear regression with a normal error structure, measures the proportional reduction in the model sum of squares (the model deviance given a normal error structure) due to the chosen regressors relative to an ‘intercept-only’ baseline model. This intercept-only baseline model is simply the mean value of the dependent variable: . In the context of Eq. (3) the proportion of the total deviance encoded in the observed relative abundances (o_{i}) that is accounted for by the model is measured by the Kullback-Leibler index (, Eq. (4)), which is a generalization of the classic R^{2} index for maximum likelihood estimation of a non-linear regression with a multinomial error structure (Cameron & Windmeijer 1997), which is formally equivalent to the maximum entropy solution in our case (Shipley et al. 2012, supplement). In the context of Eq. (3), the equivalent baseline model is , where **q**_{0} is the maximally uninformative uniform prior. This baseline model is the expectation when there is no contribution from either the meta-community or from trait effects. The Kullback–Leibler index involves the ratio of two Kullback–Leibler divergences. The Kullback–Leibler divergence, , measures the amount of information lost when approximating the observed distribution of relative abundances, **o**, by another distribution **m** that has been obtained from some model; the larger the value of D_{KL}(**o**||**m**) the more poorly **m** approximates **o**. Since **q**_{0} (the uniform distribution) is the prior distribution that encodes only the maximally uninformative information, and which allocates abundances to one of S mutually exclusive and unordered states (i.e. species in the species pool), any model producing **p** that predicts **o** better than does **q**_{0} will yield ; in this case the model is based on some correct information. A model producing **p** that perfectly predicts **o** will yield ; in this case the model is based not only on correct, but also complete information. If then this means that the model producing **p** predicts **o** even worse than one using the minimum amount of true information (i.e. **q**_{0}); in this case the model is actually based on false information. The inclusion of regressors (i.e. the species traits in our case) can only improve the model fit relative to the baseline model (non-significantly so if the predictors are actually independent of the response variable), but never decrease it. However, the inclusion of priors other than **q**_{0} can decrease model fit relative to the baseline model if the information encoded in such priors is incorrect.

- (4)

The decomposition of causes requires the fitting of data to the CATS model four times given different assumptions (Fig. 1); this can be done via the maxent and maxent.test functions of the FD library of R (R Foundation for Statistical Computing, Vienna, AT). The first model involves specifying a maximally uninformative (i.e. uniform) prior distribution, i.e. q_{i} = 1/S for Eq. (3), and random permutation of the trait vectors among species. This forces the traits to be independent of the observed relative abundances due to the random permutations, while also ignoring any contribution from the meta-community. The distribution of values of the resulting statistic is obtained from many independent runs of the permuted trait vectors and is an estimate of the average value of fit under this null hypothesis . This null distribution is used in the inferential test of significance of traits as described in Shipley (2010b). Since this estimate contains the minimum possible information from the prior, and no information from traits, it measures the fit due solely to model bias in the same way that the expected value of the classic model R^{2} under the null hypothesis is used to correct for model bias in a regression context. In the context of a classical multiple linear regression, the expected value of this model bias is known and is a function of the model degrees of freedom (df), and thus, the number of predictor variables, relative to the residual *df*, thus the total number of observations (Fisher 1925a). In the more general context of this paper, the analytic formula is not known and so it is estimated by permutation methods.

The third model involves specifying the meta-community prior but again randomly permuting the trait vectors between species, as in the first model, in order to measure the degree to which the meta-community abundance structure resembles the local abundance structure. This involves fitting a model to Eq. (3) in which q is the measured vector of meta-community relative abundances, but the traits are forced to be independent of the observed relative abundances due to the random permutations. When fitted using many independent runs of the permuted trait vectors and averaging, one obtains . However this step, as originally described in Shipley et al. (2012), can also result in negative values of if not properly modified. Negative values of occur when the traits are, in fact, associated with the relative abundances but the direction of the association is in opposite directions in the local and meta-communities. By permuting the traits relative to the observed local relative abundances one is breaking any association between traits and local relative abundances. However, such permutations do not break any association between traits and meta-community relative abundances. Negative values of therefore occur because species with certain trait values cause them to have higher than average relative abundances in the meta-community, but these same trait values result in lower than average relative abundances in the local community. To avoid this nonsensical result we therefore modify our definition to .

The final model involves both using the meta-community prior and using the observed trait vectors between species. This includes the contribution both from the meta-community prior and from the traits. Fitting this model yields . Again, contributions from the meta-community are either irrelevant, given the traits, or else they improve the fit. Because of this, I define .

Given these four steps, we have two measures each of the contributions of the traits and dispersal mass effects via the meta-community. The increase in the explained deviance due to traits can be measured either by . The first relation measures the increase in the explained deviance due to traits beyond that due solely to model bias, while the second relation measures the increase in explained deviance due to traits beyond that due to contributions (if any) made by the meta-community. This model bias is exactly equivalent to the model bias in the classic model R^{2} statistic of linear regression, in which the expected value of the classic model R^{2} is greater than zero even given independence between the dependent and predictor variables in any finite sample (Fisher 1925a,b). The increase in explained deviance due dispersal mass effects via the meta-community can be measured by either . The first relation measures the increase in the explained deviance (if any) due to the meta-community beyond that due to model bias, while the second measures the increase in the explained deviance due to the meta-community, given the traits, relative to the explained deviance due only to the traits.

A decomposition of the total deviance of a model consists of expressing this total deviance as the sum of a series of deviances due to mutually exclusive sources, as is the case in an anova. There are different ways of performing such a decomposition. Here I consider a decomposition in which neither trait-based selection nor meta-community processes are assumed primary *a priori*. Note that the unexplained deviance of the most complete model is . Since Shipley et al. (2012) did not provide proofs for the equations that follow, these are given in electronic Appendix S1.

Simulations: The simulations generate distributions of relative abundance both in the meta-community and in the local community, and also specify the link between the two scales. Since all real relative abundance distributions are strongly uneven, with a few dominant species and many subdominants, this pattern is respected in the simulations. Furthermore, I keep these simulations as simple as possible in order to facilitate clarity in the underlying patterns.

The relative abundances for each of ten species (the species pool) in the meta-community are generated using Eq. (9). In this equation the variable x_{i} represents the value of some property (x) determining the relative abundance (q_{i}) of species i in the larger meta-community, and w (set to 0.3 in these simulations) is a weight representing by how much a unit change in x would change relative abundance. Note that if w is zero then the meta-community prior reverts to a uniform distribution. Positive values of w mean that species having more of property x will have higher relative abundance, while negative values of w mean that species having more of property x will have lower relative abundance. This property (x) represents some factor acting in the larger landscape and potentially (if w ≠ 0) causing different species to have different abundances in this larger landscape. For instance, it could be the same trait as modelled in the local community, it could be an unmeasured trait, it could represent the preference of each species by humans in the past, resulting in different relative abundances of these species in the landscape, or it could be any other cause generating differences in relative abundance in the meta-community, including purely neutral processes such a random speciation events (Hubbell 2001). Because x could potentially be a trait, the meta-community assembly is not necessarily neutral, which is why I qualify the neutral assumption to ‘local’ neutrality. Whatever the nature of this property, it could be independent of the causal factors determining relative abundance in the local community or it could be correlated with these local causal factors. In empirical studies the meta-community prior would come from the estimated relative abundances of each species measured at the meta-community level. For instance, if many local communities have been sampled, then the meta-community relative abundances would come from the pooled abundances of each species over all local communities.

- (9)

The relative abundances (o_{i}) of each of ten species in the local community are generated using Eq. (10). For simplicity, I use only a single trait. This is the same as Eq. (3) plus a random value (u_{i}) representing demographic stochasticity, which is generated by a uniform random value between 1 and 10, and *a* is the weight (set to 0.2 in these simulations) associated with u; larger values of *a* result in more random variation in the local relative abundances around the value predicted by the CATS model. As previously stated, t_{i} is the trait value of species i (here, t_{i} = i) and λ is a weight measuring by how much a unit increase in the trait will change the proportional relative abundance of species in the local community. If λ is zero, then there is no trait-based selection in the local community and observed relative abundances are entirely determined by the random component.

- (10)

The link between local and meta-community relative abundances is made in two ways. First, the local relative abundances are partly determined by the meta-community relative abundances (**q**), as specified by Eq. (10). Second, one can introduce an indirect link between the meta-community and local relative abundances by allowing a correlation between the landscape property (x) and the trait values (t). To produce a strong level of correlation between x and t, I let x = t + N(0,1), where N(0,1) is a random value drawn from a normal distribution with a mean of zero and a SD of 1. The simulations were done using the R language. The λ parameters of Eq. (3) were estimated by entropy maximization using the Improved Iterative Scaling algorithm (Della Pietra et al. 1997) as implemented in the maxent function of the FD library in R. The values of in each simulation run were estimated from 500 independent permutations using the maxent.test function of the FD library. For each case, I ran 50 independent simulations. Appendix S2 provides a worked example of a simulation run and Appendix S3 provides the R script for the simulation. Appendices S4 and S5 provide the R script for the maxent and maxent.test functions.