Should ecologists prefer model‐ over distance‐based multivariate methods?

Abstract Ecological data sets often record the abundance of species, together with a set of explanatory variables. Multivariate statistical methods are optimal to analyze such data and are thus frequently used in ecology for exploration, visualization, and inference. Most approaches are based on pairwise distance matrices instead of the sites‐by‐species matrix, which stands in stark contrast to univariate statistics, where data models, assuming specific distributions, are the norm. However, through advances in statistical theory and computational power, models for multivariate data have gained traction. Systematic simulation‐based performance evaluations of these methods are important as guides for practitioners but still lacking. Here, we compare two model‐based methods, multivariate generalized linear models (MvGLMs) and constrained quadratic ordination (CQO), with two distance‐based methods, distance‐based redundancy analysis (dbRDA) and canonical correspondence analysis (CCA). We studied the performance of the methods to discriminate between causal variables and noise variables for 190 simulated data sets covering different sample sizes and data distributions. MvGLM and dbRDA differentiated accurately between causal and noise variables. The former had the lowest false‐positive rate (0.008), while the latter had the lowest false‐negative rate (0.027). CQO and CCA had the highest false‐negative rate (0.291) and false‐positive rate (0.256), respectively, where these error rates were typically high for data sets with linear responses. Our study shows that both model‐ and distance‐based methods have their place in the ecologist's statistical toolbox. MvGLM and dbRDA are reliable for analyzing species–environment relations, whereas both CQO and CCA exhibited considerable flaws, especially with linear environmental gradients.


| INTRODUC TI ON
Which environmental gradients determine species abundances and community composition is one of the most essential questions in ecology (Clements, 1907) and the current alteration of ecosystems at an unprecedented rate endows it with a new urgency (Pacifici et al., 2015). Given the complexity of simulating ecological systems under artificial conditions (e.g., in microcosms), monitoring the abundance or occurrence of taxa across sites with variable environmental conditions has been one approach to tackle this question. Related studies deliver a sites-by-species matrix Y containing multivariate species abundances, which is then statistically related to a sites-by-predictors matrix X, containing information on the environmental predictors.
From a statistical perspective, Y has many undesirable properties such as intercorrelations between variables, for example, through biotic interactions (Morales-Castilla, Matias, Gravel, & Araújo, 2015), probability distributions other than the normal, more species than sites (high dimensionality, especially in DNA barcoding studies, Cristescu, 2014), and many zeros, because most species are commonly absent from most sites (sparsity, McGill et al., 2007).
While univariate data (i.e., one response but possibly multiple explanatory variables) are routinely analyzed by model-based methods such as ANOVA, generalized linear models, and linear mixed models, multivariate data are most often analyzed with distance-based methods. The latter analyze a pairwise matrix of distances or dissimilarities instead of the sites-by-species matrix. They include a multitude of approaches, such as correspondence analysis (CA), nonmetric multidimensional scaling (NMDS), and principal coordinates analysis (PCoA). Their common ground lies in not assuming a specific parametric underlying model for how the data were generated. Different authors group slightly different methods under this label. Warton, Wright, and Wang (2012), for example, exclude CA, while Roberts (2019) explicitly includes it. We follow the wider definition of Roberts (2019) and consider constrained correspondence analysis (CCA) as an example of a distance-based method. An alternative designation for this group is algorithmic or algorithm-based (Warton, Foster, De'ath, Stoklosa, & Dunstan, 2015).
In distance-based method, the researcher takes the data's statistical properties into account when selecting a distance metric. For instance, Minkowski distances (e.g., Manhattan and Euclidean) assume a constant variance across all mean values (ter Braak & Prentice, 1988) whereas species abundances often show a quadratic mean-variance relationship (Routledge & Swartz, 1991;Yamamura, 1999). Whether a distance metric is appropriate depends on the properties of the data and the aim of the study, as each metric extracts different information from the raw data. The choice is complicated by the vast amount of available metrics (see Legendre & Legendre, 2012). An alternative to distance-based analyses that accounts for mean-variance relationships and incorporates ecological assumptions is the model-based approach.
The model-based approach consists of explicitly specifying a statistical model of the process that generated the observed data (Warton, Foster, et al., 2015). This includes properties such as marginal distributions and corresponding parameters, overdispersion, zero inflation, mean-variance relationship, and correlation structure, all of which can be flexibly tailored to the data and the research question. While this approach is ubiquitous in univariate analyses (Bolker, 2008;Zuur, Ieno, & Elphick, 2010), it has long been uncommon in multivariate ecological analyses, largely due to the absence of suitable models (Anderson, 2001). However, advances in statistical theory and computation power have led to a surge of models for multivariate abundance data.
In MvGLM, a separate univariate GLM is fit to each taxon, with each model using the same predictors. Univariate GLMs are a flexible method and are strongly advocated for the analysis of count or occurrence data as they can handle different residual distributions and mean-variance relationships (O'Hara & Kotze, 2010;Warton & Hui, 2011). Extending them to multispecies abundance data was thus a natural starting point for multivariate model-based analyses . The univariate models are combined by summing their test statistics, which allows for inference on the whole community. The use of MvGLM, facilitated by an easy-to-use implementation in R (in the mvabund R package, Wang, Naumann, Eddelbuettel, Wilshire, & Warton, 2019), has steadily increased within the ecological community. However, direct comparisons of MvGLM to other methods remain rare, with a few exceptions. Warton et al. (2012) showed that MvGLMs, in contrast to distance-based methods, can differentiate between location (difference in mean) and dispersion (difference in mean-variance relationship) effects.  found that the statistical power of MvGLMs was higher or at least equal to that of principal response curves (a form of redundancy analysis) when used for the analysis of ecotoxicological semifield studies. However, systematic studies of data sets with known properties are lacking and this paucity of studies hampers our capacity to make informed decisions on the selection of methods for multivariate data analysis.
We compared the performance of MvGLMs to differentiate between causal and noise variables to three methods of data analysis: constrained quadratic ordination (CQO), which is also model-based, canonical correspondence analysis (CCA), and distance-based redundancy analysis (dbRDA), which are distance-based. We applied the methods to 190 combinations of abundance data sets and explanatory variables. The abundance data differed in distributions and sample sizes. Based on the assessment of a variable's statistical significance, false-positive rate (FPR) and false-negative rates (FNR) were calculated.

| Data generation
Species abundances were simulated as counts, a common abundance measure in ecology (Warton, 2008b). Abundances were stored in Y, an N × S matrix of responses, in this case, the abundances of S species, s = 1 … S, at N sites, n = 1 … N. The species in Y responded to environmental variable x m with one of three different response types: unimodal (U), linear (L), or bimodal (B), as shown in Figure 1. Unimodal responses are most common in nature (Jansen & Oksanen, 2013;Lawesson & Oksanen, 2002) and bimodal shapes are expected to occur when competition restricts realized niches to gradient extremes (Hardin, 1960;Mueller-Dombois & Ellenberg, 1978). Linear responses may be the result of a stressor gradient shaping communities or may arise if the sampled gradient range is short relative to the species' tolerance. The environmental variables are stored in X an N × M matrix with M environmental variables, m = 1 … M. We simulated three different types of communities. The main focus of this study is the type I communities which are described below. Type II and type III communities represent F I G U R E 1 Simulated abundance responses along two causal variables (env1 and env2). Response combinations are as follows: unimodalunimodal (left), unimodal-linear (middle), and unimodal-bimodal (right). The vertical axis indicates abundance. The different colors represent different species. All the examples show the unsampled abundance matrix Y Large of type I communities F I G U R E 2 Flowchart of the type I community simulations. The environmental space is comprised out of two variables (env1 and env2) and 10,000 unique sites. At each site, the abundances of nine species are simulated. Abundance responses to environmental gradients display three shapes: unimodal (U), linear (L), and bimodal (B). All nine species of one community show the same response shape (with varying parameters) to one gradient, but response shapes can differ between gradients. All six possible combinations of response shapes are sampled with six different sample sizes spanning from 25 to 900. Before these data are analyzed, two noise variables are added to X. For each response shape-sample size combination, five different pairs of noise variables are appended to X communities with more heterogeneous responses to environmental variables and may be considered more realistic. They were used to evaluate the robustness of the results and conclusions based on type I communities. They are described in a separate section. The simulation process for type I communities is visualized in Figure 2.
We simulated abundances along two environmental gradients env1 and env2, which henceforth will be referred to as causal variables to differentiate them from the noise variables. Both causal variables consist of the natural numbers from 1 to 100. Each possible combination of the two is a site, that is, the total number of sites N = 10,000. Y Large holds the simulated abundances for all 10,000 sites. This data set is larger than most ecological field data sets, and fitting models to it would have required considerable computation time. Therefore, we sampled from Y Large with six different sample sizes (25, 100, 225, 400, 625, and 900) Figure 1). This setup allows for six communities each with a different combination of response types, including those with identical response types to both variables ( Figure 2). The communities are labeled with their abbreviated response types, for example, UB for a community in which species' abundances respond unimodally to the first and bimodally to the second causal variable (Figure 1c).
Unimodal responses were simulated using the Gaussian response model (Gauch & Whittaker, 1972)  By multiplying instead of adding the abundance values, we ensured that a species is absent from sites where its abundance is zero for one of the gradients, that is, is outside of its niche. The products were rounded down, as abundances can only take integer values.
After the abundances were simulated, noise variables were appended to the matrix of environmental variables X. They were simulated from a standard normal distribution, scaled to the same magnitude as the causal variables, and restricted to be orthogonal to them and to each other. We obtained five different versions of these noise variables by altering the random number generation seed, giving us five different versions of X per sampled community Y Sampled . In total, we sampled six different communities six times each and have five matrices with environmental data per sample, resulting in 180 data sets per method of data analysis.
The simulated communities are a simplification of ecological field data. They consist of only nine species and are neither high dimensional nor do they exhibit intercorrelation. However, they are not normally distributed and sparse, thereby featuring two of the common issues mentioned above. This relative simplicity eases interpretation of the results. The simulation process is visualized in Figure 2. More details on the parameterization of the models are provided in Table A1.

| Type II and III communities
We simulated two further types of communities to explore the methods' performance when used for more heterogeneous communities. Type II communities consist of only three species. Each species shows the same response shape toward both gradients, but the response shapes differ between species. They contrast with type I communities, where all species exhibit a uniform response shape and allow us to test the influence of deviations from this uniform response on the results. We simulated five type II communities with 625 sites and different random number generation seeds. The parameters were chosen so that the total abundance over all sites was equal for all species (Table A2). Type III communities represent more realistic assemblages. They harbor 30 species sampled at 625 sites, and species abundance distributions (SADs) were simulated with a Gambin model (Ugland et al., 2007) with 10 octaves and a shape parameter of 5 using the gambin R package (Matthews et al., 2014). Octaves followed the common log2 series already used by Preston (1948). The actual maximal abundance was randomly sampled from the interval of the respective octave, and the tolerance was set to the same number as the maximal abundance. All species respond unimodally to both gradients, and the locations of their optima were randomly sampled from all coordinates of the grid. Again, five different communities were simulated. All SADs were drawn from the same Gambin model with different random number generation seeds.

| Overview of methods
In the following, the methods of data analysis will be introduced briefly. Each section is concluded with details on how we applied the method in this study. (1)

| Multivariate generalized linear models
A MvGLM consists of S separately fitted univariate GLMs. The likelihood ratio test statistics of all univariate models (i.e., species) are added for each environmental variable to obtain the sum-of-likelihood-ratio statistics. For these statistics, p-values related the null hypothesis that a given environmental variable has no effect on the mean community abundance can be calculated. We fit MvGLMs with Poisson, negative binomial (both with log-link), and Gaussian residual distributions (with identity link) to each community and compared their Dunn-Smyth residual plots (Dunn & Smyth, 1996) and Akaike's information criteria (AIC, Akaike, 1974). We did not test models with quadratic or higher order polynomial terms. The likelihood ratio test statistic was calculated for the best fitting model (least patterns in residuals and lowest AIC). To estimate p-values, we used a residual permutation bootstrap with 1,000 repetitions (Davidson & Hinkley, 1997).

| Constrained quadratic ordination
Like the MvGLM, the CQO is related to the GLM. It is based on vector generalized linear models (VGLMs), which are a further generalization of GLMs. All GLMs are special instances of VGLMs, just like linear regression is a special instance of a GLM. They are not restricted to the exponential family, include multivariate response models, and can explicitly model other response parameters than the mean (e.g., the variance or higher order moments). CQO builds on reduced-rank VGLMs, in which the M original predictors are reduced to R latent variables .
This entails the reduction of the hat matrix H, which holds the regression coefficients β, to a rank R matrix H R . So unlike a MvGLM, CQO reduces the data's dimensionality, and in contrast to most ordination techniques (including dbRDA and CCA), the researcher specifies the β 1 is the intercept term, and is the linear predictor. It assumes symmetric and unimodal responses to the latent variables. CQOs were run with Poisson residual distribution and the canonical loglink function. The four explanatory variables were scaled and centered before fitting the models. The effective nonlinear degrees of freedom were set to 1.5 as suggested by Yee (2015). Each model was run fifty times, and the deviances of each run were compared.
If the lowest deviances are too far apart, the solution might be local and the model should be refitted. Here, we fit the model again, until the difference between the lowest and the fifth lowest deviance no longer exceeded 3. In its current implementation in the VGAM R package (Yee, 2019), CQO does not provide p-values (but see Yee, 2010). To compare its results with the other methods, we calculated pseudo-p-values for the CQO (details of the procedures can be found in the Appendix 1). Shortly, to determine the pseudo-pvalue of environmental variable m, we permuted the variable 100 times and fit a CQO to each permuted data set. For every model, the absolute values of the constraint coefficient across both latent variables were added for environmental variable m, to obtain the The proportion of test statistics of permuted data sets that were larger than that of the unpermuted data set is the pseudo-p-value. All models were fit with ranks 1 and 2. The optimal number of ranks was found to be 2 for all models, determined by the AIC as proposed by Yee and Hastie (2003).

| Canonical correspondence analysis
Canonical correspondence analysis is the heuristic solution to restricted Gaussian regression (Zuur, Ieno, & Smith, 2007). In the latter, one tries to estimate the parameters u, t, and c of a Gaussian response model (see Equation 1), but instead of the measured environmental variables, their linear combinations are used as x. Though it is possible to estimate the parameters with iteratively reweighted least squares in a GLM, this was to computationally intensive at the time the method was proposed by Gauch and Whittaker (1972). Instead, ter Braak (1986) proposed to approximate the results by CCA, which is valid as long as: All species have equal tolerances t and maximal abundances c, their responses are unimodal and symmetrically bell-shaped, and their optima c are spread uniformly in the ordination space. These assumptions are collectively known as the species packing model. Palmer (1993), Johnson and Altman (1999) and Zuur (1999)  where u = (u 1 … u S ) t , D c is a diagonal matrix with the abundance of species s across all sites as its s,s-th element, and Y t denotes the transpose of Y. The species scores are in turn used to calculate the site scores as their weighted average Z wa (Equation 5) where D r is a diagonal matrix with the abundance of all species at site n as its n,n-th element, and D −1 r denotes the inverse of D r . Z wa is regressed against X to obtain the weighted regression coefficient α.
Lastly, Z is calculated as the product of X and α. This procedure is repeated until convergence.
The distance between sites (scaling 1) or species (scaling 2) in a CCA approximates their two-dimensional chi-square distance, that is, the Euclidean distance between the expected abundances under the null hypothesis, that abundances do not change along environmental variables and the actual data. Explanatory variables were scaled and centered. Hypothesis tests for environmental variables can be conducted using a pseudo-F-statistic with permuted residuals (Legendre, Oksanen, & Braak, 2011) and the null hypotheses that the effect of the variable on the response is equal to zero after accounting for the effect of all other variables. Hypothesis tests were conducted with 999 permutations.
For type I and II communities, we did not transform the abundances as all species had similar or equal maximal abundances.
For type III communities, the CCA was run with untransformed, square root-transformed, base 2 log-transformed, and Hellingertransformed abundance data.

| Distance-based redundancy analysis
Distance-based redundancy analysis (dbRDA) is a variation of the commonly used redundancy analysis, proposed by Legendre and Anderson (1999). It is not based on one specific distance measure but instead can adopt any chosen measure. It is the constrained form of principal coordinate analysis (PCoA, Legendre & Anderson, 1999), which will be shortly addressed here. In PCoA, Y is transformed into a centered distance matrix Δ. The columns of the matrix PC are the eigenvectors of Δ scaled to a length that is equal to the square root of their eigenvalues (Gower, 1966). Each row of PC gives the eponymous Principal Coordinates of one observation. In a dbRDA, this matrix PC is linearly related to the explanatory variables by an RDA. The dbRDA preserves the distance metric of Δ, which can be metric, semi-, or nonmetric. dbRDA was highlighted by , because the possibility to use asymmetrical distance metrics makes them appealing for sparse data sets. We used the Bray-Curtis distance, which is the reciprocal of the Steinhaus coefficient (Motyka, 1947), to calculate Δ. As in CQO and CCA, environmental variables were scaled and centered.
The significance tests for explanatory variables are calculated using a pseudo-F-statistic in the same manner as for the CCA. dbRDA of type III communities was run with untransformed, square root-transformed, base 2 log-transformed, and Hellinger-transformed abundance data.

| Comparison of methods
The benefit of using simulated rather than field data are twofold: (a) There is a clear dichotomy between causal and noise variables, and (b) we know whether a given explanatory variable is causal or noise. This enables us to compare the methods in terms of their classification error rates. To this end, we calculated false-positive (FPR) and false-negative rates (FNR) for each method.
where FP is a false positive, TN a true negative, FN a false negative, and TP a true positive. A false positive occurs when a noise variable is classified as causal, whereas a false negative when a causal variable is classified as noncausal. True positives and negatives are instances where the variable is labeled correctly. An FPR of 0.5, for example, would indicate that half of all variables that were determined to be causal are in fact noise. Variables with a p-value lower than the significance level (α) were classified as causal whereas all variables with p > α were classified as noise. To alleviate the problematic dichotomy of statistical significance (Greenland et al., 2016), we use five different significance levels α (0.01, 0.03, 0.05, 0.07, and 0.1). This allows us to evaluate trends in classification strength over different thresholds.

| RE SULTS
We report the means and standard deviations of p-values of MvGLM, CQO, CCA, and dbRDA for all explanatory variables on type I communities (Table 1); p-values for all combinations of response shapes and sample sizes as well as type II and III communities are given in Tables A3-A8. In most MvGLMs, negative binomial residual distribution achieved the lowest AIC and the best fit to model assumptions.
The plot of Dunn-Smyth residuals against the linear predictor of LL ( Figure A1) showed arched patterns, which could indicate that the residuals were not independent of the explanatory variables.
Nevertheless, we used a negative binomial residual distribution because the visual inspection of the QQ plots suggested that it resulted in a better fit than Poisson or Gaussian distributions.
MvGLMs' p-values for both causal variables and all response type combinations were low (Table 1 and Figure 4). The p-values of the linear variable in LB and UL and of the bimodal variable in UB were higher at the smallest sample size than at higher ones (Table A3). Otherwise, the sample size had no effect on the p-values of the causal variables, which were often minimal (1 divided by the number of permutations + 1). The p-values for noise variables were higher and varied strongly. They only  (Table A3).
The FPR was the lowest of all methods (0.008 at α = 0.05) and always well below the respective significance level. Overall, FPRs and FNRs of MvGLMs were very low ( Figure 3). Interestingly, the p-values of noise variables did not show a monotonic positive relationship with sample size, as we expected. Rather, the response seemed unimodal in UU, LL, and BB, slightly negative in LB and UL, and positive for UB (Table A3).
MvGLM had a FNR and FPR of zero with both type II and type III communities at all significance levels ( Figure 3).  (Table A4). The mean p-value of unimodal variables excluding those from UL is 0.006 ± 0.022 compared to 0.036 ± 0.084 for the unimodal variable in UL.
Similarly, the mean p-value of bimodal variables except for those form LB is 0.004 ± 0.015, and for the bimodal variable in LB, it is 0.042 ± 0.083. This mixed performance leads to relatively high mean p-values for the causal variables (Table 1) and accordingly high FNR and FPR (Figure 3).
Constrained quadratic ordination is the only method that has nonzero FNR in type II and III communities. In type II communities, the FNR is 0.2 for α = 0.1 and zero for all other significance levels.
In type III communities, the FNR is 0.43 for an α between 0.01 and 0.03, and then decreases to 0.17 for all higher significance levels ( Figure 3).
Canonical correspondence analysis has the highest mean p-values for causal variables and the lowest for noise ones. Accordingly, the FPR was the highest of all methods (Figure 3). Irrespective of significance level, it is more than one order of magnitude higher than for all other methods. These problems are due to two factors: (a) high p-values for causal linear variables and (b) low p-values for noise variables. The mean p-value for causal linear variables is 0.963 ± 0.094.
Additionally, CCAs of LL with sample sizes 400-900 produced constrained inertias (explained variance) of 0 and were therefore excluded from significance testing. Noise variable p-values were especially low in UU and UB (Figure 4), which is interesting since these data sets matched closest with the assumed species packing model.
In BB, they were markedly higher (Table A5). The impact of different sample sizes was negligible in all response combinations (Table A5).
The FNR was 0 in all type II and type III communities, whereas the FPR depended on the type of transformation. With Hellinger-, square root and base 2 log-transformed data, FPR was zero for type TA B L E 1 Mean p-values ± standard deviations of the causal (env1 and env2) and noise variables from multivariate generalized linear models (MvGLMs), constrained quadratic ordination (CQO), canonical correspondence analysis (CCA), and distance-based redundancy analysis (dbRDA) on type I communities

F I G U R E 3
False-positive rate and false-negative rate of the four statistical methods canonical correspondence analysis (CCA), constrained quadratic ordination (CQO), distance-based redundancy analysis (dbRDA), and multivariate generalized linear model (MvGLM) with type I, II, and III communities. CCA_log and dbRDA_log show the results of CCA and dbRDA on base 2 log-transformed data, which yielded to lower or equal FPR than the other two transformations (see Figure A6). Points are jittered slightly along the x-axis II and III communities at all significance levels, except for square root-transformed data it is at the significance level 0.1 in type III communities where it increased to 0.1. The FPR on untransformed data was higher. It was overall highest, in type II communities with 0.5 at the significance level 0.1. In type III communities, CCA maintained an FPR of 0.1 between the significance levels of 0.01 and 0.03 and increased to 0.2 for higher significance levels.  (Table A6). However, they were below 0.1, so that the dbRDA had an FNR of 0 at α = 0.1 Indeed, the FNR was the lowest of all methods ( Figure 3). The FPR was relatively high, with 0.039 at α = 0.05. p-values were relatively similar for all sample sizes (Table A6).
At most significance levels, the FPR of dbRDA in type II and III communities was higher than in type I communities. In type II communities, it was 0.1 at α = 0.05 and increased to 0.3 at α = 0.1. For type III communities, FPR increased steadily with significance level, reaching 0.3 with Hellinger transformation and 0.33 on untransformed data at α = 0.1. Both square root and base 2 log transformation had lower FPRs, reaching 0.2 at α = 0.1. Log-transformed data maintained an FPR of 0 for all sigificance levels below 0.07. Both distance-based methods were considerably faster than the model-based ones ( Figure A3).  is computationally expensive. Another option is shrinking the correlation matrix toward identity using ridge regularization (Warton, 2008a(Warton, , 2011. Both alternatives use generalized estimation equations (GEE) with the sandwich-type-estimator of Warton (2011). As

| D ISCUSS I ON
GEEs do not provide likelihoods, other test statistics than the likelihood ratio have to be used. Current options are the score and the Wald statistic. However, these methods also require resampling, as asymptotic marginal distributions of regression parameters for GEEs are not specified for data sets with more species than sites. Testing these methods on data sets with known correlation structures could highlight stronger performance differences, as the other methods lack adjustments to these properties.
MvGLMs are the only method considered here that does not provide an easy to use and to interpret method for visualizing the data. dbRDA was least influenced by different response types and sample sizes in type I communities.
In type II and III communities, however, FPR was higher than that of both model-based methods. Square root and log transformation lowered FPR compared to dbRDA on untransformed data, but remarkably Hellinger transformation did not lead to a lower FPR. For α = 0.75, the FPR of dbRDA on Hellinger-transformed data was even higher than that of dbRDA on untransformed data.
Small p-values were scarce for noise variables but occurred at all sample sizes and response types. dbRDA's good performance on type I communities is in concert with other simulation studies (Roberts, 2009 The CCA performed worst on the type I communities of the methods tested and assigned high p-values to all linear variables. As CCA assumes unimodal gradients, which are more frequent than linear ones in nature (Oksanen & Minchin, 2002), this was expected.
This study confirmed that CCA should be avoided if exploratory analyses indicate linear relationships, which can occur if the sampled range of a gradient is short relative to the species' tolerance. Noise p-values were lower than in other methods. Most of the low p-values for noise variables occurred in communities with uni-or bimodal responses. This is surprising, given that UU fits the expectations of the species packing model perfectly and bimodal models deviate only slightly.
In type III communities, CCA on base 2 log-transformed data performed as well as the model-based methods, while the FPR of untransformed data was slightly lower than in type I communities.
The latter is likely due to chance; FPR and FNR for both type II and III communities were only based on five repetitions instead of 180 for type I communities, which were the main focus of this study.
Overall, the result corroborates earlier findings that CCA is robust against two specific violations of the species packing model: unequal maximal abundances and nonregular distribution of optima in the ordination space.
Newer approaches to CCA that can correct for zero inflation (Zhang & Thas, 2012) or nonlinear relationships between predictor and response variable (Makarenkov & Legendre, 2002) (2015) suggest that this is due to limitations on the number of species that can be included, a steep learning curve, and numerical instability.
This study confirmed that in its current state, the method has issues with linear response types but can handle alteration of the symmetrical unimodal bell shape.
Constrained Quadratic Ordination encompasses many options that we did not test. They include different models for the tolerance matrix, further marginal distributions, and additive models.
Considering all plausible combinations of these exceeded the scope of this study, but could improve performance. We refer the interested reader to the comprehensive treatment in Yee (2015).
Our findings suggest that MvGLMs can be applied in a wide variety of settings. None of the data sets or their respective properties resulted in high FPR or FNR. CQO had low FPR rates in all tests but had the highest FNR. However, as stated before, many options of CQO remained unexplored in our study, which might remedy the problems. In type I communities, CCA had high FNR with linear responses and a high FPR with unimodal responses. We thus caution against the use of CCA if exploratory analysis indicates linear relationships. Lastly, dbRDAs performed well with type I communities, but worse with type II and III communities. Data sets with a high number of species or stronger abundance differences might pose problems to dbRDA that can only partially be alleviated by transforming the data.
Our study is the first to directly compare the methods. Comparative studies of multivariate methods, in general, are common. Especially, ordination techniques such as CCA and RDA were subject to extensive testing in the 1970s and 1980s (Gauch & Whittaker, 1972;Gauch, Whittaker, & Wentworth, 1977;Kenkel & Orloci, 1986). Roberts (2008) and Roberts (2009) compared dbRDA, CCA, and multidimensional fuzzy set ordinations. Roberts (2008) used simulated data sets to this end, whereas Roberts (2009) used four different field data sets. Both studies concluded that dbRDA outperforms CCA, which we also find for type I community data but not for type II or type III. CQ O is occasionally tested in comparisons of individual and community-level species distribution models (Baselga & Araújo, 2009;Maguire et al., 2016), where they are an instance of the latter. Generally, they exhibited a similar performance as classical models (e.g., GLMs or Regression Trees).
Future studies could improve the realism of the simulated communities by using more complex response patterns like beta-functions (Austin, Nicholls, Doherty, & Meyers, 1994), which add asymmetries to bell-shaped curves. However, in a study of Oksanen and Minchin (2002) only about 20% of the responses were strongly skewed, whereas symmetric and bell-shaped responses were most common.
Alternatively, asymmetry could be introduced through random terms added to abundances, environmental variables, or both (McCune, 1997). When correlated random terms are added to both, this would engender endogeneity (a nonzero covariance between the residuals and one or more explanatory variables). Simulations with induced endogeneity would be interesting as this phenomenon is underappreciated by ecologists (Armsworth, Gaston, Hanley, & Ruffell, 2009;Fox, Negrete-Yankelevich, & Sosa, 2015). Observation and measurement are sources of errors in field data sets, and both can be represented in a model via binomial functions as in N-mixture models (Royle, 2004).
This would be interesting to examine the effects of regression dilution (Frost & Thompson, 2000;McInerny & Purves, 2011).
It would also be of great interest to compare the methods' performance with presence-absence data instead of abundance data as novel options for analysis have recently emerged for this less informative but more available type of data (Podani, Pavoine, & Ricotta, 2018;Sander, Wootton, & Allesina, 2017;Tovo et al., 2019).
Our study shows that model-based multivariate inference can outperform more frequently used distance-based methods. The answer to our eponymous question is thus: Not categorically, decisions should be made on a case-by-case basis. As model-based methods are still at an early stage, new developments and increases in computation speed can be expected. An especially active area of development is models using joint probability distributions (Clark, Gelfand, Woodall, & Zhu, 2014;Pollock et al., 2014) that estimate the joint distribution of all species conditional on the environmental variables instead of only using the marginal distribution of every species' abundance. A common interest of many joint models is to infer biotic interactions from the residuals of the species-environment interaction, as these two sets of predictors (biotic and abiotic) were shown to have little redundancy (Meier et al., 2010). Some of the models also anticipate the growing challenges of Big Data for ecology (Hampton et al., 2013).
Generalized linear latent variable models, for example, include latent variables instead of random effects to capture residual correlation, which considerably reduces the size of the variancecovariance matrix (Niku, Warton, Hui, & Taskinen, 2017;Warton, Blanchet, et al., 2015). In hierarchical modeling of species communities (Ovaskainen et al., 2017), this approach is coupled with a fourth corner model (including species traits, Legendre, Galzin, & Harmelin-Vivien, 1997) and phylogenetic relationships to create a flexible and comprehensive framework for community data analysis. In a similar vein, generalized joint attribute models allow for different kinds of data (e.g., continuous, discrete counts, ordinal counts, and occurrence) to be included in the same response variable and have outperformed Poisson GLM on discrete count data and a Bernoulli GLM on binary host status data in a recent simulation study (Clark et al., 2017). It is now essential that ways to infer ecological processes from the modeled patterns develop at a similar pace as these models, to  (Legendre & Legendre, 2012). Their general approach is as follows: A test statistic T is computed for the data set of interest D, with X,Y ∈ D. Some property of D (e.g., the rows of X or Y) is permuted n times, and the same test statistic is calculated for each of the permuted data sets D * . The pseudo-p-value can then be calculated as follows: We permuted the predictors. Each predictor was tested separately so that in any one model only one predictor was permuted while the other remained in their original order.

F I G U R E A 1
The Dunn-Smyth residuals of the LL community sampled with 400 samples plotted against the linear predictor. A pronounced arched pattern can be observed for every single species (different colors)  Tables A7 and A8 show the same information for type II and III communities.

F I G U R E A 2
False-positive and false-negative rates of canonical correspondence analysis (CCA) and distance-based redundancy analysis (dbRDA) on square root-and Hellingertransformed abundance data. Points are jittered slightly along the x-axis