Using commonality analysis in multiple regressions: a tool to decompose regression effects in the face of multicollinearity

Authors


Summary

1. In the face of natural complexities and multicollinearity, model selection and predictions using multiple regression may be ambiguous and risky. Confounding effects of predictors often cloud researchers’ assessment and interpretation of the single best ‘magic model’. The shortcomings of stepwise regression have been extensively described in statistical literature, yet it is still widely used in ecological literature. Similarly, hierarchical regression which is thought to be an improvement of the stepwise procedure, fails to address multicollinearity.

2. We propose that regression commonality analysis (CA), a technique more commonly used in psychology and education research will be helpful in interpreting the typical multiple regression analyses conducted on ecological data.

3. CA decomposes the variance of R2 into unique and common (or shared) variance (or effects) of predictors, and hence, it can significantly improve exploratory capabilities in studies where multiple regressions are widely used, particularly when predictors are correlated. CA can explicitly identify the magnitude and location of multicollinearity and suppression in a regression model.

In this paper, using a simulated (from a correlation matrix) and an empirical dataset (human habitat selection, migration of Canadians across cities), we demonstrate how CA can be used with correlated predictors in multiple regression to improve our understanding and interpretation of data. We strongly encourage the use of CA in ecological research as a follow-on analysis from multiple regressions.

Introduction

Multiple linear regression (MR) is widely used to identify models that capture the essence of ecological systems (Whittingham et al. 2006). MR extends simple linear regression to model the relationship between a dependent variable (Y) and more than one independent (also known as ‘predictor’ and so-called henceforth) variables (Sokal & Rohlf 1995). Owing to the complexity of ecological systems, multicollinearity among predictor variables frequently poses serious problems for researchers using MR (Graham 2003).

Quite often, ecologists pose the question, to what extent is variation in each predictor variable associated (linearly) with variation in the dependent variable? To answer this, in MR, there are three main effects that need to be assessed: (i) total effects – total contribution of each predictor variable to the regression when the variance of other predictors are accounted for; (ii) direct effects –contribution of a predictor, independent of all other predictors; and (iii) partial effects –contribution of a predictor when accounting for variance of a specific subset or subsets of remaining predictors (LeBreton, Ployhart & Ladd 2004). However, conventionally, one reports the coefficient of determination (R2) and regression coefficients, where the P-values of the regression coefficients are the information one mostly relies upon to answer the fairy-tale question, ‘Mirror, mirror on the wall, what is the best predictor of them all?’ (Nathans, Oswald & Nimon 2012; p. 1).

The R2 quantifies the extent to which the variance of the dependent variable is explained by variance in the predictor set, and regression coefficients help rank the predictor variables according to their contributions in the regression equation (Nimon et al. 2008). However, relying only on the unstandardized slope of regression (also known as partial regression coefficients, Sokal & Rohlf 1995) or standardized (partial) regression coefficients (also known as beta coefficients and so-called henceforth, Sokal & Rohlf 1995) may generate erroneous interpretations, particularly when multicollinearity is involved (Courville & Thompson 2001; Kraha et al. 2012). It is not uncommon for researchers to erroneously refer to beta coefficients as measures of relationship between predictors and the dependent variable (Courville & Thompson 2001). Hence, it is often advisable to use multiple methods while interpreting MR results, especially when the intentions are not strictly predictive (Nimon & Reio 2011). In this paper, we use simulated and empirical data to demonstrate how a relatively similar approach, regression commonality analysis (CA), can be used along with beta coefficients and structure coefficients to better interpret MR results in the face of multicollinearity.

Why are stepwise or hierarchical regressions not enough?

Complexities in most systems warrant that ecologists collect data on variables of importance and of interest in concert with data on factors that might affect a response (Whittingham et al. 2006). When a researcher is interested in determining predictors’ contribution, MR is used to either choose predictor variables based on statistics (stepwise regression) or identify the predictor variables that support a theory (hierarchical regression; Lewis 2007).

Stepwise regressions, which are widely used in ecological studies, provide a methodology to determine a predictor variable's meaningfulness as it is introduced into a regression model (Pedhazur 1997; Graham 2003). Though widely used in fields such as animal behavior and species distribution modelling (Araújo & Guisan 2006; Smith et al. 2009), stepwise regression is often discouraged owing to its approach of ‘data dredging’ and lack of theory (Burnham & Anderson 2002; Whittingham et al. 2006). This procedure cause biases and inconsistencies in parameter estimation, model selection algorithms, selection order of different predictors and selection of the single best magic model (see Whittingham et al. 2006; Zientek & Thompson 2006; Nathans, Oswald & Nimon 2012; Suppl mat.). Stepwise regression relies largely on the first predictor entering the model, which determines the variance of other predictors in the model, posing serious Type I errors associated with inflated F-values (Thompson 1995; Zientek & Thompson 2006; Nimon et al. 2008; Nathans, Oswald & Nimon 2012). Hence, the use of stepwise regression has been discouraged for assessing the contributions of predictor variables in MR (Nathans, Oswald & Nimon 2012).

Hierarchical regression is more replicable and reliable for evaluating the contribution of predictors and is thus an improvement over stepwise regression (Thompson 1995; Pedhazur 1997; Lewis 2007). Rather than using the familiar stepwise procedure, the choice and order of variables in hierarchical regression is based on a priori knowledge of theory (Thompson 1995; Pedhazur 1997; Lewis 2007; Nathans, Oswald & Nimon 2012) that help researchers to more effectively choose the best predictor set (Henderson & Velleman 1981; Lewis 2007), for example, by entering distal or lateral variables (generally variables we are less interested in and would want to control for) in early steps and prime or proximal variables later (variables we are truly interested in). However, hierarchical regression ignores the relative importance of certain predictor variables and fails to address multicollinearity (Petrocelli 2003). Furthermore, the misuse of hierarchical regression in one step leads to additional errors in subsequent steps, with compounding as well as errors in interpretation (Cohen & Cohen 1983).

Can beta and structure coefficients address multicollinearity?

In MR, the beta coefficient (b′Y) of a predictor indicates the expected increase (or decrease) in standard deviation units of the dependent variable, with one standard deviation increase in the predictor holding all other predictors constant (Nimon & Reio 2011; Nathans, Oswald & Nimon 2012). Hence, beta coefficients account for a predictor's total contribution to the regression equation and have a simple relationship with the more conventional partial regression coefficient (bY) as follows:

display math

where Sx and SY are the standard deviations of the predictor and the dependent variables, respectively (Sokal & Rohlf 1995). When research intentions are strictly predictive, the use of partial regression coefficients and beta coefficients is appropriate. However, relying only on beta coefficients may lead to misinterpretations while informing theory or explaining the predictive powers of a variable of interest (Nimon & Reio 2011; Kraha et al. 2012). This is because the accuracy of these regression coefficients depends on a fully and perfectly specified model (Kraha et al. 2012), as adding or removing of variables may change these values.

When predictor variables are correlated, beta coefficients may mislead the interpretation of how different predictors influence the dependent variable, because, they are based on a predictor's relation with Y, as well as with all other predictors in the model (Kraha et al. 2012). In multicollinear data, risks of misinterpretation are enhanced, because even though a predictor's contribution to the regression equation may be negligible, it might still be highly correlated with the dependent variable Y (Courville & Thompson 2001; Nimon 2010; Nimon & Reio 2011). Still in other cases, a predictor's contribution to the regression equation may be negative even when its relationship to the dependent variable is positive, which some researchers may inappropriately interpret as a negative relationship with the dependent variable (Courville & Thompson 2001). Similar to regression coefficients, confidence intervals (Nimon & Oswald 2013) and standard error can be generated for beta coefficients.

Interpretation of MR results can also be improved with the structure coefficients. Structure coefficients (rs) are the bivariate (Pearson's) correlations between a predictor and the predicted dependent variable's score resulting from the regression model (math formula; Nathans, Oswald & Nimon 2012). Additionally, the squared structure coefficients (math formula) identify how much variance is common between a predictor and math formula:

display math
display math

where r2x.y is the squared bivariate correlation between a given predictor x and y and math formula is the squared bivariate correlation between a given predictor x and math formula values (Nimon & Reio 2011). Structure coefficients are thus independent of collinearity among variables and have the additional property of ranking independent variables based on their contribution to the regression effect (Kraha et al. 2012). Hence, when predictors are correlated, interpretation of both beta coefficients and structure coefficients should be considered when attempting to understand the essence of the relationships among the variables (Nimon et al. 2008; Kraha et al. 2012). Structure coefficients should not, however, be confused with partial correlation coefficients, which is a measure of linear dependence of two random variables when the influence of other variables are eliminated, and can actually be obtained by correlating two sets of residuals (Kerlinger & Pedhazur 1973). However, neither of these measures can inform us about both the ‘unique’ variance (unique effects or direct effects) of a predictor along with ‘common’ variance (common effects or partial effects) that is shared by two or more predictors.

What does commonality analysis do?

Commonality analysis (also called element analysis and/or components analysis) was developed in the 1960s (Newton & Spurrell 1967; Mood 1969) and is frequently used in social sciences, psychology, behavioural sciences and education research (Siebold & McPhee 1979; Nimon et al. 2008; Nimon 2010; Kraha et al. 2012; Nathans, Oswald & Nimon 2012), yet CA has rarely been used in ecological research (but see Raffel et al. 2010; Sorice & Conner 2010). Regression CA improves the ability to understand complex models because it decomposes regression R2 into its unique and common effects. Unique effects indicate how much variance is uniquely accounted for by a single predictor. Common effects indicate how much variance is common to a predictor set (Pedhazur 1997; Thompson 2006; Nimon 2010; Nathans, Oswald & Nimon 2012).

To understand the value of CA, consider a hypothetical example adapted from Nimon et al. (2008). Imagine that an independent variable ‘y’ is explained by two predictors ‘m’ and ‘n’. The total variance explained by both of these variables is R2y.m.n. The unique contribution (U) of a variable is the proportion of variance assigned to it when it is entered last in the MR equation. In this case, the unique effects of m and n will be

display math
display math

and the common contribution (C) will be

display math

Substituting the solutions for Um and Un

display math

Thus, U explains the unique (minimum) explanatory power of a predictor variable, while U+C explains the total (maximum) explanatory power of a predictor. Understanding these effects becomes more interesting when C (the sum of all commonalities associated with a predictor) is substantively larger than U, indicating greater collinearity among variables and making it harder to interpret how the predictor contributes to the regression effect (Nathans, Oswald & Nimon 2012). In a regression model with k predictor variables, CA decomposes the explained variance into 2k−1 independent effects (Siebold & McPhee 1979). With two predictors, there are three readily interpretable commonalities. However, interpretation becomes more difficult in higher-order models because the number of commonalities expands exponentially with the number of predictors (for 6, 7 and 8 predictors, the number increases to 63, 127 and 255, respectively).

Commonality effects should not be confused with interaction effects (e.g. in regression or anova). Common effects identify how much variance a set of variables has in common with a dependent variable, while interaction effect models the contrasts that exist between different levels or values of at least two independent variables (Supporting information). Any interaction effect thus should be considered as an additional predictor in the regression model. As indicated by Siebold & McPhee (1979); ‘the function of commonality analysis is to ferret out these common effects so that they may be interpreted’ (p. 365).

Commonalities can be either positive or negative. Negative commonalities can occur in the presence of suppression or when some of the correlations among predictor variables have opposite signs (Pedhazur 1997). A particularly interesting case, likely to emerge in ecological research, is given by variables that suppress or remove irrelevant variance in another predictor and thus increase the predictive ability of that predictor (or a set of predictors) and R2 by its inclusion in a regression equation (Cohen & Cohen 1983; MacKinnon, Krull & Lockwood 2000; Capraro & Capraro 2001; Zientek & Thompson 2006). Irrelevant variance is the variance shared with another predictor and not with the dependent variable, and hence, it does not directly affect R2 (Pedhazur 1997). A suppressor (say X1) has zero or almost zero (classic suppression) or small positive (negative suppression) correlation with the dependent variable (Y) but is correlated with one or more predictor variables (say X2), generating negative regression weights in the equation (Pedhazur 1997; Thompson 2006; Beckstead 2012). When a predictor and a suppressor are positively correlated with the dependent variable, but are negatively correlated with each other (reciprocal suppression), the regression weights of both predictors remain positive (Conger 1974; Beckstead 2012). The correlation of X1 with X2 indirectly improves the predictive power of the regression equation by inflating X2's contribution (Pedhazur 1997; Thompson 2006) to R2. This is because X1 removes (suppresses) or purifies the relationship of X2 and Y, by removing the irrelevant variance of X2 on Y, while the remaining part of the variance becomes more strongly linked to Y (Cohen & Cohen 1983; Pedhazur 1997; Lewis 2007). Hence, R2 with an effect of suppression should be compared to an R2 without the suppressor (Thomas, Hughes & Zumbo 1998; MacKinnon, Krull & Lockwood 2000; Thompson 2006; Capraro & Capraro 2001; Zientek & Thompson 2006; Beckstead 2012).

There are several ways to identify a suppressor, but here, we specifically discuss the two that are most likely to be most relevant to this context. First, a suppressor variable is revealed when it has a large beta coefficient in association with a disproportionately small structure coefficient that is close to zero (Thompson 2006; Kraha et al. 2012). A mismatch in the sign of a beta coefficient and structure coefficient may similarly indicate suppression (Nimon et al. 2008). Second, CA can identify the loci and magnitude of suppression by examining negative commonality coefficients (Nimon et al. 2008; Nimon & Reio 2011). A negative commonality coefficient may indicate the incremental predictive power associated with the suppressor variable (Capraro & Capraro 2001). Negative commonality coefficients however must be interpreted cautiously because they can also emerge when some correlations among predictors are positive while others are negative (Pedhazur 1997).

Computation of CA can be a laborious procedure; however, programmes have been written in spss (Nimon 2010) and R (Nimon et al. 2008) that automatically compute commonality coefficients for any number of predictors. The ‘yhat’ package (Nimon, Oswald & Roberts 2013) in R (R Development Core Team 2013) incorporates the commonality logic of Nimon et al. (2008), and it also calculates beta coefficients and structure coefficients as well as other regression-related metrics, such as a wide variety of adjusted R2 effect sizes.

Two worked examples

Example 1: Heuristic Example

Here, we use a heuristic data set and statistical simulation to demonstrate the various commonalities associated with three correlated variables (Table 1a; Appendix S1). The dependent variable Y is correlated with X1 (r = 0·50) and X3 (r = 0·25) but not with X2 (r = 0·00). Furthermore, X1 is positively, but weakly, associated with X2 (r = 0·15) and X3 (r = 0·10), which in turn are highly correlated with each other (r = 0·60, Table 1a). This correlation matrix thus reflects a classic case of suppression that was used to generate a simulated dataset (following Kraha et al. 2012) for regression CA of Y on X1, X2 and X3 (Appendix S1, Supporting information).

Table 1. Applying commonality analysis to a simulated dataset to demonstrate the various commonalities associated with three correlated variables. (a) Pearson's correlation coefficients of the predictor variables with Y. (b) In addition to typical regression results, which includes multiple R2, adjusted R2 (R2adj), beta coefficientsa (β), standard error or beta coefficients (SE), 95·0% confidence interval (lower) for β (CIL), 95·0% confidence interval (upper) for β (CIU), structure coefficientsa (rs), squared structure coefficient (math formula), each predictor's total unique (U), total common (C), and total variance (r2) in the regression equation
  Y X1 X2 X3
(a)
Y 1·00   
X10·501·00  
X20·000·151·00 
X30·250·100·601·00
Predictors Multiple R2 R 2 adj βSE CIL CIU r s math formula U C r 2
  1. a

    The ‘regr’ function in yhat also computes beta coefficients and structure coefficients.

  2. Significance codes: 0 <*** 0·001 <** 0·01<* 0·05 < 0·1.

(b)           
 0·3495***0·3395         
X1  0·5076***0·0580·3930·6230·84580·71530·2518−0·00180·2500
X2  −0·3058***0·072−0·449−0·1630·00000·00000·0591−0·05910·0000
X3  0·3827***0·0720·2410·5250·42290·17880·0937−0·03120·0625

Table 1b provides an overview of the model, each predictor's contribution, beta coefficients, structure coefficients and squared structure coefficients necessary to interpret the analysis, as well as the total variance in the dependent variable explained by each predictor (r2), partitioned into its total ‘unique’ (U) and total ‘common’ (C) effects.

We begin by interpreting the beta coefficients from the predictive model: math formula = (0·508*X1) + (−0·306*X2) + (0·383*X3). Beta coefficients inform a researcher about the contribution each of the three predictor variables have in standardized form (ZX1, ZX2 and ZX3) in the equation, to predict Y when all other predictor variables are held constant. The squared correlation of Y with the predicted Y (or math formula) equals the overall R2 (0·349, R2adj. = 0·339, P < 0·0001, Courville & Thompson 2001; Kraha et al. 2012), indicating that 34·9% of the variation in the dependent variable is accounted for by the three independent predictor variables. Please note that, although the predictor variable X2 had zero correlation with Y (Table 1a), its beta coefficient is relatively high in the regression equation. Such a result alerts one to the possible suppression by X2 on one or more of the remaining variables and reinforces that beta coefficients are not direct measures of the relationships in a regression unless the predictors are perfectly uncorrelated (Courville & Thompson 2001; Kraha et al. 2012).

Recall that the structure coefficients are the bivariate correlation between math formula and each of the predictors X1, X2 and X3; thus, the squared structure coefficients (math formula) represent the proportion of variance in the regression effect explained by each predictor alone irrespective of collinearity with other predictors (Kraha et al. 2012). For instance, the math formula of X1 (0·715) shows that X1 was able to account for 71·5% of the regression effect given by the R2 (=0·349, Table 1b), which with rounding error yields the corresponding r2 for X1 (0·715 × 0·349 = 0·250). Similarly, examining X2 (math formula = 0·00) shows that X2 shared no variance of math formula, while X3 (math formula = 0·179) was able to explain 17·9% of the variation in math formula.

Looking at unique effects of the predictors (Table 1b) shows that predictor X1 uniquely explained 25·2% variation in the dependent variable, followed by 9·4% by X3, while X2 uniquely explained 5·9% of the variation in Y (Table 1b, Fig. 1). This seems contradictory, as although X2's contribution to Y is zero (r2 = 0·00), it receives credit in the regression equation. This is because X2's unique effect (U = 0·059) is offset by its common effect (C = −0·059; Table 1b), identifying X2 as a suppressor variable. X2 thus helps other predictors to better predict the regression, although it itself is unrelated to the equation.

Figure 1.

Unique (red bars) and common regression effects of three predictor variables (X1, X2, and X3), on the dependent variable (Y). Note the presence of both positive (black bars) and negative (empty bars) commonality coefficient values for each predictor variable.

Table 2 decomposes the regression effect into first-order (variance unique to each X), second-order effects (variance common to pairs of variables), and third-order effects (variance common to X1, X2 and X3), and it also includes the percentage of contributions (% Total) of these coefficients to the regression effect.

Table 2. Commonality analysis output representing commonality coefficients, both unique (also presented in Table 1a) and common, along with % total contribution of each predictor variable or sets of predictor variables to the regression effect
 X1X2X3% Total
Unique to X10·252  72·06
Unique to X2 0·059 16·90
Unique to X3  0·09426·82
Common to X1, X2−0·024−0·024 −6·84
Common to X1, X30·004 0·0041·12
Common to X2, X3 −0·053−0·053−15·26
Common to X1, X2, and X30·0180·0180·0185·20
Total   100·00

If we sum the negative contributions to the total regression effect in Table 2, we find that 22·1% of the regression effect is caused by suppression. Table 2 also reveals that each of the negative commonality coefficients involves X2 (Fig. 1). Although the suppression effect of X2 on X1 is lower than that with X3, collectively they add to a moderate amount of variance. In this simulated dataset, the remaining second-order common effects account only for 1·12 % of the regression effect, whereas 5·2% of the regression effect was uniquely common to X1, X2 and X3.

Example 2: An Ecological Example

Morris and Mukherjee (2006) used MR as part of their assessment of the role that habitat selection plays in metapopulation dynamics. In particular, they used MR to evaluate whether the proportion of immigrants (PropIm or Y) moving into a city depended on the median household income (Income) of the city, the number of people employed in the city (Employ), the proportion of people emigrating (PropEmm) out of the city, the weighted distance of immigration (DistIm), and the weighted distance of emigration (DistEm) (Appendix S1, Supporting information). Although a stepwise regression may have seemed appropriate to address Morris & Mukherjee's (2006) intent to identify whether humans choose habitats in a way consistent with theory, a more careful assessment of each variable's contribution to the model is necessary if the intent is to understand how the different variables interact to influence human dispersal. Such an understanding could be crucially important in developing economic and social policies for humans. Similar understanding of dispersal by other animals may be important and necessary to develop viable conservation strategies. For more details on this study and its conceptual theory, see Morris and Mukherjee (2006).

Table 3a represents the pairwise Pearson's correlation coefficients of the predictor variables and the dependent variable. Among these, it is notable that the dependent variable, PropIm, has the highest correlation with Income (r = 0·58), while its correlation with DistEm is close to zero (r = −0·01).

Table 3. Human habitat selection represented by immigration of Canadians across Canadian cities was tested using commonality analysis. (a) Pearson's correlation coefficients of the predictor variables with Y, which in this case is proportion of immigrants (or PropIm). (b) In addition to typical regression results, which includes multiple R2, adjusted R2 (R2adj), beta coefficientsa (β), standard error or beta coefficients (SE), 95·0% confidence interval (lower) for β (CIL), 95·0% confidence interval (upper) for β (CIU), structure coefficientsa (rs), squared structure coefficient (math formula), each predictor's total unique (U), total common (C) and total variance (r2) in the regression equation
  Income Employ PropEmm DistIm DistEm
(a)     
Y 0·58−0·200·20−0·29−0·01
Income  0·31−0·320·050·07
Employ   −0·620·050·19
PropEmm    −0·08−0·30
DistIm     0·42
Predictors (x)Multiple R2 R 2 adj βSE CIL CIU r s math formula U C r 2
  1. a

    The ‘regr’ function in yhat also computes beta coefficients and structure coefficients.

  2. Significance codes: 0 <*** 0·001 <** 0·01<* 0·05 < 0·1.

(b)           
 0·675***0·594         
Income   0·766***0·1360·4821·0500·7040·4950·513−0·1790·334
Employ   −0·2720·165−0·6160·073−0·2450·0600·044−0·0040·040
PropEmm   0·3180·170−0·3800·6730·2400·0580·056−0·0170·039
DistIm   −0·397*0·141−0·691−0·104−0·3560·1270·130−0·0440·086
DistEm   0·2520·147−0·0550·558−0·0070·0000·048−0·0480·000

Table 3b provides an overview of the human migration model, where in addition to multiple R2, beta coefficients, structure coefficients and squared structure coefficients are provided to help interpret the regression effect, along with the total variance explained by each predictor (r2), partitioned into its ‘unique’ (U) and total ‘common’ (C) effects.

The predictor variables in the regression explained 67·5% (R2 = 0·675, R2adj. = 0·594, P < 0·001) of the variance in PropIm (Table 3b). The standardized regression equation from the beta coefficients (Table 3b) yields the prediction equation:

display math

We note that although DistEm had negligible correlation with PropIm (Table 3a), yet its beta coefficient was relatively high in the regression equation. As well, the zero-order correlation between DistEm and PropIm was negative (Table 3a), while the beta coefficient of DistEm was high and positive (Table 3b). The combined pattern indicated the presence of multicollinearity among variables as well as suppression caused by DistEm.

Examining the unique variance, we find that, similar to Morris and Mukherjee (2006), Income was indeed the best unique predictor of PropIm (Table 3b, Fig. 2), uniquely explaining 51·3% of variation in Y. DistIm (i.e. how far people are moving in from) was the second best unique predictor, uniquely explaining 13·0% of variance in Y (Table 3b, Fig. 2). Although PropEmm uniquely explained 5·6% of variance in Y, its contribution to the regression equation was weak (Table 3b, Fig. 2).

Figure 2.

Unique (red bars) and common regression effects of Income, Employ, PropEmm, DistIm, and DistEm on proportion of immigrants. Note the presence of both positive (black bars) and negative (empty bars) commonality coefficient values for each predictor variable.

If we now decompose each predictor's variance with the dependent variable into unique and the sum of common effects, we find that each predictor's total common effect was negative (Table 3b, Fig. 2). Table 4 displays the decomposition of the regression effect for this data set. Among the 26 second-, third- and fourth-order commonalities, 15 were negative and contributed to nearly half of the regression effect. A substantial amount of the regression effect (25·0%) involved negative commonalities associated with Income, Employ and/or PropEmm (Table 4, Fig. 2). Furthermore, the majority of the negative commonality coefficients stemmed from the high correlation between Employ and PropEmm (Table 3a), which suggested that in cities with high employment, emigration is reduced. This correlation was in conjunction with the following conditions: First, Income increased with Employ (r = 0·31), while it decreased with PropEmm (r = −0·32, Table 3a). This yielded a substantial negative commonality coefficient among the three predictors (−.0925, Table 4). Secondly, Employ increased with DistEm (r = 0·19), while PropEmm decreased with DistEm (r = −0·30; Table 3a). Thirdly, Employ increased with DistIm (r = 0·05), while PropEmm decreased with DistIm (r = −0·08; Table 3a). A substantial amount (16·5%) of the regression effect involving DistEm generated negative regression effect (Table 4). DistEm thus satisfies the classic definition of suppression, having virtually no shared variance with the dependent variable (Table 3b. Fig. 2), yet making a noteworthy contribution to the regression equation. DistEm's unique variance was offset by its common variance (Table 3b, Fig. 2), suggesting that DistEm's major contribution to the regression equation was to suppress variance in the remaining predictors (most notably DistIm) that was irrelevant to predict variance in PropIm.

Table 4. Contribution of predictors and predictor sets on proportion of Canadians immigrating into cities to address human habitat selection. Table shows commonality coefficients (unique and common) and % total contribution of each predictor or predictor sets to the regression effect
 Commonality coefficients% Total
Unique to Income0·513175·97
Unique to Employ0·04396·50
Unique to PropEmm0·05648·36
Unique to DistIm0·129519·17
Unique to DistEm0·04767·04
Common to Income, and Employ−0·0334−4·94
Common to Income, and PropEmm−0·0433−6·41
Common to Employ, and PropEmm0·128619·05
Common to Income, and DistIm−0·0193−2·86
Common to Employ, and DistIm−0·0021−0·32
Common to PropEmm, and DistIm−0·0070−1·03
Common to Income, and DistEm−0·0116−1·72
Common to Employ, and DistEm−0·0014−0·21
Common to PropEmm, and DistEm−0·0193−2·86
Common to DistIm, and DistEm−0·0420−6·22
Common to Income, Employ, and PropEmm−0·0925−13·69
Common to Income, Employ, and DistIm0·00160·23
Common to Income, PropEmm, and DistIm0·00440·64
Common to Employ, PropEmm, and DistIm−0·0108−1·59
Common to Income, Employ, and DistEm0·00100·15
Common to Income, PropEmm, and DistEm0·01161·72
Common to Employ, PropEmm, and DistEm−0·0178−2·64
Common to Income, DistIm, and DistEm0·00921·37
Common to Employ, DistIm, and DistEm0·00110·16
Common to PropEmm, DistIm, and DistEm0·01432·12
Common to Income, Employ, PropEmm, and DistIm0·00610·91
Common to Income, Employ, PropEmm, and DistEm0·00671·00
Common to Income, Employ, DistIm, and DistEm−0·0008−0·12
Common to Income, PropEmm, DistIm, and DistEm−0·0086−1·28
Common to Employ, PropEmm, DistIm, and DistEm0·02002·95
Common to Income, Employ, PropEmm, DistIm, and DistEm−0·0099−1·46
Total0·6753100·00

Our analysis confirms Morris and Mukherjee's (2006) conclusion that immigration (PropIm) is mostly driven by Income. But we also learned that immigration is driven by the distance from which Canadians are immigrating (DistIm) and, to a lesser degree, by DistEm, Employ and PropEmm. The substantial negative commonalities associated with, and common to, Income, Employ and PropEmm reveal that Income's predictive power was increased after controlling for Employ and PropEmm in the model. It thus appears that decisions by Canadians to permanently move from one city to another are not a simple weighting of expected income and distance from ‘current home’ (as implied in the MR results from Morris & Mukherjee 2006), but that individuals may also be assessing emigration patterns correlated with employment opportunities. Although Morris and Mukherjee (2006) correctly interpret the data and their model as consistent with habitat-selection theory, application of CA sharpen our insights into the causes of human dispersal and deepen our understanding of underlying behaviours influencing habitat (city) selection. Recognizing that, DistEm suppresses other predictor's irrelevant variance provide deeper insights into how humans make the decision to move from one city to another. From here, we can say that humans tend to discount income and other factors influencing dispersal as the distance between cities increased. This insight may explain why cities (and countries) hoping to increase immigration often provide incentives additional to employment opportunities. Such incentives may be unnecessary if immigrants are targeted from shorter distances. To better understand the relationships among these predictors and to further understand the increase in predictor power associated with suppression and inconsistent signs in predictor correlations, path analysis and/or structural equation modelling approach may be warranted.

How CA does compare with other analyses?

To some degree CA competes with other analyses in providing information that researchers can garner. However, there are subtle and useful differences in the information that CA generates compared with other analyses. CA differs from other variance partitioning algorithms (Pratt's measure, relative weights, general dominance weights) in the way in which CA partitions regression effect into 2k – 1 effects, as opposed to k partitions for other measures. Although k partitioning is easier to interpret, such analyses do not necessarily indicate how much and where multicollinearity exist.

Commonality analysis is the process of decomposing the regression effect from all-possible-subsets of regression and a set of formulae that depend on the number of predictors. CA is independent of variable order and hence does not suffer from the problems associated with stepwise regression (wrong degrees of freedom, type 1 errors, inflated F-value), thus yielding results that are replicable (Pedhazur 1997; Zientek & Thompson 2006; Nathans, Oswald & Nimon 2012). CA may also subsume some of the information gained through a hierarchical linear regression. For example, in the case of a regression model with three predictors (X1, X2 and X3) where X1 and X2 were entered in the first block of a hierarchical regression and X3 was added in the second block, the unique effect for X1 will be the same as the ΔR2 between the regression model with only X1 and X2 and the full regression model with X1, X2 and X3. Here, it should be kept in mind that hierarchical regressions are different from hierarchical linear models.

Hierarchical linear models (also known as multilevel models, mixed models, nested models or random-effects models) are commonly used in biological or ecological data analysis (Nakagawa & Schielzeth 2013). CA can also be applied to these hierarchical linear (mixed) models using the methods described in this paper. However, instead of using R2 analogues (Luo & Azen 2013; Nakagawa & Schielzeth 2013) due to lack of monotonicity (R2 increasing with added predictors), fit indices resulting from an all-possible-subsets analysis are used as input to the commonality analysis algorithms described in this paper. For a description of this solution and the statistical software, see Nimon, Henson & Roberts (2013), Nimon, Oswald & Roberts (2013).

Identifying the magnitude and location of multicollinearity using CA may be useful in ‘fully conceptualizing and representing the regression dynamics’ (Zientek & Thompson 2006, p. 306). Structural equation models are a class of sophisticated multivariate techniques that are largely confirmatory (based on prior knowledge of theory) and test the validity of a model as opposed to identifying the best model (Tomer 2003). Structural equation models explicitly address the underlying causality patterns among variables (Tomer 2003). Although CA is applied in the same way for a given set of dependent and independent variables regardless of one's causal model (Pedhazur 1997), its results may be useful in identifying what causal models should be explored. In the presence of suppression, it may be particularly useful to employ structural equation models to assess the regression model with and without the suppressor effects (Beckstead 2012). CA may also be used as a technique to identify potential general factors (Hale et al. 2001); however, such analyses have been criticized and require further research (Schneider 2008).

Limitations, conclusions and future applications

We caution readers to keep in mind that ‘methods are not a panacea’ and require prudent use and diligent interpretation (Kraha et al. 2012, p. 9). In regression analysis, the uniqueness of variables depends on a specific set of predictors under study, and the addition or deletion of predictors may change the uniqueness attributed to some or all of the variables (Kerlinger & Pedhazur 1973). As CA decomposes the regression effect resulting from a MR, it is possible that failure to meet the statistical assumptions of MR may also impact the bias and precision of the resulting commonality coefficients; however, research has yet to be conducted to determine how such violations of the statistical assumptions of MR affect commonality coefficients. Understanding the complexities of ecological systems often demands collection and analysis of elaborate data sets with confounding factors and increases in the number of predictor variables exponentially increases the complexity of computation and interpretation of these predictors, when considering analyses based on all possible subsets regression (e.g. commonality analysis, dominance analysis). In such cases, it might be advisable to conduct variance partitioning using sets of predictors and interpret the unique and common effects of a set of predictors (generally highly correlated family of predictors), rather than single predictor variables (Zientek & Thompson 2006).

Traditional MR analysis is inefficient when multicollinear variables arise by either true synergistic association, through spurious correlations, or by improperly specified models (Graham 2003; Whittingham et al. 2006). CA models multicollinear data explicitly (Kraha et al. 2012); nevertheless, difficulties in understanding such models should not be attributed to CA, but to our ignorance of variables (or the model) in question. Hence, CA should be used to identify indicators that are failing badly with respect to the specificity of the model (Mood 1969; Kerlinger & Pedhazur 1973). Thus, in exploratory stages of ecological research, CA may help researchers select variables in a predictive framework (Kerlinger & Pedhazur 1973; Pedhazur 1997), by revealing the confounding relationships of multicollinear predictor variables. Ecologists should nevertheless use and interpret CA with the same caution as any other analysis, because it does not differentiate theoretical relationships from other causal relations (Kerlinger & Pedhazur 1973).

Commonality analysis has rarely been used in ecological studies (but see Raffel et al. 2010; Sorice & Conner 2010), and we highly recommend the use of CA when one wishes to gain a better understanding of exploratory data. CA can often yield additional insights into ecological studies that have used (or plan to use) MR. Ecologists often attempt to reduce problems of collinear variation with principal components regression, canonical correspondence analysis, cluster analysis and stepwise regression (Graham 2003; Araújo & Guisan 2006; Whittingham et al. 2006). However, interpretations using these techniques have not always been satisfactory (Graham 2003). Regression CA can provide an improved solution to this problem. We suggest that, along with beta coefficients and structure coefficients (which can also be generated using yhat package in R), CA should be used, especially when the aim of research is exploratory. Other approaches, such as path analysis and/or structural equation modelling, are more appropriate when the researcher has an prior theoretical understanding of how different variables and combinations of variables, are likely to influence a particular ecological process (Morris, Dupuch & Halliday 2012). The ability to conduct CA for a large number of variables might allow the discussion on how CA might form the ‘missing link’ between exploratory and theory-driven research frameworks. CA thus deserves strong consideration as a tool in ecological research, where it can improve model selection and predictor contribution in regression analysis.

Acknowledgements

We thank the Editor Prof. Bob O'Hara, the Associate Editor, Dr. Fred Oswald, and the anonymous reviewers for helping us improve the quality of this manuscript. JRM and SM are spouses; however, this has not influenced authorship in the manuscript. Otherwise there is no conflict of interest among the authors.

Ancillary