## Introduction

In analysing multivariable datasets it is common that in looking at the effect of some variable (*x*_{1}) on a dependent variable of interest (*y*), the effects of a third continuous variable (*x*_{2}) are to be controlled for, for instance because its effects may confound those of *x*_{1}. In such circumstances it has become common to perform a regression of *y* on *x*_{2} and use the residuals from this regression in testing for the effects of *x*_{1}.

A recent article by García-Berthou (2001) pointed out that this is an inappropriate analysis in the case where *x*_{1} is a categorical variable, and where the residuals from the regression of *y* on *x*_{2} are subject to a *t*-test or an anova to test for differences between the groups defined by *x*_{1}. As pointed out by García-Berthou (2001), the correct analysis is in fact an ancova, or other general linear model (GLM) where the factorial and regression variables are included simultaneously. Although García-Berthou (2001) pointed out one analytical procedure in which residuals from regression are treated as data in subsequent analysis, the use of residuals as data is common in a range of analyses, particularly when the confounding variable, *x*_{1}, is continuous. This use of residuals as data is, for example, particularly common in controlling for the effects of body size in multivariable analyses.

In this paper it is argued that the practice of treating residuals from regression as if they are data is unjustified except in specialized circumstances. This is because in ecological data it is common to find that independent variables are correlated, and such correlations lead to biased parameter estimates or significance tests. This bias arises because, except in the case of fully balanced designs, the marginal (effect on *y* of changing *x* ignoring other covariates) and conditional (effect on *y* of changing *x* given other covariates) estimates of parameters are not the same. A similar point has been made by Darlington & Smulders (2001) in the context of analysing behavioural data. In their paper Darlington & Smulders concentrate the consequences of using residuals as data for hypothesis testing (i.e. rates of Type I and II errors). Below a different perspective is taken. It is argued that the estimation of effects in multiple regression is best viewed as consisting of two components: (i) generating unbiased estimates of the parameters (i.e. slopes and intercept) for the data; and (ii) measuring how much variance is explained by each variable, and how much of this is independent of the other variables. It is highlighted that the second component of the analysis may be approached in several ways, depending on the question in hand, but that residual regression yields biased parameter estimates.