Phylogenetic signal and linear regression on species data


1. A common procedure in the regression analysis of interspecies data is to first test the independent and dependent variables X and Y for phylogenetic signal, and then use the presence of signal in one or both traits to justify regression analysis using phylogenetic methods such as independent contrasts or phylogenetic generalized least squares.

2. This is incorrect, because phylogenetic regression assumes that the residual error in the regression model (not in the original traits) is distributed according to a multivariate normal distribution with variances and covariances proportional to the historical relations of the species in the sample.

3. Here, I examine the consequences of justifying and applying the phylogenetic regression incorrectly. I find that when used improperly the phylogenetic regression can have poor statistical performance, even under some circumstances in which the type I error rate of the method is not inflated over its nominal level.

4. I also find, however, that when tests of phylogenetic signal in phylogenetic regression are applied properly, and in particular when phylogenetic signal in the residual error is simultaneously estimated with the regression parameters, the phylogenetic regression outperforms equivalent non-phylogenetic procedures.