The phylogenetic comparative method has become a widely used tool to address questions related to long-term evolutionary processes by analyzing datasets collected across multiple species and incorporating information about the varying degrees of relatedness among them (Felsenstein 1985; Harvey and Pagel 1991; Freckleton et al. 2002). Such comparative analyses often include numerous variables, which may be directly or indirectly related to the trait of interest, yielding a complex, multivariate network of associations, in which the distinct variables may present different effect sizes. Evolutionary biologists employing the comparative method have come to accept, with some resignation, that one inevitable consequence of the use of such methods is that they must banish the idea of causality all together (although one particular method does allow to determine contingency, see Pagel and Meade 2006). The results are generally interpreted as allowing—at best—to identify a subset of variables that evolve in a correlated fashion or that differences exist in the trait of interest between two groups of species. Indeed, the fully randomized experiment is the ideal means by which to test hypotheses and explore causal relationships among variables (Fisher 1926). However, many evolutionary questions regarding causality are simply impossible to address using fully randomized experiments and alternative methods have to be adopted (Felsenstein 1985; Harvey and Pagel 1991; Martins 2000; Freckleton et al. 2002).

One of these methods, confirmatory path analysis, has been specifically developed to test prespecified causal hypotheses represented as directed acyclic graphs (DAGs) and thus as a set of structural equations (Shipley 2000b). Basically, path analysis posits that correlational relationships between characters imply an unresolved causal structure, because the causal processes generating the observed data impose constraints on the patterns of correlation that such data display (Shipley 2000a). Standard path analysis methods, such as those implemented in structural equation models (SEM), therefore compare the observed covariance matrix with the covariance matrix predicted by the tested causal model. Alternatively, the d-sep test, developed by Shipley (2000b), tests the conditional probabilistic independences implied in the DAG of the hypothesized causal model. As has been well discussed in the literature, however, data points in multispecies analyses cannot be considered as independent from a statistical point of view because the differing degrees of shared ancestry among species will influence the expected similarity in trait values (Felsenstein 1985; Harvey and Pagel 1991; Garland et al. 1992; Freckleton et al. 2002). The consequences of not accounting for phylogenetic effects in statistical analyses of multispecies data are, among others, artificially inflated number of degrees of freedom, incorrectly estimated variances, and increased Type I error rates of significance tests (Felsenstein 1985; Harvey and Pagel 1991; Martins and Garland 1991; Martins et al. 2002; Rohlf 2006). All these problems become compounded in path analysis because of the requirement of testing multiple structural equations (in the case of SEM) or all the conditional probabilistic independencies that must be true for the causal model to be correct (in the case of the d-sep test). Path analysis models addressing evolutionary questions using multispecies data, but which ignore the underlying phylogenetic relationships among species, may therefore fail to detect the “true” causal structure between the variables. Attempts to use path analysis on multispecies datasets have been previously reported in the literature. However, most of these analyses failed to account explicitly for phylogeny (Sol et al. 2005, 2010) or did not specify the method used to account for phylogeny. Recently Santos and Cannatella (2011) used phylogenetic independent contrasts (Felsenstein 1985; Garland et al. 1992) as the data entered into SEM. This approach allowed the authors to undertake the path analysis accounting for the statistical nonindependence of the data arising from phylogenetic relatedness. However, independent contrasts assume that the data being analyzed evolves following a strict Brownian motion model of evolution and performance can be compromised if the assumption is not met (Revell 2010); furthermore, independent contrasts also assume strictly linear relationships between trait values (Quader et al. 2004). Here, we propose an alternative approach combining path analysis with phylogenetic generalized least squares (PGLS) methods (Martins and Hansen 1997). The advantage of PGLS is that it can incorporate distinct models of trait evolution, can combine continuous and categorical variables in a single model without the need to code dummy variables, and provides the value of the *y*-intercept (Martins and Hansen 1997). Further, a key advantage of using PGLS is that it would allow for path analyses to be undertaken using taxon-specific trait values rather than contrasts, facilitating interpretation of the results. Finally, in PGLS an evolutionary parameter is estimated simultaneously with model fit. The role of this parameter is to determine the amount of phylogenetic signal in the data (in the residuals of the model to be precise) and hence the necessary correction for the expected covariance in trait values resulting from phylogenetic relatedness, given the evolutionary model (Freckleton et al. 2002; Revell 2010). This is an important advantage because in some instances data may present a phylogenetic structure that is intermediate between that predicted by the evolutionary model and absence of phylogenetic correlation in the data (Freckleton et al. 2002; Revell 2010). Under such circumstances, PGLS models have been shown to outperform independent contrasts (Revell 2010).

Our proposed method for phylogenetic confirmatory path analysis (hereafter called PPA), integrates PGLS with the d-sep test developed by Shipley (2000b). This method exploits the concept of d-separation (Pearl 1988; Verma and Pearl 1988) to predict the minimal set of conditional probabilistic independencies that must all be true if the causal model is correct. The predicted independencies can thus be tested using various statistical tests, according to the nature of the data at hand, and the probabilities of these tests can be combined using Fisher's *C* test (Shipley 2000a), which reflects the deviation of the data from the correlational structure predicted if the causal model is correct. The d-sep test is a very general test that can be used for small sample sizes (because the inferential tests are not asymptotic), nonnormally distributed data (although the phylogenetic comparative methods we will use here do assume normal distribution of the phylogenetically transformed residuals), and nonlinear functional relationships. The only disadvantage of d-sep tests is that they cannot be used with causal models including latent (i.e., not measured) variables (Shipley 2000a,b). Shipley (2009) showed how confirmatory path analysis by d-sep tests can be generalized to deal with data having an underlying hierarchical or multilevel structure. Here, we generalize the method further to deal with multispecies data, which are not independent because of phylogenetic relationships among species. We use simulations to explore the consequences of ignoring phylogeny when undertaking confirmatory path analysis by d-sep tests. Finally, we revisit a previously published analysis of the evolutionary correlates of aggressive sibling strife in birds (Gonzalez-Voyer et al. 2007) as an empirical example of the implementation of the method.