PHYLOGENETIC META-ANALYSIS

Authors


Abstract

Meta-analysis is a powerful statistical technique that combines the results of independent studies to identify general trends. When the species under examination are not independent however, it is also necessary to incorporate phylogenetic information into the analysis. Unfortunately, current meta-analytic approaches cannot account for lack of independence resulting from shared evolutionary history, so a general solution to this problem is lacking. In this article, I derive a model for phylogenetic meta-analysis, so that data across studies may be summarized with evolutionary history explicitly incorporated. The approach takes advantage of common aspects of linear statistical models used by both meta-analysis and the phylogenetic comparative method, thereby allowing them to be analytically combined. In this manner, the correlation structure generated by phylogenetic history can be incorporated directly into the meta-analytic procedure. I illustrate the approach by examining the prevalence of body size clines in mammals. The approach is general, and can also be used to incorporate correlation structure among studies generated by other factors, such as spatial or temporal proximity, or environmental similarity. Therefore, this procedure provides a general statistical template for meta-analytic techniques that can account for attributes that generate nonindependence among studies. Implications of the phylogenetic meta-analysis are discussed.

Ecologists and evolutionary biologists have long used a comparative approach to evaluate the prevalence of general patterns in large datasets. These datasets compile findings across species, communities, or ecosystems, and contain the results from a set of individual studies or results from a single analysis. In recent years, two rather distinct analytical toolkits have been developed to identify broad-scale patterns in such data. One approach, meta-analysis, provides a statistical framework for the quantitative synthesis of results across independent studies. These methods evaluate the overall prevalence for a trend by incorporating both the magnitude of study effects as well as their uncertainty (Hedges and Olkin 1985). The second approach is the phylogenetic comparative method. Here the statistical relationships of two or more variables are evaluated explicitly in a phylogenetic framework to account for the lack of independence due to shared evolutionary history (Felsenstein 1985). Because these approaches are considered statistically superior to analytical alternatives, their use in ecological and evolutionary studies is steadily increasing.

When data addressing a common hypothesis are available for a set of evolutionarily related species, meta-analysis is clearly the appropriate tool for identifying general patterns. However, the species under examination are not independent, so it is also necessary to incorporate phylogenetic information directly into the analysis. Unfortunately, a weakness of conventional meta-analysis is that lack of independence across studies cannot be taken into account. Thus, although the advantages of combining meta-analysis and the phylogenetic comparative method are obvious, a formal analytical framework has not yet been developed (see discussion in Jennions et al. 2001; Schino 2006). Some researchers have attempted to bridge this gap through phylogenetic simulation. Here, datasets are simulated along a phylogeny, and a meta-analysis is performed on each simulated dataset. Meta-analysis is also performed on the observed data, and these results are compared to those found from the simulated datasets (Ashton and Feldman 2003; Verdú and Traveset 2004, 2005). Although this approach provides some evolutionary context for assessing meta-analytic results, it does not incorporate the phylogenetic information directly in the calculations of the meta-analysis itself. Thus, the meta-analytic results themselves are still phylogenetically naïve, because they are based solely on the species' data without accounting for their phylogenetic relationships.

The purpose of this article is to derive a model for phylogenetic meta-analysis, so that data across studies may be summarized explicitly in light of evolutionary history. The analytical approach derived here is general, and is useful for incorporating correlation structures resulting from shared evolutionary history, as well as from other factors, such as spatial or temporal proximity, or environmental similarity. Thus, this procedure provides a general statistical template for performing meta-analytic techniques that can account for attributes that generate nonindependence among studies.

Developing a Phylogenetic Meta-Analysis

Because analytically combining meta-analysis and the phylogenetic comparative method requires an understanding of the underlying statistical models of both, I briefly describe them here. Meta-analysis provides a statistical framework for combining the results of independent studies (Hedges and Olkin 1985; Rosenberg et al. 2000). In conventional meta-analysis, summary statistical data are obtained for a set of studies, and from these, effect sizes and their variances are calculated. Effect sizes represent the strength of the signal in each study, and using a common effect size measure across studies places their results in a common scale (see Cooper and Hedges 1994; Cooper 1998). Effect sizes are then statistically combined to obtain an overall assessment of the strength of evidence supporting a particular hypothesis. Procedurally, this is accomplished through a weighted statistical analysis, where effect sizes from individual studies are weighted inversely by their variance (i.e., effects with less variance are considered more reliable). Therefore, the larger the cumulative effect size, the greater statistical support for the hypothesis (see Cooper and Hedges 1994).

In meta-analysis, effect sizes are statistically combined using a weighted general linear model

image(1)

where E is a vector of effect sizes, X is a design matrix of independent variables containing grouping variables and continuous covariates, and β is a vector of partial regression coefficients (Hedges and Olkin 1985; Rosenberg et al. 2000). The weight for each study is defined as the inverse of its variance, wi= 1 /vi, and a diagonal matrix, W, contains the study weights on the diagonal. The weighed coefficients (β) for the meta-analytic model are then estimated as

image(2)

As with all general linear models, the design matrix (X) describes the type of model structure under investigation. If the design matrix (X) contains only a column of ones, β represents the weighted cumulative effect size, inline image. This value describes the overall effect present across a set of studies. The cumulative effect size can be statistically evaluated using its standard error and confidence intervals (Hedges and Olkin 1985), or through bootstrap confidence intervals obtained by resampling the set of effect sizes (Adams et al. 1997; Rosenberg et al. 2000). Alternatively, because inline image is simply a weighted partial regression coefficient, its significance can be evaluated using a Student's t-test derived from its standard error. Finally, the design matrix (X) can contain additional columns representing categorical or continuous variables that describe different meta-analytic structures. Thus, meta-analysis may be used to compare effects across different groups, as in analysis of variance (ANOVA) (e.g., Gurevitch et al. 1992), or can be used as regression to identify the covariation between effect sizes and a continuous covariate.

In contrast to meta-analysis, the phylogenetic comparative method is used to describe the statistical relationships of two or more traits across species while taking into consideration their shared evolutionary history. Several analytical approaches have been proposed, including phylogenetically independent contrasts (Felsenstein 1985), phylogenetic regression (Grafen 1989), phylogenetic generalized least squares (PGLS: Martins and Hansen 1997), as well as other procedures (e.g., Cheverud and Dow 1985). Of these, PGLS is the most general statistically, as both independent contrasts and phylogenetic regression are special cases of this model (see Garland and Ives 2000; Rohlf 2001). With this approach, the nonindependence among taxa due to shared evolutionary history is incorporated directly into the statistical analysis as a correlated error structure.

Statistically, the phylogenetic generalized least squares model is described as

image(3)

where Y is a matrix of dependent variables for the species, X is a design matrix of independent variables, β is a vector of partial regression coefficients, and ɛ is the error (Rohlf 2001). Unlike standard regression however, ɛ does not follow a normal distribution, but instead is modeled with a correlation structure that captures the lack of independence due to shared phylogenetic history. This history is described by the phylogenetic covariance matrix, Σ, which is derived directly from the estimate of the species' phylogeny. The diagonal elements of Σ represent the phylogenetic distances between each species and the root of the tree, and the off-diagonal elements describe the phylogenetic distances of shared branches between species (for details see Martins and Hansen 1997; Garland and Ives 2000). Parameter estimates of the PGLS model are found as

image(4)

The significance of the model is evaluated using standard techniques, and estimates of all nodes of the tree can be obtained as in least squares regression (Rohlf 2001). Note that if X contains only a column of ones, the overall mean is obtained.

A useful alternative formulation of PGLS is found by transforming X and Y by the phylogenetic covariance matrix and then using standard linear regression with uncorrelated error terms (Garland and Ives 2000). This alternative formulation yields identical statistical results to those found from equation (4) (Garland and Ives 2000; Rohlf 2001). With this method, X and Y are transformed as Z=DYand U=DX, where D is found from a singular-value decomposition of Σ (Garland and Ives 2000; p. 361). The relationship between U and Z is then found from a standard linear regression model

image(5)

Both meta-analysis and the phylogenetic comparative method evaluate the relationship between dependent and independent variables using least-squares approaches; meta-analysis uses general linear models, whereas PGLS uses generalized least squares models. Importantly, PGLS can be evaluated using a general linear model if the dependent and independent variables are transformed by the phylogenetic covariance matrix (Garland and Ives 2000). Therefore, because both approaches can be represented by general linear models, the standard linear representation of meta-analysis (eq. 2) and the linear form of PGLS (eq. 5) can be analytically combined to derive a common model for phylogenetic meta-analysis.

To accomplish this, the vector of effect sizes (E) and the meta-analytic design matrix (X) are converted by D, as found from the phylogenetic covariance matrix Σ. This generates transformed variables of the effect size vector Enew=DE, and transformed variables of the independent variables Xnew=DX. Parameters of the phylogenetic meta-analysis are then found by solving the weighted general linear model

image(6)

where W is a diagonal weight matrix from conventional meta-analysis. As in conventional meta-analysis, X can contain independent variables that represent groups (ANOVA), continuous variables (regression), or some combination of the two. This matrix is then transformed as Xnew to take shared evolutionary history into account. If the overall cumulative effect size is desired, X contains only a vector of ones. Assessing the significance of parameters of this model is identical to approaches for conventional meta-analysis; using confidence intervals and t-tests from standard errors, or using resampling procedures (see above).

An Example

As an illustration of phylogenetic meta-analysis, I examined latitudinal patterns of body size in mammals. Many organisms tend to be larger in cooler climates, a pattern described as Bergmann's rule (Bergmann 1847; Mayr 1956). For endothermic vertebrates, these patterns are typically explained as an evolutionary response to minimize heat loss in colder climates (Bergmann 1847; Walters and Hassall 2006; Olalla-Tárraga and Rodríguez 2007). Several studies have documented Bergmann's clines in a variety of mammals (Ashton et al. 2000; Meiri and Dayan 2003; Blackburn and Hawkins 2004; but see Meiri et al. 2004), and because the majority of species studied exhibit this pattern, it is generally assumed that Bergman's clines are common in mammals. However, none of these studies have explicitly taken the evolutionary relationships of the species into account when determining the overall trend across species (i.e., cumulative effect size).

I used the data of Ashton et al. (2000), which contained intraspecific correlations of body size and latitude across multiple geographic localities for various mammalian species. From these data, body size clines for 40 species were appropriate for meta-analysis, as they were derived from four or more geographic localities (Fig. 1). Following conventional meta-analytic procedures, correlation coefficients were transformed to the Fisher's Zr, with the variance estimated as: vZ= 1 / (N− 3) (Hedges and Olkin 1985; Rosenthal 1991). To estimate the evolutionary relationships of these taxa I used a recently published phylogenetic supertree for mammals (Bininda-Emonds et al. 2007). This phylogeny, based on genetic data, contained nearly all extant mammal species (4510 species), and included time-calibrated branch lengths. From this phylogeny I extracted the topology and branch lengths for the 40 species selected for meta-analysis by pruning all taxa that were not present in the meta-analytic dataset (Fig. 2).

Figure 1.

The correlation of body size and latitude plotted against the number of geographic localities (sample size) for 40 species of mammal (data from Ashton et al. 2000). The cumulative effect size is shown as a dashed line.

Figure 2.

Phylogeny for 40 mammalian species used in this study, obtained from larger mammalian supertree of Binida-Emonds et al. (2007).

Both conventional meta-analysis and phylogenetic meta-analysis were performed to assess the prevalence of Bergmann's clines in mammals. For conventional meta-analysis, effect sizes were combined to obtain a cumulative effect size (eq. 2), which was evaluated using confidence intervals and a Student's t-test based on the standard error. For the phylogenetic meta-analysis, the phylogenetic covariance among taxa was taken into account (eq. 6), and the phylogenetic cumulative effect size was determined. The significance of the phylogenetic cumulative effect size was evaluated using a Student's t-test based on the standard error, as well as through randomization (see Adams et al. 1997; Collyer and Adams 2007). Here, species' effect sizes and their associated weights were randomly assigned to tips of the phylogeny, and the phylogenetic cumulative effect size was recalculated. This procedure was repeated for 9999 iterations, and the distribution of randomly generated cumulative effect sizes was used to evaluate the significance of the observed cumulative effect size. All analyses were performed in R (R Development Core Team 2006). (Computer code for performing a phylogenetic meta-analysis in R, along with example data, is found in the online Supplementary Material.)

Using conventional meta-analysis, I found a significant cumulative effect size (inline image, CI = 0.2272 – 0.3494, t= 9.55, P < 0.0001), implying that there was general support for the prevalence of Bergmann's clines in mammals (Fig. 1). This corroborates previous results that found a majority of species exhibited clines in the direction predicted by Bergmann's rule (i.e., vote-counting: Ashton et al. 2000). Using phylogenetic meta-analysis however, I found that this relationship was no longer significant (inline image, t= 1.415, P= 0.165, Prand= 0.1571). This implied that for these data, once shared evolutionary history was taken into account, a significant trend between body size and latitude was not present. One interpretation of this finding is that the association between body size and latitude can be explained simply by evolutionary history. If this interpretation is correct, it is not clear whether latitudinal body size clines in mammals are the result of heat conservation mechanisms (as Bergmann's rule would imply), or shared evolutionary history. Thus, as with other comparative analyses, meta-analysis with and without phylogenetic information can lead to differing biological conclusions with respect to interpreting the underlying processes responsible for observed patterns (see Felsenstein 1985).

Discussion

It is common in evolutionary biology to examine the relationships among variables sampled across related species to explore general comparative trends. However, such statistical analyses must be performed in a phylogenetic framework to account for the lack of independence due to shared evolutionary history. Sometimes, the overall prevalence for a particular trend is examined by synthesizing findings across a set of independent studies. Here, meta-analysis is clearly the appropriate statistical tool, but when the species under examination are not independent, it is also necessary to incorporate phylogenetic information directly into this analysis. In this article, I derive a model for phylogenetic meta-analysis, so that data across studies may be summarized explicitly in light of evolutionary history. I showed how both meta-analysis and phylogenetic generalized least squares can be represented as general linear models. As such, the two can be combined to derive a common framework for meta-analysis in a phylogenetic context. I illustrated phylogenetic meta-analysis by examining the prevalence of body size clines in mammals. I found that when shared evolutionary history was taken into consideration, there was no statistical support for a general trend of larger body sizes at higher latitudes. Therefore, the relationship between body size and latitude can be explained simply by shared evolutionary history, rather than some other causal mechanism.

Meta-analysis is an increasingly useful tool for identifying common trends across evolutionary studies. This is due to the fact that it can be used to combine the results of disparate studies in an objective and quantitative manner. Because meta-analysis takes into consideration the magnitude of the effect present in each study, it provides a considerably more comprehensive synthesis of prior research as compared to simply counting how many studies found results in favor of, or opposed to, a particular hypothesis (i.e., vote-counting: see discussion in Hedges and Olkin 1985). Therefore, meta-analysis provides an important tool in assessing major questions in ecology and evolutionary biology. Despite its many advantages, a weakness of conventional meta-analysis is that lack of independence across studies cannot be taken into account. This weakness is particularly acute for evolutionary meta-analyses, as the shared evolutionary history of species generates nonindependence (Jennions et al. 2001; Schino 2006). Thus, meta-analyses summarizing results across related species suffer from the potential of overestimating the significance of the cumulative effect size in a set of studies. With the phylogenetic meta-analysis proposed here, this weakness is completely alleviated.

Finally, the phylogenetic meta-analysis approach provides a statistical template for a general meta-analysis that can account for other attributes that generate nonindependence. The appeal of generalized least squares is the fact that correlated error structures can be incorporated directly into the model. In the case of phylogenetic meta-analysis, the correlated error structure is modeled by the phylogenetic covariance matrix. However, the approach is not only limited to phylogenies. If some other factor it resulted in nonindependence among studies, the correlation structure of that factor could similarly be estimated and taken into account using the same procedure. One obvious factor relevant for many ecological meta-analyses is geographic proximity. Biological variables are frequently spatially autocorrelated; and as such studies conducted in close geographic proximity are not statistically independent. By incorporating spatial nonindependence into the approach described here, one could conduct a “spatial meta-analysis.” Here, the geographic localities for each study are used to generate a geographic covariance matrix, and this matrix is used to transform the effect sizes (E) and matrix of independent variables (X). The weighted meta-analysis would then examine the cumulative effect size present in the set of studies while taking into consideration geographic proximity. Similarly, other correlated error structures that account for nonindependence could be envisioned for additional applications (e.g., temporal nonindependence, environmental nonindependence, etc.). One could even envision accounting for multiple correlated error structures, by first transforming E and X by the correlation structure of one factor (e.g., phylogeny), and then transforming the new variables by the correlation structure from a second factor (e.g., geography). Thus, using the phylogenetic meta-analysis approach as a statistical template, one can not only account for shared evolutionary history in meta-analysis, but in fact any covariance structure that generates nonindependence across studies.

Associate Editor: K. Crandall

ACKNOWLEDGMENTS

I thank J. Church, M. Collyer, and N. Valenzuela for comments on the manuscript. This work was sponsored in part by NSF grant DEB-0446758.

Ancillary