#### Study selection criteria and phenotypic selection on phenology

We selected those studies where the relationship between flowering onset and/or flowering synchrony and fitness was shown. Flowering time was usually reported as either the calendar date or the relative time (e.g. mean date of flowering or flowering rank) an individual starts producing flowers. Flowering synchrony among individuals was reported in virtually all studies we reviewed as Augspurger’s (1983) index; only two studies used different indexes but with the same scale as that of Augspurger’s index. In most of the studies we reviewed, female fitness was reported as fruit or seed production. Pollen deposition (two cases) and finite growth rate (one case) were rarely reported. Male fitness was reported in some studies (five cases) along with female fitness; in these cases, male and female fitness were averaged or—when it was available—a composite fitness estimation (i.e. a fitness measure where both male and female fitness are considered) was preferred. Although lifetime fitness was evaluated in just eight species, fitness was evaluated in at least two reproductive episodes for nearly half (43%) of species considered in the analysis. Most studies included in our analyses were non-manipulative; for manipulative studies, data from control groups was used for effect size computation. We did not include any study which artificially manipulated flowering time phenotypes. Variation in methodological issues (e.g. study type, type of fitness or synchrony index used) were not addressed in our study because these were largely homogeneous in our data set and because our study is focused only on ecologically relevant moderator variables. Traditionally, selection on a given trait has been described by regression (Lande & Arnold 1983) or correlation (e.g. Primack 1980; Ollerton & Lack 1998) coefficients of the relationship between a character and fitness. Therefore, we considered Pearson’s *r* as the most straightforward metric of the effect size to assess selection on flowering time and flowering synchrony. When selection gradients or selection differentials (*sensu*Lande & Arnold 1983) were reported, these were transformed into *r* as long as any measure of dispersion and sample size, or the phenotypic variance-covariance matrix, were available. Values for Pearson’s *r* were also obtained from a variety of summarizing statistics (*F*, *Z*, *t*, χ^{2}) or from one tail *P*-values when sample size was known (Rosenthal 1991). In the studies where nonparametric correlations (Kendall or Spearman) were used, we calculated *r* from *P*-values. In two cases (Ollerton & Lack 1998; Salinas-Peba & Parra-Tabla 2007) we went back to the raw data and *r* was calculated directly from that data set. When information needed was available only in published plots, we accessed them with data thief ii (http://www.datathief.org) which is software to reverse engineer the raw data. When information from which *r* can be obtained was presented in more than two forms in a study, we preferred the form which needed the simpler algorithm to calculate *r*. Unlike selection gradients, selection differentials, simple regression and correlation analyses do not explicitly address direct versus indirect selection; however, all the reviews on the topic agree that, in most cases, total selection and direct selection match in direction (Endler 1986; Hoekstra *et al.* 2001; Kingsolver *et al.* 2001; Harder & Johnson 2009). Therefore, all these measures of selection should give qualitatively similar information. Studies from which we did not get the full information (in any form, numeric or in figures) needed to calculate *r*, were excluded from the analyses.

Pearson’s *r* was obtained or estimated from all the subsets of data provided by authors in the original study (e.g. more than one population) and each of these was transformed to *Zr* using Fisher’s algorithm (Hedges & Olkin 1985). However, as we decided to perform the meta-analysis at species level, only a single mean effect size was calculated for each species by averaging the *Zr* values (Rosenthal 1991). As in previous meta-analyses at species level, in studies providing data for more than one species, each species was considered as an independent effect size (e.g. Aguilar *et al.* 2006; Munguía-Rosas *et al.* 2009; Morales & Traveset 2010). When the same species had been studied in more than one research project, *Zr* values were averaged. Sometimes studies provided directional as well as disruptive or stabilizing selection gradients/differentials, and in these cases the selection gradient or differential with better fit (narrower confidence interval or standard error) was selected. However, stabilizing or disruptive selection gradients only rarely (four cases) showed a better fit than directional. Stabilizing and disruptive selection are addressed statistically with quadratic regression models (Lande & Arnold 1983); therefore, we calculated *r* from the statistics of the quadratic term.

Once we had obtained and transformed the effect sizes per species, the overall effect size was calculated for two effects: the relationship between flowering time and any estimation of fitness (hereafter selection on flowering time) and the relationship between flowering synchrony and fitness (hereafter selection on flowering synchrony). The overall effect size was calculated with ordinary random-effects meta-analysis (hereafter ordinary meta-analysis) and phylogenetic random-effects meta-analysis (hereafter phylogenetic meta-analysis). Random-effects models calculate an average effect size from a sample of effect sizes in contrast to the assumption of only one real effect size in the case of fixed-effect models. That is, when studies are significantly heterogeneous (as usually seen in ecology) random-effects models are preferred; the classical measure of heterogeneity is Cochran’s *Q*, which is calculated as the weighted sum of squared differences between individual study effects and the pooled effect across studies (Gurevitch & Hedges 1999). Analytically, the difference between models is essentially the estimated variance: fixed effects only consider the within-study variance while random models consider the between-study variance (τ^{2}) as well. To achieve a more accurate approximation to the distribution of *Zr,* the asymptotic within-species variance 1/(*n*−3) – where *n* is the sample size—was preferred (Hedges & Olkin 1985). The inverse of variance was used as a vector of weights in the analyses; as variance is inversely proportional to sample size, studies with larger sample sizes have relatively more weight in analyses than studies with smaller sample sizes.

Ordinary meta-analysis and phylogenetic meta-analysis are specialized cases of generalized theory of least squares (Lajeunesse 2009), a framework that addresses directly violations due to non-independence and heteroscedasticity of data. These violations are explicitly modelled in a *k × k* covariance matrix (Σ):

where *E* is a *k* × 1 vector of *k* number of effect sizes (δ), which are assumed to be normally distributed (*N*). *X* is the design matrix where moderator variables are codified; when only the overall effect size needs to be estimated, *X* becomes a vector of 1’s. In ordinary meta-analysis Σ contains the sampling variances of each effect size on its main diagonal, modelling a weighted least squares regression where effect sizes with large variances are penalized during the pooling of effect size. In the comparative method, phylogenetic relatedness is provided in off diagonal elements of Σ. The novel approach of phylogenetic meta-analysis takes advantage of the underlying shared theory of both approaches: the meta-analysis and the comparative method (Adams 2008; Lajeunesse 2009). Adams (2008) implemented the phylogenetic meta-analysis converting the meta-analytical data (*E* and *X)* into phylogenetically independent data (*E*_{new} and *X*_{new} in Adams’ notation) multiplying meta-analytical data by D – a matrix produced by singular decomposition of Σ. Then a meta-analysis is performed using a weighted regression model. The Adams’ approach was implemented in R (The R Development Core Team 2007) and updated on 29 April 2009 (http://www.public.iastate.edu/~dcadams/software.html). We modified the Adams’ R code slightly to perform random-effects models introducing as weights the inverse of τ^{2} plus the within-species variance. The significance of coefficients was assessed contrasting the value of the coefficients against *Z*-values. An improvement of this method has been developed by Lajeunesse (2009); the updated Adams’ and Lajeunesse’s approaches lead to similar results, but Lajeunesse (2009) designed some algorithms to estimate among-species heterogeneity, random-effects models, different evolutionary hypotheses and model contrasts with the Akaike Information Criterion (AIC). Ordinary and phylogenetic meta-analyses are competing hypotheses (Lajeunesse 2009) and the AIC is a criterion that seeks a model that has a good fit but few parameters (Crawley 2007). Some of these algorithms were implemented in a pilot software project called phylometa (http://lajeunesse.nescent.org/software.html). In this study the two currently available approaches were used.

The publication bias of this research topic was assessed both graphically, by drawing funnel plots, and statistically with regression methods (Egger *et al.* 1997) implemented in the Metafor package for R (Vietchtbauer 2010). Metafor was also used to estimate overall effect size of ordinary meta-analysis; this package uses weighted least squares regression as in Adams (2008) approach.

#### Factors affecting selection on phenology

We evaluated the influence of a number of moderator variables on the calculated effect sizes. These variables were: dependence on pollinators, longevity of plants, identity of the pollen vector, latitude and duration of flowering season at population level. The dependence on pollinators was scored using a scale from one to five, one being the strongest and five the weakest dependence on pollinators. The values were assigned as follows: dioecious plants (1); self incompatible + floral condition preventing self-pollination (e.g. distylous, protandrous etc.) (2); self incompatible and no reported floral condition preventing self-pollination (3); self-compatible + floral condition preventing self-pollination (4); self-compatible and no reported floral condition preventing self-pollination (5). We consider longevity in terms of reproductive events, that is, perennials with more than two reproductive events in one group (perennials) and annuals, biennials and monocarpic perennials in another (short-lived). The vectors of pollen were categorized as ‘animal-pollinated’ and ‘wind-pollinated’ plants. In most of the multi-population studies, the populations were within the same degree of latitude; only for two species were the studied populations more than one degree of latitude apart [*Arabidopsis lyrata* (L.) and *Lythrum salicaria* (L.) O’Kane & Al-Shehbaz]; in these two cases, effect sizes and latitudes were averaged. Most of the studies included information on these moderator variables in the published paper; when they did not, complementary literature was consulted or we asked authors for additional information. Herbarium specimens or photographs were also considered as complementary sources of information. The compatibility system was assigned using a large database on this topic belonging to M.M. Ferrer-Ortega (mferrer@uady.mx). All the moderator variables listed above were used to explain variation in selection on flowering time. However, in the case of selection on flowering synchrony, only dependence on pollinator, latitude and duration were taken into account owing to the lack of short-lived species and poor representation (only one example) of wind-pollinated species in data set. The effects of the moderator variables were implemented by including the variables in the matrix design (*X* ) and the significance of coefficients was assessed by its standard error or heterogeneity (Hedges & Olkin 1985). Ordinary meta-analyses and phylogenetic meta-analyses were fitted with the Metafor package and the approach of Adams (2008), respectively. We did not assess the effect of moderator variables using the Lajeunesse (2009) approach because phylometa software aborted the calculation before finishing the analyses, presumably because of problems with matrix convergence.

#### Effects of pre-dispersal seed predation and plant size on flowering time

It has been suggested repeatedly in the literature that both interactions with antagonists (particularly seed predators) and plant size can affect flowering time (reviewed by Kudo 2006 and Elzinga *et al.* 2007). Seed predators could affect the evolution of flowering time if those predators preferentially target seeds being produced at a particular time of the year (e.g. by early, peak or late flowering plants), whilst plant size can have an influence on flowering time because larger individuals tend to flower earlier in some species at least (e.g. Ollerton & Lack 1998). Therefore, we independently assessed the effect of these variables on flowering time using ordinary and phylogenetic meta-analyses as outlined above. We searched for studies in the databases described above using ‘phenotypic selection’, ‘plant size’ and ‘pre-dispersal seed predation’ as keywords. All studies showing the relationship between intensity of seed predation or a surrogate of size with flowering time, where enough information (numeric or graphical) to calculate *r* was given, were included in our survey. Seed predation was carried out by a wide variety of predators, typically insects, and studies assessing predation on developing seeds, mature seeds and fruits were included in the analysis. Plant size was frequently measured as plant height; far less frequently, size was reported as number of leaves (three cases), stem diameter (one case), plant volume (two cases) or a composite index (one case). Owing to the homogeneity in methodological (e.g. plant size surrogate) and biological issues (e.g. life form) seen in the reviewed studies, no moderator variable was considered in this analysis. Publication bias was once again assessed with funnel plots and using regression techniques (Egger *et al.* 1997).

#### Phylogenetic framework

To perform phylogenetic meta-analysis a phylogenetic hypothesis is required. An initial scheme was constructed using the online tool Phylomatic (http://www.phylodiversity.net/phylomatic), based on the Angiosperm super tree built by Davis *et al.* (2004). This allows one to input a list of plant species with their family affiliation to obtain a phylogenetic tree. We selected the ‘conservative seed plant tree’ option, which leaves nodes with less than 80% support as soft polytomies. After that, the topology was entirely resolved with the help of several studies for Asteraceae (Jansen *et al.* 1991), Cactaceae (Taylor & Zappi 1989; Nyffeler 2002; Arias *et al.* 2003), Caryophyllaceae (Popp & Oxelman 2004; Fior *et al.* 2006), Ericaceae (Kron 1997; Kron *et al.* 2002), Fabaceae (Kajita *et al.* 2001), Orchidaceae (Cameron 2007), Poaceae (GPWG 2000; Schneider *et al.* 2009), Polemoniaceae (Johnson *et al.* 2008), and Ranunculaceae (Ro & McPherson 1997; Wang *et al.* 2009). Relationships for species in the genus *Solidago* were resolved using information from internal transcribed spacer (ITS) sequences (Genbank accessions: AF046982, DQ005981, EU125357, FJ859719, FJ980344) using an exhaustive parsimony analysis conducted in PAUP 4.0b10 (Swofford 2003). As several species have not been taken into account in any published study, the branch length is unknown. To handle this issue a branch length of one was assumed, as in previous phylogenetic meta-analyses facing this situation (Verdú & Traveset 2004, 2005; Munguía-Rosas *et al.* 2009). Initially we built a primary phylogeny considering all the species and all the effects (Fig. 1). Then species needed for specific analyses were selected by trimming unneeded branches in the primary phylogeny. From this tree we obtained a covariance matrix where the topology and the sum of branch distance from root to tips was used to penalize species relatedness.