All meta-analyses were conducted in the R environment (version 2·6·1; R development Core Team 2006) using weighted linear mixed effects models (Pinheiro & Bates 2000). In all meta-analyses, we accounted for the hierarchical structure of our data set (e.g. with multiple effect sizes from the same population) by including host species and study population as nested random effects (Nakagawa *et al.* 2007). This allowed us to use all data from the papers we located and include multiple effect sizes from a study or host population in the same analysis, without violating the fundamental assumption of independence.

First, we conducted an all-inclusive meta-analysis, containing all effect sizes (i.e. both parasitism and immunity results). Subsequently, we split this into two meta-analyses according to the response variable involved, that is, we conducted a parasitism meta-analysis and an immunity meta-analysis. In all three meta-analyses, we used weighted linear mixed effects models and restricted maximum likelihood (REML) to determine overall mean effect size and 95% confidence intervals. Where the 95% confidence intervals for an effect size did not span zero, this effect could be considered statistically significant at the 5% level. We calculated the proportion of variation in effect size that could be explained by the random factors of study population and host species. We also calculated two measures of heterogeneity for each meta-analysis: *Q*_{T} (total heterogeneity) and *Q*_{REML} (the residual heterogeneity in random-effects models), which is a more appropriate measure of heterogeneity when meta-analyses contain random effects (Nakagawa *et al.* 2007). One of the limitations in using mixed-models is the difficulty of estimating accurate degrees of freedom for statistics when multiple random effects are used; for example, the estimate of statistical significance for *Q*_{REML} may be inaccurate due to inappropriate degrees of freedom used. Therefore, we used meta-regression analyses using an information theoretic approach based on AIC corrected for small sample size (AIC_{c}; Burnham & Anderson 2002; Anderson 2008), to avoid model selection based on heterogeneity statistics, which has been used traditionally (Cooper & Hedges 1994). In this way, we were able to compare null models (i.e. normal meta-analysis) and other models (i.e. meta-regression models incorporating predictors) to evaluate the importance of each relevant predictor. Model selection for the parasitism and immunity meta-regression analyses were conducted separately. In each case, we created a set of candidate models using maximum likelihood parameter estimation, which included, in addition to the random effects of host species and population, all possible combinations of our fixed factors (i.e. predictors) of interest. For parasitism, these fixed factors were: measure of parasitism (prevalence or parasitaemia), mean duration of reproductive effort manipulation (in days), host sex and parasite genus (*Plasmodium*, *Haemoproteus*, *Leucocytozoon* or *Trypanosoma*). For immunity, the fixed factors were: assay type (PHA, SRBC, diptheria–tetanus vaccine), mean duration of reproductive effort manipulation (in days), manipulation stage (during incubation only vs. during brood rearing only) and host sex. Mean duration of manipulation was unrelated to both manipulation stage and assay type (manipulation stage: *F*_{1,21} = 0·108, *P* = 0·745, assay type: *F*_{2,20} = 0·452, *P* = 0·642). During the process of model selection for immunity studies, three effect sizes were excluded since they involved an experimental procedure that was different from the majority and therefore could not be assigned a meaningful factor level for one of more of our fixed factors of interest. These were two effect sizes in which the assay involved use of an NDV vaccine (Nordling *et al. *1998) and one where both incubation and brood rearing had been manipulated (Moreno *et al. *1999). For each model an Akaike weight was calculated, which indicates its level of support (since Akaike weights sum to 1, models with Akaike weights approaching 1 receive the most support relative to other models). We subsequently used model averaging to determine the relative importance of each fixed factor, as expressed by the sum of Akaike weights from all models in which that factor was included.