Offspring performance is well buffered against stress experienced by ancestors

Evolution should render individuals resistant to stress and particularly to stress experienced by ancestors. However, many studies report negative effects of stress experienced by one generation on the performance of subsequent generations. To assess the strength of such transgenerational effects we propose a strategy aimed at overcoming the problem of type I errors when testing multiple proxies of stress in multiple ancestors against multiple offspring performance traits, and we apply it to a large observational dataset on captive zebra finches (Taeniopygia guttata). We combine clear one‐tailed hypotheses with steps of validation, meta‐analytic summary of mean effect sizes, and independent confirmatory testing. We find that drastic differences in early growth conditions (nestling body mass 8 days after hatching varied sevenfold between 1.7 and 12.4 g) had only moderate direct effects on adult morphology (95% confidence interval [CI]: r = 0.19–0.27) and small direct effects on adult fitness traits (r = 0.02–0.12). In contrast, we found no indirect effects of parental or grandparental condition (r = −0.017 to 0.002; meta‐analytic summary of 138 effect sizes), and mixed evidence for small benefits of matching environments between parents and offspring, as the latter was not robust to confirmatory testing in independent datasets. This study shows that evolution has led to a remarkable robustness of zebra finches against undernourishment. Our study suggests that transgenerational effects are absent in this species, because CIs exclude all biologically relevant effect sizes.

From an evolutionary perspective, we would expect that natural selection acts to minimize the susceptibility of organisms to harmful direct and indirect, condition-transfer effects. Fitnessrelated traits in particular are selected to be well buffered against detrimental influences from the environment (evolution of stress tolerance, robustness, and developmental canalization; e.g., Waddigton 1942;Siegal and Bergman 2002). Moreover, selection will disfavor mothers that handicap their own offspring. In general, detrimental carry-over effects may be inevitable to some extent, but selection will work against them. In contrast, "transgenerational anticipatory effects" are thought to have evolved for an adaptive function. Such "transgenerational anticipatory programming" of offspring may have evolved when the environments in which parents and offspring grow up are generally similar (e.g., Krause and Naguib 2014;Raveh et al. 2016), and when proximate mechanisms of epigenetic inheritance enable it (e.g., Holliday 1987;Colborn et al. 1993;Jablonka and Raz 2009; but see also Proulx and Teotónio 2017 for alternative scenarios for the evolution of adaptive anticipatory effects). Studies of epigenetic inheritance boomed since the early 1990s (Jablonka and Raz 2009;Jensen 2013), focusing mostly on organisms that are immobile or lack differentiation between soma and germ cells, such as fungi (Benkemoun and Saupe 2006), plants (Cubas et al. 1999;Molinier et al. 2006;Henderson and Jacobsen 2007;Feng 2010), and nematodes (Bagijn et al. 2012;Rechavi et al. 2014;Dey et al. 2016;Lev et al. 2019). Meanwhile, sexually reproducing animals, such as fruit flies (Magiafoglou and Hoffmann 2003), birds (Naguib and Gil 2005;Monaghan 2008; Khan et al. 2016), mice (Morgan et al. 1999;Carone et al. 2010), rats (Anway et al. 2005), and humans (Colborn et al. 1993) also became popular study subjects for epigenetic inheritance. However, in the latter group, we still lack studies that show mechanistically how experiences made by the soma can be transferred to the germline. In sum, the widespread existence of both types of transgenerational effects seems somewhat unlikely, because condition transfer is selected against and anticipatory effects may lack a mechanism that could achieve such adaptation.
Although the mechanisms behind most of the observed epigenetic inheritance remain largely unclear (Jablonka and Raz 2009;Miska and Ferguson-Smith 2016), evolutionary biologists have studied transgenerational effects and have estimated the fitness consequence of stress experienced by one generation on individuals of subsequent generations in various animal systems (Ledón-Rettig et al. 2013;Guerrero-Bosagna et al. 2018), sometimes with individuals from the wild (Drummond and Ancona 2015), but mostly with captive-bred animals, for example (Naguib and Gil 2005;Uller et al. 2005;Alonso-Alvarez et al. 2007; Krause and Naguib 2014;Wilson et al. 2019). Such effects have typically been investigated experimentally across two generations, that is, effects of increasing stress experienced by the parents on the offspring, using brood or litter-size manipulation (Naguib and Gil 2005;Alonso-Alvarez et al. 2007), restricted food supply during female pregnancy, or nestling or puppy stages (Bertram et al. 2008), restraint stress exposure during early life where individuals were intermittently deprived from social interactions (Goerlich et al. 2012), corticosterone intake during female pregnancy or early individual development (Khan et al. 2016), and cold or heat shock (mostly for insects, e.g., in Drosophila and Tribolium; Magiafoglou and Hoffmann 2003;Eggert et al. 2015). In general, the reported significant effects are often accompanied by numerous nonsignificant test results, and sometimes a significant effect with a sign opposite to expectations may still get interpreted as evidence for the existence of transgenerational effects. Moreover, transgenerational effects are sometimes being reported in a sex-specific way (interaction effect between the sex of the parent and that of the offspring). For example, in humans, effects from (grand)mother to (grand)daughters and from (grand)father to (grand)sons have been reported (Pembrey et al. 2006;Kaati et al. 2007). When studies examine multiple predictors and response variables in multiple ancestors, there is a risk of selective reporting of the strongest effects. Unbiased estimates can only be obtained when including all predictors and responses that appeared worth investigating at the start of a study (or when subsetting is not conditional on the results). Hence, for assessing the importance of transgenerational effects, we suggest that rigorous testing of one-tailed a priori hypotheses and meta-analytic summary of effect sizes is essential.
Here, we use observational data of more than 2000 captive zebra finches from a long-term, error-free pedigree to study the sex-specific effects of multiple stressors experienced during early development on later-life morphology and fitness-related traits. We consider both direct, intragenerational effects and effects of developmental stress experienced by parental and grandparental generations. For all individuals, we systematically recorded variables that have previously been used as indicators of early developmental conditions: brood size (Koskela 1998;Naguib and Gil 2005;Tschirren et al. 2009), hatching order (Saino et al. 2001;Ferrari et al. 2006;Wilson et al. 2019), laying order of eggs (Soma et al. 2007;Gilby et al. 2012), and clutches (Tomita et al. 2011), egg volume (Love and Williams 2011), and nestling body mass at 8 days old (Bolund et al. 2010). We also used five morphological traits as dependent variables: tarsus (Naguib and Gil 2005;Tschirren et al. 2009) and wing length (Naguib and Gil 2005;Krause and Naguib 2014;Wilson et al. 2019), body mass (Tschirren et al. 2009;Krause and Naguib 2014;Wilson et al. 2019), abdominal fat deposition (Bolund et al. 2010), and beak color (Tschirren et al. 2009;Bolund et al. 2007Bolund et al. , 2010Wilson et al. 2019). For a subset of birds, we also measured life span and aspects of reproductive performance (female clutch size in cages and in aviaries, female fecundity, male infertility in cages, male within-pair paternity, male siring success, female embryo mortality, nestling mortality for a given social mother, and for a given social father, female, and male seasonal recruits (for details, see the "Methods" section), following, for example (Naguib et al. 2006;Tschirren et al. 2009;Krause and Naguib 2014;Khan et al. 2016;Wilson et al. 2019).
Regarding condition transfer, we focus our analyses on the a priori hypothesis that the stress that an individual's parents and grandparents experienced in early life has detrimental effects on the morphology and reproductive performance of that individual as an adult. We assume that the direction of effects is independent of the sex of the focal individual. We further hypothesize a priori that if such transgenerational effects were sex-specific (e.g., epigenetics of sex chromosome by environment interaction), the environment experienced by mothers and grandmothers would affect daughters and granddaughters whereas the environment experienced by fathers and grandfathers would affect sons and grandsons (Pembrey et al. 2006;Kaati et al. 2007). Such one-tailed expectations have the advantage that trends which are opposite to the expectation can be quantified as negative effect sizes. If the null hypothesis is true, that is, if there is no effect, we expect a meta-analytic mean effect size that does not differ from zero.
Regarding anticipatory effects, we focus on the a priori, onetailed hypothesis that offspring perform better as adults when they experienced similar early-life growth conditions as their parents did (measured as nestling body mass at 8 days old). Note that this only concerns variation in individual growth conditions, while the captive environment (aviary or laboratory) provides relatively stable conditions (Kuijper et al. 2014;Kuijper and Johnstone 2016).
First, we validate the six proxies of early developmental stress by examining their direct effects on the individual itself. Second, we use meta-analysis to average transgenerational effect sizes across multiple traits reflecting either morphology or reproductive performance of adult male and female zebra finches. Lastly, we use an independent dataset (i.e., additional birds from populations with shorter pedigrees, but otherwise equal data quality), to assess whether the significant findings from the initial tests can be replicated.

STUDY SYSTEM AND GENERAL PROCEDURES
The zebra finch is an abundant, opportunistic breeder in Australia in the wild (Zann 1996) that also breeds easily in captivity. We used birds from a domesticated zebra finch population with a 13-generation error-free pedigree, maintained at the Max Planck Institute for Ornithology, Seewiesen, Germany (#18 in Forstmeier et al. 2007). Housing conditions have been described elsewhere (see Bolund et al. 2007;Ihle et al. 2015;Pei et al. 2019). The study was conducted under license (permit number 311.4-si and 311.5-gr, Landratsamt Starnberg, Germany).
We used all individuals (N = 2099) from this study population for which complete information was available on laying and hatching dates, egg volume, and nestling mass at 8 days of age, both from the focal individual itself, but also from its nearest six ancestors (parents and grandparents). Breeding experiments were conducted either in cages with single pairs whereby the partners were assigned to each other, or in semioutdoor aviaries with groups of females and males whereby birds freely formed breeding pairs. During episodes of breeding, nests were checked daily on weekdays and occasionally during weekends to collect the required data (see below).

PROXIES OF EARLY DEVELOPMENTAL STRESS
We examined six parameters that have been used in previous studies as potential proxies of nutritional status or stress experienced during early development: (i) the laying order of eggs within a clutch (range: 1-18, mean = 3.1, SD = 1.7, note that only five birds hatched from eggs with laying order >10; a clutch was defined as eggs that were laid consecutively by a focal female allowing for laying gaps of maximally 4 days between subsequent eggs; Soma et al. 2007;Gilby et al. 2012), (ii) the order of clutches laid within a breeding season (range: 1-8, mean = 2.1, SD = 1.2; Tomita et al. 2011), (iii) the order of hatching within a brood (range: 1-6, mean = 2.1, SD = 1.1; Saino et al. 2001;Ferrari et al. 2006), (iv) brood size (number of nestlings reaching 8 days of age (range: 1-6, mean = 3.3, SD = 1.2; Koskela 1998;Tschirren et al. 2009), (v) relative egg volume (i.e., centered to the mean egg volume laid by a given female; range: −0.26 to 0.30, mean = 0.01, SD = 0.07; Love and Williams 2011), and (vi) nestling body mass at 8 days of age (range: 1.7-12.4 g, mean = 7.2 g, SD = 1.6; Bolund et al. 2010). Egg volume was calculated as V = (1/6) × π × Width 2 × Length, whereby egg length and width were measured to the nearest 0.1 mm. Note that the first five parameters describe developmental conditions that are beyond the control of the developing organism, while the last one (nestling mass) is a trait that also depends on genetic variation in growth rate. In our population, a cross-fostering experiment revealed that nestling mass at day 8 has a heritability of 13% (Bolund et al. 2010). Thus, zebra finch nestling mass primarily reflects environmental conditions experienced by the individual during early growth. We decided to measure nestling mass only at 8 days of age, because at that age nestlings on average reach about half of their final mass, and hence we expected variation due to extrinsic growth conditions to be maximal.
We hypothesized that individuals (or their parents and grandparents) developed under more stressful conditions if they came from eggs later in the laying, clutch or hatching sequence, were raised in a larger brood, hatched from an egg that was relatively small, and had a lower body mass at 8 days of age. For the measure of similarity of parent-offspring early developmental condition (predictor of "anticipatory effect"), we calculated the absolute difference in nestling mass at 8 days between parent (mother or father) and offspring (mother-offspring range: 0-8.4 g, mean = 1.8 g, SD = 1.4; father-offspring range: 0-7.9 g, mean = 1.7 g, SD = 1.3).
To aid interpretation, we scored all stressors in such a way that all estimated effects are expected to be positive (multiplication by −1 where necessary). Thus, positive effect sizes indicate detrimental effects of a stressor on a trait.

ADULT PERFORMANCE TRAITS
We studied the following morphological traits, measured when the individual reached adulthood (median = 115 days of age, range 93-229 days, >95% of birds were 100-137 days old): (i) body mass (measured to the nearest 0.1 g using a digital scale, N = 947 females and 1012 males), (ii) length of the right tarsus (measured from the bent foot to the rear edge of the tarsometatarsus, including the joint, using a wing ruler to the nearest 0.1 mm, N = 944 females and 1008 males; see method 3 in Forstmeier et al. 2007), (iii) length of the flattened right wing (measured with a wing ruler to the nearest 0.5 mm, N = 939 females and 1004 males), (iv) visible clavicular and abdominal fat deposition, scored from 0 to 5 in 0.5 increments (N = 932 females and 989 males), and (v) redness of the beak (Bolund et al. 2007), scored by comparison to a color standard following the Munsell color scale from 0 to 5.5 in 0.1 increments (N = 947 females and 1012 males). Male and female traits were analyzed separately, leading to a total of 10 morphological traits.
We also studied the following 13 fitness-related traits (data taken from Pei 2020b): (i) female clutch size measured in cages (N = 166 females) or (ii) in aviaries (N = 274 females); (iii) female fecundity, that is, total number of eggs laid in aviaries without nestling rearing (N = 230 females); (iv) male infertility, measured in cages as the proportion of nondeveloping eggs (N = 132 males); (v) male within-pair paternity, measured in aviaries as the proportion of eggs fertilized by the social male (N = 237 males); (vi) male siring success, measured in aviaries as the total number of eggs sired (within and extra pair; N = 281 males); (vii) female embryo mortality, measured as the proportion of a genetic mother's embryos dying (N = 228 genetic mothers); (viii) nestling mortality, measured as the proportion of hatchlings in a brood that died before day 35, for a given social mother (N = 233); and (ix) for a given social fathers (N = 228); (x) female and (xi) male seasonal recruits as the total number of independent offspring produced (defined as offspring that survived until day 35; N = 126 males and N = 125 females); (xii) female (N = 409); and (xiii) male life span (N = 412). All measures of reproductive success in aviaries were based on genetically assigned parentage, including all dead embryos and all nestlings (see Ihle et al. 2015;Wang et al. 2017, Wang et al. 2020. For infertility, within-pair paternity, embryo and nestling mortality, we used raw data based on the fate of single eggs, while controlling for pseudo-replication by adding male and female identities as random effects in all models (see the "Statistics" section). Female clutch size was analyzed at the clutch level, controlling for female identity, because 94% of females produced multiple clutches. For fecundity, siring success and seasonal recruits, we used the data from individuals within a given breeding season (96% of females and 78% of males had multiple measures for fecundity and siring success, while for seasonal recruits, females and males were only measured once). For easy interpretation of the results, we scored all fitness-related traits in such a way that high trait values refer to better reproductive performance (multiplication by −1 where necessary).
The morphological and fitness-related traits are in general positively correlated within female and male zebra finches ( Fig.  S1 and Table S1).

STATISTICS
We estimated the effect of each potential stressor experienced either by the individual itself (direct effects), or by one of its parents or grandparents (condition transfer) on each trait in a separate model (6 stressors × 7 sources × 23 traits = 966 models). To estimate anticipatory effects, we analyzed each of the 23 performance traits as a function of parent-offspring similarity in nestling mass (once for the mother, once for the father) while statistically controlling for the direct effect of the individual's nestling mass (see the "Results" section; 2 sources × 23 traits = 46 models). We used mixed-effect models and animal models to control for the nonindependence of data points due to shared random effects including genetic relatedness. For animal models, we used the package "pedigreeMM" V0.3-3 (Vazquez et al. 2010) and for mixed-effect models we used "lme4" V1.1-23 (Bates et al. 2015) in R V4.0.0 (R Core Team 2020). The 95% CIs of estimated effect sizes were calculated using the "glht" function in the "multcomp" V1.4-13 R package while controlling for multiple testing (Hothorn et al. 2008), unless stated otherwise.
Morphological traits typically show high heritability, so we included the between-individual relatedness matrix (using pedigree information) as a random effect to control for the genetic relatedness of individuals. In contrast, fitness-related traits typically have low heritability (Pei et al. 2019), so we analyzed fitness-related traits in mixed-effect models while only controlling for repeated measurements from the same focal individual, parent, or grandparent. To compare and summarize the effects of the variables indicating early-life conditions on different traits, we Z-scaled all dependent and all predictor variables (stressors), assuming a Gaussian distribution.
Details on model structures (see Tables S2-S4 for all fixed effects), all scripts and underlying data are provided in the Open Science Framework at https://osf.io/wjg3q/ (Pei 2020a). In brief, for all morphological traits, we fitted sex (male and female), fostering experience (three levels: no cross-fostering, cross-fostered within or between populations), and inbreeding level (pedigreebased inbreeding coefficient, F ped , where outbred birds have F ped = 0 and full-sib matings produce birds with F ped = 0.25) as fixed effects. For models with beak color, wing, and tarsus length as the dependent variable, we also fitted the identity of the observer that measured the trait as a fixed effect to control for betweenobserver variation. We included the identity of the peer group in which the individual grew up as a random effect. We fitted individual identity twice in the random structure, once linked to the pedigree to control for relatedness between individuals and once to estimate the permanent environmental effect. Additionally, for models with body mass, beak color, wing, and tarsus length and fat score as the dependent variable, we included the identity of the batch of birds that were measured together as a random effect (group ID) to control for batch effects between measurement sessions.
For models of fitness-related traits, we controlled for individual age, inbreeding level (F ped ), the number of days the individual was allowed to breed (in aviaries), the sex ratio (i.e., the proportion of males), and pairing status (force-paired in cages or free-paired in aviaries) by including them as fixed effects, whenever applicable. Additionally, for egg-based models (male fertility, within-pair paternity, embryo and nestling survival), we controlled for clutch order and laying or hatching order of the egg that was laid/potentially sired by the focal female or male. For models on embryo and nestling survival, we also controlled for the inbreeding level of the offspring. In all models, we included individual identity, breeding season identity, clutch identity, identity of the partner of the focal individual and the pair identity, as appropriate.
We metasummarized effect sizes using the "lm" function in the R package "stats," whereby we weighted each effect size by the inverse of the standard error of the estimate to account for the uncertainty of each estimate. Intercepts were removed to estimate the mean of each category unless stated otherwise. First, we metasummarized the direct effect of each of the six stressors on the individual's own morphological versus fitness-related traits ("trait type," two levels). In this model, we fitted the pairwise combination of the trait type and the potential stressor as a fixed effect with 12 levels (Table S5). Second, we summarized the di-rect or transgenerational effects (from the individual, its parents and grandparents, seven levels, "stress experienced by a certain individual") of the most powerful proxy of developmental stress (nestling body mass at 8 days old; see the "Results" section) on the morphological versus fitness-related traits (two levels) of males and females (two levels, "sex"). In this model, we fitted the pairwise combination of stress experienced by a certain individual, trait type, and sex as a fixed effect with 28 levels ( Table S6). Third, we metasummarized the transgenerational anticipatory effect of the similarity between parent-offspring in their nestling mass (mother or father in combination with daughters or sons, four levels) on the offspring's morphological versus fitnessrelated traits (two levels). Here we included the pairwise combination of parent, offspring sex and trait type as a fixed effect with eight levels (Table S7).
Then, we metasummarized the overall transgenerational effects of condition transfer and anticipatory effects in two mixedeffect models using the "lmer" function in the R package "lme4" (Bates et al. 2015), where we weighted each estimate by the multiplicative inverse of its standard error to account for their level of uncertainty. To account for the nonindependence between response variables (see Fig. S1), we fitted a random effect that reflects their dependencies. For this purpose, we grouped all 23 performance traits based on their pairwise correlation coefficients (Table S1) into 11 categories (see Table S8). The fitted random effect groups the performance traits into 11 categories separately for each ancestor (22 levels for the parents and 44 levels for grandparents). We metasummarized the overall transgenerational effects of condition transfer of mass at day 8 experienced by the ancestors (parents and grandparents) on the traits of individuals, by only including an intercept (Table S8). Last, we metasummarized the overall transgenerational anticipatory effect of similarity between parent-offspring in their mass at day 8 on the traits of offspring, by only including an intercept (Table S9).
For visualization, we calculated the expected Z-values with 95% CIs from a normal distribution given the number of Z-values for each group of effects due to each stressor experienced by the focal individual, its mother, its father and its grandparents formulas as follows: expected Z-values as "qnorm(ppoints(N Zvalues))" (i.e., the integrated quantiles assuming a uniformly distributed probability of a given number of observations) and 95% CIs of the expected Z-values as "qnorm(qbeta(p = (1 ± CI)/2, shape1 = 1: N Z-values, shape2 = N Z-values:1))" (i.e., the integrated quantiles of quantiles of a uniformly distributed probability of a given number of observations from a beta distribution) in the R package "stats." We visually inspected the ZZ-plots for the expected versus observed Z-values dependent on the direction of the effects. Z-values larger than 1.96 were considered to be significant.

CONFIRMATORY ANALYSIS
For the confirmatory analysis, we used additional birds, including the remaining individuals from the main study population (referred to as "Seewiesen") whose maternal nestling mass was known (but information from grandparents was missing), as well as birds from two other captive populations with short pedigrees: "Krakow" (interbreeding between populations "Krakow" #11 in ) and "Seewiesen") and "Bielefeld" (wildderived in the late 1980s, #19 in Forstmeier et al. 2007). These datasets are of equal quality as the main dataset, but have shorter pedigrees. To replicate the tests that showed significant effects of maternal early condition and the similarity between motherdaughter early condition on daughter fecundity-related traits (see the "Results" section), we used the following samples: (i) female clutch size measured in cages (N = 156 "Seewiesen" and 30 "Krakow" females) or (ii) in aviaries (N = 84 "Seewiesen," 66 "Krakow," and 53 "Bielefeld" females); (iii) female fecundity, measured in aviaries (N = 31 "Seewiesen" females). We Zscaled nestling body mass at 8 days of age within each population before further analysis because birds in the recently wild-derived population "Bielefeld" were smaller compared to those of the domesticated "Seewiesen" and "Krakow" populations. We used the "lmer" function from the R package "lme4" to estimate the maternal nestling mass effect on daughters" fecundity-related traits. The same model structure was used as in the initial tests, but we additionally controlled for between-population differences by including the population where the female came from as a fixed effect. In the model of female fecundity, we removed the variable "number of days the female stayed in the experiment" (because there was no variation) and the random effect "female identity" (because each individual contributed only one data point).
We analyzed the animal model for each population separately, using the same model structure as in the initial test, using the R function "pedigreeMM" from package "PedigreemMM." In the confirmatory models for the Seewiesen population, we removed "author identity" because all birds were measured by the same person.
We metasummarized the effects of (1) maternal mass at 8 days old, (2) similarity of mother-daughter early condition on her daughters' fecundity-related traits, and (3) similarity of fatherdaughter early condition on his daughters' size (see the "Results" section) in a "lm" model, by fitting the pairwise combination of test (initial or confirmatory) and the three effects as a fixed effect with six levels and the multiplicative reverse of the standard error of each estimate as "weight" (Table S10).

Results
We examined the effects of six variables describing early developmental conditions (potential stressors) on 10 measures of morphology and on 13 aspects of reproductive performance, resulting in 138 predictor-outcome combinations (6 × 23 tests). We thus obtained 138 effect sizes for the direct effects (intragenerational ;  Table S2), 828 effect sizes for the intergenerational conditiontransfer effects (i.e., effects of the early-life experiences of the six ancestors: two parents and four grandparents, 6 × 138; Table  S2) and 46 effect sizes for the anticipatory effects (i.e., effects of similarity in nestling mass between mother and offspring and between father and offspring for 23 performance traits; Table S3).

VALIDATION OF STRESSORS USING DIRECT EFFECTS
Of the six putative indicators of early developmental conditions, only one measure had significant consequences for the adult individual (Fig. 1). Nestling body mass measured at 8 days of age affected both adult morphology (mean r = 0.229, 95% CI: 0.186-0.272, P < 0.0001) and reproductive performance (mean r = 0.070, 95% CI: 0.021-0.119, P < 0.0001; Fig. 1 and Table  S5). The mass of nestlings when 8 days old varied by a factor of 10 (range: 1.2-12.4 g, N = 3525 nestlings), and light-weight nestlings had a clearly reduced chance of survival to adulthood (see Fig. S2). Among the survivors (N = 3326) and among those individuals included in the analyses of direct and transgenerational effects (N = 2099), mass at day 8 still varied by a factor of 7 (range: 1.7-12.4 g).
Other indicators of developmental conditions, despite being widely used as proxies in the published literature, had little direct effect on the individual later in life. Therefore, in the following analyses we only use nestling body mass at day 8 as the proxy of early-life condition of parents and grandparents to assess the strength of the two types of transgenerational effects. Note that nestling body mass is a measure of an outcome of stress rather than the cause of stress. In contrast, the other five variables represent causes rather than outcomes of stress. However, none of them shows direct effects (Fig. 1), so there seems little scope for detecting transgenerational effects. Nevertheless, Table S2 lists all transgenerational effects (mean effect size r = −0.003, N = 828 effects).

CONDITION TRANSFER
We did not find any evidence for a transgenerational effect of nestling mass either of the parents or of the grandparents on the adult offspring (mean estimate of 138 transgenerational effects after accounting for some level of non-independence between the response variables r = −0.007, 95% CI: −0.017 to 0.002; Table S8; see also Fig. 2B and D, and Table S6). Among

Four out of the six indicators of early-life conditions were multiplied by −1 (indicated by (−)) such that positive effect sizes reflect better performance under supposedly better conditions. Morphological and fitness-related traits as well as indicators of early-life conditions were Z-transformed to yield effect sizes in the form of Pearson correlation coefficients.
the many correlations examined, only one was significant: the nestling mass of the mother correlated positively with the reproductive performance (clutch size in cages and aviaries, fecundity in aviaries, embryo survival, nestling survival, seasonal recruits, and life span; Table S2) of her daughters (mean r = 0.071, 95% CI: 0.024-0.118, P = 0.003, without correction for multiple testing; Table S6; also see Fig. 2D). This finding was mostly driven by a large positive effect of maternal early growth on daughter fecundity (Fig. 3F), which was even larger than the direct effect of the daughters' own nestling mass at day 8 (Fig. 3E). For all other dependent traits that were influenced by nestling mass, the direct effects ( Fig. 3A and C) exceeded the indirect maternal effects ( Fig. 3B and D).
The direct effects of nestling mass on the individual's adult traits are clearly stronger than expected under a random distribution of effect sizes ( Fig. 2E; see also Fig. 2A and C). In contrast, the positive effects of the early-life condition of the mother (Fig. 2F) are not much stronger than the presumably coincidental negative effects (opposite to expectations) of the early-life condition of the father and the grandparents (Fig. 2G and H; see also Fig. S3). The two significant maternal effects (upper right corner in Fig. 2F) are those on daughter fecundity (see Fig. 3F) and on daughter clutch size (r = 0.103, 95% CI: 0.025-0.181, P = 0.01 without correction for multiple testing; Table S2). These findings are not independent because clutch size and fecundity are strongly correlated (r = 0.71, N = 230 females; Fig. S1 and Table S1), partially due to the fact that they are measured in the same breeding season (N = 183 females).

ANTICIPATORY EFFECTS
Offspring performed significantly better when growing up under similar conditions as their parents (similarity in mass at day 8), but the effect size was small (mean estimate of 46 transgenerational effects after accounting for some level of nonindependence between the response variables r = 0.028, 95% CI: 0.016-0.040; Table S9; see also Fig. 4 and Table S7). This was mainly driven by the positive effects of (1) father-daughter similarity on the daughters' size (i.e., tarsus and wing length) and (2) mother-daughter similarity on the daughters' fitness-related traits ( Fig. 4 and Table S7; see also Table S3).

CONFIRMATORY TESTS ON INDEPENDENT DATA
To independently verify the strongest and most plausible findings of (1) condition transfer from the mother affecting daughter fecundity, (2) anticipatory effects of similarity between mother and daughter in their nestling body mass on daughter fecundity, and (3)

. Transgenerational condition-transfer effects of early developmental conditions (measured as nestling body mass at 8 days of age). (A-D) Average magnitude of condition-transfer effects from six types of ancestors (B, D) in comparison to the direct effects of the experience of the individual itself (A, C) on morphological (mean of five traits; A, B) and fitness-related traits (mean of six or seven traits; C, D) for individual females (red) and males (blue; Table S6). Error bars show two types of 95% CIs: thick lines refer to the single estimate and thin lines are Bonferroni adjusted for conducting 28 tests (figure-wide significance among A-D). Indicated P-values refer to each average effect estimate without correction for multiple testing. For further explanations see legend of Figure 1. (E-H) ZZ-plots of expected versus observed Z-values of the effects of early developmental conditions (mass at 8 days) experienced by the focal individual itself (E), its mother (F), its father (G) and its four grandparents (H) on 10 morphological and 13 fitness-related traits. N indicates the number of tests. Red indicates that the sign of the estimate is in the expected direction, blue indicates that the sign is in the opposite direction. Lines of identity (where observation equals prediction) and their 95% CIs are shown.
daughter body size, we examined an independent dataset (comprising data from several populations, see the "Methods" section for details). All effect sizes of the confirmatory analysis are listed in Table S10. For all three tests, the initial effect size was clearly larger than the independent verification effect size (exploratory vs. confirmatory, detailed in Fig. 5 and Table S10) and, apart from the effect of father-daughter similarity on daughter tarsus length in one of the three populations, none of the confirmatory tests was significant.

Discussion
Our study supports the general idea that individuals are resilient to stress and particularly to stress experienced by ancestors. Even though individuals differed sevenfold in body mass when 8 days old, nestling mass only had small effects on morphology and reproductive success later in life. Our results clearly reject the hypothesis of condition transfer between generations, in line with the idea that selection acts against transmitting a handicap to the next generation. We found some evidence for transgenerational anticipatory effects, but the mean effect was small (r = 0.028), and did not hold up in an independent confirmatory test ( Fig. 5B and C). These mixed results indicate that, in our study, the effect size for transgenerational anticipatory effects must be exceedingly small (Uller et al. 2013;Horsthemke 2018).
In conclusion, transgenerational effects were absent or miniscule, and direct effects on fitness traits were relatively small given that some of the offspring were seriously undernourished. Thus, at least in this study system, the notion of organismal robustness seems more noteworthy than the claim of sensitivity to early-life conditions within and across generations. Nevertheless, the latter dominates both the literature with studies focused on zebra finches (e.g., Naguib and Gil 2005;Monaghan 2008;Tschirren et al. 2009;Krause and Naguib 2014;Khan et al. 2016;Wilson et al. 2019) and the broader literature (e.g., Pembrey et al. 2006;Marshall and Uller 2007;Uller et al. 2013;Engqvist and Reinhold 2016;Zizzari et al. 2016). This begs the question whether the underrepresentation of studies emphasizing "robustness" in the literature is the result of the predominant framework of hypothesis testing, where the rejection of the null hypothesis is almost a pre-condition of getting published (Greenwald 1975).
We found that direct intragenerational effects of early environment on morphology were of moderate magnitude while effects on fitness-related traits were small, which is largely in line with previous findings (Tschirren et al. 2009;Eyck et al. 2019). Regarding transgenerational effects of early stress, we examined the existing zebra finch literature that is mostly based on captive birds (Naguib and Gil 2005;Naguib et al. 2006;Alonso-Alvarez et al. 2007;Krause and Naguib 2014;Khan et al. 2016;Wilson et al. 2019) and found that studies typically report a large number of tests (median number of discussed combinations of stressors, traits and sex: 18, range: 7-150). Only 15% of all tests were statistically significant, which is not far from the random expectation, especially if some nonsignificant findings were not reported. Additionally, an experimental study on zebra finches found no transgenerational anticipatory effect (Krause and Naguib 2014) and a meta-analysis of studies on plants and animals found no effect of transgenerational condition transfer (Uller et al. 2013). Given the small (expected) effect sizes, we argue that transgenerational effects can sensibly only be studied within a framework that ensures a comprehensive reporting of all effect sizes and a meta-analytic summary of these effects. Focus on a subset of tests (e.g., those that are significant) leads to bias, but selective attention may be advisable in two situations. Firstly, when there is an independent selection criterion. For example, we limited our analysis of transgenerational effects to those involving only the most powerful indicator of early developmental conditions. In this case, the selection criterion (magnitude of direct effects; Fig. 1) was established independently of the outcome variable (magnitude of transgenerational effects). Second, when there is an independent dataset. For example, we selected the largest transgenerational effects from a first dataset, and assessed them independently using the second dataset (Fig. 5). Consistent with the phenomenon of the winner's curse , we found that selective attention to large effects yields inflated effect size estimates compared to the independent replication.
Selective attention to large effects makes the published effect size estimates unreliable. Thus, we propose to base conclusions on meta-analytic averages of all effect sizes that have been judged worth of investigation before any results were obtained. With this approach we shift our attention from identifying the supposedly best predictor and best response towards the quantification of the magnitude of an average predictor on an average response. Clearly, the latter is more reliable than the former, just as the average of many numbers is more robust than the maximum. Accordingly, the meta-analytic summary yields narrow confidence intervals (CIs) around the estimated mean effect size. Note, however, that the estimated 95% CI might be somewhat anticonservative (i.e., too narrow), because the summarized effect sizes are not fully independent of each other (multiple response variables are correlated; see Fig. S1 and Table S1). In the cases where we   . Daughter fecundity-related and size-related traits, mother's mass at 8 days old, and similarity between mother-daughter and father-daughter in nestling mass were Z-transformed to yield effect sizes in the form of Pearson correlation coefficients. Tarsus and wing length were analyzed by population due to the between-population difference in body size (C), where "S," "K," and "B" refer to populations "Seewiesen," "Krakow," and "Bielefeld." For additional details, see Table S10.
summarize a large number of effect sizes (138 estimates in Table S8 and 44 estimates in Table S9) we fitted a random effect that controls for some of this nonindependence, and this led to CIs that are about 20% wider (compared to dropping the random effect). This approach of modeling and quantifying the degree of nonindependence cannot be applied when summarizing only few effect size estimates (between 2 and 13 estimates in Figs. 1, 2, 4, and 5), meaning that the indicated CIs will be somewhat too narrow. In our study, five of six putative indicators of early developmental stress had little or no direct effect on an individual's morphology and fitness later in life (Fig. 1). This suggests that it is not worth to examine these traits for transgenerational effects (Jablonka and Raz 2009; see also Fig. S3), unless one can plausibly assume that some indicators of early stress cause direct effects, while others cause transgenerational effects. This differs from previous studies that showed various direct effects, but did not metasummarize all examined effects, for example, of brood size (Naguib et al. 2004, Naguib et al. 2008Tschirren et al. 2009), laying order (Gorman and Nager 2004;Soma et al. 2007) and hatching order (Wilson et al. 2019). In contrast to the other five variables, nestling mass (8 days old) was clearly associated with both nestling survival (Fig. S2) and adult performance. However, its strongest effect was on morphology (highest r = 0.41; Fig. 3A and Table S2), which is somewhat trivial. Food shortage during the developmental period reduces growth and this in turn affects body size later in life (Bolund et al. 2010). Because body size per se has little direct causal effect on fitness in zebra finches (Bolund et al. 2011), more complete developmental canalization for size-related traits may not have evolved. Indeed, despite large variation in mass at day 8 (1.7-12 g), the effect of nestling mass on reproductive performance and life span was weak, suggesting that fitness is remarkably resilient to variation in early-life conditions (Waddigton 1942;Drummond and Ancona 2015).
Note that our study is nonexperimental and on captive individuals. The latter implies that individuals were kept in a safe environment with ad libitum access to food (but with intense social interactions including competition for mates and nest sites). Direct and transgenerational effects on reproductive performance traits may be different in free-living populations, where individuals live and reproduce under potentially more stressful environmental conditions. Additionally, our dataset was not ideal to test "anticipatory parental effects." This hypothesis predicts that offspring have higher fitness when the offspring environment matches the parental environment (e.g., Uller et al. 2013;Engqvist and Reinhold 2016). In an ideal experiment, one would manipulate the parents' and the offspring's breeding environments in a fully factorial design and examine the effects of matching versus mismatching on offspring performance (Monaghan 2008;Uller et al. 2013;Reinhold 2016, 2018). Our study only uses observational data and only regarding the similarity of the early growth environments (but not breeding environments). However, a meta-analysis of experimental studies on plants and animals only found a weak trend for small beneficial anticipatory parental effects (effect size d = 0.186, highest posterior density: −0.030, 0.393) (Uller et al. 2013). Experimental studies are better suited to test causality, but when analyses of observational data suggest no effect, experiments may not provide much insight (note that the 95% CI for the mean effect excluded all biologically relevant effect sizes, for example, the estimated condition-transfer effects ranged from −0.017 to 0.002, and anticipatory effects ranged from 0.016 to 0.040). Our approach had the advantage that we could make use of the entire range of ob-served growth conditions (sevenfold difference in mass at day 8), while experimental studies often only induce a 10-15% difference in nestling mass between treatment groups (because ethical concerns prohibit strong treatments; Naguib et al. 2004;Bolund et al. 2010). This then requires much larger sample sizes to detect similar phenotypic effects. Our additional confirmatory datasets had smaller sample sizes than the initial dataset (Fig. 5), and the data were more heterogeneous because they included individuals from different populations that differ in genetic background, body size, and domestication history.
Zebra finches in the wild might breed multiple times across a broad range of conditions (Zann 1996). One might thus question the suitability of zebra finches as a good model for testing "anticipatory parental effects." Hence, the biological conclusions of our study should be taken with caution because our findings on zebra finches might not be representative for a broad range of organisms. In contrast, the meta-analytical method we propose here can be broadly applied-as an alternative to or in combination with preregistration-to ensure that effect sizes are not inflated. Biased reporting presumably occurs in most disciplines, and such biases could explain the discrepancy between our findings and the conclusions of the existing zebra finch literature on early-life and transgenerational effects.
In summary, for future studies on transgenerational effects, we suggest an approach that renders multiple testing a strength rather than a burden and that consists of four simple steps: (i) start with clear, one-tailed hypotheses (Ruxton and Neuhäuser 2010); (ii) validation by assessing the direct effects (Fig. 1); (iii) meta-analysis of all effects (Fig. 2) and-if feasible-(iv) verify the effects with an independent confirmatory dataset (Fig. 5). Using this approach, our study shows convincing evidence for small direct effects, and-at best-weak evidence for small transgenerational effects on morphology and fitness. Hence, our study supports the null hypothesis that selection buffers individual fitness against detrimental epigenetic effects, such that the detrimental effects due to stress experienced early in life by the ancestors are not carried on across generations (Waddigton 1942;Hallgrímsson et al. 2002).

AUTHOR CONTRIBUTIONS
WF and BK designed the study. WF collected the morphological data. YP and WF analyzed the data and interpreted the results with input from BK. YP and WF wrote the manuscript with help from BK.