Animal vocalizations play an important role in individual recognition, kin recognition, species recognition, and sexual selection. Despite much work in these fields done on birds virtually nothing is known about the heritability of vocal traits in birds. Here, we study a captive population of more than 800 zebra finches (Taeniopygia guttata) with regard to the quantitative genetics of call and song characteristics. We find very high heritabilities in nonlearned female call traits and considerably lower heritabilities in male call and song traits, which are learned from a tutor and hence show much greater environmental variance than innate vocalizations. In both sexes, we found significant heritabilities in several traits such as mean frequency and measures of timbre, which reflect morphological characteristics of the vocal tract. These traits also showed significant genetic correlations with body size, as well as positive genetic correlations between the sexes, supporting a scenario of honest signaling of body size through genetic pleiotropy (“index signal”). In contrast to such morphology-related voice characteristics, classical song features such as repertoire size or song length showed very low heritabilities. Hence, these traits that are often suspected to be sexually selected would hardly respond to current directional selection.

Animal vocalizations can function as cues for the recognition of particular individuals, kin versus nonkin, or conspecifics versus heterospecifics. Moreover, vocalizations often play a major role in male–male competition and female choice. To fully understand the function and evolutionary implications of vocalizations in each of these contexts, it is important to know the sources of variation in vocal traits. However, the quantitative genetics of vocal traits have been studied only in a few taxa, predominantly in insects (Butlin and Hewitt 1986; Webb and Roff 1992; Aspi and Hoikkala 1993; Ritchie and Kyriacou 1994; DeWinter 1995; Ritchie and Kyriacou 1996; Mousseau and Howard 1998; Collins et al. 1999), but also for instance in mice (Thornton et al. 2005). Noticeably, rather little is known about the genetics of bird vocalizations, a group that has been studied very intensely in terms of vocal communication. From studies of hybridization and interspecific cross-fostering it is clear that many vocal traits show a hereditary basis (Baptista 1996; Derégnaucourt et al. 2001). However, very few studies have estimated the heritability of vocal traits within a population (Zann 1985; Baker and Bailey 1987; Medvin et al. 1992). Songbirds are particularly interesting to study in that respect, because their vocal repertoire comprises innate calls, as well as learned calls and songs. Vocal production learning (Janik and Slater 2000) occurs in a fairly limited set of taxonomic groups (humans, cetaceans, seals, bats, songbirds, hummingbirds, and parrots; Brainard and Doupe 2002), and accordingly, genetic aspects of learned vocalizations have hardly been studied at all (songbirds: Mundinger 1995; Wright et al. 2004; Haesler et al. 2007; humans: Sataloff 1995; Debruyne et al. 2002).

Acoustic parameters that are commonly used to describe vocalizations may show a hereditary basis because they partly reflect heritable morphological, physiological, and neurological aspects of the sound production device (e.g., Davies and Halliday 1978; Kyriacou and Hall 1980; Fitch and Hauser 1995; Rendall et al. 2005; Reby et al. 2005; Cynx et al. 2005; Pfefferle and Fischer 2006). This is particularly evident in fairly stereotypic, nonlearned vocalizations. The frequency of vibration of the sound source is a function of its morphology and elasticity, the sound amplitude is a function of the pressure generated, the timbre (distribution of amplitudes over frequencies) depends on the resonance properties of the vocal tract, and the rhythmical timing of vocalizations depends on the velocity of the underlying neurological processes as well as on morphological properties of the respiratory system (Fitch and Hauser 2002; Suthers and Zollinger 2004; Goller and Cooper 2004). Theoretically, each of these properties could facilitate individual recognition, depending on the relative amount of within-individual versus between-individual variation (individual repeatability). The task of individual recognition by acoustic traits becomes more challenging when vocal production learning adds variability that is specific to what we call “text” in humans or something like “syllables” in birds. Humans can recognize “voices” independently of the text that is spoken, and the same ability has been demonstrated at least for one bird species, the great tit (Dhont and Lambrechts 1992; Weary and Krebs 1992; Lambrechts and Dhont 1995; Blumenrath et al. 2007; but see Beecher et al. 1994). Humans recognize voices primarily by their timbre, while for birds we only know that they have similar abilities as humans to detect variation in timbre (Cynx et al. 1990; Lohr and Dooling 1998).

Besides the possibility to study the genetics of learned vocalizations from such a perspective of individually distinct voices, there is also a largely unexplored possibility of heritable learning biases. An individual that is given a wide choice of syllables that it could learn, may, by genetic predisposition, preferentially learn syllables with a certain characteristics but not others. Whether such learning biases exist within a single population has never been studied, but there is striking evidence from strains of canaries that have been selected for vocal traits. When given a choice, so-called Roller canaries selectively learn Roller song whereas Border canaries selectively learn Border song (Mundinger 1995). Moreover, Belgian Waterslager canaries preferentially learn songs with more energy at low frequencies as compared to wild-type canaries (Wright et al. 2004). This preference results from a loss-of-function mutation located on the Z chromosome, leading to a hearing deficit at high frequencies (Gleich et al. 1997; Wright et al. 2004). This hearing deficit results in the preferential learning of low-frequency songs, leading to a coevolution of genes with culturally inherited traits (Lachlan and Feldman 2003). These examples from artificially selected bird strains strengthen the notion that genetic differences in song learning preferences between species probably contribute substantially to species differences in learned songs (Marler and Sherman 1985; Baptista 1996) and hence to the reproductive isolation of species. More knowledge about such learning biases would greatly enhance our understanding of the processes involved in speciation (see e.g., Edelaar 2008).

In the present study, we focus on the quantitative genetics of learned versus nonlearned vocalizations in the zebra finch (Taeniopygia guttata). By estimating heritabilities and genetic correlations, we aim to contribute to a better understanding of acoustic traits that may be important in species recognition, kin recognition, individual recognition, and sexual selection. The zebra finch is an interesting study species in that respect because it has both learned and nonlearned vocalizations and these are known to be important in individual recognition (see Miller 1979; Zann 1984; Vignal et al. 2004) and have also been suggested to play a role in kin recognition (Zann 1985, 1997) and in sexual selection (Collins 1999; Neubauer 1999; Spencer et al. 2005; Zann and Cash 2008). Male and female zebra finches produce more or less individually distinct “distance calls,” which seem to function primarily in keeping acoustic contact with other individuals especially over longer distances (Zann 1996). This call is innate in both sexes, but males “overwrite” their inherited call during puberty with a call that is learned from a tutor (the father or another male). In the absence of a tutor, males will stick to their inherited female-like call (Price 1979; Zann 1985), and normally tutored males can be reverted back to their inherited call by specific brain lesions or transsection of nerves important in the production of learned vocalizations (Simpson and Vicario 1990). Presumably due to the added flexibility enabled by learning, male distance calls are much more individually distinct than the innate female distance calls (Zann 1984). In analogy to text in humans, this would mean that all females utter the same or almost same piece of text (e.g., “female”), but with their individually distinct voices, whereas males use different texts (e.g., “John,”“Peter,” etc.). A young male would adopt the text from his tutor, but render it in his distinct voice. Besides learning their distance call, male zebra finches also learn their song from a tutor (female zebra finches do not sing). Songs are also individually specific, but are more complex than distance calls because they consist of several syllables that are rendered in a stereotypic order. Young males often learn their entire song from one tutor, but deviations from that are also very common (Zann 1990; Slater and Mann 1991). Human observers can easily learn to recognize males individually by their call or song, probably with the exception of well-matching tutor–pupil pairs (personal observations).

Zann (1985) studied the within-family resemblance of distance calls of cross-fostered male and female zebra finches. He found a significant resemblance in the duration of the distance call between eight daughters and their seven mothers, and, fairly surprisingly, between 11 sons and their eight genetic fathers. The latter is surprising because the cross-fostered males have not learned their calls from their genetic father. Hence the finding might indicate either a genetic predisposition to render a learned syllable faster or slower, or to preferentially learn from a tutor whose call duration matches the own genetic predisposition (like the preferential learning in canary strains). However, given the small sample size, Zann's (1985) finding seems not as convincing, and hence the phenomenon was apparently not investigated any further. Instead, articles on kin recognition in zebra finches adopted the view (Burley et al. 1990; Zann 1997) that the significant heritability of call characteristics in females would allow kin recognition even independently of learning, whereas in males acoustic kin recognition would work only if the young males learned their vocalizations from their genetic father.

Our primary aim is to study the heritability and genetic architecture of zebra finch vocalizations. This has major implications for at least five fields of research:

  • 1Evolution of vocal traits. Whether different vocal characteristics such as pitch, duration, or amplitude can evolve independently of each other, depends on the extent to which these traits are affected by the same sets of genes (pleiotropy), that is by the structure of the genetic variance–covariance matrix. These so-called G-matrices have been studied for instance in insects (e.g., Blows et al. 2004), but it is unknown to what extent the evolution of vocal learning will alter G-matrices. The zebra finch offers a good opportunity to study how conserved G-matrices are when comparing between the sexes and between the different vocalizations (innate female calls, learned male calls, and learned male songs).
  • 2Sexual selection on song traits. In many passerine species song complexity (measured as repertoire size) seems to be under directional selection by female choice (Searcy and Yasukawa 1996), and the same has been argued for the zebra finch (Neubauer 1999; Spencer et al. 2005). Airey and DeVoogd (2000) claimed that repertoire size in zebra finches depends on HVC size (higher vocal center in the brain), which was found to be heritable (Airey et al. 2000). However, a direct estimate of the heritability of repertoire size has never been made for any species. Such an estimate would be needed to predict whether repertoire size would respond to directional selection.
  • 3Honest signaling of body size. Larger individuals tend to vocalize at lower sound frequencies (Morton 1977; Mager et al. 2007; Hardouin et al. 2007). It is debated to what extent the honesty of this size indicator is maintained through morphological constraints (“index signal”) or through condition dependence of costly signals (“handicap signal”; Fitch and Hauser 2002). We address this question for the first time by examining the genetic versus environmental correlations between frequency characteristics and body size.
  • 4Kin recognition. Call traits that are genetically correlated between the sexes could potentially function in kin recognition, independent of learning (Burger 2006; Schielzeth et al. 2008).
  • 5Evolutionary genetics of vocal learning. The zebra finch is the model species for the study of the genetic and neurobiological mechanisms of vocal learning. These studies generally assume that vocal learning takes place in male but not in female zebra finches. Although vocal learning in males is beyond any doubt, the absence of vocal learning in females has never been demonstrated convincingly (Zann 1985). We examine this issue by comparing female distance calls to those of their foster mothers as well as among unrelated females reared together.

As to the measurement of vocal characteristics we chose three different approaches, also with the purpose to allow comparison among them.

  • 1Bird calls have traditionally been studied by describing patterns visible in sonograms, that is call durations and sound frequencies and changes in frequencies over time. We take the same approach using the software package Sound Analysis Pro (Tchernichovski et al. 2004).
  • 2Recently, bird researchers (Trawicki et al. 2005; Fox et al. 2006) have begun to study the individuality of bird voices by adopting techniques originally designed for text-independent speaker recognition in humans. These techniques are based on the idea that the frequency characteristics of a sound signal depend (a) on the vibration frequency of the sound source (here the syringeal labia), and (b) on the filtering or resonance properties of the vocal tract (Fant 1960; Titze 1994). These filtering properties lead to a characteristic distribution of sound amplitudes over the frequency spectrum (i.e., timbre). This distribution of energy is quantified by so-called cepstral coefficients, which are transformed according to the mel scale to account for the nonlinear human perception of frequencies. These mel-frequency cepstral coefficients (MFCCs) reflect the resonance properties of the vocal tract and hence its length, shape, and tissue structure independent of the sound source. Birds can change the resonance properties by movements of the vocal tract (including the beak; Goller and Cooper 2004), and accordingly it has been found that timbre is largely learned from the tutor (Williams et al. 1989). However, active modification of timbre should only be possible within a certain range set by the individually distinct morphology of the vocal tract. A few studies on nonhuman mammals (e.g., Reby et al. 2006) and birds (e.g., Fox et al. 2006) indicate the usefulness of MFCCs for individual recognition.
  • 3Bird song, which typically is more complex than calls, has traditionally been described in terms of its gross structure. We do this by looking at classical features such as the duration of the song, the number of song syllables produced per second, and the number of distinct syllable types included in a male's repertoire. These are the traits that have most frequently been studied in relation to sexual selection (Gil and Gahr 2002).

With the first set of features, we hope to mirror primarily the underlying neurophysiology (timing), syrinx morphology (frequencies), and air sac pressure (amplitude). The MFCCs, in contrast, should primarily reflect timbre and hence the vocal tract morphology. Finally, the overall structure of the song might be most indicative of any learning biases (preferential learning of more or less complex songs).



The present study was conducted on a large captive population of zebra finches maintained at the Max Planck Institute for Ornithology in Seewiesen, Germany. This population consists of three successive generations: the parental generation (initially 231 individuals) that came from a population at the University of Sheffield, UK (breeding described in Forstmeier et al. 2004); the F1 generation (initially 309 individuals) produced by 50 different pairs of the parental generation (breeding described in Forstmeier 2005); the F2 generation that comprised 415 offspring from F1–F1 pairings and 111 offspring from Parental–F1 pairings (from a total of 96 mothers and 91 fathers). From this pool of 1066 birds, 810 individuals were still alive at the time of recording of vocalizations, and 808 of them (429 males and 379 females) could be recorded and are included in this study.

The rearing conditions for these 808 birds were the following: all but 25 individuals (3%) were cross-fostered at the early egg stage, and hence were reared to independence by randomly selected foster parents. Mean brood size was 3.4 offspring surviving to fledging. The majority of the 808 birds (88%) were reared by pairs housed in individual cages in rooms containing about 30–70 breeding pairs. The other 12% were reared by pairs in aviaries holding six breeding pairs within rooms containing about 40 breeding pairs. Aviary versus cage rearing did not seem to affect the vocal traits (described further below) in a strong and consistent manner (multivariate analysis of variance [MANOVAs] controlling for mother identity to account for most of the genetic nonindependence: 16 female call traits: Wilk's Lambda = 0.94, F16,229= 0.93, P= 0.53; 16 male call traits: Wilk's Lambda = 0.95, F16,258= 0.81, P= 0.67; 25 male song traits: Wilk's Lambda = 0.91, F25,227= 0.87, P= 0.65). At independence (35 days of age) birds were transferred to peer groups of varying sizes and sex composition, being housed in either large cages or aviaries. Altogether there were 76 peer groups: 27 unisex male (mean group size: 14.4, range: 3–75), 25 unisex female (mean group size: 15.0, range: 6–47), and 24 mixed-sex with equal sex ratio (mean group size: 12.2, range: 4–36). Vocal traits seemed neither strongly affected by variation in peer group size nor by uni- vs. mixed-sex rearing (multivariate analysis of covariance [MANCOVAs] on peer group means: group size: female calls: Wilk's Lambda = 0.64, F16,31= 1.08, P= 0.41; male calls: Wilk's Lambda = 0.83, F16,33= 0.44, P= 0.96; male songs: Wilk's Lambda = 0.49, F25,24= 1.01, P= 0.49; uni- vs. mixed-sex rearing: female calls: Wilk's Lambda = 0.58, F16,31= 1.37, P= 0.22; male calls: Wilk's Lambda = 0.56, F16,33= 1.60, P= 0.13; male songs: Wilk's Lambda = 0.28, F25,24= 2.46, P= 0.015). The statistical significance of mixed-sex rearing on male song was primarily due to the data showing higher frequency modulation and lower VB5 (explained below) in mixed-sex peer groups. However, according to mixed-effect models controlling for mother identity (150 levels; to account for genetic nonindependence) and peer group identity (51 levels) using the lme4 package of R 2.7 (Bates et al. 2008), the effect of mixed-sex rearing seemed not very strong (frequency modulation: t411= 1.91, P= 0.057, effect size d= 0.30; VB5: t411=−2.69, P= 0.007, d= 0.36). Hence these factors were not specifically considered any further but rather treated as part of the normal variation found between peer groups. After reaching maturity (90–160 days), when vocal production learning is supposed to be completed (Brainard and Doupe 2002), birds were also kept in various social situations (breeding pairs or unisex and mixed-sex groups), but again the social context before recording did not seem to have any consistent effects on vocal traits (details not shown).

We measured body mass of adult birds on several occasions (on average 2.5 times per individual) as well as tarsus length once or twice per individual. We use average body mass and tarsus length for individuals in further analyses.


For recording, birds were put into one of five identically built and equipped sound-attenuated chambers measuring 70 cm × 50 cm and 50 cm high from inside. Birds were placed inside a metal wire cage containing food and fresh water. Cages were equipped with three plastic perches, that were all placed at a distance approximately 35 cm from the microphone (Behringer condenser microphone C-2, Behringer International GmbH, Willich, Germany), which was mounted at a 45° angle between the ceiling and the right side wall. The microphone was connected to a PR8E amplifier (SM Pro Audio, Melbourne, Australia) from which we recorded directly through a M-Audio Delta 44 (AVID Technology GmbH, Hallbergmoos, Germany) sound card onto the hard drive of a computer at a sampling rate of 44 kHz and 16 bit amplitude resolution using Sound Analysis Pro version 2.063 (Tchernichovski et al. 2004). For recording we used all the default settings of the program, except that we selected the minimum duration of vocalizations to be 77 ms (the shortest distance call we found was 90 ms).

To record distance calls, birds were put into the chamber singly for a few hours. Most individuals started producing loud distance calls within the first hour, but some individuals required repeated sessions and longer time periods for acclimation. Of 430 males and 380 females, all but three males and one female started calling eventually (after 2–3 days of acclimation the latest). To record the directed song of males (which is sung during courtship of females; Zann 1996), we added a female to their cage inside the recording chamber. Most males started singing to the female within the first minute, but again some required repeated attempts or longer time periods. In this way we managed to record the songs of 413 males (96% of all males), which even included two of the three males that did not produce any distance calls.

Note that, with this setup, the measurement of amplitude is problematic, because it will depend on the position of the bird relative to the microphone and on the distance between the male and the female (because males adjust song amplitude to the distance from the female; Brumm and Slater 2006). Hence, heritability estimates for amplitude may partly reflect heritable differences in the choice of where to sit.

From the large number of calls recorded for every individual we manually selected about 10–20 loud (mean amplitude > 33 db at about 35 cm) high-quality recordings (without noise caused by bird movements). For 97% of the individuals we were able to obtain a minimum of eight calls, the remainder included eight individuals with only one to four calls. From the song recordings we selected only two representative and high-quality motifs per individual, which is the more or less stereotypically repeated part of a male's song (see Fig. 1C). Whenever males regularly (in more than about 20% of motifs) varied the syllable composition of their motif (27.5% of males) we included one recording of each variant. If there were even more than two variants, we picked the two most frequently used ones. For five of 413 males (1.2%) we only managed to obtain a single motif of good quality.

Figure 1.

Sound spectrograms (frequency over time) of (A) distance calls of four females, (B) distance calls of five males, (C) the repeated unit (motif) of a male's song. The shown motif contains five syllables (delineated by black bars) according to automatic segmentation by SAP with an amplitude threshold of 25 db and entropy < −2.2.


We used three different approaches to extract call and song characteristics. First, we used the program Sound Analysis Pro (SAP; Tchernichovski et al. 2004; freely available at:, written specifically for the analysis of zebra finch vocalizations. Second, we analyzed all sound recordings using the program Voicebox (Speech Processing Toolbox for MATLAB, written by M. Brookes, Imperial College, UK; to extract MFCCs. Third, for all song recordings we extracted the most frequently studied zebra finch song parameters by visual inspection of sound spectrograms, partly aided by SAP.

The start and end point of all calls and all song syllables was automatically delineated by SAP using a 25 db threshold for amplitude and a −2.2 threshold for entropy. The results of this automatic segmentation of song syllables can be seen in Figure 1C. Note that some authors have used manual segmentation (e.g., subdividing the third syllable in Fig. 1C into three syllables), which may lead to different estimates for mean syllable duration and repertoire size (see Table S1). Automatic segmentation yielded some (5.7%) very short fragments that we excluded from the analyses (mean duration: 11 ms, range: 6–25 ms, N= 233) because their occurrence in a male's song seemed very irregular as opposed to the remaining longer syllables (mean duration: 142 ms, range: 29–791 ms, N= 3852). Also such short notes could not be analyzed for MFCCs (see below). The delineation of female distance calls was always unambiguous, whereas approximately 40% of males sometimes or regularly produced double calls (two identical calls with a short pause in between). We treated these as two independent calls. However, in a few cases (4%; N= 326) the pause was missing, such that the two calls were joined to one. Such joint calls were treated as one call, leading to an increased within-individual variation in call duration. Moreover, male distance calls were sometimes preceded or followed by so-called “short calls” or “tet calls” (Zann 1996). These were always omitted from the analyses.

After segmentation we automatically extracted eight parameters using SAP for every call and song syllable (listed in Table 1). These are the eight parameters extracted by SAP version 1.02 that describe mean trait values for a given syllable (rather than minima, maxima, or variances found over the course of a syllable). Later versions of SAP, such as the version 2.063, which we actually used for extraction, have extended this list to a total of 33 parameters (including minima, maxima, and variances). We decided to stick to the eight “traditional” parameters (1) for practical reasons such as the concise presentation of results, and (2) because first tests showed either low individual repeatabilities of these parameters or redundancy (high correlation coefficients) with the traditional ones.

Table 1.  Call and song characteristics measured using Sound Analysis Pro (SAP) and Voicebox (VB).
Duration (ms)Duration of calls or song syllables as delineated by SAP (thresholds: amplitude > 25 db, entropy < −2.2)
Mean pitch (Hz)For harmonic sounds: fundamental frequency; otherwise: mean frequency
Frequency modulationAn estimate of the absolute slope of frequency traces
EntropyWiener entropy measures the width and uniformity of a power spectrum on a logarithmic scale: white noise corresponds to 0, a pure tone to minus infinity
Pitch goodnessMeasures the “pureness” of a harmonic stack (similar to the harmonic-to-noise ratio used in other studies)
Mean frequency (Hz)A smooth estimate of the center of derivative power
Amplitude modulationChanges in the amplitude envelope per unit of time
Amplitude (db)Mean amplitude over the duration of a syllable
Voicebox1The first mel-frequency cepstral coefficient (abbreviation: VB1)
Voicebox2The second mel-frequency cepstral coefficient (abbreviation: VB2)
Voicebox3The third mel-frequency cepstral coefficient (abbreviation: VB3)
Voicebox4The fourth mel-frequency cepstral coefficient (abbreviation: VB4)
Voicebox5The fifth mel-frequency cepstral coefficient (abbreviation: VB5)
Voicebox6The sixth mel-frequency cepstral coefficient (abbreviation: VB6)
Voicebox7The seventh mel-frequency cepstral coefficient (abbreviation: VB7)
Voicebox8The eighth mel-frequency cepstral coefficient (abbreviation: VB8)
Repertoire sizeThe number of different syllable types in a male's song (excluding the introductory note which is not part of the motif)
No. of syllablesTotal number of syllables in a male's motif (average of two motif recordings)
Motif duration (ms)Duration from the start of the first to the end of the last syllable in the motif
Syllable rate (s−1)No. of syllables / (motif duration + one pause between subsequent motifs)
Sound densitySum of all syllable durations within the motif / (motif duration + one pause between subsequent motifs)
Percent motif durationMotif duration / (motif duration + one pause between subsequent motifs)
StereotypyProportion of syllables that are the same between the two selected motif recordings of a male
Intra similarityThe average similarity coefficient of all possible pairwise comparisons between the syllables in a male's song repertoire
Percent call syllablesThe percentage of syllables in a male's song repertoire that gets classified by discriminant analysis as a distance call rather than as a song syllable

After automatic delineation of calls and song syllables by SAP these were automatically cut using a MATLAB routine to remove any periods of silence. We then extracted for every syllable the first eight MFFCs using Voicebox. We analyzed windows of 30 ms (1320 samples at the 44 kHz sampling rate), with a 15 ms overlap. The number of MFCCs was chosen to match the number of SAP parameters (Table 1) to allow a more direct comparison of the explanatory power of the two approaches.

Finally, we extracted an additional nine parameters describing the overall structure of male song (Table 1). Most of these parameters or very similar measures have been used widely in the study of zebra finch song (e.g., Holveck and Riebel 2007; Zann and Cash 2008). The definition of parameters is described in Table 1. To decide whether two syllables in a male's song were “the same” or “different,” we used the symmetric pairwise similarity comparison function of SAP (using only standard settings). An overall score larger than 70 was regarded as indicating identity. This similarity function of SAP was also used to calculate the parameter “intrasimilarity,” which reflects the overall uniformity versus diversity of syllables in a male's repertoire. Zebra finch songs typically contain some syllables that resemble typical male or female distance calls (e.g., the last syllable in Fig. 1C). To assess the relative abundance of these call-like syllables in a male's song in an objective way, we performed a discriminant analysis between the 806 different calls (one per individual) and the 1787 different song syllables (4.3 per male) based on the eight SAP traits and the eight MFCCs. Based on their discriminant scores, 297 of the 1787 song syllables (16.6%) were classified as call-like.

Upon visual inspection of histograms, 22 of the 25 characteristics showed an approximately normal distribution, with the exceptions being pitch, stereotypy, and proportion call syllables. This problem was solved for pitch by log-transformation, but the other two traits were difficult to transform appropriately. We nevertheless decided to analyze all traits assuming normality, and hence the results for stereotypy, and proportion call syllables should be interpreted with great caution.


To analyze the heritability of vocal traits as well as the proportion of phenotypic variance explained by several environmental factors, we used pedigree-based animal models (Lynch and Walsh 1998) performed by REML-VCE 6.0.2 (Groeneveld et al. 2008). This technique of variance component estimation is based on restricted maximum likelihood, and we used it to decompose the total observed phenotypic variance in a trait into the following components: additive genetic variation (as explained by the pedigree information), general maternal environment (as explained by the random effect of mother identity), general foster environment (as explained by the random effect of foster pair identity), general peer environment (as explained by the random effect of peer group identity), and residual variance. Hence, our heritability estimates (additive genetic variation divided by total phenotypic variation) largely reflect narrow-sense heritability, except that some part of the dominance variance and epistatic effects will be included. Dominance and epistasis increase the covariance particularly between full-siblings, and because this is not modeled specifically, these effects will partly inflate the estimates of additive genetic and maternal effects and partly be included in the residual variance component.

The pedigree we used spans four generations (the first of which is without phenotypic data), and comprises 1221 individuals. Due to the use of a large number of molecular markers this pedigree is essentially free of errors. The largest models, like for male and female call traits, would have phenotypic data on 806 individuals, which are distributed among 182 mothers, 275 foster pairs, and 76 peer groups. The smallest models, like for call traits of females alone, comprised 379 individuals with phenotypic data, distributed among 134 mothers, 197 foster pairs, and 49 peer groups.

To explore which factor (genetics, the three environmental factors, plus residual) explains how much of the phenotypic variation, we first ran a separate model for each vocal trait (16 female call traits, 16 male call traits, 25 male song traits, hence 57 models in total). After establishing that there is substantial additive genetic variance for most traits, we were also interested in genetic correlations between the sexes, between male call traits and male song traits, as well as between vocal traits and body mass. Genetic correlations estimate the extent to which the same genes have correlated effects on two traits (e.g., call duration in females vs. call duration in males). To do this, we initially ran four-trait models that estimate all the five variance components mentioned above for each of four traits (a female call trait, the corresponding male call trait, male song trait, and body mass), as well as the five times six covariances between them. However, many of these models did not reach VCEs convergence criteria (finishing on status 2 or 3), probably due to the many parameters (50) to be estimated from a limited dataset. Hence, we excluded the three random effects representing the environmental components from these models and estimated only the genetic versus residual variance components and the genetic versus residual correlations (20 parameters). We expected that the heritabilities estimated by these models would be slightly inflated by maternal effects (since full-siblings share both 50% of their genes and 100% of the general maternal effect) but not by foster or peer effects (due to the well-randomized cross-fostering design). Because maternal effects turned out to be of minor importance as shown by the initial single-trait models (with five variance components), we consider their omission from the four-trait models as justified. In fact, heritability estimates from these four-trait models may be less error-prone because strong genetic correlations facilitate the estimation of variance components, as is reflected by their lower standard errors and fewer negative variance component estimates (which are forced to zero by the program).

To study the genetic relationships between the various vocal traits, we ran separate multitrait models for female calls, male calls, and male songs, always extracting the additive genetic and environmental (i.e., residual) variances and covariances. To reduce problems with scaling, all vocal traits of individuals were z-transformed within their category (female calls, male calls, and songs). Eight-trait models were run for the SAP-traits and for the VB-traits, and a nine-trait model for the parameters describing song structure. To compare G-matrices between the sexes and between calls and songs, we extracted (using MATLAB) the first principal component (eigenvector) describing the main axis of genetic variation in the multidimensional space described by a G-matrix (see Blows 2007), and we then calculated the angle between these first eigenvectors. We used the same approach to compare matrices of environmental variances and covariances.


Repeatability was calculated according to Lessells and Boag (1987). Repeatabilities can vary from +1, when all variation is between individuals and none within individuals, to values lower than zero, when all variation is within individuals and none between individuals (the precise lower limit of repeatability depends on the number of measurements taken per individual). When the between- and within-individual mean sum of squares are equal, repeatability is zero, and the analysis of variance (ANOVA) yields F= 1 and P= 0.50. We interpret repeatabilities as significantly smaller than zero if the within-individual mean sum of squares exceeds the between-individual mean sum of squares to such an extent that P > 0.975 (corresponding to a two-tailed test). We used SPSS 15.0 for all statistics other than what is specified above.

We decided to highlight variance components and genetic correlations as significant if their 95% confidence interval (estimate ± 1.96 SE) does not include zero. Note that this is not equivalent to a hypothesis test, for which one should use a likelihood-ratio test against a model in which the respective component or correlation is constrained to zero. We here prefer the former approach, because the aim of our study is to estimate effect sizes rather than to test specific hypotheses.


Table S1 shows means and standard deviations of vocal traits on the basis of single syllable recordings. This variance is first decomposed into within-individual variance and between-individual variance (see repeatability analyses below). The subsequent analyses of heritability are based on individual means of vocal traits, the means and standard deviations of which are shown in Table S2.


The individual distinctness of distance calls is signified by the high individual repeatabilities of vocal traits (Table S3). On average, repeatabilities were higher for males (R= 0.67) than for females (R= 0.56), and higher for SAP-traits (R= 0.69) than for the timbre-related VB-traits obtained from the Voicebox software (i.e., the MFCCs, R= 0.54). The directed songs of the 413 recorded males were made up of an average of 4.3 different syllable types per male (yielding 1787 syllables in total). The repeatability of the acoustic traits of these 1787 syllables (i.e., the same syllable sung twice by the same male) was even higher than the individual repeatability of call traits (SAP: R= 0.92; VB: R= 0.81). The increasing repeatabilities from female calls to male calls to male song syllables can be partly explained by the increasing overall variability (Fig. 1; Tables S1 and S3).

To address the question of syllable-independent individual recognition one can compare (1) a male's song syllables with each other and (2) a male's song with his distance call. The first comparison is shown on the x-axis of Figure 2, the second on the y-axis.

Figure 2.

Repeatability of 16 vocal traits within individual males. The x-axis shows the individual repeatability of vocal traits comparing a male's song syllables with each other (only including different syllable types, i.e., where the similarity score < 0.7). The y-axis shows the strength of the correlation (Pearson's r) between males’ average song traits and the traits of their distance calls. The hatched lines delineate the range where neither the repeatability within song nor the correlation between song and call is significantly different from zero. SAP traits are labeled with a short name (for details see Table 1) and the numbered Voicebox traits refer to the eight MFCCs.

Of the eight SAP-traits, mean frequency was the only trait with an individual repeatability (among a male's song syllables) significantly larger than zero (R= 0.047; F412,1374= 1.21; P= 0.007). Remarkably, three traits (duration, pitch, and entropy) showed significantly greater variance within individuals than between individuals (referring to mean sums of squares), as reflected by their negative repeatability estimates. This indicates that zebra finch song is composed of syllables that are more diverse (within males) with regard to these traits than randomly picked syllables. In strong contrast, all VB-traits (except VB4) were significantly repeatable (average R= 0.144), as expected for a measure of voice.

When comparing the average song traits of males (averaged among the 1–10 song syllables per male) to the traits of their distance calls, significant correlations were found for all SAP-traits but pitch (average r= 0.13), and for all VB-traits (r= 0.19; for details see Table S3). This shows that syllable-independent individual recognition would be possible in principle, but given the weak correlations and the limited number of different syllables per individual (one call and 4.3 song syllables) the scope for such recognition would be practically very constrained in the zebra finch.


The syllable-independent individual repeatability of VB-traits (MFCCs) is thought to be caused by the resonance properties of the vocal tract. We have no measurements on individual vocal tract morphology, but body mass should at least correlate with the length of the vocal tract. Accordingly, we found that all VB-traits (all except VB4) were significantly correlated with female body mass, and also to a lesser extent with male body mass (Fig. 3A; Table S4). Among the SAP-traits, only mean frequency showed substantial correlations with body mass.

Figure 3.

The strength of (A) phenotypic, (B) genetic, and (C) environmental correlations of call characteristics with body mass in males and females. The hatched lines in (A) delineate the range where neither the correlation of female call traits with female body mass nor the correlation of male call traits with male body mass is significantly different from zero. Standard errors for genetic and environmental correlations are shown. SAP traits are labeled with a short name (for details see Table 1) and the numbered Voicebox traits refer to the eight MFCCs.

Decomposing these phenotypic correlations with body mass into their additive genetic component (genetic correlations, Fig. 3B) and environmental component (residual correlations, Fig. 3C) showed that the underlying reasons were primarily genetic in nature (Table S5). In other words, a large proportion of the genes that affect body mass have correlated effects on vocal traits. These effects are relatively large (compared to the smaller phenotypic and environmental correlations) and they are similar between the sexes. The much weaker and sometimes opposing environmental correlations indicate that environmental factors tended to blur rather than enhance the body-size indicator function of vocal traits, and especially so in males due to the relatively greater weight of environmental effects (Fig. S1).


Heritabilities of SAP-traits were generally higher than those of VB-traits, and, as expected, were highest for female calls, intermediate for male calls, and lowest for male song (Fig. 4; for details see Tables S6–S8). To understand the reasons for lower heritabilities in males as compared to females we compared the amounts of additive genetic variance and also of environmental variance between the sexes (computed from h2 in Table 2 and SD2 in Table S2). Male call traits showed only nonsignificantly lower additive genetic variance than the same traits in female calls (median of 16 traits: 84%; paired t-test of log-transformed variances t15= 1.2, P= 0.26), but significantly greater environmental variance (median: 156%, t15=−3.3, P= 0.005). Male songs showed clearly lower genetic variance than female calls (median: 31%, t15= 4.3, P= 0.0007) and greater environmental variance (median: 167%, t15=−2.4, P= 0.029).

Figure 4.

Variance component estimates for vocal traits measured with Sound Analysis Pro (SAP) and Voicebox (VB) as well as for structural characteristics of song (Other). The y-axis shows the proportion of phenotypic variance explained by additive genetic effects (Genetic), maternal effects (Maternal), foster environment effects (Foster), and peer group effects (Peer). Medians of the estimates for eight (SAP and VB) or nine (Other) characteristics are shown. Asterisks indicate the approximate significance of these median values (for details see Table S6–S8). *P < 0.05; **P < 0.01; ***P < 0.001.

Table 2.  Heritability estimates ± SE for traits of female calls, male calls, and male songs together with genetic correlations ± SE between traits of female and male calls as well as between traits of male calls and male songs. Bold print highlights estimates that are more than 1.96 SE from zero. These estimates are derived from four-trait animal models that include body mass (see Table S5) besides the female call trait, male call trait, and male song trait.
TraitFemale call h2Genetic correlation male–femaleMale call h2Genetic correlation call–songMale song h2
Mean pitch0.637±0.0800.035±0.1520.167±0.0510.939±0.4500.076±0.077
Frequency mod.0.547±0.0570.009±0.1510.172±0.0760.541±0.2550.275±0.088
Pitch goodness0.676±0.0880.586±0.2190.113±0.0830.598±0.3040.126±0.088
Mean frequency0.391±0.0480.456±0.1800.299±0.0660.572±0.2310.101±0.061
Amplitude mod.0.655±0.0800.214±0.1600.180±0.0650.901±0.2820.102±0.049
SAP median0.6180.2550.1870.7220.114
VB median0.2970.5600.2280.1920.100

Significant genetic correlations between the sexes were found for three of eight SAP-traits (mean of three traits: r= 0.59) as well as for six of the eight VB-traits (mean of six traits: r= 0.67; Table 2), indicating a shared genetic basis for these traits in females and males. Genetic correlations between male call traits and male song traits were significant for seven of eight SAP-traits as well as for two of the eight VB-traits.

Genetic (and environmental) correlations among different traits (such as duration, pitch, or amplitude) showed some similarities but also pronounced differences when comparing between female calls and male calls (Tables S10 and S11). Consequently, the first principal components of the female call and the male call G-matrices (PCg) were oriented at an angle of 40° (SAP-traits) and 46° (VB-traits), where 0° stands for identical orientation and 90° for orthogonality. The angles for the respective first principal components of the environmental variance–covariance matrices (PCe) were 21° (SAP) and 60° (VB). Dissimilarities in G- and E-matrices were even slightly more pronounced when comparing between male calls and male songs (PCg: SAP 35°, VB 85°; PCe: SAP 36°, VB 58°).

It seems noteworthy that classical song traits such as repertoire size that have been widely studied in Passerines showed very low estimates of heritability. Note that heritability estimates derived from multiple-trait models (Table S12) tended to be slightly higher (median h2= 0.112) and more confident (median SE = 0.050) than those from single-trait models (Table S8, median h2= 0.069, median SE = 0.081), as expected from the increased power of multiple-trait models. Among those structural song traits, percentage motif duration showed the highest heritability (h2= 0.24), but this might be more a measure of the motivation to sing (with short pauses) than an acoustic trait.


Maternal effects on all call and song traits were very small and only rarely significant (Fig. 4; Tables S6–S8). If anything, some VB-traits of males may have been affected.

The variance component estimates for the foster environment were also relatively small, but, as expected, tended to be larger for males than for females (Fig. 4; Tables S6–S8). A more powerful test for foster effects is the direct comparison of vocal traits between offspring and foster parents (Table S9). These tests confirm that daughters did not resemble and hence did not learn from their foster mother (mean r=−0.01). In contrast, sons tended to slightly resemble their foster fathers in call traits (mean r= 0.13), but less so in song traits (mean r= 0.07).

Finally, female call traits did not depend on the peer group in which females grew up (Fig. 4, Table S6). Some male call traits (SAP-traits) were significantly affected by the peer group (Table S7), and many male song traits depended significantly on the peer group (Table S8). The still relatively small variance component estimates (all < 0.185) indicate that the members of a peer group (mean group size was 10.4 males, range: 2–75) did not fully converge on a group-specific call or song (in this case all other variance components than peer group would approach zero), but rather that convergence occurred in smaller subgroups within the peer group (W. Forstmeier, unpublished data).


To illustrate the extent to which heritable call traits could serve for kin discrimination, we selected in each of the sexes the six largest full-siblings families. Inclusion criteria were a minimum of six daughters or seven sons, respectively, per family, and no shared grandparents between the families. It should be noted, however, that all families originated from the same (fairly panmictic; see Forstmeier et al. 2007) population. For each sex, the 16 call traits were entered into a discriminant analysis in a stepwise forward manner (including only significant predictors) to separate the six families. Individual scores on the first two discriminant axes (out of five axes in females and four in males) are shown in Figure 5. The ability to assign individuals correctly to their family was tested by removing always one individual at a time from the dataset for calculating the discriminant functions, and then classifying that individual based on these functions. In females, 32 of 53 individuals (60.4%) were assigned to the correct family, which differs strongly from random assignment (i.e., 16.7%; effect size w= 1.17). The predictors in this analysis were VB8 (significance of F to remove: P= 0.0001), amplitude (P= 0.0009), pitch (P= 0.005), VB6 (P= 0.021), and VB1 (P= 0.023). In males, 20 of 48 individuals (41.7%) were correctly assigned (w= 0.67). Here the predictors were VB8 (P < 0.0001), VB4 (P= 0.0005), pitch (P= 0.0006), and VB7 (P= 0.036).

Figure 5.

(A) Discriminant analysis separating 53 females belonging to six different families (represented by different symbol types) according to five vocal traits (pitch, amplitude, VB1, VB6, and VB8). Cross-validation assigns 32 females (60%) to the correct family. (B) Discriminant analysis separating 48 males belonging to six different families according to four vocal traits (pitch, VB4, VB7, and VB8). Cross-validation assigns 20 males (42%) to the correct family.


Our study confirms that the characteristics of female distance calls are highly heritable (Zann 1985). We show that females actually did not learn any of the traits investigated from their foster mothers and that call traits did not converge among female nest sibs or peer group members. Males, in contrast, learned their vocalizations partly from the foster father, partly from peer group members. As a consequence of learning from unrelated individuals heritabilities were lower in males, because learning adds variability that is specific to the type of syllable that is learned (increased environmental variance). However, independent of learning, heritable voice characteristics still lead to a resemblance of relatives that would allow, to a certain degree, kin discrimination to occur even in the absence of learning from the genetic father (see Burley et al. 1990; Zann 1997).

We found several strong genetic correlations in vocal traits between the sexes as well as between male calls and songs, indicating a shared genetic basis. However, genetic covariances between different vocal traits (G-matrices) differed to quite some extent between the sexes as well as between calls and songs. We suggest that vocal production learning introduces new levels of acoustic complexity that seriously alters the genetic relationships between vocal traits. Hence, extrapolations from zebra finch song G-matrices to other Passerine species are not warranted.

Also, we show that frequency and timbre characteristics might function as indicators of body size. Genetic correlations with body size enforce the honesty of this signal, whereas vocal production learning in males adds noise to this relationship and thereby devaluates its indicator function (as compared to females). In contrast to voice characteristics, structural traits of male song, such as repertoire size, showed extremely low heritabilities. Hence, these classical song features that are often suspected to be under directional selection by female choice (Searcy and Yasukawa 1996; Neubauer 1999; Gil and Gahr 2002; Spencer et al. 2005) would hardly respond to selection at all. However, it should be noted that clear evidence for directional selection on zebra finch repertoire size is still missing (see below).


Our study shows that syllable-independent individual recognition by voice characteristics (such as text-independent speaker recognition in humans) would be possible in principle. However, this may not be of much practical importance, with the possible exception of well-matched tutor–pupil pairs, where the syllable types are the same, but the voices differ. The high individual distinctness of male call and song syllable characteristics (Table S3) confirms previous studies showing that these vocalizations are highly suitable to function in individual recognition (Zann 1984; Vignal et al. 2004). A male's vocal repertoire is so small, so stereotyped, and so individually specific that recognition by syllable type (i.e., “text”) is much more likely to be the norm than recognition by voice. In that respect, the zebra finch is likely to resemble the case of the song sparrow, where individual recognition seems to be based on song types rather than voice characteristics (Beecher et al. 1994). In contrast, in great tits, where song type sharing between males is much more common, individual recognition by voice seems to work fairly reliably (Weary and Krebs 1992; Blumenrath et al. 2007). Future studies could investigate whether zebra finches are able to distinguish the vocalizations of well-matching tutor–pupil pairs by their individually distinct voice characteristics. In those matched cases voice characteristics may actually be much easier to pick up than might be suggested by the low correlation coefficients shown in Figure 2. It has to be kept in mind that, in our analyses, variation due to different syllable types adds a lot of noise to the data, leading to very moderate individual repeatabilities across syllable types. Also, it might be that some syllable types or parts of syllables are not “voiced” vocalizations (i.e., produced by the syrinx, like the human vowels are produced by the larynx), but rather come from different sources (like most human consonants), which would reduce the repeatability of MFCCs between syllable types.

Incidentally, Figure 2 also disproves a common misconception (see Falconer and Mackay 1996). It is widely believed that measurements have to be significantly repeatable to yield a meaningful average value. In contrast, our data show that averages of several measurements with individual repeatability smaller than zero (x-axis) can still show significant correlations with another trait (y-axis) and even be significantly heritable (Table S8). This counterintuitive finding can be understood if we postulate two opposing mechanisms: (1) genetic differences between individuals in their trait means and (2) a tendency to maximize within-individual diversity in syllables. If the second mechanism is sufficiently strong, there will be more within- than between-individual variation (referring to mean sums of squares, hence repeatability < 0), but after averaging, individual means will still reflect the underlying genetic differences (see also Dohm 2002). The finding of significantly lower amounts of additive genetic variance in SAP and VB-traits of male songs as compared to female calls may be a byproduct of such a learning strategy that maximizes within-song diversity of syllables. It may also be that additive genetic differences between individuals in trait means have been reduced by, for example, stabilizing selection through female choice.

The remarkably low repeatability of pitch (Fig. 2) is probably because pitch estimation in SAP switches between the fundamental frequency in harmonic stacks when goodness of pitch is high and the much higher mean frequency when goodness of pitch is low. Hence this result might be an artifact of SAP's variable way of pitch estimation.


Strictly speaking, our estimates of heritability are valid only for the environmental conditions experienced by our laboratory birds. In the wild, birds would typically grow up with their genetic parents (except for cases of extra-pair paternity and egg dumping) rather than with foster parents, and would mostly be able to stay with their parents for longer than in our captive setting. Environmental conditions during song learning (mostly around days 40–90) may be harsher in the wild with regard to nutrition, but maybe less intense with regard to social interactions. However, we have little evidence that the vocal traits we studied were sensitive to early nutrition or peer group size and composition (W. Forstmeier and E. Bolund, unpublished data; see also the Methods section).

The most commonly described characteristics of nonlearned bird vocalizations are measures of duration and frequency. In our domesticated population of zebra finches these traits showed a heritability of around 60%. This agrees well with the findings of Zann (1985), who studied zebra finches that were directly caught from the wild in Australia. Two other studies have published data on nonlearned bird vocalizations from which broad-sense heritabilities can be calculated. Duration and frequency characteristics of separation calls of northern bobwhites showed an average heritability of 56% (calculated from table 3 in Baker and Bailey 1987), and those of nestling begging calls showed heritabilities of around 60% in barn swallows and 93% in cliff swallows (calculated from Table 1 in Medvin et al. 1992). Hence, in nonlearned vocalizations, heritabilities of around 60% seem to be the default, and the increased value in cliff swallows might be interpreted as resulting from selection increasing between-brood differences to facilitate offspring recognition in these colonial breeders (Medvin et al. 1992).

Two small-scale twin studies on voice characteristics in humans indicate that speaking fundamental frequency may show a heritability of around 40–80% (Przybyla et al. 1992; Debruyne et al. 2002). Hence, when the spoken text is controlled for, heritability is in the range of those of nonlearned bird vocalizations. In contrast, the characteristics of learned vocalizations in male zebra finches showed lower heritabilities due to the added variability that is specific to the type of syllable that is uttered. Here, the highest heritabilities were found in traits that are most closely linked to morphological or physiological constraints (e.g., mean frequency and the cepstral coefficients VB1–8 reflecting timbre). Accordingly, several of these traits showed significant genetic correlations with body size, as well as significant genetic correlations between the sexes.

In contrast to morphology-related voice characteristics, classical song traits such as repertoire size, motif length, or syllable rate showed very low heritabilities. This is at odds with the claim that zebra finch repertoire size depends on the size of the higher vocal centre HVC in the brain (Airey and DeVoogd 2000) and that HVC volume is heritable (Airey et al. 2000). Although heritable variation in brain morphology is very likely to exist, the correlation between repertoire size and HVC volume in zebra finches seems questionable. The analysis by Airey and DeVoogd (2000) suffers from a multicolinearity problem (their Table 2), and reanalysis of their table 1 yields a correlation of only r= 0.18, P= 0.44, which is in agreement with another study by Ward et al. (1998), which also found r= 0.18, P= 0.59. Hence, HVC volume may be heritable, but seems unrelated to repertoire size at the intraspecific level. Although it would be tempting to argue that the low heritability of repertoire size might be because past selection by female choice has depleted the additive genetic variance for the trait, we have limited confidence in the notion of directional selection on repertoire size in this species. Most claims of female preferences are of indirect nature (Neubauer 1999; Collins 1999; Spencer et al. 2005; Zann and Cash 2008) and we see no such preferences in either choice tests or aviary breeding experiments and also no relationship between repertoire size and male fitness in aviaries (W. Forstmeier and E. Bolund, unpublished data). Compared to other passerine species, zebra finches have a strikingly simple song that is highly stereotypic within males but also highly variable between males (Table S3), suggesting a function for signaling identity rather than for signaling learning ability via exaggeration of complexity.

Still, the low heritability of repertoire size (h2= 0.081 ± 0.051) and motif length (h2= 0.178 ± 0.053) seems remarkable, given that these song features seem to differ genetically (rather than culturally) between the two subspecies of the zebra finch (Clayton 1990; see also Kroodsma and Canady 1985). Apparently, within our population (of the Australian subspecies) hardly any heritable learning biases exist such that some genotypes would preferentially learn from tutors with small repertoires and others from tutors with large repertoires. Patterns of song convergence within peer groups (Table S8; W. Forstmeier, unpublished data) suggest, in agreement with the study of Volman and Khanna (1995), that our young males housed in larger peer groups predominantly learned from each other, and that they indeed faced a wide choice of potential tutors. Hence, the design seems really suitable for detecting heritable learning biases (resembling those described in canaries; Mundinger 1995; Wright et al. 2004).

Although motif length showed little heritable variation, there was remarkable additive genetic variation for the duration of single syllables within songs and this was genetically correlated with call duration. Apparently, additive genetic variation for call duration is also present in nonlearned calls of pigeons and chicken, as evidenced by successful selective breeding on call duration in some pigeon and chicken breeds (Baptista 1996).

Wright et al. (2004) found that the gene responsible for the learning bias in Waterslager canaries is located on the Z chromosome, and they highlight the fact that many genes controlling sexually selected traits are Z-linked (Iyengar et al. 2002; Price 2002). We tested for the possibility of Z-linked inheritance of call traits. If major genes affecting call traits were Z-linked, we would expect a reduced heritability estimate from mother–daughter regression (as compared to the animal model estimates) because daughters inherit the Z chromosome only from the father. However, we found a weak and nonsignificant trend in the opposite direction (W. Forstmeier, unpublished data), making considerable Z-linkage unlikely. W-linkage, on the other hand, should inflate the maternal effect estimate from animal models, which also turned out to be very small (Table S6). Hence we conclude that a predominantly autosomal inheritance of vocal traits seems most likely.

Zann and Cash (2008) found unexpected differences in song characteristics of zebra finches originating from different aviaries and suggested that maternal effects (via egg components) could have caused these differences. Our finding of very small maternal effects (Table S8) suggests that this explanation is unlikely. It seems more promising to search for unaccounted random effects acting during the main phase of song learning (analogous to our peer environment in Table S8).


The fact that sound frequency decreases with body size is well established for birds at the interspecific level (Ryan and Brenowitz 1985) and also at the intraspecific level, at least in some nonpasserine bird species (Barbraud et al. 2000; Miyazaki and Waas 2003; Madsen et al. 2004; Hardouin et al. 2007; Mager et al. 2007). It has been debated (Fitch and Hauser 2002; Mager et al. 2007) whether the honesty of this size-indicator function comes through morphological constraints (“index signal”; Vehrencamp 2000) or through condition-dependence of costly signals (“handicap signal”). Our quantitative genetic analyses (Table S5) show that, in the zebra finch, the relationship with size is primarily due to genetic pleiotropy (genetic correlations with body size) and not due to correlated environmental effects (residual correlations), clearly supporting the index signal scenario. In our population, we find breeding values of body size (i.e., genetic size) to be selectively neutral, whereas environmental deviations from the breeding value (i.e., residual size) are positively related to fitness in breeding aviaries (E. Bolund and W. Forstmeier, unpubl. data). Hence genetic size is not an indicator of genetic quality, and the genetic correlation between frequency and size does not reflect the ability to invest in low-pitch sounds, but rather directly reflects how morphological properties of the vocal tract affect the frequency.

It should be noted that vocal production learning tends to add noise to these relationships (Tables S4 and S5) because frequency and timbre are strongly affected by learning (Williams et al. 1989; see also Goller and Cooper 2004). Accordingly, genetic and phenotypic correlations with size were weaker in males than in females. This fits with the observation that strong phenotypic correlations of frequency characteristics with body size have been reported from species without vocal production learning (amphibians: Davies and Halliday 1978; Bee et al. 2000; mammals: Fitch 1997; Reby and McComb 2003; Pfefferle and Fischer 2006; nonpasserine birds: see references above), whereas relationships seem much weaker in humans (Rendall et al. 2005) and passerine birds (evidenced through a lack of references).

Holveck and Riebel (2007) reported significant positive relationships of syllable rate, sound density, and percent motif duration with body size in their male zebra finches (around r= 0.5; N= 17). Hence they concluded that song was signaling redundant information reflecting male overall quality. However, we found no support for such a relationship in our population (around r=−0.03; N= 413; Table S4).


The zebra finch and its vocalizations have become an important model in neurobiology, behavioral ecology, and evolutionary research. With the present study we hope to contribute some fundamental information about the quantitative genetics of vocal traits, which may have implications for a variety of questions ranging from the neurogenetic control of vocal production to the honesty of bird song as a quality indicator.

Associate Editor: J. Wolf


We thank A. Zeller, Q. Herzog, and M. Schneider for assistance with recording and sound analysis. We also thank E. Bodendorfer, A. Grötsch, P. Neubauer, F. Preininger, and M. Ruhdorfer for animal care. M. Gahr kindly provided the soundproof recording chambers; B. Kempenaers provided other logistic support. H. Milewski calculated the eigenvectors of G- and E-matrices. E. Groeneveld helped with advice on animal models. H. Brumm as well as two anonymous referees provided helpful comments on earlier versions of the manuscript. During this project WF was supported by an Emmy-Noether Fellowship (Deutsche Forschungsgemeinschaft: FO 340/1-3).