Multilocus analysis of phylogeography and population history is a powerful tool for understanding the origin, dispersal, and geographic structure of species over time and space. Using 36 genetic markers (29 newly developed anonymous nuclear loci, six introns and one from mitochondrial DNA, amounting to over 15 kb per individual), we studied population structure and demographic history of the red-backed fairy wren Malurus melanocephalus, a small passerine distributed in the northern and eastern part of Australia across the Carpentarian barrier. Analysis of anonymous loci markers revealed large amounts of genetic diversity (π= 0.016 ± 0.01; average number of SNPs per locus = 48; total number of SNPs = 1395), and neither nuclear nor mitochondrial gene trees showed evidence of reciprocal monophyly among Cape York (CY), Eastern Forest (EF), and Top End (TE) populations. Despite traditional taxonomy linking TE and CY populations to the exclusion of EF, we found that the CY population is genetically closer to the EF population, consistent with predicted area cladograms in this region. Multilocus coalescent analysis suggests that the CY population was separated from the other two regions approximately 0.27 million years ago, and that significant gene flow between the ER and the CY populations (∼2 migrants per generation) suggests geographic continuity in eastern Australia. By contrast, gene flow between the CY and the TE populations has been dampened by divergence across the Carpentarian barrier.
Understanding the geographic distribution of genealogical lineages and the underlying processes generating these patterns has been of great interest for evolutionary biologists. Phylogeography focuses on these distributions within and among closely related species (Avise et al. 1987; Arbogast and Kenagy 2001), combining both spatial and phylogenetic components, and thereby contributes to the study of micro- and macroevolution along with fields such as systematics and paleontology (Avise et al. 1987; Avise 1998, 2000; Hewitt 2001). Since the advent of the polymerase chain reaction (PCR), the use of DNA sequence variation to reconstruct phylogeographic histories has accelerated the development of the field, relying heavily in its first two decades on mitochondrial DNA (mtDNA) because of its high mutation rate, small effective population size (compared with autosomal nuclear DNA), haploidy, and its putative lack of recombination, although reports of mitochondrial recombination have become more common recently (Hagelberg et al. 2000; Pakendorf and Stoneking 2005; Tsaousis et al. 2005). However, because mtDNA is usually inherited maternally, it may not exhibit genealogical patterns that are representative of the entire population history, especially when there is a sex bias in fitness or dispersal behavior (Hare 2001). More importantly, such single-locus approaches are liable to errors due to stochastic lineage sorting (Kuo and Avise 2005) and we cannot estimate the coalescent variance for parameters such as gene flow, divergence times and population sizes with a single locus (Edwards and Beerli 2000). In addition, natural selection has been increasingly shown to influence the trajectory of mtDNA lineages, in some cases placing a ceiling on diversity that appears to be eroded repeatedly by selective sweeps (Bazin et al. 2006). Although mtDNA will surely continue to be a mainstay of phylogeography, the relative power of nuclear versus mitochondrial variation for unraveling phylogeographic histories, as well as the consistency of scenarios presented by the two genomic compartments in natural populations, is still debated (Edwards et al. 2005; Brito and Edwards 2008; Zink and Barrowclough 2008).
Recently, single nucleotide polymorphisms (SNPs) and “resequencing” of nuclear DNA have arisen as important methods that may help refine comparative phylogeography and historical demography of populations (Nickerson et al. 1997; Di Rienzo et al. 1998; Wakeley et al. 2001; Brumfield et al. 2003). One marker type that has received little attention in studies of nonhuman animals is anonymous nuclear loci (Karl and Avise 1993; Brito and Edwards 2008). Anonymous nuclear markers are noncoding regions of the genome, randomly collected and presumably dispersed across the chromosomes. Because methods for developing anonymous loci can yield a nearly unlimited number of random segments of nuclear DNA, they provide a better representation of variation across the whole genome than does any one marker. In addition, via resequencing approaches they can minimize ascertainment bias, and help evaluate the extent of coalescent variation in datasets (Karl and Avise 1993; Carstens et al. 2005; Jennings and Edwards 2005). Even though nuclear DNA is known to have a lower mutation rate than mitochondrial DNA, in fact, few studies have compared levels of noncoding mitochondrial and nuclear DNA within species to check the almost universal assumption that mtDNA exhibits higher variation than nuclear DNA, and some studies that have done so have found lower levels of mtDNA variation, presumably a result of selection (Bazin et al. 2006; Bensch et al. 2006). The multiplicity of gene histories available in the nuclear genome can provide a strong signal for inferring demographic histories, even if the gene histories themselves appear heterogeneous (Carstens and Knowles 2007). Furthermore, the development of increasingly powerful coalescent-based methods makes it straightforward to test hypotheses and infer demographic parameters from multilocus data (Nielsen and Wakeley 2001; Hey and Machado 2003). Such methods, combined with multilocus data and frequently with simulation approaches designed to delimit the parameter space of a priori phylogeographic hypotheses, mark a new phase of multilocus phylogeographic research known as “statistical phylogeography” (Knowles and Maddison 2002; Knowles 2004).
BIOGEOGRAPHY OF AUSTRALIAN BIRDS
Due to its long history of isolation as a continent (Cracraft 2001), Australia has provided abundant opportunities for studying biogeography. By comparing general geographic distributions of Australian birds, early studies identified multiple codistributed songbirds ranges and areas of endemism around the continent (Keast 1961; Ford 1974, 1987a; Cracraft 1986). Some geographic patterns across the continent have been explained by selective responses to environmental gradients without geographic isolation (Wooller et al. 1985; Hughes et al. 2001, 2002; Toon et al. 2003). In addition, various combinations of differentiation in isolation and population expansion also have been proposed for populations now found in the center of the continent as well as for populations around the periphery (Keast 1961; Ford 1974, 1987a; Schodde 1982; Joseph and Wilke 2007). Many species show congruent distributions with major or minor barriers that have generally been assumed to be formed in the Pleistocene (Cracraft 1986; Ford 1987b; Jennings and Edwards 2005), but still we do not have a solid time scale for many of these events. Determining further time scales for the Australian biota is important for understanding global patterns of bird biogeography, particularly in light of the controversy surrounding the role of the Pleistocene in the evolution of the North American bird fauna (Avise and Walker 1998; Klicka and Zink 1999; Johnson and Cicero 2004).
One biogeographic barrier that has played an important role in Australian biotic history is the Carpentarian Barrier, a prominent barrier in northern Australia extending from the southernmost shore of the Gulf of Carpentaria. This barrier is generally assumed to have arisen as a result of the fluctuations of sea level over time as well as a steep climate and habitat gradient across this area during the last glacial maxima (Nix and Kalma 1972; Smart 1977; Cracraft 1986). The Carpentarian barrier is roughly 150-km wide depending on the taxon being considered and is a semi-arid region extremely poor in vegetation. Using cladistic analysis of morphology of seven avian lineages, Cracraft (1986) showed that in many lineages the northern and eastern biota were separated first by this barrier and then by secondary barriers along the northern, eastern, and western coasts (see Fig. 1). Several studies of various animal taxa across this barrier are consistent with the above explanation (Keast 1961; MacDonald 1969; Ford 1977; Edwards 1993; Ford and Blair 2005; Jennings and Edwards 2005), although the timing of divergence across this barrier has not been widely addressed. Using anonymous loci, Jennings and Edwards (2005) estimated divergence of phenotypically well-diverged grassfinches (Poephila) across the Carpenterian barrier to be about 600–700 kya, a value well within the Pleistocene and one that contrasts with the older divergences claimed for North America (Klicka and Zink 1999; Johnson and Cicero 2004).
The red-backed fairy wren (Malurus melanocephalus) is a common, sedentary, and territorial passerine in the genus Malurus, which is well known for its high degree of sexual dimorphism, promiscuity, and prevalence of cooperative breeding (Karubian 2002; Webster et al. 2008). As the smallest member of the genus (weight ranging from 6 to 8 g), the red-backed fairy wren is distributed across the northern part (north of latitude 20°S) as well as the east coast of Australia, primarily inhabiting open forest and savannah (Fig. 1). Given its large distribution and its poor capability of long-distance flight, the red-backed fairy wren is a good organism to study the effects of geographic barriers on population structure. Two subspecies, M. m. cruentatus and M. m. melanocephalus, were identified as sharing their boundaries on the southern part of the Atherton Plateau (Schodde and Mason 1999). This taxonomy is based mainly on male plumage color on the back: scarlet and crimson for M. m. melanocephalus and M. m. cruentatus, respectively. Such a geographic pattern suggests a role for a minor Pleistocene-originated geographic barrier, but contradicts the primary function of the Carpentarian Barrier as a major cause of shaping species distributions in northern Australia as predicted by area cladograms (see Fig. 1). Here we employ multilocus genetic tools to test this morphological taxonomy, and use our multilocus dataset to infer the population structure, timing, and demographic history of this charismatic species.
Materials and Methods
Our sampling and data-collecting efforts emphasized loci over individuals, and we focused on broad regions within Australia rather than detailed geographic sampling. We obtained 30 red-backed fairy wren samples from two institutions: 10 specimens from the Burke Museum, Seattle, WA (collected over several expeditions lead by SVE), and 20 specimens from the ANWC (Australian National Wildlife Collection, Commonwealth Scientific and Industrial Research Organization in Canberra), Australia (Supporting Table S1). Each sample was assigned to one of three major biogeographic regions according to their specific localities. All samples from the western side of the Carpentarian Barrier were assigned to Top End (TE: 14 individuals), and the rest of the samples were assigned either to Cape York (CY: 8 individuals) or to Eastern Forest (EF: 8 individuals). Because no hybrids have been reported from south of the Burdekin Barrier in northeast Australia (barrier D in Fig. 1), we used this barrier as the subspecies boundary that separates the CY and EF populations. Among eight samples assigned to the CY population, two individuals (sample #12 and #20) were collected from the putative hybrid zone (Supporting Table S1). Eight white-winged fairy wrens (M. leucopterus) were included in the analysis so as to understand how frequently genes are shared between fairy wren sister species (Christidis and Schodde 1997). We also used three splendid fairy wrens M. splendens, three variegated fairy wrens M. lamberti, three striated grasswrens Amytornis striatus, and two rufous-crowned emu-wrens Stipiturus ruficeps as outgroups to root reconstructed gene trees, focusing on whichever taxon provided reliable sequence of a given locus (Supporting Table S1).
DEVELOPMENT AND EVALUATION OF ANONYMOUS NUCLEAR LOCI
We applied the same technique used by Jennings and Edwards (2005) to develop nuclear anonymous markers. We first used a female red-backed fairy wren specimen (number 60736, Burke Museum) as a source of genomic DNA. Genomic DNA was extracted from frozen pectoral muscles using a standard genomic DNA extraction kit (Qiagen, Valencia, CA) and sheared into small (1 ∼ 2kb) fragments with a Hydroshear (Genomic Solution, Ann Arbor, MI). Sheared DNA was electrophoresed through a 1% agarose gel, and fragments within target range (1 ∼ 2kb) were excised to obtain fragments that would ultimately minimize the number of sequencing reactions required to sequence across each locus, yet maximize sequence available for reconstructing gene trees. After blunting the ends of the sheared DNA and dephosphorylating them using a TOPO-shotgun cloning kit (Invitrogen, Carlsbad, CA), these fragments were ligated into pUC19 plasmid vectors and then transformed into chemically competent Escherichia coli cells by the heat-shock method. Clones were plated on agar plates containing ampicillin and kept for 18 h in an incubator at 37°C. Clones were picked randomly with toothpicks, and then sequenced using M13 vector primers. Because we aimed for markers that would reflect the variability in the noncoding subgenome, all sequences were screened bioinformatically to minimize capture of coding sequences. First, we searched for any significant matches in BLAST (Basic Local Alignment Search Tool) for each sequence. We also checked to determine if they contain any open reading frames (ORFs) using the NCBI ORF finder (http://www.ncbi.nlm.nih.gov/projects/gorf/).
Because we developed anonymous loci markers from random segments of nuclear DNA, it is possible that some of loci might represent paralogs, in which case we would expect to see much higher genetic diversity. To detect possible paralogs, we blasted sequences from all anonymous loci to the trace archive database of zebra finch Taeniopygia guttata (http://www.ncbi.nlm.nih.gov/Traces), which is the closest species whose genome is currently available. Because the zebra finch genome was sequenced sixfold, we would expect to see a similar number of matches in our search. Most of our anonymous loci had on average eight matches, but two loci (Mame-AL17 and Mame-AL30) matched with 100 and 71 clones in zebra finch trace archive. As candidate paralogous loci, these loci were cloned using one individual, and 30 clones for each locus were picked randomly. Whereas only two alleles were found in Mama-AL17 locus as expected, we found six different allele types in Mame-AL30. This suggests that this segment of DNA may exist in more than one place in the genome or may result from high recombination within the locus. Thus, we decided to exclude this locus in our further analyses. In addition, the fact that we did not find any double bands in PCR products further reduces the probability of paralogs. After satisfying ourselves that our loci did not encode known paralogs or protein-coding regions, we designed PCR primers flanking each sequence (Supporting Table S2).
MITOCHONDRIAL AND INTRON MARKERS AND DETERMINATION OF HAPLOTYPES
In addition to anonymous nuclear markers, six intron markers (AB4, aldolase B intron 4; α-globin2, α-globin intron 2; GAPDH, glyceraldehyde-3-phosphate dehydrogenase intron 11; GTP, phosphenolpyruvate carboxykinase intron 9; RI2, rhodopsin intron 2; TGFβ2, Transforming Growth Factor-β2 intron 5) were developed using previously published primers (Waltari and Edwards 2002; Sorenson et al. 2004; McCracken and Sorenson 2005). Part of the mitochondrial DNA (ND2, NADH dehydrogenase subunit 2) was also analyzed using available primers (Sorenson et al. 1999) to compare with characteristics of the nuclear loci markers (Supporting Table S2). After amplifying with these primers, the set of amplicons for 30 red-backed fairy wrens and eight white-winged fairy wrens was sequenced directly in both directions. We found a substantial number of indels in our anonymous loci, often in heterozygous condition (see Results). Sequences containing indels were separated manually by visually subtracting chromatogram peaks upstream of the indel in the reverse primer sequence from the double peaks downstream of the indel in the forward primer sequences. Both sequences were then reverse-complemented, and the subtraction was repeated in the alternative direction as described (Dolman and Moritz 2006). However, because this method was not applicable to sequences that contain more than one indel due to the complexity of their chromatograms, we excluded nine sequences that showed any evidence of multiple indels (e.g., overlapped double peaks in chromatograms). Analysis of indels using this approach has the additional advantage of allowing determination of allele phase directly from the chromatograms, rather than being inferred statistically. To estimate haplotypes of loci, the software PHASE2.1.1 (Stephens et al. 2001) was used to resolve haplotypes of genotypes that were heterozygous at multiple sites. Sequences from outgroups were aligned with these estimated haplotypes using MacClade 4.0 (Maddison and Maddison 2000).
ESTIMATION OF GENE TREES
Because patterns of substitution may differ between different regions of the genome, we used the program MrModeltest 2.2 (Nylander 2004) to infer adequate DNA substitution models for each locus. Based on these estimated DNA substitution models, gene trees were reconstructed using MrBayes version 3.0 (Huelsenbeck and Ronquist 2001), with which we obtained Bayesian posterior probabilities from five million MCMC cycles with a sample frequency of 100 and a burn-in period of 100,000 generations.
Given that the number of distinct alleles (inferred haplotypes) per locus was high (33 ± 11) relative to the number of allele copies sequenced per locus (56 ± 4), FST was measured in Arlequin 2.0 (Excoffier and Schneider 2005) not using allele frequencies alone but with a genetic distance that was estimated for each locus, based on DNA substitution models we obtained from MrModeltest. We also estimated another statistic (SNN) that was proposed as a more powerful method for handling loci with high haplotype diversity (Hudson 2000). SNN, referred to as the nearest-neighbor statistic, is a measure of how often the “nearest neighbors” of sequences are from the same locality or geographic area. If populations are strongly structured, SNN is expected to be close to one; when three subpopulations are unstructured (as in our case), SNN is expected to be approximately one-third. The significance of these two statistics was evaluated by 1000 simulations in which the locality of samples was assigned randomly, but in the same proportions as our original dataset. In addition, we used the reconstructed gene trees to evaluate the degree of genetic differentiation. After the geographic locality of samples was assigned to the tips of trees and then the minimum number of character state changes (known as Slatkin's s) was counted by parsimony, treating each locality as a multi-state unordered character (Slatkin and Maddison 1989). To check the significance of our observed value of s, we calculated the statistic for 1000 randomly generated gene trees with the same proportions among sampled localities and compared the resulting distribution with our data. This test was performed to detect genetic differentiation within red-backed fairy wren as well as to assess how frequently genes are being shared between the red-backed and white-winged fairy wrens. To further assess genetic structure within the red-backed fairy wren, we employed the genetic clustering program STRUCTURE (Pritchard et al. 2000), which can detect genetic groups without a priori geographic information by assigning individuals to populations within which there is approximate Hardy–Weinberg equilibrium.
To infer historical demography of the red-backed fairy wren, we used the computer program IM, which assumes an “isolation with migration” model (Hey and Nielsen 2004). We employed this program for two different divergence events (TE vs. CY and CY vs. EF), given that the TE population is not geographically adjacent to the EF population. For each divergence event, six population parameters scaled by mutation rate (μ, geometric mean of mutation rate per year per locus) were estimated: the neutral mutation parameter for the ancestral population (θA, θ= 4Nμ) and for the two daughter populations (θ1, θ2); the divergence time between the two descendant populations (t); and forward and backward migration rates between daughter populations (m12, m21). The major assumption underlying this program is that genetic markers must be independent of selection and recombination, and that mating within ancestral and descendant populations is random. Because recombination and high mutation rate (especially multiple point-mutations at one site) can cause similar patterns in DNA sequence, we decided to apply the infinite site model to our nuclear markers so that we can exclude the effect of recombination, rather than the HKY model in which sequences that result from recombination would be interpreted as result of high mutation rate. To see if our markers satisfy the first of these assumptions, we calculated Tajima's D for neutrality and performed the four-gamete test to detect evidence of recombination, using DNAsp 4.0 (Rozas and Sanchez 2003). For loci that did not pass the four-gamete test, the locus was divided into a number of sequence segments such that any evidence of recombination was not detected. Then, we selected the longest sequence segment from each of these sets so as to maximize the information from each locus. The average length of each locus was 170 bp (151 bp) for anonymous nuclear markers (introns), which contained 13 (8) segregating sites on average. After a few short preliminary runs (five million steps) were made to optimize prior boundaries for the six parameters, the final simulation for each divergence was carried out with a different random seed, four different chains, 20 million steps, and burn-in of two million steps.
To understand how the number of loci affects the uncertainty of demographic parameters in the IM program, we plotted peak posterior distribution estimates as well as the lower and upper bounds of estimated 95% highest posterior densities (HPD) for each demographic parameter versus the number of loci. For a given number of loci, we ran the IM program with five randomly selected subsets of loci with five million steps and a burn-in of 500,000. Given that the six demographic parameters were not mutually independent, we discarded a particular run whenever any of the six parameters failed to converge to stationary distribution. Because none of the runs with fewer than four loci satisfied this condition, we included only subsets with more than five loci. All valid estimates were then averaged over the number of valid runs. To avoid biases due to sampling artifacts, we repeated this process five times and calculated the mean (over 5) for each number of loci.
CHARACTERISTICS OF ANONYMOUS LOCI
Among 60 sequenced clones from our genomic library, seven sequences had significant matches (e value < 10 × 10−10) with deposited sequences in GenBank (M. splendens 12S rRNAs (2), Malurus cyaneus cytochrome b (1), Cyanopica cyanus control region of mtDNA (1), Gallus gallus Stearoyl-CoA Desaturase (SCD) mRNA (1), G. gallus Serum- & Glucocorticoid-induced kinase mRNA (1), and Avian Andgrogen Receptor genomic region (1)). However, no sequences were predicted to contain the ORFs. Discarding four more loci that did not amplify well, we randomly selected 30 loci but we had to exclude an additional locus that appeared to be paralogous (see details in Methods). Thus, we worked with 29 anonymous markers, amounting to 12,831 bp per individual. When we include the six introns (total 2173 bp) and a part of ND2 region (467 bp), in total we sequenced 15,471 bp from each individual (Table 1). The anonymous loci contained 48 SNPs on average (standard deviation = 21) and exhibited substantial diversity, yielding an average nucleotide diversity (π) of 0.016 ± 0.010 (Table 1) which is 10 times the typical level found, for example, in noncoding regions of the human genome (Zhao et al. 2000). This nucleotide diversity as well as its variance was significantly higher than introns (π= 0.009 ± 0.002; t33=−3.384, P= 0.0018, Fig. 2, Table 1). Except for two anonymous loci (Mame-AL10 and Mame-AL14), indels were found in all loci (mean frequency per locus across all anonymous loci = 4 ± 2; range: 0–11; Fig. 3A) with an average size of 6 bp (standard deviation = 10 bp, Fig. 3B). In addition, we found a positive correlation between the number of indels per locus and the number of SNPs (Pearson's r= 0.66, N= 29, P < 0.01; Fig. 3C).
Table 1. Descriptive statistics for the 36 loci used in our study. Sequence length (L) includes alignment gaps, and the number of segregating sites (S) is shown with its percentage given the length of the locus. Statistical significance of Tajima's D test (D) is indicated with an asterisk at the *P<0.05 level. The lengths of indels are listed as they occurred in each locus.
1, 2, 1, 1
48, 4, 1, 4
1, 1, 1, 1, 4
3, 1, 7, 1
28, 1, 3, 5, 1, 5, 2, 1
1, 5, 3, 1, 7
7, 1, 4, 12, 1
2, 4, 1, 1
1, 19, 1, 7
35, 2, 2
5, 4, 5, 1, 10, 7, 3, 1
10, 12, 3, 22, 1, 1
1, 2, 5
1, 16, 9, 1
3, 1, 4, 1
12, 1, 4, 21
3, 30, 2, 35, 3
1, 2, 1, 3, 1, 1
3, 72, 1, 4
2, 21, 1, 1
1, 1, 1, 1, 2, 2, 16, 3, 6, 2, 10
1, 2, 1, 1
9, 43, 1, 12, 9, 5, 1
RECONSTRUCTION OF GENE GENEALOGIES AND POPULATION STRUCTURE
Once we obtained two successfully converging runs from MrBayes, gene trees were visualized in PAUP4.0 (Swofford 2003). Figure 4 shows some examples of gene trees we reconstructed using anonymous loci markers and the mitochondrial ND2 marker. None of the gene trees, including that for mtDNA, showed reciprocal monophyly of the three red-backed fairy wren populations, although the ND2 gene tree did possess the lowest value of Slatkin's s (s = 6; Fig. 4). For all but three loci, FST among the three populations was substantial, ranging from 0.028 to 0.499 for nuclear loci (mean = 0.119), and 0.553 for mtDNA (Fig. 5A). SNN (mean across loci = 0.571 ± 0.118, Fig. 5B) also supported this result because a value above 0.333 is predicted only if there is detectable genetic structure among three populations. Figure 5C presents the distributions of Slatkin's s obtained from 36 reconstructed gene trees and from 1000 simulated random gene trees. The distribution of observed values was significantly smaller than that of the simulations (t127=−16.0149, P < 0.001), suggesting that our observed gene trees are more structured than random gene trees. When we conducted this test between red-backed and white-winged fairy wrens, we found much greater differentiation than within species (t126=−22.0581, P < 0.0001, Fig. 5D). Strong clustering by geographic region within red-backed fairy wrens was detected by STRUCTURE, which suggested that two genetically distinguishable populations separated by the Carpentarian Barrier was the best explanation of the data (Fig. 5E).
Figure 6 and Table 2 present the results for two IM runs (TE vs. CY and CY vs. EF). To convert model parameter estimates from the IM program into demographic quantities (population size, migration rate, and divergence time), we started with a previously suggested substitution rate of 1.5 × 10−9 substitutions/site/year for avian introns (Ellegren 2007). However, because the genetic diversity of anonymous loci in our study was significantly higher than that for introns, we applied an adjusted substitution rate of 2.56 × 10−9 substitutions/site/year to anonymous loci, given the amount of difference in genetic diversity (π) between anonymous loci and introns (1.5 × (0.015903/0.009292) ≈ 2.56). From both runs, the effective population size (Ne) of the CY population was estimated to be 260,000, assuming that the generation time of red-backed fairy wrens is 2 years (Rowley and Russell 1997). The Ne of the EF population (≈275,000) was similar to that of the CY population, whereas the Ne of the TE population (≈420,000) was estimated to be much larger than the other two populations. The two estimates of divergence time were similar to each other (Table 2), occurring between 271,000 and 281,000 years ago. Even though the IM estimated nonzero migration from CY to the other two populations (0.32, 0.16 migrants per generation to EF and to TE, respectively), the estimates from the TE and EF populations into the CY population were higher (0.81 and 2.06 migrants per generation from TE and EF populations, respectively).
Table 2. Results of IM analysis of two two-population subsets of the red-backed fairy wren data (TE vs. CY and CY vs. EF). All estimates shown here are scaled by mutation rate (μ): θ1 and θ2 represent the population size for the first and the second population (e.g., θ1 and θ2 in the first row are the population size of the TE and the CY population, respectively); θA is the ancestral population size of two populations included in the run; t indicates the divergence time between two populations; m1 and m2 stand for the migration rate at which gene comes into the first population and the second population, respectively. HPD90Hi and HPD90Lo represent the upper and lower bound of the estimated 90% highest posterior density interval, respectively, and HiPts are our parameter estimates.
TE vs. CY
CY vs. EF
SHIFTS IN RESOLUTION WITH NUMBERS OF LOCI
We wanted to determine how the uncertainty of our estimates of parameters in the population history changed with the number of loci sampled. To minimize any bias that could be caused by violating one of the IM program assumptions, namely that there should not be any other population exchanging genes with the sampled populations, we merged the CY population and EF populations, and then ran the IM program with the TE population. The estimates of two divergence events were similar, and so we proceeded to run IM repeatedly with different numbers of loci randomly sampled from our dataset. The results indicate that the variance for all six parameters decreases rapidly from five to 15 loci, and the rate of decrease is reduced thereafter (Fig. 7). Intriguingly, most of the parameter estimates remained stable regardless of the number of loci sampled, and the variance usually stabilized after 10–15 loci. This was the case for the variance of all parameters except those for ancestral population size and divergence time, which changed little regardless of the number of loci analyzed (Fig. 7C, D).
We used a multilocus approach to study phylogeography of the red-backed fairy wren, a denizen of the widespread savannah forests of northern and eastern Australia. This study not only supports the idea that the Carpentarian Barrier had played a primary role in differentiating the avian fauna of the northern Australia, but also provides an example and a time frame for the Pleistocene “population expansion hypothesis,” which suggests that populations distributed across the Carpentarian region were formerly sequestered in refugia around the periphery of northern and eastern Australia. Our study, conducted within a single taxonomic species, found a relatively recent divergence across the Carpentarian barrier (270 kyr), less than half the time estimated for Australian grassfinches (Poephila), which are phenotypically clearly distinct species (Jennings and Edwards 2005). Furthermore, the present study demonstrates the applicability of nuclear anonymous loci markers in phylogeography. Although there have been many studies on speciation patterns in vertebrates of northern Australia (Edwards 1993; Worthington-Wilmer et al. 1999; Cardinal and Christidis 2000; Ford and Blair 2005; Toon et al. 2007), this is the first (aside from Jennings and Edwards 2005) to examine population structure by applying multilocus sequence-based markers, allowing us to estimate demographic parameters with confidence (Brito and Edwards 2008). In addition, the large number of novel SNPs we discovered (n= 1395) will be useful for understanding fairy wren behavior and mating systems, particularly using high-throughput SNP genotyping approaches.
CHARACTERISTICS OF ANONYMOUS LOCI
One of the reasons that mitochondrial DNA has thus far been widely used for phylogeographic studies is that it is expected to show higher genetic divergence due to its small effective population size, which is only a quarter that of nuclear DNA when the sex ratio is even. Therefore, phylogeographers expect to get more information per length of sequence from mtDNA than from nuclear DNA. However, considering that genetic diversity is usually measured as the product of mutation rate (μ) multiplied by effective population size (Ne), substantial genetic diversity should be detectable in nuclear DNA as well if the mutation rate is high enough. This was the case in our study. Even though we cannot directly compare the diversity at anonymous loci to that in the mitochondrial ND2 gene, a coding region, the number of SNPs we detected in anonymous loci (mean = 48 ± 21 per locus, 10.6%) compared favorably to that found in mtDNA (10 SNPs in 467bp, 8%). Our rate of SNP encounter in anonymous loci was less than that found in a study of control region sequences in another passerine bird (96 of 400 sites, or 24%) distributed almost identically to the red-backed wren, the gray-crowned babbler (Pomatostomus temporalis; Edwards 1993). However, as judged by mtDNA, the babblers are somewhat more differentiated, both within and between subspecies, than the red-backed wrens appear to be, so this comparison is also not straightforward. Regardless, our results suggest that anonymous loci in birds will exhibit abundant variation for phylogeographic analysis. The level of genetic diversity also appears to be almost six times greater than that found in a trio of grassfinches (t29=−7.3071, P < 0.001; Jennings and Edwards 2005) and collared flycatcher (Backström et al. 2008), although each study either had limited sampling of individuals (grassfinch) or included markers representing known genes (flycatcher). This high genetic diversity was also reflected on the pattern of indel frequency in anonymous loci (Johnson 2004). When the number of SNPs (as a proxy of genetic diversity) was plotted across the number of indels, we found a positive correlation (Pearson's r= 0.66, P < 0.01; Fig. 3C), suggesting that both SNPs and indels may have a common underlying mutational basis, perhaps varying among loci or regionally in the fairy wren genome.
In addition, the genetic diversity detected at our anonymous loci was much higher than that of introns (Fig. 2). This is interesting given that the effective population size is the same across these markers. Because we had ruled out any possible paralogous loci (see Methods), we believe that the high genetic diversity in our study is the result of a high substitution rate in anonymous loci. Introns in vertebrates are known to possess abundant regulatory sequences that presumably are under at least mild stabilizing selection (Collins 1988; Castillo-Davis et al. 2002; Hong et al. 2006). By contrast, anonymous regions of the genome may be genuinely free from stabilizing selection, although of course some noncoding regions of the genome show extensive conservation over evolutionary time (Katzman et al. 2007). This disparity may arise from the difference in substitution rate between the two classes of markers, suggesting that substitution rates in anonymous loci are higher in this species.
When all genetic markers were analyzed across species, we found that red-backed and white-winged fairy wrens are still sharing their nuclear genes, but not their mtDNA (ND2 gene in our study; Fig. 4). Given that the white-winged fairy wren shows population structure between east and west (Driskell et al. 2002), this incomplete lineage sorting for the nuclear genes may be due to our biased sampling on eastern population of the white-winged fairy wren, which is genetically more closely related to the red-backed fairy wren. However, they are phenotypically very divergent and their distributions do not overlap (Rowley and Russell 1997), so this lack of reciprocal monophyly in nuclear genes is noteworthy. This pattern will be common in bird studies when most species have arisen rapidly or had ancestors with large effective population sizes (Edwards et al. 2005; Jennings and Edwards 2005). In addition to the fact that mtDNA may in some cases be driven by natural selection (Bazin et al. 2006) and represents only a portion of a species history due to its maternal inheritance, our multiple nuclear gene study emphasizes the necessity of incorporating nuclear genes in understanding evolutionary history of species.
Given that haplotype diversity was very high, we estimated two commonly used statistics (FST and SNN) with genetic distance as well as allele frequency. Both statistics suggested that there is significant differentiation among populations, a pattern that is also observed in the analysis using Slatkin's s. The fact that this analysis is dependent on gene trees, which may be not fully resolved, can cause problems if Slatkin's s is used to estimate gene flow. However, we have used this statistic primarily as a broad descriptor of the presence of structure, rather as a means to estimate rates of gene flow. These statistics suggest that there is significant population structure, but they do not specify which populations are closer than one another, which is one of the main questions in our study. By grouping the CY population with the EF population, the clustering algorithm implemented in STRUCTURE shows that there is much greater differentiation across the Carpentarian Barrier than across the Burdekin Barrier (barriers D, Fig. 1). Based on morphological data, the subspecies boundary had been set near the Burdekin Barrier (Schodde and Mason 1999). However, our study shows that genetic inference favors the “Carpenatarian Barrier first” hypothesis over the morphological taxonomy. Considering that there is a substantial plumage color difference between the two subspecies (Schodde and Mason 1999), the plumage-based taxonomy might not represent the true demographic history of this species. This discrepancy may result from subjective assessment of the plumage-color, and may represent another case in birds of conflict between molecular markers and traditional taxonomy (Zink 2007). This discrepancy might be resolved with more quantitative and detailed assessment (e.g., spectrophotometry), or resolved by taking other morphological characters into account.
DIVERGENCE TIMES IN THE RED-BACKED FAIRY WREN COMPLEX
The results of the IM program suggest that the divergence time between the CY population and the TE population is approximately 270,000 years ago. Our estimate is more recent than the estimated divergence time (0.61∼0.71 mya) for two grass finches (Jennings and Edwards 2005), also using anonymous markers. This discrepancy could be due to the method we used to fit our sequence data to the IM model. Because the IM program assumes no recombination within loci, we took from each locus the longest block of sequence identified as nonrecombining by the four-gamete test. Even though this method can avoid violating the assumption, we expect such loci to have shorter genealogies, on average, in terms of recovered mutations (Hey and Nielsen 2004; Becquet and Przeworski 2007). Consequently, our divergence time of 0.27 mya may be an underestimate. Another possibility is that on average the anonymous loci we have chosen for this study evolve slower than those in the grass finch study, or that we have misestimated divergence times relative to the grass finch study by using an incorrect generation time calibration. One deficiency of the anonymous locus approach is that frequently no locus will overlap with those used in studies of other taxa, as is the case in our comparison with Jennings and Edwards (2005) study. Thus, one could argue that such studies cannot be compared because they do not use homologous loci. We have argued elsewhere that such a conclusion is probably too conservative, given that such studies at least focus on a homologous subgenome (Brumfield et al. 2003). In addition, the intron sequences that we used should be amplifiable in other lineages of birds such as the grassfinches, and could be used to estimate relative rates of genome-wide evolution between the two lineages (Brumfield et al. 2003). In general we are comfortable comparing our study directly to the grassfinch study, with the caveats that we have outlined above. Using a standard set of introns across avian phylogeographic studies, such as those provided by Backström et al. (2008) may also be a useful approach for comparing different studies. It is reasonable that our estimate of divergence time across the Carpentarian Barrier is less than that for the grassfinches, primarily because we are studying taxa that are morphologically conspecific and presumably much more recently diverged than the grassfinches, which are conventionally recognized as clearly different species with highly distinct plumage color, song, and other traits (Cracraft 1986; Boles 1988; Schodde and Mason 1999).
We can gain additional perspective on the divergence times in the present study by examining mtDNA divergence. Using the simple pairwise method for estimating net nucleotide diversity (Nei and Li 1979; Edwards and Beerli 2000), we estimated the divergence time for the ND2 gene between the TE and CY + EF regions. Net nucleotide diversity (d) was calculated by the equation d=dxy− 0.5 (dx+dy), where dxy is the average divergence between the TE and CY + EF regions (0.018), and dx and dy are the average pairwise divergence within the TE and CY + EF regions, 0.006 and 0.011, respectively. The net nucleotide diversity between the TE and the CY + EF regions in ND2 region was 0.9%, corresponding to approximately 450,000 years for the divergence time across the Carpentarian Barrier given a standard mtDNA rate calibration of 2% divergence per million years (Lovette 2004; Weir and Schluter 2008). This divergence time is about twice as old as that from anonymous loci (about 260,000 years when we ran the IM program using only anonymous loci). This discrepancy can be due to error in calibration rate for mtDNA or anonymous loci, or both. Also, given that the mtDNA rate calibration is often used for groups that show reciprocal monophyly, our estimate of divergence time may be overestimated.
Even though our study supports the idea of the Carpentarian Barrier as the primary geographical barrier in the northern part of Australia, the cause of the morphological divergence recognized by taxonomists across the Atherton Plateau is still unclear. The uplift of the plateau is presumed to have occurred prior to the opening of the Coral Sea in the Late Cretaceous (Ollier 1982), but this event is clearly unrelated to divergences in our study. The Atherton plateau may represent a barrier that has experienced mulitple vicariant events in the same place over many different times (Leaché et al. 2007). One explanation can be found in the long-term vegetation history in this area (Hopkins et al. 1990; Kershaw 1994). Based on charcoal layers found in volcanic craters located on Atherton plateau, these studies claimed that there was a series of extensive fires across this plateau around the late Pleistocene (150,000 ∼ 120,000 years ago). Given the evidence that red-backed fairy wrens have a strong tendency to avoid burnt areas (Rowley and Russell 1997), the environmental change caused by these fires could have played a role in structuring populations. An alternative explanation is that the subspecies separation actually occurred at the Burdekin Barrier, not on the Atherton Plateau. Many species exhibit high divergence across the Burdekin Barrier (Joseph et al. 1993; James and Moritz 2001; Dolman and Moritz 2006), suggesting that it is a major barrier for rainforest species in the northeastern part of Australia. Even though the origin of the Burdekin Barrier is still unclear, it could have dampened vagility for this species and caused subspecies divergence (James and Moritz 2001). Given that red-backed fairy wrens mainly inhabit open forest/savannah rather than rainforest (the predominant habitat on the Atherton Tableland), this idea also provides a plausible scenario for the population history of this species. Our estimated rates of gene flow suggest that there has been a significant number of migrants among the three regions, especially into the CY region, and mainly from the EF region. Taking all this information into account, we suggest that the fluctuation of Pleistocene sea levels in the Gulf of Carpentaria and environmental change in the Atherton Plateau or Burdekin Barrier were the primary factors initiating the differentiation of red-backed fairy wrens. Later on, as the plateau was revegetated, birds in the EF region could have spread back north to Cape York, whereas the Carpentarian barrier, whatever its source is, remained a potent barrier to gene flow maintaining differentiation between the CY and the TE populations.
HOW MANY LOCI DO WE NEED?
One way to reduce the variance in coalescent estimates of population parameters is to increase the number of loci (Plunzhnikov and Donnelly 1996). The question then is, how many loci do we need to get proper estimation with sufficiently minimized variance? In our analysis of subsamples of our loci, we failed to obtain stable parameter estimates when we used fewer than five loci, presumably due to lack of information in this small subset for our particular dataset (Fig. 2). Despite the large amount of variation we found, particularly in anonymous loci, presumably the recent divergence times and presence of gene flow in our system require a large number of variable sites that are not quite provided by five loci. However, the variances for most of parameters rapidly decrease from five to 15 loci, and slowly stabilize thereafter. A similar pattern was also observed in the grassfinch study, which used different multilocus statistical methods (e.g., MCMCcoal; Rannala and Yang 2003) to estimate parameters among species (Jennings and Edwards 2005). Given that some parameters (such as ancestral population size and divergence time) may require more data to estimate than current population sizes do (Yang 1997), it is hard to predict how many loci will be needed for a specific study. The number of loci required likely will depend not only on population history, but also on how frequently recombination occurs within loci. It has been suggested that recombination is a difficult variable to include in genealogy-based, likelihood framework for historical model-fitting (Kuhner et al. 2000; Nielsen 2000). But some recent progress has been made in this area (Becquet and Przeworski 2007). New pyrosequencing approaches will likely make it possible to sample even much larger numbers of loci easily from nonmodel species, and will dramatically increase the potential of statistical phylogeography (Knowles 2004).
Associate Editor: M. Webster
We would like to thank L. Joseph and R. Palmer of the Australian National Wildlife Collection, and S. Birks of the Burke Museum, University of Washington, for providing tissue samples. M. Lagator enthusiastically helped with the sequencing of the ND2 region, B. Jennings provided help with developing anonymous markers, and C. Chapus, C. Balakrishnan, C. Organ, D. Janes, L. Liu, and N. Rotzel provided assistance with laboratory work and data analysis. We also thank M. Webster, C. Moritz, and two anonymous reviewers for constructive comments. JYL was supported by funds from Harvard University and the laboratory research was funded by NSF grant DEB – 0108249 (0500862) to SVE and Peter Beerli.