SEARCH

SEARCH BY CITATION

Keywords:

  • Biodiversity hotspots;
  • Cape Floristic Region;
  • diversification;
  • gene flow;
  • next generation sequencing;
  • phylogeography;
  • RAD ;
  • radiation;
  • speciation;
  • species cohesion

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Drivers of species persistence: The Cape of Southern Africa as an example
  5. Phylogeographical studies in The Cape: few and far between
  6. Stretching the limits of phylogeography: the power and promise of next-generation sequencing
  7. Putting NGS to work in biogeographical studies of species-rich environments
  8. De Novo approaches: NGS without the safety net of a completely sequenced genome
  9. Removing the net: towards De Novo SNP calling in multispecies data sets
  10. Conclusions and outlook
  11. Acknowledgements
  12. References
  13. Biosketch
  14. Supporting Information

The drivers of species diversification and persistence are of great interest to current biogeography, especially in those global biodiversity ‘hotspots’ harbouring most of Earth's animal and plant life. Classical multispecies biogeographical work has yielded fascinating insights into broad-scale patterns of diversification, and DNA-based intraspecific phylogeographical studies have started to complement this picture at much finer temporal and spatial scales. The advent of novel next-generation sequencing (NGS) technologies provides the opportunity to greatly scale up the numbers of individuals, populations and species sampled, potentially merging intraspecific and interspecific approaches to biogeographical inference. Here, we outline these prospects and issues by using the example of an undisputed hotspot, the Cape of southern Africa. We outline the current state of knowledge on the biogeography of species diversification within the Cape, review the literature for phylogeographical evidence of its likely drivers and mechanisms, and suggest possible ways forward based on NGS approaches. We demonstrate the potential of these methods and current bioinformatic issues with the help of restriction-site-associated DNA (RAD) sequencing data for three highly divergent species of the Restionaceae, an important plant radiation in the Cape. A thorough understanding of the mechanisms that facilitate species diversification and persistence in spatially structured, species-rich environments will require the adoption of novel genomic and bioinformatic tools in biogeographical studies.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Drivers of species persistence: The Cape of Southern Africa as an example
  5. Phylogeographical studies in The Cape: few and far between
  6. Stretching the limits of phylogeography: the power and promise of next-generation sequencing
  7. Putting NGS to work in biogeographical studies of species-rich environments
  8. De Novo approaches: NGS without the safety net of a completely sequenced genome
  9. Removing the net: towards De Novo SNP calling in multispecies data sets
  10. Conclusions and outlook
  11. Acknowledgements
  12. References
  13. Biosketch
  14. Supporting Information

Understanding the drivers and mechanisms of species diversification and persistence is of central interest to biogeography, evolutionary biology and conservation genetics (Frankham et al., 2004; Futuyma, 2009; Höglund, 2009; Ladle & Whittaker, 2011). The need to understand these issues is particularly pressing in the world's biodiversity hotspots, i.e. those regions of the planet where both the challenges and payoffs of conservation measures are expected to be greatest (Myers et al., 2000; Mittermeier et al., 2005). The origin and maintenance of biological diversity in species-rich environments have been addressed using the concepts of historical biogeography, phylogenetics and comparative biology (Wiens & Donoghue, 2004; Emerson & Gillespie, 2008; Jablonski, 2008; Linder, 2008; Antonelli et al., 2009; Cavender-Bares et al., 2009; Salamin et al., 2010). These approaches have greatly advanced our understanding of key issues such as: what was the speed of a particular radiation, or of entire suites of radiations making up a biodiversity hotspot? Has diversification involved explosive bursts of speciation (= adaptive radiation), or has it proceeded gradually? Can environmental aspects such as area, climate or topography explain spatial and temporal patterns of diversity?

While large-scale comparative approaches have been remarkably successful in addressing these questions on the origin and accumulation of biodiversity (e.g. Wiens & Donoghue, 2004; Linder, 2008; Cavender-Bares et al., 2009; Verboom et al., 2009; Salamin et al., 2010), within-species phylogeographical and genetic data are required to address the mechanisms responsible for species diversification and persistence in strongly structured, species-rich environments, and to predict the likely future responses to environmental (e.g. climate) change. Mechanisms mediating species persistence may comprise intrinsic barriers to gene exchange maintaining the integrity of populations, or ecological mechanisms affecting entire genomes via population demography and specific genome regions via natural selection (Conner & Hartl, 2004; Stinchcombe & Hoekstra, 2008). Unfortunately, classical multispecies biogeographical studies tend to treat species as homogeneous units and employ just one (or at best a few) individual(s) per species (Salamin et al., 2010), which precludes an understanding of genetic variation at exactly the level that matters most to the evolutionary process – the pool of standing variation available to individual-level, natural selection and drift (Futuyma, 2009). Within-species phylogeographical studies, on the other hand, often employed hundreds or even thousands of individuals but have thus far been limited to only one or a few related species per study. This lack of overlap and synthesis between multispecies biogeographical studies and intraspecific phylogeographical work makes it virtually impossible to judge the commonalities and mismatches between processes operating at macroevolutionary versus microevolutionary time-scales (Jablonski, 2008).

Molecular and analytical tools exist to track phylogeographical lineages within species or groups of closely related taxa (Arbogast & Kenagy, 2001; Excoffier, 2004; Avise, 2009; and see e.g. Dawson, 2001; Cannon & Manos, 2003; Costa, 2003; Swart et al., 2009; Pinheiro et al., 2011, for recent examples published in this journal). Nevertheless, these studies are thinly and unevenly spread across the globe. Southern Hemisphere biomes, for example, find little representation in phylogeographical studies (Beheregaray, 2008). Also, most phylogeographical studies to date are based on ‘neutral’ genetic markers, which do not permit tracking of the adaptive genetic variation that allows populations to respond directly to environmental change (for discussion of this issue see e.g. Davis & Shaw, 2001; Stinchcombe & Hoekstra, 2008; de Carvalho et al., 2010; Hohenlohe et al., 2010). The striking lack of intraspecific genetic data for geographical regions of great scientific and conservation interest is most readily exemplified by an undisputed biodiversity hotspot, the Cape of southern Africa.

Drivers of species persistence: The Cape of Southern Africa as an example

  1. Top of page
  2. Abstract
  3. Introduction
  4. Drivers of species persistence: The Cape of Southern Africa as an example
  5. Phylogeographical studies in The Cape: few and far between
  6. Stretching the limits of phylogeography: the power and promise of next-generation sequencing
  7. Putting NGS to work in biogeographical studies of species-rich environments
  8. De Novo approaches: NGS without the safety net of a completely sequenced genome
  9. Removing the net: towards De Novo SNP calling in multispecies data sets
  10. Conclusions and outlook
  11. Acknowledgements
  12. References
  13. Biosketch
  14. Supporting Information

The Cape Floristic Region (CFR) of southern Africa is a global biodiversity hotspot (Myers et al., 2000; Linder, 2005; Mittermeier et al., 2005) covering < 90,000 km2 and harbouring more than 9000 vascular plant species, roughly 70% of which are endemic to this region (Goldblatt & Manning, 2002; note the broader delimitation as Greater CFR in recent studies: Linder et al., 2010). The hypotheses regarding the origin and maintenance of this diversity revolve around climatic stability and change. Climatic stability over millions of years is now thought to be the chief factor responsible for the accumulation of species richness over geological times. This becomes apparent from molecular phylogenetic studies indicating an early onset of important plant radiations in the Cape (reaching back into the Oligocene, > 25 Ma; Verboom et al., 2009; Schnitzler et al., 2011) and relatively constant and moderate (by global standards) rates of net diversification ever since (Linder, 2008; Valente et al., 2010; Schnitzler et al., 2011).

In contrast, the key mechanisms currently generating and maintaining species richness in the Cape appear to be tightly coupled to climatic oscillations. Range modelling of Cape biomes (mediterranean fynbos shrubland and succulent karoo semi-desert) and individual species based on fossil pollen indicates rapid range dynamics within the last few hundreds of thousands of years (i.e. during the Pleistocene; Midgley et al., 2003, 2005). Given the mountainous topography and east–west climatic differentiation of the area, this suggests the presence of ‘species pumps’, with topographic diversity continuously shuffling populations along elevational and longitudinal (moisture) gradients and across a varied mosaic of soil types and microclimates (Midgley et al., 2003, 2005; Schnitzler et al., 2011). Thus, the key to understanding species richness in the Cape biodiversity hotspot appears to lie in understanding how so many species have evolved and persisted as cohesive units in so little space (Valente et al., 2010), despite the frequent opportunity for contact and genetic interactions that tend to accompany climatic oscillations (Hewitt, 2000). This calls for DNA-based phylogeographical studies that explicitly address the population-level mechanisms of divergence and persistence, i.e. the actual ‘engine’ of the evolutionary process. Strikingly few phylogeographical studies are currently available, considering the intensity of the debate surrounding the determinants of species richness in the Cape. We shall use the Cape as an example to sketch the current state-of-the-art, existing limitations, and potential future solutions for DNA-based biogeographical work in spatially structured, species-rich environments.

Phylogeographical studies in The Cape: few and far between

  1. Top of page
  2. Abstract
  3. Introduction
  4. Drivers of species persistence: The Cape of Southern Africa as an example
  5. Phylogeographical studies in The Cape: few and far between
  6. Stretching the limits of phylogeography: the power and promise of next-generation sequencing
  7. Putting NGS to work in biogeographical studies of species-rich environments
  8. De Novo approaches: NGS without the safety net of a completely sequenced genome
  9. Removing the net: towards De Novo SNP calling in multispecies data sets
  10. Conclusions and outlook
  11. Acknowledgements
  12. References
  13. Biosketch
  14. Supporting Information

An exhaustive literature search for intraspecific phylogeographical studies in the South African Cape using Web of Science (search terms: various combinations of ‘Cape’ and ‘Africa’ with ‘phylogeography’, ‘DNA’, ‘population’, and ‘marker’) double-checked by Google Scholar reveals 17 studies, including 12 on animals and only 5 on plants (Table 1). We mined these 17 studies for answers to two basic questions: (1) Is there phylogeographical structure in the studied taxa? (2) Are there indications of recent range dynamics (e.g. demographic/range expansions) in the phylogeographical data? The analytical tools used by these studies allow separate tests for structure and range dynamics, thus these two features were reported separately (Table 1). We then cross-tabulated the results with four variables that may help to interpret the answers to these questions: taxonomic group, mating system, generation time and genome sampled (Table 1). These explanatory variables were chosen because of their known impact on genetic structure in animals and plants (Hamrick & Godt, 1996; Morjan & Rieseberg, 2004).

Table 1. Literature survey of within-species phylogeographical and population genetic work carried out for plants and animals in an exemplary world biodiversity hotspot, the Cape of southern Africa, including reference, target species, taxonomic group, mating system (outcrossing/mixed/unknown), generation time [plants, annual (ann) vs. perennial (per); animals, iteroparous (itero) vs. semelparous (semel)], genomic compartment sampled (cp, chloroplast; mt, mitochondrial; nuc, nuclear), and indications of phylogeographical structure and recent range dynamics (demographic/range expansions) as indicated by the molecular genetic data
ReferencesSpeciesGroupMating systemGeneration timeGenomeStructureaDynamicsb
  1. a

    The presence (y, yes) or absence (n, no) of phylogeographical structure in the reviewed studies was inferred from phylogenetic trees or networks including commonly used support statistics (Posada & Crandall, 2001) or from analysis of molecular variance (AMOVA; Excoffier & Lischer, 2010) or related spatially explicit types of F-statistics.

  2. b

    The presence (y, yes) or absence (n, no) of demographic/range expansions in the reviewed studies was inferred from mismatch analysis (Rogers & Harpending, 1992), Bayesian skyline plots of effective population size (Ne) through time (Heled & Drummond, 2008), properties of allele frequency spectra, or spatial trends in genetic diversity.

  3. c

    Parentheses indicate results regarded as tentative.

  4. d

    The geographical range covered by this study on Cape buffalo greatly exceeds the Greater Cape Floristic Region.

Tolley et al. (2010) Strongylopus grayii AmphibiaOutcrossingper/iteromtyn
Bergh et al. (2007) Elytropappus rhinocerotis AngiospermaOutcrossingper/iteronucn(y)c
Prunier & Holsinger (2010)Protea spp.AngiospermaMixedper/iteronucyy
Rymer et al. (2010)Gladiolus spp.AngiospermaMixedper/iterocp/nucnn
Segarra-Moragues & Ojeda (2010) Erica coccinea AngiospermaUnknownper/iteronucyn
Ramdhani et al. (2010)Schotia spp.AngiospermaUnknownper/iterocp/nuc(n)cy
McDonald & Daniels (2012) Peripatopsis capensis EuonychophoraOutcrossingper/iteromt/nucy(y)c
Price et al. (2007)Platypleura stridula InsectaOutcrossingann/semelmt/nucy(y)c
Downie & Williams (2009) Porthetes hispidus InsectaOutcrossingunknownmt/nucny
Smit et al. (2007)Elephantulus edwardii MammaliaOutcrossingunknownmtyy
Van Hooft et al. (2002)d Syncerus caffer MammaliaOutcrossingper/iteromt/nucyy
Daniels et al. (2007) Chersina angulata SauropsidaOutcrossingper/iteromtyy
Swart et al. (2009) Agama atra SauropsidaOutcrossingper/iteromt/nucyy
Tolley et al. (2006)Bradypodion spp.SauropsidaOutcrossingunknownmtyy
Portik et al. (2011) Trachylepis sulcata SauropsidaOutcrossingper/iteromt/nucyy
Willows-Munro & Matthee (2011) Myosorex varius SoricomorphaOutcrossingper/iteromt/nucyy
Heideman et al. (2011)Scelotes spp.SquamataOutcrossingunknownmt/nucynot tested

Despite the limited number of available studies (arguably a surprising result in its own right), two patterns become readily apparent from the data. First, there are clear indications for both phylogeographical structure and recent range dynamics, and these results hold regardless whether genealogies were sampled from nuclear or from cytoplasmic genomes (Table 1). Second, there is more variation in the results for plants than for animals, despite the small number of studies of the former. The limited amount of phylogeographical information currently available points to open questions and emerging issues requiring the attention of biogeographers and phylogeographers with interests in hotspots, and we shall outline these below.

At first sight, widespread phylogeographical structure is consistent with the idea of species divergence and persistence due to the pronounced topographic and ecological heterogeneity frequently encountered in the Cape (Goldblatt & Manning, 2002; Linder, 2003; Schnitzler et al., 2011). Thus, phylogeographical patterns within species are reminiscent of patterns of diversity seen at the community level: high species turnover between habitats (beta diversity) is known to be a crucial determinant of species richness in the Cape (Simmons & Cowling, 1996; Linder, 2003). More data on species diversity and phylogeographical structure in different organismal groups are needed to test whether both levels of diversity are affected by the same drivers. Note that patchy and mosaic-like habitats are not exclusive to plants, but are also encountered by species of animals living in the Cape (e.g. reptiles; Tolley et al., 2006; Swart et al., 2009; Portik et al., 2011). The frequently found signature of demographic/range expansions, on the other hand (Table 1), is also consistent with the impact of Pleistocene climatic changes on the range dynamics of extant species, as hypothesized by Midgley et al. (2003, 2005). Given the peculiar history and nature of the Cape, what then are the actual drivers of current species divergence and persistence, especially for the extraordinary diversity of plants inhabiting this hotspot?

From the viewpoint of population genetics, the drivers of divergence and persistence may be either neutral or associated with natural selection. In a neutral scenario, mutation and drift during periods of geographical isolation facilitate species divergence and reproductive isolation, whereas neutral gene flow keeps populations of the same species together as cohesive units (Conner & Hartl, 2004; Morjan & Rieseberg, 2004). Alternatively, these processes may be driven by natural selection. Differential adaptation can trigger divergence and contribute to isolating mechanisms that maintain newly arisen, distinct forms in sympatry or parapatry (Wu & Ting, 2004; Gavrilets & Vose, 2005; Smadja & Butlin, 2011), while selective sweeps across habitat mosaics can maintain species as cohesive units despite restrictions in dispersal (Morjan & Rieseberg, 2004; Kane & Rieseberg, 2007). Distinguishing between these different mechanisms (neutral versus selective) matters greatly, because their outcomes differ strongly in terms of the speed of divergence and chance of persistence of divergent forms (Morjan & Rieseberg, 2004; Smadja & Butlin, 2011).

It helps to keep in mind that the mechanisms mediating cohesion (and thus persistence) of species are closely related to those maintaining reproductive isolation between them (Rieseberg & Burke, 2001). This is the case because the opportunity for interspecific gene exchange depends crucially on levels of gene flow within species (Petit & Excoffier, 2009), which provides yet another motivation for merging within- and between-species approaches to biogeographical inference. Importantly, the footprint of selection between conspecific populations will often be limited to genomic regions surrounding very specific sets of functionally important genes or their regulatory elements (Schlötterer, 2003; Kane & Rieseberg, 2007; Stinchcombe & Hoekstra, 2008), whereas differentiated genome regions appear to grow quite quickly in speciating populations (Feder & Nosil, 2010). Thus, phylogeographers interested in the mechanisms of diversification and persistence will need to greatly expand their genomic coverage to discover and delimit the genome regions important in the process, in addition to extending taxonomic coverage. Fortunately, the genomic tools to enable this are now becoming available.

Stretching the limits of phylogeography: the power and promise of next-generation sequencing

  1. Top of page
  2. Abstract
  3. Introduction
  4. Drivers of species persistence: The Cape of Southern Africa as an example
  5. Phylogeographical studies in The Cape: few and far between
  6. Stretching the limits of phylogeography: the power and promise of next-generation sequencing
  7. Putting NGS to work in biogeographical studies of species-rich environments
  8. De Novo approaches: NGS without the safety net of a completely sequenced genome
  9. Removing the net: towards De Novo SNP calling in multispecies data sets
  10. Conclusions and outlook
  11. Acknowledgements
  12. References
  13. Biosketch
  14. Supporting Information

So-called next-generation sequencing (NGS) approaches to the ultra high throughput sequencing of DNA are currently transforming the ways in which phylogeographers are able to track the dynamics of genetic diversity in space and time. The power of these approaches lies in their ability to yield hundreds of millions of short (at present typically 50–200 DNA bases each) sequence reads per run (Metzker, 2010), in contrast to conventional Sanger sequencing that typically yields only a few hundreds of reads. The unprecedented throughput of NGS approaches allows phylogeographers to discover tens to hundreds of thousands of molecular genetic (DNA) markers in the genomes of non-model species and simultaneously type them in wild populations by direct re-sequencing. Such population level re-sequencing may be done at the level of entire genomes (Rubin et al., 2010; Turner et al., 2010) or, more affordably for larger sample sizes, in partial genomic scans revealing ‘only’ a few tens of thousands of genetic markers in each individual animal or plant. Many variations to this theme exist (Parchman et al., 2012), but arguably the most relevant of these to phylogeographers at the present time is restriction-site-associated DNA sequencing (RAD-seq; Baird et al., 2008; Emerson et al., 2010; Hohenlohe et al., 2010).

The nature and work flow of RAD-seq have been described in detail elsewhere (e.g. Baird et al., 2008; Hohenlohe et al., 2010; Amores et al., 2011; Davey & Blaxter, 2010; Etter et al., 2011) and need not be reiterated here. The main issues in applying NGS technology to phylogeographical research are: (1) How can the massive amounts of data from population-level NGS, such as RAD-seq, be used to address the evolutionary drivers of species diversification? (2) How can these approaches be used across multiple species, effectively merging between- and within-species approaches to biogeographical study, particularly in non-model taxa without completely sequenced genomes? We shall use the Cape hotspot example and NGS data for Cape taxa to illustrate these issues.

Putting NGS to work in biogeographical studies of species-rich environments

  1. Top of page
  2. Abstract
  3. Introduction
  4. Drivers of species persistence: The Cape of Southern Africa as an example
  5. Phylogeographical studies in The Cape: few and far between
  6. Stretching the limits of phylogeography: the power and promise of next-generation sequencing
  7. Putting NGS to work in biogeographical studies of species-rich environments
  8. De Novo approaches: NGS without the safety net of a completely sequenced genome
  9. Removing the net: towards De Novo SNP calling in multispecies data sets
  10. Conclusions and outlook
  11. Acknowledgements
  12. References
  13. Biosketch
  14. Supporting Information

We suspect that perhaps the greatest impact of NGS in biogeography and phylogeography will lie in its ability to help identify and pinpoint the evolutionary drivers of species divergence, expansion, and persistence, arguably the ‘holy grail’ of current biogeography and evolutionary genetics (Morjan & Rieseberg, 2004; Gavrilets & Vose, 2005; Klopfstein et al., 2006; Excoffier et al., 2009; Smadja & Butlin, 2011). NGS approaches hold immense potential for this purpose, because the ‘genomic footprints’ of neutral population divergence and demographic changes (e.g. population expansions and contractions) differ from those caused by natural selection affecting very particular sets of functionally important genes and their surrounding DNA (Schlötterer, 2003; Klopfstein et al., 2006; Stinchcombe & Hoekstra, 2008; Excoffier et al., 2009; see above for why the distinction between neutral and selective forces matters). In the case of the Cape hotspot example, particularly worthwhile goals would be: (1) to test whether and how often divergence of populations and species within major radiations is driven by slow neutral processes in geographical isolation versus rapid ecological speciation due to strong divergent selection, and (2) to test whether secondary contact and interspecific gene flow in sympatry or parapatry triggered by range dynamics (Table 1) constrains or facilitates adaptive evolution and speciation (for a rationale see Hewitt, 2000; de Carvalho et al., 2010).

The use of NGS to address these important issues is likely to bring novel analytical opportunities and challenges. Phylogeographical reconstruction has traditionally relied on neutral genetic markers (Avise, 2009), but NGS approaches offer considerable power to uncover regions of the genome that respond to natural (e.g. divergent ecological) selection (Hohenlohe et al., 2010; Rubin et al., 2010; Turner et al., 2010). Of course, non-neutral ‘outlier loci’ may also be removed from the data set to improve inferences about population history (Stinchcombe & Hoekstra, 2008), or the very large number of markers generated may be assumed to approximate a neutral distribution (Emerson et al., 2010). Nevertheless, population geneticists have long suspected that marker loci affected by selection may in fact be efficient tools for estimating gene flow between locally adapted populations (Lenormand et al., 1998; Guichoux et al., 2013), analogous to the estimation of migration and selection along clines (Barton & Hewitt, 1985). Likewise, markers undergoing range-wide selective sweeps are predicted to be highly useful for delimiting related species, because their rapid spread within species strengthens their resistance to introgression from related taxa (Petit & Excoffier, 2009). Thus, we anticipate that NGS approaches will transform the way in which biogeographers make use of DNA genealogies to address questions on the origin and maintenance of biological diversity.

It would be premature to forecast the myriad creative ways biogeographers will find to put NGS approaches to work for tackling these important issues, once provided with the necessary bioinformatic infrastructure and ‘know-how’. One particular challenge faced by biogeographers and phylogeographers at the present time is the need to analyse sets of multiple species, usually in the absence of fully assembled reference genomes. This issue is particularly pronounced in biodiversity hotspots such as the Cape, which is dominated by unusual organismal families (Linder et al., 2010), because these groups of taxa have not been ‘on the radar’ of genomics research in the past.

De Novo approaches: NGS without the safety net of a completely sequenced genome

  1. Top of page
  2. Abstract
  3. Introduction
  4. Drivers of species persistence: The Cape of Southern Africa as an example
  5. Phylogeographical studies in The Cape: few and far between
  6. Stretching the limits of phylogeography: the power and promise of next-generation sequencing
  7. Putting NGS to work in biogeographical studies of species-rich environments
  8. De Novo approaches: NGS without the safety net of a completely sequenced genome
  9. Removing the net: towards De Novo SNP calling in multispecies data sets
  10. Conclusions and outlook
  11. Acknowledgements
  12. References
  13. Biosketch
  14. Supporting Information

Perhaps two of the greatest challenges in current NGS-based biogeography and phylogeography are the application of NGS approaches (e.g. RAD-seq) to taxa without existing reference genomes and their use in comparative studies involving multiple species. A recurrent theme across all current NGS approaches, including RAD-seq, is that these two issues are tightly connected because of the way polymorphisms are called (recognized and identified) in populations (for details of the bioinformatic work flow see Hohenlohe et al., 2010; Amores et al., 2011; Catchen et al., 2011; Etter et al., 2011).

Typically, single nucleotide polymorphisms (SNPs) in genotyping-by-sequencing approaches are identified either after reference mapping of short DNA sequence reads against a reference genome (e.g. Hohenlohe et al., 2010) or, when no reference genome is available, against de novo assembled clusters or ‘stacks’ (Emerson et al., 2010; Catchen et al., 2011) of short DNA sequence reads from many individuals. Either approach requires definition of a sequence mismatch threshold (e.g. 95% sequence identity) during sequence alignment, to make sure that similar DNA fragments recovered from populations are indeed allelic (= homologous variants of the same genetic locus). De novo assembly of sequence clusters is the method of choice in non-model species with no available reference genome, and the robustness of the approach has been thoroughly evaluated by computer simulations and genetic mapping (Amores et al., 2011; Catchen et al., 2011). The possibility to build dense genetic linkage maps of RAD-seq markers in non-model species (e.g. Amores et al., 2011) is an important aspect for biogeographical studies, because linkage mapping yields recombination distances that are highly useful for interpreting selection signatures and for parameterizing historical demographic models. However, one practical issue of the de novo approach requires special attention here.

The process of cluster building easily results in the ‘loss’ of many polymorphisms, because a stringent maximum mismatch threshold is required during clustering of sequence reads to avoid the inclusion of paralogous (non-allelic variants) variants, especially when working with taxa with complex, recently duplicated genomes (Hohenlohe et al., 2011). This issue is greatly aggravated when multiple divergent species are present in the data set. Sequence reads derived from different species will often be too divergent to be ‘recognized’ as members of the same cluster (= genetic locus) in spite of shared ancestry, ultimately resulting in the loss of detected polymorphic sites or in the reconstruction of separate loci (‘allele splitting’), depending on the filtering criteria used. Genetic divergence between species may also manifest itself in the loss of endonuclease restriction sites (which play a crucial role in genotyping-by-resequencing approaches such as RAD-seq; Baird et al., 2008) or in the loss of entire stretches of DNA. A bioinformatic analysis of pilot data from RAD-seq of an important plant radiation at the Cape may serve to illustrate these issues.

Removing the net: towards De Novo SNP calling in multispecies data sets

  1. Top of page
  2. Abstract
  3. Introduction
  4. Drivers of species persistence: The Cape of Southern Africa as an example
  5. Phylogeographical studies in The Cape: few and far between
  6. Stretching the limits of phylogeography: the power and promise of next-generation sequencing
  7. Putting NGS to work in biogeographical studies of species-rich environments
  8. De Novo approaches: NGS without the safety net of a completely sequenced genome
  9. Removing the net: towards De Novo SNP calling in multispecies data sets
  10. Conclusions and outlook
  11. Acknowledgements
  12. References
  13. Biosketch
  14. Supporting Information

To illustrate our point, we generated and analysed RAD-seq data for three southern African species of the Restionaceae family (restiads): Restio capensis (L.) H. P. Linder & C. R. Hardy, Restio triticeus Rottb., and Hypodiscus aristatus (Thunb.) C. Krauss (see Appendix S1 in Supporting Information). These species originate from highly divergent clades within the Restionaceae family (Hardy et al., 2008; Linder & Hardy, 2010), thus providing a conservative setting for testing the applicability of RAD-seq in multispecies studies. We followed essentially the same laboratory procedures recently employed for a pair of highly divergent, hybridizing Northern Hemisphere tree species by our group (Stölting et al., 2012). RAD sequence reads were assembled by de novo clustering because no reference genome is currently available for Restionaceae. The widely used Stacks software (Catchen et al., 2011) was used to build sequence clusters in each of the three restiad species using commonly used criteria (e.g. a maximum of three mismatches; see legend of Fig. 1). For the purpose of the present contribution, we used clusters shared by two individuals sampled from each species and examined sharing of these clusters between taxa, using custom perl scripts and open source software (Chen & Boutros, 2012) (Fig. 1). The individuals were picked randomly from 300 RAD-sequenced plants currently used for population genomic studies in our laboratory.

image

Figure 1. Venn diagram illustrating the potentials and limits of studying clusters of restriction-site-associated DNA (RAD) sequence reads across three highly divergent species of the South African Restionaceae: Restio capensis (RC), Restio triticeus (RT) and Hypodiscus aristatus (HA). The black numbers are the numbers of unique RAD clusters shared by two individuals examined for each species. The white numbers are the numbers of unique RAD clusters shared by different combinations of species. Clustering (‘stack building’) was carried out with the Stacks software (Catchen et al., 2011), allowing for a maximum of three mismatches between reads of the same cluster and a minimum coverage of two copies per cluster (see text for details).

Download figure to PowerPoint

The Venn diagram indicates the great loss of sequence clusters when combining information from these three divergent taxa (Fig. 1). Nevertheless, more than 5600 unique sequence clusters were shared across all three species, thus providing a rich source of DNA sequences that can be mined for SNP diversity (quantifying that diversity requires the analysis of larger samples and is beyond the scope of the present contribution). The diagram is also suggestive of a relationship between levels of cluster sharing and evolutionary divergence, with the congeneric taxa Restio capensis and R. triticeus sharing a much greater number of unique clusters than Hypodiscus aristatus with either of these. Note that the split between the genus Restio and the group containing Hypodiscus is the most basal one in a comprehensive molecular phylogeny of the Restionaceae family (Hardy et al., 2008), the estimated divergence time of the entire radiation being > 35 Ma (Linder, 2005). Our results illustrate the potentials and limits of NGS approaches for multispecies biogeographical studies of large radiations such as that of Restionaceae in the Cape, in particular, the amount of putatively homologous sequence information (cluster sharing) uncovered across different evolutionary time-scales (Fig. 1). There is great potential for improvement of all these issues, as existing cluster building software advances (e.g. Catchen et al., 2011) and the read length of NGS approaches increases.

Another currently open issue of multispecies NGS approaches in biogeography is that divergence will vary not only between different pairs of species within radiations of animals and plants, but also along the genomes of any given set of species. This is the case because the genomes of plants and animals remain ‘porous’ for up to millions of years following the onset of speciation (Wu & Ting, 2004), and the combined action of selection and drift creates considerable variation in the ‘genomic landscape of divergence’ until whole-genome isolation is complete (Feder & Nosil, 2010). This implies that the efficiency of de novo clustering will vary greatly along genomes, with far worse performance of de novo SNP calling methods in hyperdivergent regions of the genome. Here, many sequence clusters will simply drop out from the analysis because they exceed the a priori defined mismatch threshold (i.e. homology criterion) used during cluster building, potentially leading to highly uneven coverage of genomes in multispecies studies. In a related contribution (Lexer & Stölting, 2012), we touch on these open issues based on exemplary RAD-seq data for two highly divergent (several million years) hybridizing plant species with a completely sequenced reference genome. We suspect that tackling these issues in multispecies biogeographical studies will require analytical approaches that account for variation in the depth of genealogies across the genome (which may be estimated from multispecies data) and experimental manipulation of sequence alignment (i.e. mismatch) and sequencing depth (i.e. coverage) thresholds in each single case, accompanied by extensive simulation studies of the relevant parameter space. Recent work on divergent adaptation, gene flow and hybrid speciation in Lake Victoria cichlid fishes (Keller et al., 2012) demonstrates how much can be learned by applying RAD sequencing (or similar NGS approaches) to multispecies studies of non-model taxa without sequenced reference genomes, and similar examples from plants are forthcoming.

Conclusions and outlook

  1. Top of page
  2. Abstract
  3. Introduction
  4. Drivers of species persistence: The Cape of Southern Africa as an example
  5. Phylogeographical studies in The Cape: few and far between
  6. Stretching the limits of phylogeography: the power and promise of next-generation sequencing
  7. Putting NGS to work in biogeographical studies of species-rich environments
  8. De Novo approaches: NGS without the safety net of a completely sequenced genome
  9. Removing the net: towards De Novo SNP calling in multispecies data sets
  10. Conclusions and outlook
  11. Acknowledgements
  12. References
  13. Biosketch
  14. Supporting Information

Understanding the mechanisms and drivers of species divergence, expansion, and persistence in spatially structured, species-rich environments of high conservation value (i.e. biodiversity hotspots) will require concerted efforts to resolve the phylogeography and genetics of diverging populations in a much greater number of taxa and with much greater genomic coverage than previously achieved. We have demonstrated this need using the Cape of southern Africa as an example, but our own experience shows that the same applies to other Southern Hemisphere biodiversity hotspots (Palma-Silva et al., 2009, 2011). The NGS technologies to facilitate scaling-up this process are now available and increasingly affordable. Moreover, suitable bioinformatic approaches are in place to handle the enormous amounts of sequence data gathered in each single species. A challenge ahead will be the development of bioinformatic tools able to efficiently handle NGS data involving multiple genetically divergent species. This will be particularly important for research on plants. The notorious complexity and evolutionary ‘fluidity’ of plant genomes challenge the reliable assignment of orthologous (= allelic) genetic variants, necessary to trace genealogies within and among diverging populations of radiating species.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Drivers of species persistence: The Cape of Southern Africa as an example
  5. Phylogeographical studies in The Cape: few and far between
  6. Stretching the limits of phylogeography: the power and promise of next-generation sequencing
  7. Putting NGS to work in biogeographical studies of species-rich environments
  8. De Novo approaches: NGS without the safety net of a completely sequenced genome
  9. Removing the net: towards De Novo SNP calling in multispecies data sets
  10. Conclusions and outlook
  11. Acknowledgements
  12. References
  13. Biosketch
  14. Supporting Information

Biosketch

  1. Top of page
  2. Abstract
  3. Introduction
  4. Drivers of species persistence: The Cape of Southern Africa as an example
  5. Phylogeographical studies in The Cape: few and far between
  6. Stretching the limits of phylogeography: the power and promise of next-generation sequencing
  7. Putting NGS to work in biogeographical studies of species-rich environments
  8. De Novo approaches: NGS without the safety net of a completely sequenced genome
  9. Removing the net: towards De Novo SNP calling in multispecies data sets
  10. Conclusions and outlook
  11. Acknowledgements
  12. References
  13. Biosketch
  14. Supporting Information

Christian Lexer leads a research programme on the evolutionary genomics of adaptation, speciation, and of traits involved in range shifts in plants. The author team forms part of a consortium research project on the Spatially Explicit Evolution of Diversity (SPEED), coordinated by Peter B. Pearman.

Author contributions: C.L. and N.S. conceived the ideas; C.L. wrote the paper; S.M. compiled and analysed the phylogeographical literature; E.B. and K.N.S. analysed the NGS data, and all authors contributed to data interpretation and writing.

Supporting Information

  1. Top of page
  2. Abstract
  3. Introduction
  4. Drivers of species persistence: The Cape of Southern Africa as an example
  5. Phylogeographical studies in The Cape: few and far between
  6. Stretching the limits of phylogeography: the power and promise of next-generation sequencing
  7. Putting NGS to work in biogeographical studies of species-rich environments
  8. De Novo approaches: NGS without the safety net of a completely sequenced genome
  9. Removing the net: towards De Novo SNP calling in multispecies data sets
  10. Conclusions and outlook
  11. Acknowledgements
  12. References
  13. Biosketch
  14. Supporting Information
FilenameFormatSizeDescription
jbi12076-sup-0001-AppendixS1.docWord document35KAppendix S1 RAD sequencing results for South African Restionaceae (restiad) species discussed in the main text.

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.