Match between question and approach
Even though NGS is a relatively new technology that has only recently been applied in ecology and conservation biology, several studies demonstrate its large potential. In most studies, NGS technology is used to identify genes of importance for conservation. For instance, the California condor has a relatively high frequency of an inheritable dwarfism called chondrodystrophy (Ralls et al. 2000; Ralls and Ballou 2004). NGS technologies are currently being applied to identify carriers of the disease, which offers the opportunity to eliminate the disease (Romanov et al. 2006; Frankham 2010).
Several studies have used NGS technology to characterize the transcriptome of species with conservational interest. The Glanville fritillary butterfly (Melitea cinxia) was one of the first nonmodel species for which a large part of the transcriptome was characterized, using a Roche/454 platform (Vera et al. 2008). The authors characterized around 9000 unique genes, with an average coverage of 6.5-fold for the 4800 longest contigs. This coverage was sufficient for the identification of a large number of SNPs, including 149 first and second codon position polymorphisms, which are likely to change the corresponding amino acid sequence. The genomic resources described in Vera et al. (2008) enable the study of ecological features of M. cinxia (e.g., dispersal ability). In a follow-up study, Wheat et al. (2011) combined the genomic resources developed by Vera and colleagues with a long-term ecological study to obtain a more mechanistic understanding of life history variation affecting ecological and evolutionary dynamics of M. cinxia. The authors first identified groups of populations that differed in their demographic history. Gene expression differences and allelic polymorphisms were subsequently linked with life history traits and population dynamics to identify new candidate genes that affect eco-evolutionary dynamics. Their results are important for the conservation of M. cinxia, as the life history traits they studied are known to affect metapopulation persistence in fragmented habitats (Hanski and Ovaskainen 2000).
There are many applications of NGS, and the number of applications is increasing continuously. Starting a conservation genomics research program therefore involves both formulating very precise questions and finding a match between question and NGS application. Here, we discuss this match in three categories of questions that can most profitably be addressed with a conservation genomic approach (Fig. 3).
Figure 3. A scheme of how various next-generation sequencing approaches relate to the three main categories of questions in conservation genomics and how to feed their results into each other.
Download figure to PowerPoint
One important question in conservation genetics is whether patterns of markers are accurate estimations of processes like drift, inbreeding, and gene flow. Applying NGS allows for the investigation into whether patterns of genome-wide variation, as measured with thousands of SNPs, lead to the same conclusions about these population genetic processes as patterns of the variation in 10–20 microsatellites. For instance, the genome-wide estimation of heterozygosity across all SNPs is negatively correlated with the level of individual inbreeding (Keller and Waller 2002), and SNP variation might therefore provide a more accurate estimate of inbreeding. Moreover, using large numbers of SNPs, distributed across the genome, is expected to lead to more adequate estimates of population genetic parameters (Novembre and Stephens 2008), to easier detection of signals of selection (Slate et al. 2009), to more power in assigning individuals to parents or other kin (Santure et al. 2010), and to estimates of historical demography (Ekblom and Galindo 2010).
The choice between various approaches to detect and screen SNPs depends on the final goal and the resources available. If the goal is to perform a single experiment to screen a population of a nonmodel species on SNP variation, performing an RAD-tag experiment seems to provide the best balance between level of detail in the data and costs and efforts invested. If the goal is to develop SNPs for many follow-up experiments, it is better to perform WGS or transcriptome sequencing, so as to create a reference genome that can be annotated. Additional advantage of these approaches is also that other types of markers, most notably microsatellites, can be identified in the same run. The developed SNPs can then be screened either with a SNP-chip that is designed based on these results or with an RAD-tag sequencing procedure. In the latter case, because now a (annotated) reference genome is available, the SNP variation discovered by RAD-tag sequencing can be mapped on the genome and be functionally evaluated.
All described approaches have been applied to nonmodel species and in ecological or conservation contexts. Sanchez et al. (2009), in an effort to develop genomic tools for the Rainbow trout (Oncorhynchus mykiss), performed WGS on a pool of genomic DNA composed out of 96 unrelated rainbow trout. Three independent analyses were performed on the data, resulting in the identification of 22 022–47 128 putative SNPs.
Novaes et al. (2008) used 454 pyrosequencing to characterize the transcriptome of Eucalyptus grandis, the most widely planted hardwood tree species. They used RNA of vegetative tissues sampled from 21 different genotypes and detected 23 742 SNPs, 83% of which were then validated after resequencing. This information was then used to detect evolutionary signatures of genes by studying nonsynonymous and synonymous substitutions. Therefore, several genes were discovered that are under purifying selection.
Angeloni et al. (2011) sequenced the transcriptome of 48 individuals of the locally threatened plant species Scabiosa columbaria, using a combination of 454 and Illumina sequencing. They found a total of 75 054 putative SNPs. They also identified 4320 microsatellites, for which 856 had suitable flanking regions for primer design.
To study the parallel evolution of marine and freshwater populations of the three-spined stickleback (Gasterosteus aculeatus), Hohenlohe et al. (2010) applied RAD-tag sequencing on an Illumina platform to simultaneously detect and genotype SNPs. They identified over 45 000 SNPs in two oceanic and three freshwater natural populations of threespine stickleback), for a total of 100 individuals. Further analyses showed that these SNPs were evenly distributed across the entire genome. Several chromosomal regions in stickleback were found that were highly differentiated between the two ecotypes. These regions contained both previously identified loci of large phenotypic effect and novel candidate genes involved in stickleback phenotypic evolution.
The same technology was applied by Hohenlohe et al. (2011) to identify almost 3000 candidate SNP loci with fixed allelic differences between introduced rainbow trout (Oncorhynchus mykiss) and native west-slope cutthroat trout (Oncorhynchus clarkii lewisi), using a total of 24 individuals.
Rowe et al. (2011) present a review on the application of RAD-tag sequencing, with NGS, in different fields.
Another important goal of conservation genomics is to study the interaction between genes and their environment, in a conservation context. NGS allows for the study of the balance between genetic effects of habitat fragmentation (inbreeding, loss of genetic variation) and effects of habitat degradation, at the genomic level. NGS also allows for the identification of the genes involved in adaptation. Methods following NGS, like GWSS and GWAS, allow for distinguishing neutral from non-neutral markers and thus for screening of the effect of habitat fragmentation on patterns of non-neutral (as compared to neutral) marker variation.
Stapley et al. (2010) provided an excellent overview of NGS approaches in the study of adaptation. In short, the first step is to create a dense map of SNP markers across the genome. This can be done with WGS, with transcriptome sequencing, or with NGS of targeted (candidate) regions. Also RAD-tag sequencing could be used, although this may deliver less dense maps. Based on screening many individuals for thousands of SNPs, one or more of the following approaches can be used to identify the loci involved in adaptation. A GWSS procedure analyzes only SNP data and identifies outlier loci, as candidate areas involved in adaptation. A reference genome is not an absolute requirement, as the method only searches for markers with a deviating level of variation. Therefore, GWSS can be performed in nonmodel species that lack a reference genome, using RAD-tag sequencing. However, only when a reference genome is available can the identified markers be associated with areas in the genome, which is the starting point for further functional analyses. If besides variation in SNPs, also variation in phenotype is assessed, associations between markers and traits can be found in a GWAS procedure. This does, however, require a reference genome, where the position of markers relative to each other is known. In some cases, species of conservation interest can be studied using reference genomes of closely or more distantly related model species, as linkage groups are likely to be conserved across related species. Alternatively, transcriptome analysis using an RNA-seq procedure can be used to identify genes that are associated with differences between populations, be it genetic or environmental differences (or both).
Next-generation sequencing provides many advantages in this type of research (Stapley et al. 2010). For instance, it provides much more power, using more loci and more individuals, thereby facilitating the discovery of selection signals or of loci of small effect. Perhaps, the biggest advantage is that the dynamics of genes involved in adaptation can now be evaluated within the context of the dynamics of other parts of the genome. This opens the way to separating effects of genetic drift from effects of selection, and effects of selection from effects of demography. Eventually, this will allow us to investigate what the balance is between genetic drift and local adaptation, in small populations or in systems of isolated populations.
Genome-wide selection scans was performed by Galindo et al. (2010), who applied 454 pyrosequencing to characterize the transcriptome of two different ecotypes of the marine gastropod Littorina saxatilis. This gastropod is a good species to study ecological speciation. Galindo and colleagues collected 15 females per ecotype in each of the two sampling site. Females were pooled into two samples, each with 30 individuals (one sample per ecotype). Two thousand four hundred and fifty-four SNPs were found, 7% of which were identified as outliers that may represent direct targets of selection or regions tightly linked to selected loci.
Atwell et al. (2010) applied GWAS to study the genetics of 107 phenotypes of Arabidopsis thaliana. Several adaptive traits, including flowering time and pathogen resistance, were shown to be controlled by loci of major effect. The study also showed that it may be difficult to distinguish between true association and false positives because of the confounding effect of population structure (see also Bierne et al. 2011). Nevertheless, the authors demonstrated that GWAS can be successfully performed on Arabidopsis and can also be applicable in other, nonmodel organisms.
Turner et al. (2010) performed a GEA study, where they investigated whether Arabidopsis lyrata is locally adapted to serpentine soil, by mapping the polymorphisms responsible for such adaptation. They pooled approximately 200 DNA samples extracted from individuals from serpentine and nonserpentine soil and sequenced each pool with Illumina. The polymorphic SNPs that were most strongly associated with soil type were involved with heavy metal detoxification and calcium and magnesium transport. These SNPs provide several candidate polymorphisms for adaptation in serpentine soil. The authors then confirmed the results by sequencing three candidate loci in the European subspecies of A. lyrata, finding parallel differentiation of the same polymorphism at one locus.
The study of mechanisms.
NGS will be instrumental in the study of the mechanisms underlying the relationship between genetic effects of habitat fragmentation and the final consequences for fitness and population viability. Inbreeding depression, the reduced fitness of offspring from a mating between related individuals, plays a central role in conservation biology. The average level of inbreeding in small and isolated populations is expected to increase over time, making individuals more homozygous, which leads to increased expression of recessive deleterious alleles and reduced fitness. NGS technologies make it possible to study the genetic architecture of inbreeding depression (Kristensen et al. 2010). One way to proceed here is to screen SNP variation in a large number of individuals that differ in inbreeding level. If for each individual also fitness traits are measured, associations between inbreeding level, SNP markers and fitness traits can be assessed (Kristensen et al. 2010). In another approach, differential expression of genes between inbred and outbred individuals can be investigated in an RNA-seq procedure. This would pinpoint genes that are associated with inbreeding depression, either as cause or as consequence. In controlled environment studies with inbred and outbred individuals, the nature of the interaction between inbreeding depression and environmental stress (Armbruster and Reed 2005) can be elucidated.
Although this type of work is in its infancy, the first results underline the need for a conservation genomic approach (Ayroles et al. 2009). Lippman and Zamir (2007) reviewed results that show that inbreeding depression is generally based on the action of several loci but is not associated with genome-wide heterozygosity in regions outside these loci. In a series of microarray experiments with Drosophila melanogaster (Kristensen and Sørensen 2005; Kristensen et al. 2006; Pedersen et al. 2008), it was shown that different populations may have different genetic causes of inbreeding depression. In a series of RNA-seq experiments with the plant species Scabiosa columbaria, it was shown that inbreeding depression in different genotypes may be caused by different genes (Angeloni et al., unpublished research). On the other hand, despite these differences in both studies, there was also a general response. In D. melanogaster, genes involved in stress responses generally respond to inbreeding (Kristensen et al. 2006). In S. columbaria, the first results indicate that genes involved in energy metabolism respond to inbreeding (F. Angeloni, N. Wagemaker, J. Ouborg, unpublished data). Studies on genetic architecture and mechanisms of important conservation genetic processes like inbreeding depression, using NGS approaches, are just starting to emerge, and many exciting and new results are expected in the near future.
Other examples of the application of RNA-seq include studies on birds and fishes. Ekblom et al. (2010) investigated tissue-specific gene expression patterns in the zebra finch (Taeniopygia guttata). In particular, they examined genes of the major histocompatibility complex (MHC). MHC genes are among the most thoroughly studied example of adaptive molecular evolution. The authors sequenced and assembled RNA from six different tissues, for a total of 11 793 ESTs. They found evidence for tissue-specific differential expression of 10 different genes related to MHC, primarily in spleen and brain.
Künstner et al. (2010) used RNA-seq for a comparative genomic study of the avian genome. The authors sequenced the brain transcriptome of 10 different nonmodel bird species and identified nearly 6500 genes. Among other results, they found evidence for a higher mutation rate of the Z chromosome when compared to autosomes. Overall, their study demonstrates the usefulness of NGS technologies for comparative genomic analysis for nonmodel species.
Elmer et al. (2010) performed RNA-seq to examine transcriptome differences between ecologically divergent, endemic and sympatric species of cichlid fishes (Amphilophus astorquii and Amphilophus zaliosus). The authors identified six genes showing signals of strong diversifying selection. These genes were involved in biosynthesis, metabolic processes, and development. NGS technologies enabled the authors to infer that natural selection is acting to diversify the genomes of young species, such as cichlids, to a much larger extent than was previously thought.