Correspondence: Elke Nevoigt, School of Engineering and Science, Jacobs University gGmbH, Campus Ring 1, D-28759 Bremen, Germany. Tel.: +49 421 2003541; fax: +49 421 2003249; e-mail: firstname.lastname@example.org
Saccharomyces cerevisiae has become a favorite production organism in industrial biotechnology presenting new challenges to yeast engineers in terms of introducing advantageous traits such as stress tolerances. Exploring subspecies diversity of S. cerevisiae has identified strains that bear industrially relevant phenotypic traits. Provided that the genetic basis of such phenotypic traits can be identified inverse engineering allows the targeted modification of production strains. Most phenotypic traits of interest in S. cerevisiae strains are quantitative, meaning that they are controlled by multiple genetic loci referred to as quantitative trait loci (QTL). A straightforward approach to identify the genetic basis of quantitative traits is QTL mapping which aims at the allocation of the genetic determinants to regions in the genome. The application of high-density oligonucleotide arrays and whole-genome re-sequencing to detect genetic variations between strains has facilitated the detection of large numbers of molecular markers thus allowing high-resolution QTL mapping over the entire genome. This review focuses on the basic principle and state of the art of QTL mapping in S. cerevisiae. Furthermore we discuss several approaches developed during the last decade that allow down-scaling of the regions identified by QTL mapping to the gene level. We also emphasize the particular challenges of QTL mapping in nonlaboratory strains of S. cerevisiae.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
Saccharomyces cerevisiae has been an important eukaryotic model organism in fundamental research and at the same time a robust and versatile production organism in industrial biotechnology. Pathway-engineered S. cerevisiae strains for the industrial production of ethanol from lignocellulosic biomass, lactic and succinic acid, butanols, isoprenoids, and polyketides have been generated or are under development (Nevoigt, 2008; Krivoruchko et al., 2011). Although industrial production strains have been selected for best performance, there is always room for further improvement, particularly when it comes to tolerance toward the multiple stresses that occur during industrial processes. This has been recently exemplified in the fermentation of lignocellulosic biomass (Almeida et al., 2011) and very-high gravity brewing (Puligundla et al., 2011). Stress tolerance is a multifactorial trait and has therefore been difficult to engineer in a rational way. Therefore, alternative engineering strategies that do not require prior knowledge about the phenotype to genotype relationship such as adaptive evolution/evolutionary engineering (Cakar et al., 2005; Aguilera et al., 2010), global transcription machinery engineering (Alper et al., 2006), and breeding (Benjaphokee et al., 2011) have been applied. These strategies may prove problematic, however, as they may result in the accumulation of disadvantageous mutations because of the highly focused selection pressure applied. Additionally, such strategies do not allow linking of the phenotypic improvement to the underlying molecular and/or genetic basis and therefore do not provide any understanding or information for further strain improvement.
In contrast, inverse engineering is a strategy that seeks to identify the genetic determinants of a phenotypic trait of interest followed by the targeted genetic improvement of an industrial production strain (Bailey et al., 2002). By doing so, this strategy exploits the biodiversity of yeast for strain optimization, taking advantage of the well-documented natural phenotypic variation within the species S. cerevisiae as well as between S. cerevisiae and closely related species (Fay & Benavides, 2005; Carreto et al., 2008; Kvitek et al., 2008; Liti et al., 2009; Schacherer et al., 2009; Csoma et al., 2010). It is obvious that this diversity presents a treasure chest with regard to genetic determinants of industrially relevant phenotypic traits that can be used in inverse engineering of yeast. However, the major challenge in inverse engineering is the identification of the genetic determinants of the phenotypic trait of interest. A mere global molecular analysis of the strains by next-generation sequencing or ‘omics’ technologies can certainly lift a corner of the veil (Blieck et al., 2007; Duong et al., 2011); however, they cannot distinguish between trait-relevant and trait-irrelevant molecular differences. In this regard, the integration of traditional genetic linkage analysis (called genetic mapping) with genomic technologies is a promising avenue to identify causative genetic determinants among the usually high number of genetic variations between different S. cerevisiae strains. Saccharomyces cerevisiae provides an ideal platform for genetic mapping because of the ease by which experimental crosses can be performed and the high recombination rate during meiosis (on average about 90 crossovers per meiosis; Mancera et al., 2008).
This review gives an overview of the state of the art of genetic mapping of quantitative phenotypic traits in S. cerevisiae. We will introduce the basic concept of mapping quantitative trait loci (QTL) and demonstrate how recent advances in global, high-throughput genotyping technologies have strongly facilitated genetic mapping of quantitative phenotypic traits in this organism. Challenges posed to genetic mapping in industrial S. cerevisiae strains will also be discussed.
Confounding factors in the genetic dissection of quantitative traits
The phenotypic variation between strains can be categorized as qualitative or quantitative (Falconer & Mackay, 1996). Qualitative traits are often referred to as Mendelian traits because they are controlled by a single locus that has a discrete effect on the phenotype. Identifying genes underlying qualitative traits has been fairly straightforward through the application of positional cloning (Botstein & Risch, 2003). Most traits, however, are quantitative, meaning that they comprise a continuous distribution of a measurable character. Well-known examples of quantitative traits in S. cerevisiae include varying types of stress tolerance, such as the industrially relevant characteristics of heat tolerance (Parts et al., 2011) and ethanol tolerance (Hu et al., 2007). Quantitative traits are typically controlled by multiple loci referred to as QTL (Abiola et al., 2003). A QTL thus refers to an individual locus that explains a specific part of the phenotypic expression of a quantitative trait. A QTL can contain a single gene or a cluster of closely linked genes that contribute to the quantitative trait (Mackay, 2001). We emphasize here that the distinction between qualitative and quantitative traits is somewhat artificial, as qualitative traits can also be considered as extreme cases of quantitative traits. This implies that the strategies for the genetic dissection of quantitative traits discussed later in this review can also be applied to qualitative traits.
In general, the identification of genes that contribute to quantitative traits has proven difficult (Flint & Mott, 2001). This is because of the complex genetic architecture of quantitative traits, which is dictated by factors such as variable QTL contribution, epistasis, genetic heterogeneity, and gene–environment interaction. With regard to the quantitative contribution of QTL, it has long been thought that quantitative traits are determined by numerous loci each having a small effect on the trait (called minor QTL) (Barton & Turelli, 1989). However, the successful dissection of quantitative traits in various species over the last decade suggests that quantitative traits are instead determined by loci with varying contributions to the trait, of which some may have a large effect (called major QTL) (Risch, 2000). For example, it has been demonstrated in S. cerevisiae that more than 90% of the difference in sporulation efficiency between the high-efficiency strain SK1 and the low-efficiency strain S288c was determined by just three major QTL (Deutschbauer & Davis, 2005). The same number of QTL have been found to explain 94% of the difference in frequency of petite formation between the low-frequency strain RM11-1a and the high-frequency strain BY4716 (Dimitrov et al., 2009). While the identification of major QTL has been straightforward in most published studies, the identification of minor QTL has required more sophisticated strategies and therefore has been a much greater challenge (Brem et al., 2005; Demogines et al., 2008; Sinha et al., 2008; Ehrenreich et al., 2010; Parts et al., 2011).
Apart from the low contribution of some QTL, the genetic dissection of quantitative traits can also be impaired by epistatic effects (Carlborg & Haley, 2004). Epistatic effects are interactions between different QTL, such that the phenotypic expression of the quantitative trait cannot be predicted simply by summing up the effects of the individual QTL. A case of synergistic epistasis has been encountered in the above-mentioned study of Dimitrov et al. (2009). This study identified three alleles from BY4716 (SAL1, CAT5, and MIP1) that contributed to its high petite frequency. When the contribution of each individual BY4716 allele to petite frequency was determined in the BY4716 background (i.e. the two other BY4716 alleles were replaced by their RM11-1a counterparts), the obtained petite frequencies were 24% (SAL1), 10% (CAT5), and 14% (MIP1). These frequencies added up to 48%, which is less than two-thirds of the petite frequency of the BY4716 wild type (79%). An additional factor presenting an obstacle in the study of quantitative traits is genetic heterogeneity, in which different combinations of genetic determinants cause the same, indistinguishable phenotype (Risch, 2000). Finally, gene–environment interactions also influence the genetic study of quantitative traits. In fact, the effect of a genetic determinant on a quantitative trait can differ in different environments. One example of gene–environment interaction in S. cerevisiae has been given by Smith & Kruglyak (2008). These authors measured transcript abundances of thousands of genes as quantitative traits in segregants from the cross between RM11-1a and BY4716 grown in either glucose or ethanol as carbon source and QTL were identified that exhibited different effects on gene expression in the two conditions.
The basic principle of QTL mapping
The aforementioned interdependent and complex interactions between QTL make it virtually impossible to identify all relevant QTL when studying each QTL separately. Hence, the identification of QTL demands a method in which they can all be collectively identified. A straightforward method is QTL mapping, which aims at the simultaneous genomic localization of all loci determining a quantitative trait. QTL mapping in S. cerevisiae is typically performed by crossing two strains that differ in the trait of interest (Fig. 1a). In particular, a haploid parental strain possessing the trait (referred to as the trait+ parent) is mated with another haploid parental strain lacking the trait (referred to as the trait− parent). After mating, the diploid hybrid strain is sporulated to yield segregants that are genetically different. This genetic diversity is the result of meiotic recombination events at both the chromosomal and intra-chromosomal level that occur during meiosis (Mancera et al., 2008). It is associated with variation in the phenotypic expression of the quantitative trait of interest. Segregants with a phenotypic expression comparable to the trait+ parent will be enriched in the genetic determinants crucial for the quantitative trait, while segregants with a lower phenotypic expression will lack some or all determinants. The selection of the former segregants is usually the starting point for QTL mapping as depicted in Fig. 1b and discussed later.
During QTL mapping, the allocation of the genetic determinants to regions in the genome relies on their co-segregation with genetic loci of known positions, which are called genetic markers. The most widely used genetic markers are DNA polymorphisms such as single nucleotide polymorphisms (SNPs), which are usually plentiful in number and thus enable complete genome coverage. These molecular markers are generally presented on a map that shows the order and relative distances between the markers along each chromosome. Before the availability of molecular markers, genetic mapping was solely based on morphological markers (alleles encoding morphological traits) and biochemical markers (alleles encoding variants of enzymes). Even though both types of markers have enabled the mapping of several qualitative traits, they were fairly limited in number, cumbersome to detect, and therefore very inefficient for the mapping of quantitative traits (Tanksley, 1993).
Provided that a minimal number of segregants with a comparable phenotypic expression as the trait+ parent have been selected (Fig. 1b), the unknown positions of the QTL can be inferred from the common presence of genetic markers in these segregants. This is based on the principle of meiotic recombination that is responsible for the relationship between the relative distance between two loci on a chromosome and their tendency to co-segregate in a cross. On the one hand, when the loci are located far away from each other on a single chromosome, there is a large probability that one or more crossovers will occur between them and separate them in the cross. In this case, the recombination frequency between the loci will approach 50%, which is the same value as obtained for two loci located on different chromosomes. On the other hand, when the loci are located close together on a single chromosome, it is very unlikely that a crossover will occur between them and hence the loci will tend to segregate together in the cross. As a consequence, the recombination frequency will approach 0% for loci located very close to one another. This principle of meiotic recombination implies that any enrichment in genetic determinants crucial for the phenotypic trait under study in the selected segregants can be inferred from the enrichment of genetic markers that co-segregate with them (Fig. 1b). The significance of this enrichment must be evaluated by means of statistical analysis.
The QTL mapping approach described in Fig. 1 is known as selective genotyping because it involves the selection of segregants that represent the quantitative extreme of the phenotypic trait under study (i.e. maximal expression of the trait). The strength of this approach is its high mapping power, which means that the chance of detecting a QTL is generally high (Lander & Botstein, 1989). However, there may be circumstances where it is advantageous to use the genotypic and phenotypic information from all segregants independent of their phenotypic expression of the trait, for example when the ratio of extreme to total segregants is relatively low (implying a large number of QTL contributing to the phenotypic trait under study) or when phenotyping is cumbersome to perform. This second QTL mapping approach implies that the genotyped population is partitioned into two groups based on the presence of either the trait+ parent's genotype or the trait− parent's genotype for each genetic marker. A statistical test determines for each marker whether there is a significant difference between the phenotypic distributions of the two genotypic groups (Broman, 2001).
In contrast to what is generally assumed, phenotypic variation between the parent strains is not a necessity for QTL mapping. The phenotypic variation must only reside in the population of segregants. In fact, each parent strain may contain several loci of opposite effects on the trait that reduce the phenotypic difference between the parent strains in comparison with the population of segregants in which the loci may have been separated. This phenomenon is called transgressive segregation (Rieseberg et al., 1999), and an extreme case has been encountered in a study mapping the variation in gene expression between a laboratory strain and a wild isolate of S. cerevisiae during exponential growth (Brem et al., 2002). In this study, the transcription level of genes showing differential expression between the two parent strains and/or the segregants from a cross between the parent strains were considered quantitative traits. Linkage analysis between the markers and the transcription levels revealed that 570 transcripts showed linkage with at least one locus, of which almost half were not different in the parent strains. Transgressive segregation has also been encountered in the mapping of variation in morphological traits (Nogami et al., 2007) as well as sensitivity to small-molecule drugs (Perlstein et al., 2007) and DNA-damaging agents (Demogines et al., 2008).
An alternative approach to identify genetic determinants of quantitative traits is association mapping, in which the mapping is performed at the level of population including many strains of a species instead of at the level of two parent strains. Association mapping relies on the assumption that phenotypically relevant sequence variants are likely to be more prevalent in yeast strains that display the trait of interest than in strains that lack the trait. Association mapping has been extensively used in higher organisms in which experimental crosses are difficult or even impossible to perform (Risch, 2000). To our knowledge, there are no published studies describing the genetic dissection of quantitative traits in S. cerevisiae using association mapping. However, Liti et al. (2009) sequenced the complete genome of 36 S. cerevisiae strains from a large variety of sources and locations and compared the genotypic data with phenotypic data for roughly 200 traits. The results revealed a substantial correlation between genotypic and phenotypic properties within the species S. cerevisiae. Although the authors did not focus on the mapping of the determinants underlying the traits, the results showed that association mapping might be a promising route in S. cerevisiae. The primary advantage of QTL mapping at the population level in comparison with the level of two single parent strains is the easier identification of the genetic determinants. This results from the larger diversity in polymorphisms between distinct yeast isolates as compared to segregants from a single cross (Schacherer et al., 2009). However, this method does not consider epistasis and the fact that different genetic elements and/or different combinations thereof may control the same trait in different strains. In addition, the complexity of the QTL defining a specific trait at the population level can be so high that reliable identification of the QTL (particularly minor QTL) usually becomes exceedingly difficult. These three obstacles prove to be the most significant disadvantages of this method.
High-throughput genotyping technologies for high-resolution QTL mapping
High-resolution QTL mapping is only possible by the simultaneous genotyping of thousands of genetic markers, and thus, this method was impossible for a long time because of the low number of available genetic markers and the laboriousness of low-throughput genotyping methods. It is obvious that the progress in whole-genome genetic analysis has facilitated QTL mapping in terms of genome coverage and mapping resolution (Segrè et al., 2006). Indeed, QTL mapping in S. cerevisiae has received renewed attention since the introduction of rapid and cost-effective methods to analyze genetic variation between strains. In particular, the introduction of high-density oligonucleotide arrays and whole-genome re-sequencing in QTL mapping has allowed the successful genetic dissection of several quantitative phenotypic traits in S. cerevisiae.
High-density oligonucleotide arrays
Winzeler et al. (1998) were the first to describe a successful approach of QTL mapping in S. cerevisiae that made use of a high-throughput genotyping technology. They hypothesized that the genetic variations between two S. cerevisiae strains could be detected by hybridizing their genomic DNA to high-density oligonucleotide arrays. The authors reasoned that the extent of hybridization of a sequence to an oligonucleotide probe present on the array depends on the number and position of mismatches between the two sequences. Thus, the genetic variations between two strains are revealed as hybridization differences on an array (measured as fluorescence intensities), while the actual sequence changes remain unknown. To test the feasibility of their hypothesis, total genomic DNA from two different S. cerevisiae strains (YJM789 and S96) were hybridized on separate Affymetrix gene expression micro-arrays. Comparison of the hybridization patterns identified 3714 probes that showed significantly different signal strengths between the two strains. The hybridization differences unambiguously distinguished both strains and showed a bimodal distribution across four segregants from one tetrad of the hybrid YJM789/S96. Thus, the hybridization differences could serve as markers in a genetic mapping experiment, and this was experimentally supported by the simultaneous mapping of five Mendelian traits to their known genomic positions with a resolution of 11–64 kb.
In the following years, many quantitative traits in S. cerevisiae were mapped using high-density oligonucleotide arrays, leading to the identification of causative QTL, genes, and even SNPs. Examples include traits such as high-temperature growth (Steinmetz et al., 2002; Sinha et al., 2008), gene expression levels (Brem et al., 2002; Yvert et al., 2003), sporulation efficiency (Deutschbauer & Davis, 2005; Ben-Ari et al., 2006), sensitivity to small-molecule drugs (Perlstein et al., 2007) and DNA-damaging agents (Demogines et al., 2008), acetic acid production (Marullo et al., 2007), cell morphology (Nogami et al., 2007), and frequency of petite formation (Dimitrov et al., 2009). All of these studies utilized commercially available gene expression arrays, which typically contain a few 25-mer oligonucleotide probes for each annotated open reading frame. This implies that these arrays are only able to reveal genetic strain-to-strain variation with regard to protein encoding genes. The marker density generally obtained with such gene expression arrays has been one marker per 3–4 kb (Brem et al., 2002; Steinmetz et al., 2002; Deutschbauer & Davis, 2005). To overcome the limitation of low marker density when using gene expression arrays, Gresham et al. (2006) have introduced yeast tiling microarrays in QTL mapping. Yeast tiling microarrays consist of overlapping 25-mer oligonucleotide probes spaced on average 5 bp apart to provide complete and approximately fivefold redundant coverage of the entire genome. This array design provides several measurements of a given nucleotide's effect on the extent of hybridization, which can subsequently be modeled to derive a statistical measure of the likelihood of a polymorphism at a particular site.
In all of the above-mentioned studies, commercially available arrays have been used that contain probes designed from the S288c genome sequence (Cherry et al., 1997). S288c is a natural isolate maintained under selective laboratory conditions, which has led to certain evolutionary adaptations (Gu et al., 2005). Thus, it has to be kept in mind that hybridization against probes of this strain might reveal QTL which are simply based on general genetic differences present in the strains under investigation compared with S288c. One example is the identification of MKT1 as a common quantitative trait gene in several QTL mapping experiments performed against the S288c background (Steinmetz et al., 2002; Deutschbauer & Davis, 2005; Demogines et al., 2008; Dimitrov et al., 2009; Kim & Fay, 2009; Ehrenreich et al., 2010). In fact, the same polymorphism in the Mkt1 protein (D30G) was found to be responsible for at least some part of the phenotypic difference between the strains studied by Deutschbauer & Davis (2005) (sporulation efficiency), Sinha et al. (2006) (high-temperature growth), and Dimitrov et al. (2009) (frequency of petite formation). It was proposed that the S288c mutation (Mkt1-D30) is a loss-of-function mutation (Sinha et al., 2006). Moreover, Mkt1-D30 seems to be a rare variant, as Mkt1-30G has been conserved in all non-S288c S. cerevisiae strains investigated (Deutschbauer & Davis, 2005; Sinha et al., 2006; Dimitrov et al., 2009). The pleiotropic effect of Mkt1 on cellular function can most likely be attributed to its recently established regulatory role in global gene expression as proposed by Zhu et al. (2008). These authors combined all available molecular data from segregants of a cross between the laboratory strain BY, an auxotrophic derivative of S288c (Brachmann et al., 1998), and the wild isolate RM11-1a and generated networks of genes that are co-expressed. Several of these networks were enriched in genes controlled by common genetic loci [called expression QTL (eQTL) hot spots], and the genetic dissection of one such eQTL hot spot revealed Mkt1 as a major regulator of global gene expression.
The QTL mapping power strongly depends on the number of genotyped segregants. In this context, the application of high-density oligonucleotide arrays for genotyping each individual segregant is laborious and expensive. An elegant solution has been to pool many segregants expressing the trait of interest and genotype the pool as a whole. This approach is known as bulk-segregant analysis and was first developed for genotyping genetic markers in pools of plants (Michelmore et al., 1991). Bulk-segregant analysis typically relies on the construction of two pools of segregants that are referred to as the selected pool and the control pool. The selected pool contains a large number of segregants that express the trait of interest, while the control pool contains a similar number of segregants that were not selected for the trait. Once the pools are made, the genomic DNA from each pool is extracted and genotyped for each marker. Genetic regions in the pool where genetic markers originating from one of the parent strains are overrepresented in the selected pool relative to the control pool are predicted to contribute to the trait of interest. Segrè et al. (2006) introduced the concept of bulk-segregant analysis in S. cerevisiae QTL mapping using high-density oligonucleotide arrays. As a proof-of-principle experiment, they successfully mapped three antibiotic resistance genes (KANR, HYGR, and NATR) in a cross between W303 and SK1 to their known chromosomal positions. The selected pool contained approximately 107 segregants from W303/SK1 that were selected in liquid medium containing all three antibiotics (geneticin, hygromycin, and nourseothricin). The control pool containing the same number of segregants was isolated in rich medium without antibiotics. We highlight that in addition to the reduction in time and working costs, an important advantage of bulk-segregant analysis is the increased mapping resolution that can be obtained by examining the numerous recombinations present in a large pool of segregants.
Another method for high-throughput genotyping that has become available for high-resolution QTL mapping is whole-genome re-sequencing. In recent years, there has been a shift away from automated Sanger sequencing toward the development of next-generation sequencing that parallelizes the sequencing process, thereby yielding an enormous volume of inexpensive and accurate genome sequence data (Metzker, 2010). Next-generation sequencing constitutes a number of sequencing technologies that rely on the physical separation of relatively short single DNA molecules on a solid surface or support, allowing thousands to millions of sequencing reactions to be performed simultaneously. This process generates a high number of partially overlapping short sequences referred to as reads. The reads are usually aligned to a known reference sequence to generate the new sequence, a process that is known as re-sequencing. The reads can also be assembled de novo; however, this has proven to be a substantial challenge and usually requires larger reads and read coverage, which is accompanied by higher costs and more intensive bioinformatic input (Metzker, 2010).
The development of next-generation sequencing provides a new avenue to score large numbers of SNPs as markers for QTL mapping. The selection of markers starts by establishing the whole-genome sequence of the two parent strains, which usually provides a large number of SNPs. Only SNPs that can be unambiguously scored can be reliably used as markers for QTL mapping. In practice, such high-quality SNPs are selected by defining a minimal required SNP coverage (i.e. the number of reads covering the nucleotide) and SNP frequency (i.e. the ratio between the number of reads with the specific nucleotide and the total number of reads). These conditions have to be well considered: with more stringent conditions, fewer reliable markers will be available, and thus, the mapping resolution that can be obtained is decreased. Under less stringent conditions, more markers can be selected, which may, however, be less informative.
Applying whole-genome re-sequencing to a large number of individual segregants remains very expensive, similar to the situation with high-density oligonucleotide arrays used for scoring SNPs in individual segregants. Therefore, QTL mapping using whole-genome re-sequencing has thus far been applied exclusively in combination with bulk-segregant analysis. In practice, the selected pool and the control pool are sequenced and the reads aligned against one of the parental sequences. The frequency of each selected SNP is then plotted against its chromosomal position, and the difference between the selected and control pool is statistically analyzed. Two studies have recently applied this approach to map quantitative traits. Ehrenreich et al. (2010) identified loci that contribute to resistance to 17 diverse chemical agents in a cross between RM11-1a and BY4716, one of it being the DNA-damaging agent 4-nitroquinoline (4-NQO). The authors isolated large pools of resistant segregants by exposing approximately 107 segregants to the different agents. The control pool consisted of segregants that were grown in the absence of any of the agents. The nucleotide frequency of SNPs detected by Illumina whole-genome sequence analysis in the DNA from these pools was subsequently applied to map the loci. Parts et al. (2011) used a similar approach to identify loci contributing to high-temperature growth. They crossed an oak tree bark strain that grew well at high temperature (YPS128) with a palm wine strain that grew poorly under identical conditions (DVGBP6044). A population of segregants from the hybrid strain was intercrossed for many generations to accumulate a high number of recombinations. In such a cross, only markers close to the genetic determinants remained linked, which allowed QTL mapping to a higher resolution. This strategy is known as the advanced intercross line approach (Darvasi & Soller, 1995). Half of the intercrossed population containing 107–108 segregants was then grown at 40 °C to generate the selected pool, while the other half was grown in permissive temperature (23 °C) to generate the control pool. Whole-genome re-sequencing of genomic DNA from both pools revealed 21 regions of which the allele frequency in the selected pool was significantly different from that in the control pool.
Statistical methods in QTL mapping
Any QTL mapping requires statistical inference to evaluate the significance of the obtained data. Such mathematical methods range from simple statistical methods to identify an association between a marker and a phenotypic trait to very complex and computationally intensive algorithms to identify a potential QTL. For a detailed overview, the reader is referred to comprehensive reviews regarding statistical methods in QTL mapping (Broman, 2001; Wu et al., 2007). Here, we will focus only on selected aspects that we find most relevant.
In general, the mapping power is determined by the number of segregants and the number of markers scored. Statistics can help to determine the minimum number of segregants and markers to be analyzed to obtain the desired mapping power. However, such absolute methods are, to our knowledge, currently not available in the literature. Nonetheless, it is logical that the mapping power generally increases with an increased number of analyzed segregants as exemplified by the following comparison. Demogines et al. (2008) performed linkage analysis on 123 segregants and identified a single locus that contributed to 4-NQO sensitivity. In contrast, Ehrenreich et al. (2010) selected a much larger number of segregants by exposing a pool of approximately 107 segregants to the drug. Linkage analysis on the selected pool identified thirteen additional loci with significant linkage to 4-NQO sensitivity.
Statistical methods have been commonly used to establish whether the putative linkage between specific markers and the phenotypic trait under study is significant. In this regard, two general approaches have been developed. Single-marker tests evaluate whether there is a significant difference between the occurrence of an individual marker in the selected segregants and in randomly selected segregants. Examples of single-marker tests are analysis of variance, t-tests, and their nonparametric equivalents (Doerge, 2002). In contrast to single-marker tests, interval mapping allows more sophisticated methods of QTL analysis, which rely on genetic maps (Lander & Botstein, 1989). Interval mapping is more powerful than single-marker approaches to detect a QTL because of the structural and additional genotypic data supplied by the genetic map.
There are also statistical approaches that allow the location of multiple QTL at the same time; however, they are based upon the identification of all single QTL and their potential interactions (Kao et al., 1999) and can include other markers as cofactors (Jansen, 1993; Zeng, 1993). Such multiple QTL models are computationally very demanding, and there is no single QTL model that is superior to the other fitted QTL models.
The primary complication with single-marker tests and interval mapping is that they fail to account for multiple testing, consequently increasing the occurrence of false positives. A technique to control the false discovery rate has been proposed (Storey & Tibshirani, 2003). However, this technique must be carefully applied to avoid excluding actual QTL, especially when the power of the QTL mapping experiment is low.
Thus far, very few methods are available that are specifically developed to analyze data obtained from QTL mapping using bulk-segregant analysis based on whole-genome re-sequencing. Such methods take into account the fact that in such analyses, the selected pool does not exclusively contain segregants possessing the trait+ phenotype (as it is the case when analyzing a number of individually selected segregants) but is only enriched in those segregants. Moreover, they consider variation introduced during the sequencing of pools. Ehrenreich et al. (2010) have described a method that is a combination of standard paired t-tests (single-marker test) and a regression-based peak finding approach to detect potential QTL. Other methods have been suggested by Nikolaev et al. (2009), Li et al. (2009), and Magwene et al. (2011).
Dissection of a QTL to the gene level
QTL mapping is a genuine genetic mapping method, meaning that it can only allocate the genetic determinants to intervals in the genome instead of identifying the determinants themselves. Therefore, once the QTL have been identified, they still must be down-scaled to the gene and/or nucleotide level by a combination of traditional methods such as sequence analysis, candidate gene prediction, and functional complementation (Abiola et al., 2003).
Depending on the number of markers and segregants analyzed, a single QTL may range from a few to several hundreds of kilobases in size. This indicates that the number of genes localized within a QTL is often too high to perform the functional evaluation of each individual gene. Therefore, a QTL may first be reduced in size by an approach that is known as fine-mapping, which relies on the analysis of additional markers and/or recombination events within the respective QTL (Steinmetz et al., 2002; Deutschbauer & Davis, 2005; Marullo et al., 2007). As discussed before, not all genetic variations between two strains can be detected or unambiguously scored by high-throughput genotyping technologies and thus selected as markers. In the case of gene-expression microarrays, only genetic variations in protein expressing genes that result in significant hybridization differences can be selected as markers. This significance in hybridization difference is also the prime consideration when applying yeast tiling microarrays. In the case of whole-genome re-sequencing, only genetic variations with a minimum coverage and ratio in the short reads are considered reliable and can be selected as markers. In most QTL, it is therefore possible to identify additional markers by performing Sanger sequencing of the obtained QTL interval in both parent strains. Establishing linkage between the additional markers and the phenotype under study may reveal markers with a stronger linkage and thus reduce the QTL size. Fine-mapping of a QTL may also rely on additional recombination events within the QTL interval. In fact, a recombination event in specific segregants may separate the unknown genetic determinant from closely linked markers, thus allowing further reduction in the QTL size. It is obvious that the latter approach usually requires scoring of a large number of segregants, as genetic markers in one QTL are by nature in very close proximity and thus usually co-segregate. Only the segregants with a recombination event within the QTL under study are informative for fine-mapping. Therefore, to avoid exhaustive phenotyping, it is advisable to first genotype a large number of segregants for markers flanking the QTL and subsequently phenotype only those that show a recombination event between these markers (Ronin et al., 2003).
Although fine-mapping reduces the size of a QTL, it does usually not result in a QTL interval small enough to contain a single gene. Thus, additional steps are required to identify the causative gene variant(s) (Glazier et al., 2002; Abiola et al., 2003). Most studies continue with prioritizing the genes present in the identified QTL (Tabor et al., 2002). The logical first step is to look for open reading frames that show polymorphisms likely to have a consequence on the amino acid sequence of the gene product, such as missense, nonsense, and frame shift mutations. Further prioritization can be based on published information of genes involved in the quantitative trait, for example those genes located in biochemical or regulatory pathways known to be involved in establishing the trait or from studies that have identified proteins that interact with components of these pathways. Although potentially straightforward, such a prioritization does not consider the fact that polymorphisms in promoters might also have strong phenotypic consequences. In this context, useful information for prioritization might result from gene expression analysis or phenotyping the corresponding deletion mutant. Rational evaluation of such available information might help to identify the crucial genetic determinants within a QTL; however, previous studies have shown it to be often insufficient (Deutschbauer & Davis, 2005; Sinha et al., 2006).
Unbiased approaches form an alternative to rational prioritization. Such approaches are advantageous in that they do not rely on prior assumptions. One unbiased approach that has been used in the genetic dissection of QTL is marker-trait association, which resembles the above-mentioned association mapping. The main difference between both approaches is that marker-trait association is performed on QTL level, while association mapping is performed on whole-genome level. Marker-trait association has the potential to directly identify the relevant sequence variant(s); however, in practice, it has had little success because of the already mentioned drawbacks of association mapping (Steinmetz et al., 2002; Deutschbauer & Davis, 2005; Sinha et al., 2006).
Another unbiased approach is reciprocal hemizygosity analysis (RHA) (Steinmetz et al., 2002). This functional analysis approach evaluates all genes in a QTL for relevance in establishing the trait of interest. It is based on the construction of two isogenic strains in the hybrid diploid background from both parent strains that differ genetically only in the alleles of one copy of a specific candidate gene (Fig. 2). Hence, one strain carries the allele from the trait+ parent and is deleted for the allele from the trait− parent, while the other strain carries the allele from the trait− parent and is deleted for the allele from the trait+ parent. By comparing the phenotypes of the two strains, it will be revealed whether an allele from one genetic background is advantageous over that from the other. As RHA analyzes the contribution of each allele in the hybrid diploid background, it takes into account the possible requirement for interactions with other mutant alleles from the parental backgrounds to confer the phenotype under study. Nevertheless, the diploid hybrid background used in RHA is different from the haploid backgrounds used in the QTL mapping experiment, which means that the possible influence of the ploidy of the strains on the phenotypic expression of the trait cannot be disregarded.
The most conclusive evidence for a gene's contribution to the trait under study resides in the ability of performing allelic replacement in both parent strains. One or more parental alleles are replaced with the allele(s) from the other parent strain and the impact tested on the phenotype. As the phenotypic expression of an allele from one parent may depend on its interaction with other alleles from the same background, the transfer of a single causative allele into the other parent's genetic background does not necessarily result in the desired outcome. Hence, the best solution is to express all the causative alleles from one parent in the other parent's background.
QTL mapping in industrial and natural S. cerevisiae strains
Quantitative traits of S. cerevisiae strains have great importance for industrial use. Those of high economic value include fermentation capacity, stress tolerance, flocculation, substrate range, and yield of products or lucrative compounds. Targeted genetic modification by means of inverse engineering offers the best potential for reliable and predictable improvement of such traits. However, this requires knowledge about the genetic basis of these traits. It has now been convincingly shown that QTL mapping provides a straightforward approach to identify genetic determinants of quantitative traits. However, virtually all published QTL mapping studies in S. cerevisiae have been carried out in laboratory strains, with the exception of one study mapping acetic acid production in a cross between two commercial wine strains (Marullo et al., 2007) and another study mapping several sake brewing characteristics in a cross between a commercial sake brewing strain and a laboratory strain (Katou et al., 2009).
Industrial strains or natural isolates pose a much greater challenge than laboratory strains for QTL mapping. Industrial strains have been reported to show a more complex genome, often being diploid, polyploid, or even aneuploid (Benitez et al., 1996). In this regard, the genetic complexity of industrial strains might strongly hamper QTL mapping as long as it is difficult to obtain a true and stable haploid descendant that still shows the extreme expression of the quantitative trait of interest of its parent and can serve as the trait+ strain in QTL mapping. In fact, if chromosome copy number is strongly contributing to the quantitative expression of the trait, QTL mapping becomes significantly more difficult. In addition, aneuploidy results in a deviation from 2 : 2 segregation for polymorphisms located on multiplied chromosomes, making them unsuitable as genetic markers in QTL mapping. The multiplied chromosomes have therefore been excluded from interrogation in the QTL mapping studies in industrial strains (Marullo et al., 2007; Katou et al., 2009).
Another complication for QTL mapping in industrial strains and natural isolates of S. cerevisiae is the likely presence of sequences that are not present in the reference sequence (most often S288c) used for designing the probes on arrays or aligning the short reads obtained by next-generation sequencing. For example, molecular karyotype analysis of an industrial bio-ethanol production strain has revealed additional chromosomal regions within subtelomeric regions that were not present in S288c (Argueso et al., 2009). These regions even differ between pairs of homologous chromosomes. Generally, these regions do not contain genes that are essential for viability but may contribute to fitness in highly specific environments, such as those used in industry. It is obvious that the identification of genetic determinants localized in genetic regions unique to an industrial strain (e.g. specific chromosomal insertions) and (partially) responsible for the phenotype of interest requires additional efforts. One solution would be to perform de novo assembly of the reads obtained by whole-genome re-sequencing that do not map to the reference sequence. The selected segregants must then be evaluated for the presence of the obtained contigs, and linkage analysis must be performed to confirm that the contigs are significantly linked with the phenotypic trait of interest (Wenger et al., 2010).
Another common feature of industrial strains and natural isolates is their strong variation in sporulation efficiency and spore viability or the tendency to be homothallic, that is, the spontaneous switching of mating type, thus not allowing the generation of stable haploid strains (Tamai et al., 2001). The latter limitation can be overcome by converting homothallic strains into heterothallic strains by deleting all HO gene copies present (Marullo et al., 2007).
Another issue that poses a challenge to QTL mapping in nonlaboratory S. cerevisiae strains is that most industrially relevant traits, such as fermentation capacity or the heterologous production of a valuable compound, are not selectable and require a thorough phenotypic evaluation of each segregant individually. Usually, such phenotypic screenings are laborious and time consuming. Thus, the selection of a large number of segregants with the phenotype of interest is much more laborious for most industrially relevant traits. This also implies that bulk-segregant analyses as performed in the recent studies of Ehrenreich et al. (2010) and Parts et al. (2011) are not directly applicable for most industrially relevant traits.
Conclusion and outlook
Facilitated by the rapid progress in high-throughput genotyping, QTL mapping in S. cerevisiae has become well established in the past 10 years. There have been a number of studies demonstrating efficient QTL mapping in this yeast, as well as the further down-scaling of QTL to the relevant genes and nucleotides. So far, the strategies have been developed and evaluated mostly in laboratory strains and for easily selectable or scorable phenotypic traits. Nevertheless, there has also been one published study that used QTL mapping to resolve an industrially relevant phenotypic trait (i.e. the production of acetic acid which is relevant in wine making) down to gene level (Marullo et al., 2007). Another QTL mapping study of an industrial S. cerevisiae strain focusing on high-ethanol tolerance phenotype will be provided by the authors of this review (Swinnen & Thevelein) in the near future. We suggest that QTL mapping could become a valuable and powerful tool for S. cerevisiae industrial strain improvement programs in the context of inverse engineering. In fact, this approach is able to link interesting phenotypic traits to genetic determinants and thereby allows the exploitation of the high phenotypic diversity of the species S. cerevisiae for the optimization of industrial strains. This requires that the challenges of QTL mapping with regard to industrial strains, which often possess a much more complex genetic constitution and usually nonselectable industrially relevant phenotypic traits, have to be overcome.
We thank Jürgen Claesen for helpful discussions about statistical methods in QTL mapping and Thomas R. Beatman and Joseph N. McInnes for English proofreading of the manuscript. Original research was supported by an SBO grant (IWT 90043) from the Agency for Innovation by Science and Technology (IWT-Flanders), the EC 7th Framework program (NEMO project), IOF-Knowledge platform (IKP/10/002 ZKC 1836), and BOF-Program financing (project NATAR) to J.M.T.