Genome-wide mapping of cellular traits using yeast

Authors

  • Leopold Parts

    Corresponding author
    1. Department of Molecular Genetics, Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Canada
    • Correspondence to: L. Parts, Department of Molecular Genetics, Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto M5S3E1, Canada.

      E-mail: leopold.parts@utoronto.ca

    Search for more papers by this author

Abstract

Yeast has long enjoyed superiority as a genetic model because of its short generation time and ease of generating alleles for genetic analysis. However, recent developments of guided nucleases for genome editing in higher eukaryotes, and funding pressures for translational findings, force all model organism communities to reaffirm and rearticulate the advantages of their chosen creature. Here I examine the utility of budding yeast for understanding the genetic basis of cellular traits, using natural variation as well as classical genetic perturbations, and its future prospects compared to undertaking the work in human cell lines. Will yeast remain central, or will it join the likes of phage as an early model that is no longer widely used to answer the pressing questions? Copyright © 2014 John Wiley & Sons, Ltd.

Introduction

Much of what we know about biology of the cell has first been described in the yeast model. Many insights into utility of genome-wide mapping methods, genetic architecture of traits, and extent of gene-environment interactions have come from the same system. Here, I will discuss approaches for mapping cellular traits in yeast. First, I consider the different mapping methods that can be used, which is largely determined by the types of genetic variation in the mapping population. Then, I look at the types of molecular and cellular traits that can be comprehensively assayed across the genome or a class of molecules. Most favourite phenotypes that can be measured in individual strains and mapped using one of the methods, so I will not attempt to cover all such designs.

Mapping approaches

To map a trait to a locus in the genome, the locus must have more than one allele segregating in the mapping population. The three available types of genetic variation lead to different designs of mapping experiments.

Standing variation in the species

The most natural targets for mapping are the alleles that have arisen over millions of years of evolution, and underlie the wide range of phenotypes in diverse wild yeasts. Such variants can be found by observing linkage of a parental allele to the parental phenotype in the progeny of a cross, as linkage studies of Mendelian traits have done in humans. Baker's yeast and its relatives are excellent for crossing, since their small genome size, short generation time and high recombination rate allow very large and diverse populations to be created, to genotype them cheaply and to map narrow regions. As of now, tens of wild strains and families of hundreds of strains have been sequenced, nearing a total of 100 full genomes of clinical, geographical and human-associated isolates (Bergstrom et al., 2014; Liti et al., 2009a; Skelly et al., 2013), as well as 2000 segregant genomes from four crosses (Bloom et al., 2013; Cubillos et al., 2013; Illingworth et al., 2013; Wilkening et al., 2013b). The short generation time allows advanced crossing designs, where the parental haplotype blocks are further fragmented (Liti and Louis, 2012) to improve resolution. Few commonly-used models can reshuffle alleles from two or more parents to this extent, and to assess the effect of a variant in up to millions of randomized backgrounds. In a two-parent cross, all alleles are expected to be at 50% frequency in the offspring, so effects of alleles that are rare in the species can be ascertained.

For association mapping, it is assumed that locus genotypes are observed in random genetic backgrounds of unrelated individuals. The effect of an allele is then estimated as the difference in the average trait value between individuals with and without it, and thousands of human polymorphisms have been implicated in disease by performing such association tests for all common variants in the genome. Population structure due to ancient and recent crossing events results in violation of the assumption of random genetic backgrounds, as individuals that have shared history also share genotype at many physically unlinked loci. Unfortunately, genomes of many of the sequenced wild yeast strains have complex mosaic structures, indicating that the association model cannot be directly applied (Connelly and Akey, 2012; Liti et al., 2009a). Statistical methods can limit false-positive findings due to such confounding (Connelly and Akey, 2012; Diao and Chen, 2012; Listgarten et al., 2012) but cannot make the underlying problem disappear. Due to population history, only large-effect alleles present in exactly one of the sequenced wild strains (Bergstrom et al., 2014; Liti et al., 2009a; Schacherer et al., 2009) can be identified, such as copy number changes of transporters relevant for stress response (Warringer et al., 2011) or cis-regulatory effects on expression levels (Skelly et al., 2013). However, the association model assumptions may be satisfied in a randomly mating subpopulation (e.g. wine strains), or other yeast species with different population histories, such as Saccharomyces paradoxus (Liti et al., 2009a) or Schizosaccharomyces pombe (Daniel Jeffares, Jurg Bahler, personal communication).

An alternative to obtaining a genotyped and phenotyped set of individuals for the correlation-based methods mentioned above is to select for a specific trait from a genetically diverse population, and find alleles that are enriched in the trait carriers (Brauer et al., 2006; Segre et al., 2006). The allele frequencies can be measured in bulk from the entire selected population, saving time and reagent costs compared to individual genotyping, and then compared to an appropriate null expectation. If the selection for phenotype can be applied over many generations, alleles with small effect will rise in frequency, improving the sensitivity to detect them. When bulk phenotyping and genotyping are available, it is a fast and powerful way of mapping the genetic basis of a trait, especially when combined with many generations of selection to increase power (Parts et al., 2011), and to quantify the relative fitness of individual alleles (Illingworth et al., 2012). This approach is easiest to apply in a two-parent cross, where all alleles are at around 50% initial frequency, and is otherwise conceptually similar to pooled screens of mutant strains, where frequencies of strains are quantified instead of alleles.

Each of these approaches is, in principle, available in all eukaryotes. The resolution is highest with the intercross QTL (iQTL) design of multiple rounds of crossing followed by bulk selection and genotyping (Liti and Louis, 2012), resulting in mapped regions of < 10 kb that include only a few genes (Parts et al., 2011). This design also has high power (Ehrenreich et al., 2010) and can be applied to populations of millions of yeast segregants. The ability to detect non-zero effects using linkage and association is very roughly proportional to the square root of the number of individuals. For most yeast growth traits examined so far, theory and experiments have shown that 800–2000 F1 segregants is enough to explain > 80% of narrow sense heritability in an average trait (Bloom et al., 2013), and I expect these genotyped strains will be much more deeply phenotyped to characterize the genetic architecture of many additional phenotypes. In humans, linkage mapping is limited by family sizes, and association mapping using thousands of individuals is the most commonly used design. While pooled selection over many generations is not feasible in humans, the very large collections of genotyped individuals, such as hundreds of thousands amassed by public consortia and private companies, will have power to detect common alleles of very small effect sizes. However, cellular phenotyping of multiple cell types is usually not feasible retrospectively for these large cohorts; the true environmental influence is unknowable, and alleles for focused follow-up studies are still substantially easier to create in yeast.

Genome-scale collections of designed alleles

Instead of relying on nature to provide, we can generate a set of alleles that span the genome and associate them with changes in phenotype. The most widely used genome-wide collection comprises replacements of each of the ~5000 non-essential yeast ORFs by a barcoded selection marker (Giaever et al., 2002). Similar reagents exist in the in fission yeast (Kim et al., 2010) and the Sigma genetic background of baker's yeast (Dowell et al., 2010). To measure trait values for the alleles, the strains can be grown on agar plates to quantify colony properties from images (Wagih and Parts, 2014; Wagih et al., 2013), or together in a pool under selection to quantify fitness from barcode frequencies by microarrays or sequencing. The difference in effects of two wild-type alleles can be measured by contrasting their reciprocal deletions in a diploid hybrid background – reciprocal hemizygosity scanning (Steinmetz et al., 2002; Wilkening et al., 2013a). The Synthetic Genetic Array (SGA) method (Tong and Boone, 2006) can be used to cross multiple alleles into one background, which enables introducing and phenotyping reporters in thousands of strains. Collections of temperature-sensitive versions of essential genes (Ben-Aroya et al., 2008; Li et al., 2011) or ORF overexpression from a plasmid (Sopko et al., 2006) can further be used to establish effects of other types of alleles.

The appeal of these collections is the clean separation of each allele from the genetic background, which makes the observed results easy to interpret. In contrast, approaches using natural variation require statistical models to distinguish the effect of an individual allele from that of the rest of the genome. However, the precise design is also a limitation, as the effects are specific to the generated allele and the genetic background it is introduced into. In general, it is difficult to use this design to find joint effects of multiple alleles in an unbiased manner; measuring the phenotype of all yeast double deletions is already a multi-year endeavour (Costanzo et al., 2010). It is also not straightforward to query the effect of all knockouts in a new genetic background, as it requires generating a new genome-wide resource.

For systematic mapping of fitness in an environment to gene knockouts or overexpression in eukaryotes, there is currently no parallel to yeast. However, this advantage may be rapidly eroding, as the CRISPR/Cas9 system is used to generate null alleles genome-wide in multiple human backgrounds (Shalem et al., 2013; Wang et al., 2013b) and applied in model organisms (Kondo and Ueda, 2013; Shan et al., 2013; Tzur et al., 2013; Wang et al., 2013a). Nevertheless, as the same machinery is also available in yeast (DiCarlo et al., 2013), it will serve as a rapid and cheap model for testing these new approaches. Moreover, it is not certain that essential gene function can be tested with the same strategy as yeast temperature-sensitive alleles, although knockdown approaches and conditional knockouts are available.

Randomly generated alleles

Instead of using historical mutations or products of manual labour, alleles can be generated by random mutagenesis. The per-generation mutation rate in yeast is low (Lang and Murray, 2008) but can be increased by radiation, chemicals or using mutator lines. Alternatively, transposon mutagenesis can induce random disruptions in the genome that leave a characteristic sequence signature (Gabriel et al., 2006). The insertions in single cells can be easily localized in pools by isolating total DNA, enriching for transposon sequence and sequencing the breakpoint in the genome (Langridge et al., 2009). There are potential biases due to non-random insertion profile of the standard Ty enzyme (Kumar et al., 2004), and the possibility of multiple jumps of the transposon, but, given these problems are controlled, large populations of random knockout alleles can rapidly be generated in new genetic backgrounds.

If mutagenesis is saturating and all genes have several null alleles in the population, genes essential for fitness can be identified by the lack of mutations in them, similarly to pooled barcode screens of deletion alleles. However, this approach requires either to resequence mutated lines from standard mutagenesis, or genotype only transposon insertion sites, as the frequency of any single allele in the population is low. Alleles that increase fitness can be found by monitoring their spread under selection (Burke et al., 2010; Dunn et al., 2013; Lang et al., 2013; Segre et al., 2006). As fitness can be engineered to be the readout of many cellular features, evolve-and-sequence studies in yeast should generate many previously unobserved yeast phenotypes, as well as novel genetic contributors to basic biological processes.

Waiting for new beneficial alleles to map causal genes is more difficult in higher eukaryotes, where generation times are substantially longer and genomes larger. Mutagenesis and transposon insertion approaches can be useful, but require precise control of the mutation rate and mutagenesis process to make sure that global allele characterizations are feasible and reliable. The main advantage of yeast is the option of doing the screens in a haploid background, such that otherwise recessive mutations can have an effect. Only dominant screens are available in human lines until very recently, although recessive ones exist in aneuploid cancer cell lines (Carette et al., 2009), haploid embryonic stem cells (Pettitt et al., 2013) or using genome-wide gene-trapping methods (von Melchner and Ruley, 1989). For now, new random alleles that give rise to adaptive phenotypes are best assessed in a haploid background.

Assays of cellular traits

The scientific curiosity to understand how things work has produced diverse means of quantifying aspects of the cell. Below, I look at the cellular traits ordered by their distance from the genetic information, starting with DNA-based ones, such as transcription factor binding, and ending with fitness, the ultimate phenotypic readout for yeast. The relevant criterion is whether they can be measured in at least medium throughput to enable linkage and association, or selected for bulk phenotyping and artificial evolution. Another important factor is how much of the knowledge gained can be generalized to other organisms. As most basic biological processes can be engineered to have a reporter that enables screening for genes required for wild-type behaviour, I will not attempt to cover all such designs. The discussion below has a genome-wide focus; mapping yeast-specific traits has been thoroughly covered in a recent review by Warringer and Blomberg (2014).

DNA traits

The chemical state of the DNA molecule, and its occupancy by transcription factor, nucleosome and other DNA binding proteins, are perhaps the most fundamental traits for tracing the effects of genotype, and can be assayed using sequencing assays. For creating a biophysical model of DNA binding, there are in vitro and controlled in vivo alternatives available (Jolma et al., 2013; Zeigler and Cohen, 2014), and for mapping eukaryotic DNA occupancy one can go directly to the organism of interest. Yeast is a cheap starting model for assessing the genetic basis of three-dimensional (3D) DNA conformation (Duan et al., 2010), but lack of widespread heterochromatin, enhancers and higher-order structure limits the generality of conclusions. Other DNA features, such as replication landscape (Liachko et al., 2013), ecombination rate (Illingworth et al., 2013; Mancera et al., 2008) and telomere length (Liti et al., 2009b), have been assayed in different backgrounds. Some more traits that are studied in humans but do not have a natural parallel in yeast include DNA methylation, a few types of histone modification and DNAse sensitivity.

RNA traits

RNA level measurements are relatively straightforward to perform in all species. The first genetics of gene expression studies were conducted in an F1 cross of baker's yeast (Brem et al., 2002). mRNA abundance can be established using microarrays or sequencing but other RNA species, such as long non-coding RNAs, miRNAs and piRNAs, are largely absent in yeast, save for some types of short non-coding RNAs (Neil et al., 2009; Xu et al., 2009). RNA localization can be measured using fluorescence in situ hybridization for individual molecules, but there is no direct benefit for using yeast for this purpose beyond simplicity of growth and handling. Finally, ribosome occupancy can be measured to approximate mRNA translation rates (Ingolia et al., 2009).

The main advantage of yeast for RNA quantification and ribosome profiling is the small genome size and low complexity of the transcriptome, with very few alternatively spliced transcripts. This enables linkage mapping of all haplotype blocks against all open reading frame (ORF) mRNA levels, with good power due to substantially fewer tests to detect trans effects (Yvert et al., 2003) that are otherwise difficult to map using association in humans. As thousands of segregants have already been sequenced and this population size gives good power for trait mapping in yeast, it is expected that the most thorough genetic analysis of gene expression will soon be undertaken in this model.

Protein traits

Protein levels are a desirable trait to measure and map. The quantification methodology ranges from 2D gels for abundant proteins to mass spectrometry of tryptic peptides in a global (de Godoy et al., 2008; Lu et al., 2007; Marguerat et al., 2012), intermediate (Venable et al., 2004) or focused (Picotti et al., 2013) fashion. Alternatively, the collection of ORF–GFP fusion alleles (Huh et al., 2003) can be monitored using high-throughput measurements to quantify the level and localization of the protein, and tandem fluorescent protein fusions can be used to quantify turnover (Khmelinskii et al., 2012). For global mapping using gels and mass spectrometry, the similarities and benefits are similar to that of RNA traits. The assays are currently limiting for good quantification, but the technology is rapidly advancing, and is being used for proteome-wide quantification and mapping (Khan et al., 2013; Marguerat et al., 2012; Picotti et al., 2013; Skelly et al., 2013; Wu et al., 2013). Mass spectrometry-based methods also enable quantifying and mapping post-translational modifications (Marx et al., 2013), but the space of their combinations is too large to generate accurate reference spectra for all possibilities.

The collection of yeast GFP alleles is a unique, rich resource that does not yet have an equivalent in other organisms. The fluorescent alleles can be used in linkage and selection designs when combined with bulk phenotyping using cell sorting (Albert et al., 2014; Ehrenreich et al., 2010; Parts et al., 2014). Protein localization can be measured and mapped, and selection can be performed after phenotyping individual clones. Cell-to-cell variability in protein abundance has also been quantified and mapped (Ansel et al., 2008) using this resource, but requires the GFP alleles in many genotyped backgrounds to perform the experiment.

Metabolites and small molecules

An industrial view of cells is that of metabolic machines. Good reference spectra exist for well-characterized metabolic intermediates that can be quantified using standard mass spectrometry or nuclear magnetic resonance spectroscopy techniques, enabling individual phenotyping. Screening or evolving mutants can help to find enzymes that are responsible for particular metabolic steps by looking for accumulation of intermediate products (Clasquin et al., 2011). Indicators of small molecule levels can be used to quantify concentrations of metals (Domaille et al., 2008), protons (Prosser et al., 2010), ATP and redox state (Vevea et al., 2013).

Metabolite levels in microorganisms have been studied for decades, and loci that contribute have been mapped in many species, ranging from yeast (Breunig et al., 2014; Steyer et al., 2012) to human (Kettunen et al., 2012). Much of the mapping work has been undertaken for the noble cause of making better wine (Brion et al., 2013; Linderholm et al., 2010; Richter et al., 2013; Salinas et al., 2012). Unfortunately, metabolite traits cannot be used in selection designs unless reporters are engineered and integrated. Much genetics remains to be done from both basic biology and industrial viewpoints, as effects of allelic variation are largely not characterized beyond establishing the metabolic pathways using knockouts. The panels of wild yeast strains surely hold additional metabolic surprises waiting to be mapped.

Cell and subcellular morphology

Gross features of cells can be quantified from microscopy images. In yeast, the cell size and shape are predictable compared to mammalian cells, and are therefore easier to model quantitatively. Tens of features, ranging from cell size to cell cycle position (e.g. budding index and nuclear morphology) can be determined using a standard set of markers from such models (Handfield et al., 2013; Nogami et al., 2007; Ohya et al., 2005). The number and shape of each subcellular compartment can be visualized by introducing a compartment-specific fluorescent protein that highlights it (Vizeacoumar et al., 2010), and large screens are under way to identify the genes that perturb the wild-type characteristics. These microscopy-based subcellular traits cannot be selected in a straightforward way, but can be phenotyped for linkage and association mapping with automated solutions, similarly to GFP localization. Some whole-cell morphology traits, such as cell size and budding index, can, however, be distinguished in a flow sorter, and there are attempts at combining cytometry with imaging that would enable sorting based on compartment reporters as well.

Growth rate

Proficiency to grow in different environments is perhaps the most well-studied yeast phenotype. Growth lag, rate and efficiency give information about different aspects of fitness (Warringer et al., 2003) and can be quantified from colony sizes on agar plates, liquid culture cell density or microscopy time courses of individual cells (New et al., 2014). Yeast cells have the advantage of cancer lines – clonal growth, no limits to passaging, and quick generations to accurately measure doubling time. The growth rate can be selected for in pooled growth in solid and liquid media, enabling the bulk selection-based mapping approaches. The genetic contributors to fitness have been studied to a great extent (Bloom et al., 2013; Cubillos et al., 2013; Ehrenreich et al., 2012; Gagneur et al., 2013; Lee et al., 2013; Warringer et al., 2011) and a reasonable understanding of the genetic architecture in different conditions has been developed. If any other phenotype can be encoded as a fitness trait, it will be easy to measure using standard growth assays and map using selection.

Many more features define the state of the cell besides the ones touched on here. There are levels and activation states of signalling molecules, precise cell cycle stage, metabolic flux, numbers of subcellular machines, membrane permeability, age of the cell, etc. A standing goal of importance is developing additional assays to quantify most informative cellular characteristics to understand, and ultimately predict, the state of the cell.

Future role for yeast in genome-wide mapping studies

Mapping traits is simpler in yeast than any other eukaryote, even with the population structure confounding association analyses in wild strains, as in most species (Figure 1). It is straightforward to generate millions of genetically diverse individuals for linkage mapping, and to genotype them cheaply due to small genome size. Clonal exponential growth enables powerful mapping approaches that utilize fitness under selection as a phenotypic readout. Haploid genome allows mapping recessive alleles using mutagenesis or artificial evolution approaches. Genome-scale reagent collections exist for clean assessment of allele effects in diverse contexts. Generating lines and reporters for precise readouts and perturbations for both mapping and follow-up experiments is, for now, substantially easier compared to human cell lines. Most assays of cellular phenotypes can, in principle, be performed in any organism, but not all mechanisms are conserved all the way to yeast, so not all traits are of equal interest for mapping purposes.

Figure 1.

Comparison of mapping approaches and trait measurements performed in human and yeast

The focus in human trait mapping is currently on DNA, RNA and disease traits, as the assays are the most mature and can be performed in large cohorts, given sufficient funding. Besides the power to map trans variants that underlie gene expression in many environments, and screens for human disease gene analogues, yeast has little to offer that can be directly translated to humans in this specific context. However, the emphasis on the nucleic acid traits is largely driven by the recent availability of sequencing-based assays for phenotypic readout. Once the low-hanging fruit is gathered, and expensive large-scale experiments start giving incremental returns, the rewards will be in understanding downstream biological processes, many of which are shared between human and yeast.

Perhaps most informative of the cellular state is the proteome. While mass spectrometry-based methods to quantify protein levels are feasible for all organisms, and GFP collections are compiled in other models (Sarov et al., 2012), it still is easiest to carry out proteome-wide studies in yeast. Markers of organelles and processes can be introduced into different genetic backgrounds using SGA, and a very rich readout of protein level and localization can be obtained using high-throughput microscopy. As images are made of individual cells, potentially during a time course, both cell-to-cell variation and temporal dynamics can be quantified. Even bulk-phenotyping approaches are feasible, either by flow sorting at the marker level or by high-throughput phenotyping of individual clones. Linking protein traits to pathway activation, subcellular morphology and, ultimately, fitness, can establish a complete chain of causal links between genotype and phenotype.

Characterizing the genetic make-up of a yeast phenotype is perhaps not interesting in isolation – trait architectures depend heavily on the chosen strains and crossing design and, while informative, it is not obvious that the results will extrapolate to the genetics of other populations, organisms and traits. The value of mapping cellular traits in yeast is the same as that of doing any analysis in yeast; to understand more of the basic biology that is shared between all eukaryotes, using the most effective tools. The genetic analysis is important, as it gives a way to distinguish between causes and effects by rooting the causal chain of events to the occurrence of an allele, and identifies components required for a parental phenotype. As long as there are basic questions about eukaryotic cell biology remaining, mapping and screening cellular traits in yeast remains an efficient and powerful approach for attacking them.

Acknowledgements

I would like to thank Joshua Bloom, Amy Caudy, Helena Friesen, Anton Khmelinskii and Gianni Liti for feedback on different sections of this text.