Comparative genome analysis of a Saccharomyces cerevisiae wine strain

Authors


  • Editor: Teun Boekhout

Correspondence: Isak S. Pretorius, The Australian Wine Research Institute, PO Box 197, Glen Osmond, Adelaide, SA 5064, Australia. Tel.: +61 8 83036610; fax: +61 8 83036601; e-mail: sakkie.pretorius@awri.com.au

Abstract

Many industrial strains of Saccharomyces cerevisiae have been selected primarily for their ability to convert sugars into ethanol efficiently despite exposure to a variety of stresses. To begin investigation of the genetic basis of phenotypic variation in industrial strains of S. cerevisiae, we have sequenced the genome of a wine yeast, AWRI1631, and have compared this sequence with both the laboratory strain S288c and the human pathogenic isolate YJM789. AWRI1631 was found to be substantially different from S288c and YJM789, especially at the level of single-nucleotide polymorphisms, which were present, on average, every 150 bp between all three strains. In addition, there were major differences in the arrangement and number of Ty elements between the strains, as well as several regions of DNA that were specific to AWRI1631 and that were predicted to encode proteins that are unique to this industrial strain.

Introduction

Saccharomyces cerevisiae can be regarded as the first domesticated microorganism. For thousands of years, humans have used this yeast for baking, brewing and winemaking due to its ability to ferment sugars into alcohol and carbon dioxide efficiently (Verstrepen et al., 2006). Over time, passaging and selection strategies have improved the performance of these industrial strains, producing genetically distinct yeasts that are highly suited for specific industrial applications.

Variation between strains of S. cerevisiae represents an attractive model for research on the role of selection in shaping a eukaryotic genome. Furthermore, while industrial strains represent a diverse group of organisms, many other strains of S. cerevisiae have been isolated for use in nonindustrial settings, such as fundamental biological research. These laboratory yeasts represent important ‘out-groups’ for comparison of industrial strains. While industrial strains were isolated for their fermentative ability under stressful environmental conditions (low pH, low nutrient availability, high ethanol concentrations and fluctuating temperatures), laboratory strains, such as S288c, were selected for fast and consistent growth in nutrient-rich laboratory media (Mortimer & Johnston, 1986). Comparison of genome structures between industrial and laboratory strains should, therefore, highlight how differences in selection regimes have differentially shaped the genomic structure of these strains and assist in the identification of genomic loci that play important roles in regulating key industrial phenotypes.

Indeed, comparative genomic studies of yeast strains have already shown that there can be substantial nucleotide variation within the S. cerevisiae species. A recent genome sequencing study revealed that S. cerevisiae strain YJM789, which was isolated from the lung of an AIDS patient, displayed around 70 000 discrete nucleotide variations when compared with S288c, in addition to several ORFs that were unique to YJM789 (Wei et al., 2007). It has also been estimated that nucleotide variation among laboratory strains can be as high as one variant nucleotide in every 300 bp of sequence (Schacherer et al., 2007). This nucleotide variation, when translated into phenotypic differences, most likely underscores the ability of yeast to adapt to varied environmental conditions and to be efficiently selected and bred for the many different roles for which yeast is currently used. Thus, the identification and characterization of the genomic diversity that exists within S. cerevisiae is vital to understanding the selective potential of this species.

To begin the investigation into how industrial yeasts differ from other strains of S. cerevisiae, we have sequenced the genome of a haploid wine yeast, AWRI1631, and compared this sequence with both the laboratory strain S288c, as well as the pathogenic isolate YJM789 (Goffeau et al., 1996; Wei et al., 2007). AWRI1631 was shown to be substantially different from both strains, especially at the level of single-nucleotide polymorphisms (SNPs), which were present, on average, every 150 bp. There were also major differences in the arrangement and number of Ty elements between the strains, in addition to several regions of DNA that were specific to AWRI1631.

Materials and methods

Isolation of the haploid wine yeast AWRI1631

The haploid S. cerevisiae strain AWRI1631 (MATa; ΔHO) is descended from the diploid industrial wine strain N96 (similar to strains known in the trade as EC1118 and Pris de Mousse). Briefly, N96 (Anchor Yeast, Cape Town, South Africa) was sporulated and haploid progeny were isolated using microdissection. As N96 is homothallic, these haploid progeny formed homozygous diploid strains via mating-type switching and several of these homozygous diploid lines were isolated for further manipulation. Homologous recombination was then used to delete the HO locus from each of these strains (Walker et al., 2005), which were then resporulated and stable haploid (ΔHO) progeny were isolated. Several progeny from each homozygous diploid line were then tested for their winemaking properties, with AWRI1631 selected as being one of several strains with winemaking characteristics matching those of the original diploid parent (data not shown).

Genome sequencing, assembly and analysis

Shotgun genome sequencing was performed using the GS-FLX system (Roche, Mannheim, Germany) by the Australian Genome Research Facility. Sequence contigs were assembled de novo and scaffolded using the existing S288c sequence (Goffeau et al., 1996). Sequences were then manually edited using seqman pro (DNAstar, Madison) with data from Sanger sequencing reactions used to clarify several problematic regions. Final alignments between the S288c (Goffeau et al., 1996), YJM789 (Wei et al., 2007) and AWRI1631 genomes were performed using mavid (Bray & Pachter, 2004) with nucleotide variation (SNPs, insertions and deletions) catalogued using seqman pro. Final sequence contigs have been deposited at DDBJ/EMBL/GenBank under the Whole Genome Shotgun project accession number ABSV00000000. The version described in this study is the first version, ABSV01000000.

For gene annotation, ORFs were predicted using glimmer 3.02 (Delcher et al., 1999), annotated by their reciprocal best match according to blastp (Altschul et al., 1990) and aligned using clustalw (Larkin et al., 2007). Genome comparisons were visualized using artemis and act (Rutherford et al., 2000; Carver et al., 2005).

Results and discussion

Genome sequencing provides the most complete understanding of the genome structure of an organism and allows for the most in-depth comparisons to be made between related species. A haploid wine yeast, AWRI1631, was recently developed for use as a model strain for the analysis of wine yeast systems biology (A.R. Borneman et al., unpublished data). AWRI1631 was produced from the common commercial wine strain N96, which, like most industrial yeasts, is a homothallic diploid. This haploid strain has retained the robust fermentation kinetics of its parent while producing wine with a composition and flavour profile that is also equivalent to N96. However, due to its stable haploid genome, it is far easier to manipulate genetically.

In order to provide a sound basis for future investigations, in addition to obtaining a thorough understanding of the genomic differences between an industrial wine yeast and a laboratory counterpart, we sequenced the genome of this model wine yeast strain. AWRI1631 was subjected to one round of GS-FLX sequencing, producing approximately sevenfold genomic coverage (supporting Table S1). Following de novo assembly, 2971 contigs were aligned to the existing S288c genome sequence and manually edited. This reduced the total number of contigs to 2489, which cover 11 088 986 bp of the S288c genome (92%), in addition to 113 115 bp of sequence, which could not be assigned to any of the S288c chromosomes.

Initial nucleotide alignments of AWRI1631 and S288c showed that there were 68 290 instances of nucleotide variation between the strains. The majority of these events (57 463) were SNPs, with the remainder comprised of insertions and deletions (indels). While there was a nucleotide variant every 162 bp, on average, between the strains, the densities of both SNPs and indels were not evenly distributed across the genome. There were several regions that displayed substantial increases in polymorphism density that were interspersed throughout regions of relatively high interstrain conservation (Fig. 1a–c).

Figure 1.

 Nucleotide variation in the wine yeast AWRI1631. (a) Variation across chromosome I in AWRI1631 and S288c, showing SNPs (green bars), deletions (red bars) and insertions (blue bars). Variation is calculated as the number of variant bases in a 101 bp sliding window, centred at base 51 and with a 1 bp resolution. (b) Variation across chromosome XV. (c) Expanded view of the region of chromosome XV highlighted in Fig. 1b by the thick black bar, showing one of the peaks in nucleotide variation. (d) Total SNP variation observed between AWRI1631, S288c and YJM789.

Remarkably, the level of nucleotide polymorphism observed between AWRI1631 and S288c (0.6%) is very similar to that separating S288c and YJM789, a strain of S. cerevisiae isolated from the lung of an AIDS patient (Wei et al., 2007). To determine whether the similar nucleotide divergence observed between the laboratory and nonlaboratory isolates was due to AWRI1631 and YJM789 sharing a common set of SNP alleles relative to S288c, we performed a three-way genomic comparison of the strains. The three yeast genomes were aligned using mavid (Bray & Pachter, 2004) and each chromosome was then cropped to exclude the telomeric regions which, due to their repetitive nature, were generally absent or misaligned in either AWRI1631 or YJM789. Analysis of the three-way alignments showed that SNP density between the three strains was approximately equal, although AWRI1631 displayed a slightly higher level of nucleotide divergence when compared with both S288c and YJM789 (Fig. 1d, Table S2). The three strains, therefore, represent equally distant lineages, with the two nonlaboratory strains being no closer at the nucleotide level than their laboratory counterpart.

Intraspecific differences in large genomic rearrangements

In addition to SNP variation between strains, there were 8155 differences between AWRI1631 and S288c due to insertion and deletion events. The location of 3299 of these indels overlapped with indels present in YJM789 when compared with S288c, while 2978 matched YJM789 in both their exact location and size. These ranged from single base pair mutations (5637) to large indels, which covered up to several kilobases each (Fig. 2a). These large indels were generally associated with the differential presence of Ty transposons in the genomes of each strain (Fig. 2b–d), although there was at least one large insertion shared between AWRI1631 and YJM789 that is not associated with a transposable element. Of the 52 Ty elements in the S288c genome, there were clear data to indicate that 42 were absent from the AWRI1631 genome, with the majority of these also being absent in YJM789. In addition to those Ty elements found in S288c, there were seven additional Ty elements present in the genome of YJM789, of which only one was present in AWRI1631.

Figure 2.

 Differences in transposon composition between Saccharomyces cerevisiae strains. (a) A summary of the Ty composition of AWRI1631, S288c and YJM789. The chromosomal location and S288c systematic name (when available) are displayed, with the presence or absence of each Ty element in each strain indicated by black and white boxes, respectively. Grey boxes indicate that the presence of the Ty element cannot be established due to a sequencing contig break at that location. (b) Genomic alignments of chromosome II from S288c, AWRI1631 and YJM789 (grey bars) highlighting the positions of Ty elements (green boxes) and putative Ty elements (white boxes) in each strain. Regions of homology between each strain are represented by regions shaded red, while homologous, but inverted regions are shaded blue. (c) Genomic alignment of a section of chromosome IV showing the absence of YDRCTy1-2 in YJM789. Alternating sections of grey in AWRI1631 represent adjacent sequencing contigs. (d) Genomic alignment of a section of chromosome XIII showing the absence of YMRCTy1-3 in AWRI1631 and YJM789, and the absence of YMRCTy1-4 in AWRI1631.

While very similar in their Ty content and distribution, YJM789 and AWRI1631 differ in the presence of a large inversion on chromosome XIV and interchromosomal translocation between chromosomes VI and X, both of which are present in YJM789 but absent from both S288c and AWRI1631 (Wei et al., 2007).

Gene content of ‘natural’ vs. laboratory isolates

In order to obtain ORF sequences from AWRI1631 for comparison with those of other strains, the gene prediction program glimmer (Delcher et al., 1999) was used to identify a total of 5687 ORFs from those AWRI1631 sequences that could be aligned to the S288c genome (Table S3). As the AWRI1631 genome sequence is comprised of a large number of separate sequencing contigs, there were many instances where genomic regions from AWRI1631 that were homologous to a single S288c ORF spanned a sequencing gap. In these situations, there were often multiple, nonoverlapping ORFs in the AWRI1631 sequence that each matched to different parts of a single S288c ORF such that 5204 (90%) of the AWRI1631 ORFs could be matched to 4634 S288c proteins by amino acid homology. This represents 81% coverage of the predicted S288c proteome if dubious ORFs are disregarded. From the remaining 483 AWRI1631 proteins, blast searches performed against the nonredundant GenBank protein database showed that these either lacked homology to any other protein sequence or represented matches to dubious S288c ORFs and are most likely nonfunctional. The only exception to this was orf_09_0030, which is predicted to encode a protein homologous to the combined products of the S288c genes YIL167W and YIL168W. These two genes are presumed to be nonfunctional in S288c, but are fused in some S. cerevisiae strains and other members of the Saccharomyces sensu stricto group where the hybrid gene encodes an l-serine dehydratase (Seufert, 1990; Kellis et al., 2003; Godard et al., 2007). Like these strains, AWRI1631 also appears to contain a functional YIL167W–YIL168W serine dehydratase enzyme that is absent from S288c.

Conserved genomic loci

Before examining the levels of protein conservation for loci conserved between AWRI1631 and S288c, we first observed whether there was any evidence for the selection of specific protein variation at the nucleotide level through the analysis of nonsynonymous mutation (dN) rates (Yang & Bielawski, 2000). clustalw protein alignments of each pair of AWRI1631, S288c orthologues were converted to codon-gapped nucleotide alignments (Suyama et al., 2006) and the number of nonsynonymous mutations were determined (Yang, 1997). While there was a large amount of variation in nonsynonymous mutation rates, when these individual values were grouped by GO categories, only a few groups displayed a concerted difference in their rates of nonsynonymous mutation, which could indicate the presence of selective pressure on their protein-coding sequences (Fig. 3). Of these groups, those involved in translation (GO terms ribosome, translation regulator activity and translation), showed the largest effect, with these three groups consistently displaying significant decrease in the frequency of nonsynonymous mutations. This is consistent with the well-known constraints that exist on the sequence of these highly conserved proteins. Of the remaining GO categories, proteins located in the cell wall (GO component) and those involved in signal transduction (GO function) displayed slightly higher nonsynonymous mutation rates compared with the other categories. Unlike this situation observed for the ribosomal proteins, increased levels of nonsynonymous mutations may indicate diversifying selection that favours protein sequences other than those present in S288c.

Figure 3.

 Genomewide analysis of nonsynonymous substitution rates between AWRI1631 and S288c. The rate of nonsynonymous substitution was calculated for each pair of AWRI1631 and S288c orthologues and then grouped by GO-slim terms. Box and whisker plots are displayed for the cellular component (green), molecular function (red) and biological process (blue) groupings, showing the median nonsynonymous mutation rate (thick black line), interquartile range (coloured boxes) and the minimum and maximum values (excluding outliers) are represented by the dotted lines. Those GO terms showing reductions or increases in nonsynonymous mutation rates are also highlighted (*).

When the amino acid, rather than nucleotide sequences of each pair of AWRI1631 and S288c orthologues were compared, it was shown that the two predicted proteomes were very similar, displaying an average amino acid identity of 99.3% and a similarity of 99.6%. However, there were large variations in the level of conservation observed on a protein-by-protein basis (Fig. 4). Accompanying the differences due to amino acid substitutions, there were also 1118 and 54 instances where the predicted protein products of AWRI1631 were truncated or extended at either the amino or carboxyl termini by more than 10% compared with S288c, respectively. It must be noted, however, that many of the truncations observed in AWRI1631 might be due to gaps in sequencing contigs as 636 lack either an intact start or stop codon, indicating that they were located at the extremities of their respective contig sequences.

Figure 4.

 Amino acid variation in AWRI1631. (a) A histogram of amino acid variation per protein in AWRI1631 when compared with S288c. The number of conservative amino acid substitutions (red) and nonconservative amino acid substitutions were calculated per protein with each protein binned by its average level of variation. The numbers above each column give the exact number of proteins within each bin (b) Alignments of highly variable proteins from AWRI1631. AWRI1631 proteins were aligned to their S288c counterparts using clustalw (Larkin et al., 2007). Each pair of aligned proteins is shaded to indicate areas of identity (yellow), conservative amino acid substitutions (red), nonconservative amino acid substitutions (blue) and indels (green). The total level of amino acid identity and similarity present across each protein pair is also listed to the right of each alignment.

Nonconserved genomic loci

From the 10 large insertions in AWRI1631 that were shared with YJM789, only one region, located on chromosome IX, was not primarily composed of transposon-related sequences. This region is predicted to encode the KHR1 heat-sensitive killer toxin, a gene that has been identified previously as being present in YJM789 but not in S288c (Goto et al., 1990; Wei et al., 2007). In addition to insertions that were shared with YJM789, there were another 10 AWRI1631-specific insertions that were not Ty associated. However, none of these regions were predicted to contain functional ORFs.

As mentioned previously, in addition to AWRI1631-specific genomic regions that could be mapped to a chromosomal location by conserved flanking regions, there were 29 contigs that could not be assigned to a particular chromosome. This 113 kb of AWRI1631-specific sequence was predicted to encode an additional 37 proteins, of which 27 had a significant (E<1−10) match to at least one other protein in the GenBank nonredundant protein database (Table 1). Despite the fact that these 27 predicted ORFs all lie within the AWRI1631 sequence that could not be clearly aligned to the S288c genome, 20 of the 27 showed significant similarity to S288c protein sequences. Of these 20 homologous ORFs, nine represent a highly conserved, reciprocal best match of an S288c protein. As such, the presence of the conserved ORF sequence within divergent telomeric sequences is probably interfering with the ability to accurately locate the surrounding contig with respect to the S288c genome (Table 1; S288c orthologues). Thus, while these ORFs most likely represent orthologues of S288c proteins, they are located within a sequence that is highly divergent between AWRI1631 and S288c.

Table 1.   AWRI1631-specific ORFs
ORFS288c matchE valueGenBank matchE valueNotes
  • *

    S288c orthologue.

  • S288c paralogue.

orf_c1095_0010YHR211WFLO53E−31XP_451520.1Unnamed protein product
Kluyveromyces lactis
1E−81Cell wall mannoprotein
orf_c1095_0020YNL202WSPS192E−10XP_500963.1Hypothetical protein
Yarrowia lipolytica
5E−63Sorbitol utilization protein
orf_c1095_0030YKL222C 2E−24NP_986434.1AGL233Cp
Ashbya gossypii ATCC 10895
1E−22chrIX telomere – zinc cluster tfactor
orf_c1097_0010YER190W*YRF1-23E−34AAB64717.1Yer190wp
S. cerevisiae
2E−31chrV telomere – Y′ element
orf_c1097_0050YOR396W*YRF1-82E−16EDN61332.1Hypothetical protein SCY_5432
S. cerevisiae YJM789
7E−14chrVIII telomere – Y′ element
orf_c1098_0010No hitXP_451168.1Unnamed protein product
Kluyveromyces lactis
1E−32Contains hAT family dimerisation domain
orf_c1098_0030YKL215C 0E+00XP_001389908.1Hypothetical protein An03g00010
Aspergillus niger
0E+005-Oxo-l-prolinase
orf_c1098_0040YNL274CGOR12E−86XP_458332.1Hypothetical protein DEHA0C16071g
Debaryomyces hansenii CBS767
1E−108Glyoxylate reductase
orf_c1098_0050YBR180WDTR12E−29XP_001382795.2AQ Resistance
Pichia stipidis CBS6054
1E−153MFS multidrug transporter
orf_c1099_0050No hitXP_001643070.1Hypothetical protein Kpol_458p5
Vanderwaltozyma polyspora DSM
7E−12Function unknown
orf_c1103_0010YAL067C*SEO11E−147EDN59792.1Supp. of sulphoxyde ethionine resis.
Saccharomyces cerevisiae
1E−144chrI telomere
orf_c1103_0020YAL067C*SEO11E−117NP_009333.1Putative permease, member of the allantoate transporter subfamily1E−115chrI telomere
orf_c1103_0030No hitXP_001589206.1Hypothetical protein SS1G_09839
Sclerotinia sclerotiorum 1980
5E−56Neutral amino acid transporter
orf_c1104_0010No hitXP_567652.1Early growth response protein 1
Cryptococcus neoformans
3E−27C2H2 zinc finger domain protein
orf_c1104_0020YLR027CAAT21E−97XP_001390014.1Hypothetical protein An03g01120
Aspergillus niger
1E−122Aspartate transaminase
orf_c1105_0010YJL216C* 1E−176EDN63177.1Hypothetical protein SCY_3086
S. cerevisiae YJM789
1E−178chrX telomere –α glucosidase
orf_c1105_0020YJL216C* 1E−117EDN63176.1Hypothetical protein SCY_3085
S. cerevisiae YJM789
1E−123chrX telomere –α glucosidase
orf_c1105_0030YFL052W 1E−158EDN63175.1Conserved protein
S. cerevisiae YJM789
1E−155Maltose activator protein
orf_c1108_0020No hitXP_451167.1Unnamed protein product
Kluyveromyces lactis
2E−12Function unknown
orf_c1110c1118_0010No hitBAA95611MPR1
S. cerevisiae∑1278b
1E−132Azetidine-2-carboxylic acid acetyltransferase
orf_c1110c1118_0020YNL332WTHI125E−16NP_012690.1THI11
S. cerevisiae
4E−13Thiamine biosynthesis protein family member
orf_c1111_0030YIL169C* 0E+00ABZ10814.1Putative mannoprotein precursor
S. pastorianus
0E+00chrIX telomere – cell wall mannoprotein
orf_c1112_0020YKL015WPUT36E−17XP_663255.1Hypothetical protein AN5651.2
Aspergillus nidulans FGSC A4
7E−32Fungal-specific, C6 transcription factor
orf_c1113_0010YGR260WTNA19E−44XP_453048.1Unnamed protein product
Kluyveromyces lactis
1E−140Nicotinic acid permease – MFS transporter
orf_c1116_0010No hitXP_001384496.2Hypothetical protein PICST_31515
Pichia stipitis CBS 6054
3E−10Limited homology to zinc finger proteins
orf_c1121_0010YNR044W*AGA12E−16XP_001746730.1Predicted protein
Monosiga brevicollis MX1
5E−17chrXIV telomere – cell wall mannoprotein
orf_c1126_0010YAR050W*FLO14E−10XP_001645487.1Hypothetical protein Kpol_1004p1
Vanderwaltozyma polyspora DSM
3E−20chrI telomere – cell wall mannoprotein

Unlike the nine ORFs that represented reciprocal best hits of an S288c protein, the 11 remaining AWRI-specific ORFs (out of the 20 that displayed homology to proteins from S288c) were non reciprocal best hits, with at least one other AWRI1631 ORF showing significantly higher similarity to the same S288c sequence (Table 1; S288c paralogues). Thus, these 11 ORFs appear to encode distinct paralogues of a pair of S288c, AWRI1631 orthologues and are more closely related to proteins from species other than S. cerevisiae (Fig. 4). These proteins include a predicted aspartate transaminase homologous to the protein encoded by YNL027C from S288c (Fig. 5a) and three ORFs that are located on a single, 15.5-kb AWRI1631-specific contig, that are homologous to YBR043C (major facilitator drug transporter), YNL247C (glyoxylate reductase) and YKL215W (5-oxo-l-prolinase) from S288c, respectively (Fig. 5b). The presence of this gene cluster is particularly interesting, as while all three of these neighbouring genes show their highest homology to sequences outside the S. cerevisiae, there is no clear single species to which all three genes show the greatest similarity. Thus, the origin of this cluster remains to be determined. It is impossible to ascertain whether these genes were present in the common ancestor of AWRI1631, S288c and YJM789 and then subsequently lost in all but AWRI1631, or whether the genes were acquired by gene duplication (followed by sequence divergence) or by horizontal gene transfer.

Figure 5.

 AWRI1631-specific genes that encode S288c paralogues. (a) The location of orf_c1104_0010 on contig 1104 is shown above a clustalw dendrogram produced from the amino acid alignments of orf_c1104_0010 (orange box), its 10 best blastp matches and the most similar pair of S288c and AWRI1631 orthologues (light blue and dark blue boxes, respectively). (b) A 15.5-kb region containing a cluster of five AWRI1631-specific ORFs. Three of the ORFs, orf_c1098_0030 (green arrow), orf_c1098_0040 (red arrow) and orf_c1098_0050 (yellow arrow) are predicted to be paralogues of existing S288c, AWRI1631 protein pairs. This is highlighted by clustalw dendrograms produced using each AWRI1631-specific protein (shaded according to the ORF arrow in contig 1098), its 10 best blast matches and the orthologous pair of S288c (dark blue boxes) and AWRI1631 (light blue boxes) proteins to which the protein shares homology.

The seven remaining AWRI1631-specific ORFs displayed significant homology (E<1−10) only to proteins not present in S288c (Table 1). While the predicted function of many of these proteins remains elusive due to a lack of characterized homologous sequences, one putative protein is predicted to have a role in neutral amino acid transport (orf_c1103_0030), while a second (orf_c1110_0010) is predicted to encode a highly conserved homologue of the Mpr1 protein of S. cerevisiae strain Σ1278b (Shichiri et al., 2001). MPR1 was shown to be specific to Σ1278b when compared with S288c and has been implicated in tolerance to both ethanol and cold stress (Du & Takagi, 2005, 2007). This gene may, therefore, play a role in providing increased stress tolerance to wine yeasts when compared with laboratory strains.

Conclusion

This study has shown that while the yeast genome displays substantial conservation throughout a core set of genes, there are many regions that display high degrees of interstrain variation. These variable regions include genes with increased levels of nucleotide substitutions, in addition to entire strain-specific loci. These strain-specific and highly variable genes may provide the means for phenotypic diversification and specialization without impeding the basic growth and metabolism of the yeast cell. However, despite the large amount of data produced in this study, it is still unclear as to exactly which genomic variation is most important for the phenotypic divergence between wine yeasts and other S. cerevisiae strains.

Sequencing of additional strains of S. cerevisiae will be required to recognize the full size and scope of the ‘global’S. cerevisiae genome. These genome sequences will also provide important points of comparison for identifying variation that can be confidently associated with desirable phenotypic traits. It is fortunate that there is now at least one large-scale study underway that is looking to broadly catalogue this variation through low-coverage sequencing of a large number of S. cerevisiae strains (http://www.sanger.ac.uk/Teams/Team71/durbin/sgrp/). However, with rapid advances in DNA sequencing technology (Bennett, 2004; Margulies et al., 2005), the ability to fully sequence large numbers of yeast genomes is now within the reach of individual laboratories, thereby providing researchers with the ability to characterize large numbers of yeast strains at the genomic level.

Acknowledgements

The Australian Wine Research Institute (AWRI), a member of the Wine Innovation Cluster in Adelaide, is supported by Australia's grapegrowers and winemakers through their investment body the Grape and Wine Research Development Corporation with matching funding from the Australian Government. Systems biology research at the AWRI was performed using resources provided as part of the National Collaborative Research Infrastructure Strategy, an initiative of the Australian Government, in addition to funds from the South Australian State Government.

Ancillary