Minisatellites in Saccharomyces cerevisiae genes encoding cell wall proteins: a new way towards wine strain characterisation

Authors


*Corresponding author. Tel.: +39 (071) 2204782; Fax: +39 (071) 2204858, E-mail address: ilaria@univpm.it

Abstract

With the aim of developing new tools for the characterisation of wine yeasts, by means of databases available on-line we scanned the genome of Saccharomyces cerevisiae in search of potentially polymorphic targets. As we have previously observed for SED1, we found that other genes coding for cell wall proteins contain minisatellite-like sequences. A polymerase chain reaction (PCR) survey of SED1 and three of these others, namely AGA1, DAN4 and HSP150, in a population of wild S. cerevisiae demonstrated that these genes are highly polymorphic in length and represent a sink of unexplored genetic variability. The primer pairs designed on the gene open reading frames yield stable and repeatable amplification profiles that show a level of resolution that allows the clear discriminate between different strains. These can therefore be utilised for PCR-based typing of S. cerevisiae.

1Introduction

The study and the conservation of the biodiversity of wine yeasts have recently become objects of growing interest. The maintenance of such biological patrimony is in fact essential both to obtain starter strains that are able to fully develop the typical flavours and aromas of wines originating from different grapevine cultivars [1], and to ensure the conservation of gene pools of primary importance for the preservation of productive activities based on yeast-mediated processes.

To date, numerous molecular methods have been proposed for the identification and differentiation of wine yeasts [2]. Among these, polymerase chain reaction (PCR) DNA fingerprinting appears to have the greatest potential. Indeed, rapid methods for the preparation of template DNA to be used in PCR reactions may now be utilised as an alternative to long DNA extraction procedures [3]. Moreover, random primers [4], primers specific for the delta sequences [5], intron-splicing sites (ISS) [6], internal transcribed spacers (ITS) [7,8] and microsatellites/simple sequence repeats (SSR) [9,10] have already been used for the identification of wine and beer strains of Saccharomyces cerevisiae. However, not all of these primers have the same discriminatory power within this species. The analysis of ITS restriction fragment length polymorphism (RFLP) is a very good tool for the identification of wine yeasts at the species level, although as ribosomal regions show a low degree of polymorphism within the same species, this analysis does not allow the discrimination of different strains within the same species [7]. On the other hand, random primers, primers specific for delta sequences, ISS and SSR have all been described as being highly discriminatory within S. cerevisiae species [5,6,10]. However, some problems can be encountered in using such primers, as numerous factors, such as low annealing temperatures, variability in the number of target sequences in the genome, and distances between sites of annealing, can affect the efficacy of the amplification reactions and lead to ambiguous and non-consistent PCR profiles [11,12]. Even though these problems may be overcome, at least in part, by using more stringent PCR conditions [3], it appeared reasonable to search for alternative molecular targets yielding more reliable and more easily interpretable amplification profiles.

We have previously reported on the existence of minisatellite-like sequences in the SED1 gene of S. cerevisiae[13]. Minisatellites are tandem repetitions of 10–100 bp motifs that, functioning as recombinogenic hot spots, cause length polymorphisms and represent putative targets for the molecular identification of different individuals [14–16]. Interestingly, much evidence has indicated also that other S. cerevisiae genes coding for cell wall proteins contain one or more sets of repeat sequences of various lengths [17–20]. Therefore, analogous to SED1, all these genes may be prone to length polymorphisms and constitute molecular markers for the characterisation of different individuals within the species.

Prompted by these observations, we scanned the S. cerevisiae genome in search of cell wall genes containing minisatellites, and PCR-surveyed some of them in a population of wild isolates ascribed to this species and in two reference strains. We here demonstrate that in the genes under study the presence of minisatellite-like sequences is accompanied by abundant length polymorphisms. These genes therefore represent preferential molecular targets for the characterisation of wine yeasts.

2Materials and methods

2.1Strains

CBS1171, the type strain for the species S. cerevisiae, and S288C were utilised as reference strains. The 29 wild S. cerevisiae were selected at random from a population of 164 yeasts of the same species isolated from grape surfaces during the 2001 vintage, as previously described [21], and identified according to Vaughan-Martini and Martini [22] utilising the dichotomic key proposed by Boulton et al. [23].

2.2Media and growth conditions

YEPD was used for yeast cultivation (2% glucose, 1% yeast extract, 2% peptone, 1.8% agar). Liquid cultures were incubated at 28°C, with shaking (200 rpm).

2.3In silico sequence analyses

The search for the S. cerevisiae genes encoding cell wall proteins was performed at the http://www.yeastgenome.org and http://www.proteome.com sites. The presence of minisatellite-like sequences was highlighted by means of the Etandem (http://bioweb.pasteur.fr/seqanal/interfaces/etandem.html) and TandemRepeatFinder (http://c3.biomath.mssm.edu/trf.html) software. The primer search was performed using the Prima software (http://www.ebi.ac.uk/emboss/prima/) and the alignment between the S. cerevisiae genome and the primer sequences was performed using the Fasta software (http://genome-www2.stanford.edu/cgi-bin/SGD/nph-fastasgd). The alignment between the sequences of the amplicons of AGA1, DAN4 and HSP150 and the corresponding sequences deposited on-line were performed using the Align software (http://www.ebi.ac.uk/emboss/align/) software.

2.4Nucleotide sequence accession numbers

The sequence of the amplicon obtained with HSP150 primer pairs on S288C template was deposited in GenBank under accession number AY321583.

2.5DNA extraction, PCR conditions and restriction analyses

PCR reactions were performed both on total genomic DNA extracted from overnight liquid cultures as described by Ushinsky et al. [24], and on whole cells heat-treated as already described by Ciani et al. [3]. In case of purified DNA, approximately 20 ng of template was utilised for PCR reactions, while for the amplifications on whole cells we proceeded as follows: a limited amount of cell material for each isolate was picked from single colonies, re-suspended in 5 μl sterile distilled water in 0.2-ml tubes and heated to 95°C for 10 min. 2 μl of the above templates was added to the PCR reaction mix. The PCR reactions were performed on a Perkin Elmer Gene AMP PCR System 9700, in 25 μl reaction mixture. Inter-delta sequences were amplified as described by Ciani et al. [3]. AGA1, DAN4, HSP150 and SED1 were amplified as described below.

AGA1: the reaction mixture contained 2 μl of template, dNTPs at 0.12 mM each, 0.6 U Taq polymerase (Amersham Biosciences, Piscataway, NJ, USA), 1× Taq reaction buffer, 3 mM MgCl2, 3 pmol of each of the primers AGA1f (5′-GTGACGATAACCAAGACAAACGATGCAA-3′) and AGA1r (5′-CCGTTTCATGCATACTGGTTAATGTGCT-3′). The PCR reactions were run for 35 cycles as follows: denaturation at 94°C for 1 min, annealing at 64°C for 1 min, and elongation at 72°C for 2 min.

DAN4: the reaction mixture contained 2 μl of template, dNTPs at 0.12 mM each, 0.6 U Taq polymerase, 1× reaction buffer, 1.5 mM MgCl2, 30 pmol of each of the primers DAN4f (5′-AGCGCTTTCAAAGGATGGTATTTACA-3′) and DAN4r (5′-AAAGTAGACCCGAAGGAAGAAACAGG-3′). The reactions were run as follows: nine cycles of touch-down PCR with denaturation at 94°C for 45 s, annealing at 70°C for 30 s (with a decrease in the annealing temperature of 0.5°C for each cycle), and elongation at 72°C for 1 min; and 26 cycles with PCR with denaturation at 94°C for 45 s, annealing at 66°C for 30 s, and elongation at 72°C for 1 min.

HSP150: the reaction mixture contained 2 μl of template, dNTPs at 0.12 mM each, 0.6 U Taq polymerase, 1× reaction buffer, 1.5 mM MgCl2, 30 pmol of each of the primers HSP150f (5′-CACTTTGACTCCAACAGCCACTTACA-3′) and HSP150r (5′-TACCGGACAAACATTGGTAGAAGACA-3′). The reactions were run for 35 cycles as follows: denaturation at 94°C for 45 s, annealing at 65°C for 30 s, and elongation at 72°C for 1 min.

In all three cases an initial denaturation step at 94°C for 7 min and a final 7 min extension step at 72°C were also performed.

The reaction mixture and amplification conditions utilised for the SED1 gene were as already described in [13]. The PCR products were analysed by electrophoresis on a 1.4% agarose gel in 1× TBE buffer. 25 μl of the PCR products was digested overnight with an excess of each enzyme in a final volume of 50 μl. The restriction fragments were analysed by electrophoresis on a 2.5% agarose gel in 1× TBE buffer. The gel images were visualised by means of ImageMaster VDS (Amersham-Pharmacia Biotech) and acquired with the Lyscap software (Amersham-Pharmacia Biotech).

2.6Cluster analysis

The correlation matrix of the amplification profiles of each sample was obtained using the formula described by Upholt [25] and Nei and Li [26]. The cluster analysis was carried out with similarity estimates by using the unweighted pair-group method with arithmetic average cluster analysis (UPGMA) by means of the NTSYS-pc package, version 1.8.

3Results and discussion

3.1Selection of cell wall genes containing minisatellite-like sequences

In our previous work we have reported that the S. cerevisiae SED1 gene is characterised by abundant length polymorphisms, due to the expansion or contraction of two minisatellite-like sequences located within two distinct regions of the gene open reading frame (ORF) [13]. To our knowledge, this represented the first intentional description of polymorphic minisatellites within a S. cerevisiae cell wall gene, even though the presence of repeat sequences inside the ORFs of cell wall genes is very common and well-documented in this and other yeast species [17,19,20,27,28]. Considering that DNA regions containing minisatellites can present length polymorphisms due to the recombinogenic potential of the repeat sequences, we hypothesised that cell wall genes containing minisatellite-like sequences could be a sink of unexplored genetic variability that should be of use for the molecular characterisation of wine yeasts. To test this hypothesis, we focussed on a pool of cell wall genes containing repeat sequences, assessed whether the presence of such sequences was accompanied by gene length polymorphisms, and consequently evaluated if these genes represent preferential targets for PCR-based typing of S. cerevisiae strains.

To achieve this, we first selected the sequences of 18 S. cerevisiae genes coding for cell wall proteins by means of the on-line databases at http://www.yeastgenome.org and http://www.proteome.com. Subsequently, in order to individuate those genes containing minisatellites, we scanned the sequences of the 18 ORFs by means of the Etandem and TandemRepeatFinder software, also available on-line, and observed that 15 of them contain repeat sequences of various lengths (data not shown).

We then proceeded with the selection of the putative polymorphic genes on the basis of three different criteria. First, as it is well known that the recombinogenic potential of a DNA region containing minisatellites is correlated with the degree of repeat conservation [29], we decided to focus on those genes whose minisatellites presented more than 80% identity between repeats. Second, having decided to investigate the presence of such length polymorphisms by means of PCR, we selected for those genes containing minisatellites consisting of repeats longer than 20 bp. Lower differences in length could have gone undetected on agarose gels. Third, we decided to focus on those genes coding for structural or not enzymatic rather than enzymatic proteins. The reasoning behind this was that length variations in genes coding for enzymes may cause a deficit in their function, thus representing a selective disadvantage for the mutated allele. Conversely, it was plausible that length variations in genes coding for not enzymatic proteins are more easily tolerated.

Ten of the genes satisfied all of these requirements; however, we thought that the analysis of a smaller number of targets could have been sufficient to test our hypothesis. Thus, three of the 10 genes, namely AGA1, DAN4 and HSP150, were selected at random and subjected to PCR analyses in a population of wild S. cerevisiae in order to highlight their putative length polymorphisms. AGA1 codes for the anchoring subunit of a-agglutinin [30]; DAN4 is a member of the PAU proteins, the function of which is still unknown [31]; and HSP150 is a heat shock protein that is at least in part retained by the cell wall through β-1,3-glucan binding and/or disulphide bridges [32,33].

3.2Designing the primer pairs specific for the selected molecular targets

Primer pairs specific for AGA1, DAN4 and HSP150 genes were designed on the corresponding sequences available on-line at http://www.yeastgenome.org. The objective was to design primer pairs able to amplify the portion of each target gene that we suspected to be polymorphic, due to the presence of minisatellites with the characteristics mentioned above.

For AGA1 we designed the primers AGA1f and AGA1r as being specific for a portion of the gene external to a set of repeats containing a 21-bp minisatellite characterised by more than 80% identity between repeats, and measuring 1198 bp in S288C (Fig. 1 and Table 1).

Figure 1.

Structures of the AGA1, DAN4 and HSP150 genes according to the corresponding S288C sequences available on-line (http://www.yeastgenome.org). Boxes represent the regions containing repeat units. Inside each box are indicated the length (bp) and the number of each repeat unit, respectively. Arrows indicate the primer sites of annealing.

Table 1.  The minisatellite-like sequences contained in AGA1, DAN4 and HSP150
  1. The consensus and percentage of identity between repeats were obtained using the Etandem software.

GeneGene productSet of repeats (n)Repeats per set (n)Length (bp)Identity (%)ORF coordinatesConsensus
AGA1Cell wall component, anchorage subunit a-agglutinin2222181.6488–949CATCTCCAAGTTCGACATCTA
   51578.71842–1916TTCTACAACATCTAT
DAN4Cell wall mannoprotein, unknown function5272777.5371–1099CTACTTCTACAACTTCTACCACTTCTA
   77298.81102–1605CAAGTCACTTCATCCGCTGAACCTACTACTGTCAGTGAATTCACCTCTTCTGTTGAACCTACCAGGTCTAGT
   121865.71646–1861CTTCCAGTGAAATTACTT
   211181.52488–2709TTAACAACTACAGAAACTTCCACGGTCGAAACAACTATAACAACATGCCCTGGTGGTGTTTGCTGCACCCTGACTGTTCCAGTTACTACAATCACCAGCGAAGCCACTACC
   211181.12733–2954TAATGCTAAGGCGAACACATTAACAACTACAGAAACTTCCACGGTCGAAACAACTATAACAACATGCTCTGGTGGTGTTTGCTCGACCCTGACTGTTCCAGTTACTACAAT
HSP150Cell wall protein, partially secreted225188.2300–401TGCTGTCTCTCGTGATGGTCAAATTCAAGCTACCACCAAGACTACCTCTGC
   55791.6481–765ACCGCTGCTGCTGTTTCTCAAATCGGTGATGGTCAAGTTCAAGCTACTACCAAGACT

For DAN4, which contains five sets of repeat units with variable lengths and percentages of identity (Fig. 1 and Table 1), we designed a primer pair which amplifies a portion of the gene ORF measuring 1270 bp, and containing two sets of repeat sequences in S288C (Fig. 1 and Table 1). One of these, which consisted of seven 72-bp long repeats, was the one regarded as potentially polymorphic, due to the high percentage of identity between repeats. The other one, whose repeats presented less than 80% identity, was considered as less likely polymorphic. However, it was included within the amplified fragment due to the impossibility to design a primer specific for the portion of the gene located immediately upstream of the set of interest.

The primer pairs HSP150f and HSP150r were designed as being specific for a 781-bp portion of the gene ORF containing two sets of repeat sequences (Fig. 1 and Table 1). The first includes two repeats of 51 bp and the second consists of five repeats of 57 bp. Both were considered as potentially polymorphic on the basis of the criteria mentioned above.

3.3AGA1, DAN4, HSP150 and SED1 polymorphisms within the population analysed

The primer pairs specific for AGA1, DAN4 and HSP150 were used under highly stringent PCR conditions to amplify the total DNA from the two reference strains, S288C and CBS1171, and a population of wild isolates of S. cerevisiae. In addition, the primer pair specific for the SED1 gene was used on the same strains to assess a role for this gene in the identification of wine yeasts.

As expected, the four sets of primer pairs highlighted the existence of extensive length polymorphisms in each of the cell wall genes analysed (Fig. 2).

Figure 2.

Cell wall gene polymorphisms. PCR primers designed on the AGA1 (A), DAN4 (B), HSP150 (C) and SED1 (D) sequences were used to amplify the corresponding genes of the two reference strains, CBS1171 and S288C, and of the 29 grape isolates of S. cerevisiae. Lanes A–I: PCR profiles observed within the population analysed with each of the primer pairs. λ: 100 bp ladder (Amersham-Biosciences).

In the case of AGA1, while the two reference strains invariably produced a single amplification product of different lengths, the 29 S. cerevisiae isolates selected yielded six different PCR profiles due to the combination of five amplicons of variable lengths (Fig. 2).

DAN4 also proved to be highly polymorphic in the population analysed. Similar to that observed for AGA1, the two reference strains produced a single amplicon. This was of the expected size for S288C. The 29 isolates yielded nine different amplification profiles due to the combination of seven amplicons of variable lengths (Fig. 2). The analysis of the sequences of the S288C amplicons, obtained with primers AGA1f and AGA1r, and DAN1f and DNA1r indicated complete homology with the corresponding sequences deposited on-line at the site http://www.yeastgenome.org, thus confirming the specificity of the primer pairs for the selected targets (data not shown). Moreover, restriction analyses of both AGA1 and DAN4 amplification products carried out with the enzymes HhaI and RsaI, respectively, confirmed that the observed length variations in both genes were due to differences in length in the portion of the amplicons containing the minisatellites (data not shown).

The use of the HSP150 primer pair yielded seven PCR profiles due to the combination of five amplicons of different lengths (Fig. 2). Surprisingly, the amplicon obtained on the S288C template measured approximately 1100 bp and was therefore longer than expected (781 bp, on the basis of the HSP150 sequence deposited on-line). The amplicons of CBS1171 and of the 29 wild S. cerevisiae were also longer than the expected sizes. Similar to what was previously done for AGA1 and DAN4, and in order to exclude the occurrence of a-specific amplifications, notwithstanding the stringent PCR conditions, we sequenced the amplicon obtained with the HSP150 primer pair on the S288C template. The BLAST analysis of the sequence obtained confirmed the specificity of the primers for the target gene (GenBank accession number AY321583). However, the sequence obtained showed some differences with the one deposited on-line. According to our results the structure of HSP150 differs from the one described in Fig. 1 by the presence of two extra repeats. These were indicated as (57+15) bp since their sequence overlaps, for the first 57 bp, with the one of the 57-bp repeat units located downstream (Fig. 3). Although an explanation concerning these observed discrepancies between the sequence deposited on-line and that which we obtained is not the object of this work, it is plausible that the presence of repeat sequences within this gene might have caused a misinterpretation of the gene structure during the sequencing of the S. cerevisiae genome.

Figure 3.

Updating of the structure of the HSP150 gene according to the sequence obtained in the present study that is deposited in GenBank (AY321583). Filled boxes represent the regions containing repeat units. Boxes with the same fill patterns indicate identical sequences. Above each box are indicated the length (bp) of each repeat and number of repeat units, respectively.

The restriction analyses of the amplicons of the wild S. cerevisiae and CBS1171, carried out with RsaI, confirmed that the length polymorphisms observed are due to expansion or contraction of the region of the gene that contains the repeat sequences (data not shown).

The SED1f and SED1r primers utilised on the total DNA of the 29 isolates of S. cerevisiae produced six different PCR profiles. Four of these are characterised by the presence of a single amplicon, possibly due to the presence of a single length variant in individuals homozygous for this gene. The remaining two PCR profiles present two and three amplicons of different lengths, as expected in individuals heterozygous for this gene (Fig. 2). The comparison of the length variants observed in this population of isolates with those sequenced in our previous work [13] indicated that three of them, here referred to as A, B and C, have the same lengths and structures as alleles Sed1-2, Sed1-4 and Sed1-5[13]. Conversely, the amplicons referred to in PCR profiles D, E and F are different from those already characterised, thus indicating the existence of undetected variability for the SED1 gene in populations of wild S. cerevisiae.

On the basis of these results, it is plausible that minisatellite-like sequences in AGA1, DAN4, HSP150 and SED1 work as recombinogenic hot spots, and that the observed gene length polymorphisms are a consequence of the different molecular mechanisms proposed for minisatellite array expansion and contraction [16,29].

3.4Intraspecies discriminatory power of a PCR fingerprinting system based on amplification with the AGA1, DAN4, HSP150 and SED1 primer pairs

We then decided to assess the possibility of using AGA1, DAN4, HSP150 and SED1 as preferential targets for the molecular characterisation of wine strains. To do that we first evaluated the repeatability of the amplification profiles obtained by using the primers designed in the present work on DNA templates prepared according to different methods. Thus, each set of primers was utilised both on purified DNA [24] and heat-treated whole cells [3]. The results showed complete identity of the PCR profiles obtained by using either method, thus indicating that the primer pairs specific for AGA1, DAN4, HSP150 and SED1 genes yield unambiguous and repeatable amplification profiles, whatever the method utilised for the preparation of template DNA (Fig. 4).

Figure 4.

Repeatability of the amplification profiles produced by PCR primers designed on genes coding for cell wall proteins. Primer pairs specific for AGA1, DAN4, HSP150 and SED1 were utilised both on purified DNA (D) and heat-treated whole cells (C) of the two reference strains CBS1171 (1) and S288C (2). λ: Mass ruler MBI Fermentas.

Subsequently, we compared the level of resolution obtained with the four primer pairs with those obtained with primers specific for delta sequences, already described as those having the highest discriminatory power within the species S. cerevisiae[5]. Delta primers produced 10 different PCR profiles, thus making a distinction between the wild S. cerevisiae strains. However, three of the isolates never produced amplicons with these primers (data not shown). This fact, not unusual in S. cerevisiae strains [3,11], may be due to several factors, among which the distance between sites of annealing, and represents a limit of this PCR-based system.

The data resulting from the amplification profiles were converted into binary matrices and cluster analysis was carried out. According to the resulting dendrograms, the use of the primer pairs specific for AGA1, DAN4, HSP150 and SED1 highlighted the existence of 23 clusters within the population of 29 isolates (Fig. 5). Conversely, the dendrogram resulting from PCR analysis with delta primers, and regarding the 26 isolates which yielded amplicons, consisted of 10 clusters (Fig. 5). Thus, the four primer pairs designed in the present work showed a good level of resolution for the separation of S. cerevisiae strains.

Figure 5.

Dendrograms showing the clustering of the S. cerevisiae isolates under study. The PCR profiles obtained with the four primer pairs (A) and with the delta primers (B) were converted into binary matrices which were elaborated according to the formula described by Upholt [25] and Nei and Li [26]. Cluster analysis was carried out with similarity estimates by using the unweighted pair group method with arithmetic average cluster analysis (UPGMA) by means of the NTSYS-pc package, version 1.8 (Dice coefficient).

In conclusion, as we hypothesised, S. cerevisiae genes encoding cell wall proteins and containing minisatellites are highly polymorphic in length and appear to be a sink of unexplored genetic variability. At present, the functional consequences of the observed length polymorphisms on cell wall structure remain to be elucidated. However, the results obtained in the present work highlight a role for AGA1, DAN4, HSP150 and SED1 as preferential targets for PCR-based typing of S. cerevisiae wine strains. In fact, the primer pairs specific for these genes yield stable and unambiguous amplification profiles, show a level of resolution that allows a clear discrimination between different strains and represent either an alternative or a complement to primers targeted to other molecular markers.

Moreover, in the present work, we presented new criteria for the individuation of further molecular markers suitable for PCR-based typing of wild S. cerevisiae and traced a new way towards the characterisation of wine strains.

Ancillary