By continuing to browse this site you agree to us using cookies as described in About Cookies
Wiley Online Library is migrating to a new platform powered by Atypon, the leading provider of scholarly publishing platforms. The new Wiley Online Library will be migrated over the weekend of February 24 & 25 and will be live on February 26, 2018. For more information, please visit our migration page: http://www.wileyactual.com/WOLMigration/
Telomeres are the DNA–protein complexes at the ends of linear chromosomes. In most eukaryotes, the DNA component consists of tandem copies of a short T/G-rich sequence (typically 5–8 bp) and is maintained by the enzyme telomerase (Wellinger and Sen, 1997; Weilbaecher and Lundblad, 1999). Many proteins are part of the telomere complex and function to regulate the access of telomerase to the end and to preserve the protective capping function of telomeres (Shore, 1998; McEachern et al., 2000).
Subtelomeric regions, or telomere-associated sequences, are the regions immediately adjacent to the telomeric repeat arrays (Pryde et al., 1997; Wellinger and Sen, 1997). Subtelomeric regions are often similar among multiple chromosome ends, and many aspects of the structure and function of these regions have been subjects of study in a variety of organisms, including repetitive elements, degenerate telomeric repeats, gene families, gene silencing, and retrotransposons that show affinity for chromosome ends.
The subtelomeric regions of Saccharomyces cerevisiae are the best characterized of any organism to date (Pryde et al., 1997). Two highly repetitive subtelomeric elements are known. Internal to the ∼350 bp of telomeric repeats, Y′ elements are commonly found at about half of all chromosome ends. These elements are highly conserved, typically 5.4–6.7 kb in size, and are present in 1–4 tandem copies per telomere (Louis, 1995). Internal to the Y′ element is the X element, a mosaic element that is more variable than the Y′ element. It is composed of a 475 bp consensus core and a variable outer portion composed of families of tandem degenerate repeats (Louis et al., 1994). Internal to the X element, there is often a proximal subtelomeric domain of varying similarity among the ends, extending for 2–30 kb in strain S288C. A number of different proximal domains are present, each shared between two or three telomeres.
A function that commonly has been attributed to subtelomeric regions is providing a genomic area of greater plasticity. An example of this is seen in the subtelomeric surface antigen genes of the malaria parasite P. falciparum. Recombination between these genes can occur at a high rate during meiosis to create new var genes, which enable efficient evasion of hosts' immune systems (Freitas-Junior et al., 2000). Subtelomeric gene families involved in host immune evasion are also found in a number of other pathogens (reviewed in Barry et al. (2003)). Even in non-pathogens such as S. cerevisiae, large and variable subtelomeric gene families are important for growth under particular conditions (Carlson et al., 1985; Naumov et al., 1994, 1995; Ness and Aigle, 1995; Codon et al., 1998).
Subtelomeric regions have also been shown to be prone to changes when telomere function is compromised. Recombination in subtelomeric regions can be greatly increased in K. lactis cells that lack a functional telomerase (McEachern and Blackburn, 1996). Similarly, deletions that include subtelomeric sequences are common in S. cerevisiae when telomerase is absent and telomeres become critically short (Hackett et al., 2001). In humans, mutations in subtelomeric regions are associated with certain diseases, including facioscapulohumeral muscular dystrophy (van Overveld et al., 2000) and α-thalassaemia (Flint et al., 1996).
In this study, we have cloned and characterized subtelomeric regions from 10/12 K. lactis telomeres. Our results show that K. lactis subtelomeric sequences are unique but share many similarities to those from other organisms.
Materials and methods
All cloning of subtelomeric sequences was done from K. lactis strain 7B520 (ura3 his2 trp1) (Wray et al., 1987). The comparative hybridization study was performed using GG1935 (ura3 ade2), which is isogenic with CBS2359, and strains from the Spanish Type Culture Collection (CECT) arbitrarily chosen from Belloch et al. (2002). See Table 1 for list.
A 1.5 kb BamHI–BglII fragment, from pMH3-Tel NoST, containing S. cerevisiae URA3 and 11.5 K. lactis telomeric repeats (Natarajan et al., 2003) was cloned into the BamHI site of low-copy plasmid, pWSK29 (Wang and Kushner, 1991). The resulting plasmid, pKN2 (Figure 1A), had low copy number in order to improve the stability of large, repetitive sequences in E. coli. The insert was positioned in a way that facilitated cloning of subtelomeric sequence once the plasmid integrated into a K. lactis telomere.
Uncut pKN2 plasmid was introduced into K. lactis using a protocol similar to established ones for S. cerevisiae. A culture of cells in 1 ml YPD was grown overnight, then an additional 1 ml YPD was added and the cells were grown for 1 h more. This culture was pelleted and washed with 500 µl cold water. After pelleting and resuspending in 80 µl water, 10 µl each of 10× TE buffer (pH 7.5) and 1 M lithium acetate (pH 7.5) was added to the cells. This suspension then was incubated at 30 °C for 1 h with addition of 2.5 µl 1 M DTT after 45 min; 400 µl cold water was added before pelleting the cells at 4 °C. The cells were then washed twice in 250 µl cold water and once in 100 µl cold 1 M sorbitol, with pelleting at 4 °C between washes. The cells were suspended in 30 µl cold 1 M sorbitol, before adding ∼1 µg pKN2 DNA and electroporated at 1500 V (using an Eppendorf electroporator 2510). 500 µl 1 M sorbitol was added, and the cell suspension was plated onto synthetic defined (SD) plates lacking uracil and containing 1 M sorbitol (Guthrie and Fink, 1991).
Screening and hybridizations
Two single digests (EcoRI and PvuII) of yeast genomic DNA were performed to determine the location of the integrated plasmid. Digests were run on 0.8% agarose gels and stained with ethidium bromide. Southern blotting was performed using Hybond N+ membrane (Amersham Pharmacia, Piscataway, NJ). Hybidizations were carried out in Na2HPO4 and SDS (Church and Gilbert, 1984). Washes were carried out in 200 mM Na2HPO4 and 2% SDS for the oligo probe and in 100 mM Na2HPO4 and 2% SDS for all other probes. Membranes were stripped for 1 h with 0.4 N NaOH between hybridizations.
The K. lactis telomeric oligo Klac 1–25 (ACGGATTTGATTAGGTATGTGGTGT) was used in all telomeric hybridizations. Temperatures of hybridizations and washes were 50–55 °C. The vector-specific probe KN2-220 was a purified 220 bp PvuII–SacI fragment from pBLUESCRIPT SK−®. This probe was hybridized at 55 °C.
The 0.7 kb EcoRI–XbaI subtelomeric probe was isolated from K. lactis telomeric clone KL11B (McEachern and Blackburn, 1994) and hybridized at 55 °C. The specific-subtelomeric probe fragments were gel-purified from the rescued low-copy plasmids after digestion with appropriate enzymes, then ligated into prepared pBLUESCRIPT SK−® (digested and treated with shrimp alkaline phosphatase; Boehringer Mannheim, Indianapolis, IN) to enable efficient purification for probes. These ligations were introduced into XL-1 supercompetent E. coli (Stratagene, La Jolla, CA), and resulting colonies were screened for interruption of the β-galactosidase gene of the vector (blue–white screening) and checked for inserts. Inserts of positive subclones were gel-purified for probes. These were hybridized at 65 °C, then 50 °C or 55 °C if no hybridization was observed at 65 °C. All probes except the telomeric probe were prepared using PRIMEIT II labelling kit, as recommended by the manufacturer (Stratagene, La Jolla, CA).
Genomic DNA from each type of telomeric insert was digested with each restriction enzyme with a site in the polylinker of pKN2 upstream of the URA3–tel insert (see Figure 1). Hybridization data from these digests were used to determine the enzyme that would allow the cloning of the largest amount of subtelomeric sequence. Genomic DNA from each K. lactis clone with integrated pKN2 was digested with the enzyme chosen for that clone. After inactivation of the restriction enzyme, approximately 0.5–1 µg DNA from the digest was ligated overnight at 15 °C.
A portion of each ligation was introduced into XL-1 supercompetent Escherichia coli (Stratagene, La Jolla, CA). Plasmids from transformants were verified by size of insert and by hybridization to the K. lactis EcoRI–XbaI subtelomeric probe described above.
Sequencing and analysis
Automated fluorescent dideoxy sequencing was performed by the Molecular Genetics Instrumentation Facility at the University of Georgia. Primers of 18–20 nucleotides were used in primer walking to obtain all sequence reported here.
Sequences were assembled and open reading frames were identified using Sequencher™ 4.1 (Gene Codes Corporation, Ann Arbor, MI). Sequences were subjected to BLASTN and BLASTX analysis (http://www.ncbi.nlm.nih.gov:80/BLAST/) (Altschul et al., 1997) against databases to discover homologues or other features. Multiple alignments were performed using the ClustalW tool provided by the European Bioinformatics Institute (EBI: http://www.ebi.ac.uk) and adjusted by sight and use of BLAST2 pairwise alignments.
Cloning K. lactis subtelomeric regions
To clone K. lactis subtelomeric sequences, pKN2, a plasmid containing an S. cerevisiae URA3 gene and 287 bp of K. lactis telomeric sequence was first constructed. This plasmid contained only 3 bp of K. lactis subtelomeric DNA other than the telomere, so that it would integrate via recombination within the telomere, enabling cloning of adjacent subtelomeric sequences. This technique was similar to that used previously in cloning subtelomeric regions of S. cerevisiae (Louis and Borts, 1995). Previous work established that arrays of telomeric repeats are present only at chromosome ends in K. lactis (including strain 7B520)(McEachern and Blackburn, 1994; and unpublished data). This ensured that all plasmid inserts into telomeric repeat arrays would be at telomeres.
Seven independent transformations of pKN2 into K. lactis 7B520 were performed in an attempt to recover at least one integration event at telomeres of each of the 12 telomeres present on the six chromosomes of the haploid genome. A total of 41 Ura+ transformants of pKN2 into 7B520 were recovered and analysed by Southern blotting. Digestion with EcoRI and PvuII was used to determine whether transformants contained the expected inserts. The PvuII digest was especially helpful, since it separated the 12 telomeric restriction fragments into 10 different-sized bands, the greatest number seen with any of a large number of different restriction enzymes (data not shown). Transformants were not identified by chromosome of integration, so the names for the chromosome ends used in this study were derived from the PvuII pattern, with P1 referring to the smallest PvuII telomeric fragment and P12 referring to the largest. This and other work (data not shown) has demonstrated that the P2 and P3 telomeric fragments migrate together as a doublet, and the P8 and P9 telomeres run together as a second doublet. The EcoRI digests confirmed results of the PvuII digest and resolved the P2/P3 doublet in the PvuII pattern. The P8 and P9 telomeres migrated as a doublet in both digests.
Southern blots of pKN2 transformants were analysed by hybridizing telomere and vector-specific probes to genomic DNA of transformants that was digested separately with EcoRI and with PvuII. Our results showed that all of the transformants had the expected restriction patterns for inserts at native telomeres. Figure 1B shows Southern blots of PvuII-digested DNA hybridized to a telomeric probe or to a 220 bp vector-specific probe (referred to as KN2-220; Figure 1C) as well as EcoRI-digested DNA hybridized to the telomeric probe (Figure 1D). The position within the plasmid vector of the sequence used as the KN2-220 probe is indicated by the thick black box in Figure 1A. This region of the vector is the only vector-derived sequence that remained attached to subtelomeric sequences after digestion with PvuII, and therefore detected the particular subtelomeric DNA fragment from a transformant that had integrated pKN2 (see PS and ES fragments in Figure 1A). A shift of one band in the telomeric restriction pattern results from the integration of the plasmid into a telomere. PvuII digestion was expected to cut off most of the vector sequence, leaving about 0.2 kb connected to the chromosome end, whereas an EcoRI digest leaves 5 kb of vector sequence attached to the subtelomeric fragment. This accounts for the different shifts seen in the two digests. These shifted bands are also often expected to be sharper than true telomeric bands, since they are internal fragments as opposed to telomeric ends that are heterogeneous in length due to differences in length of telomeric DNA. This was clearly seen for smaller fragments such as the P2 and P3 inserts (Figure 1B). Another novel fragment expected in transformants is the terminal fragment (PT and ET in PvuII and EcoRI digests, respectively; Figure 1A) that contains the telomere and URA3. This is a 1.5 kb band in both the PvuII and the EcoRI digests. In the EcoRI digest (Figure 1D) the band is obscured by other telomeric fragments, but it was seen clearly with a URA3 probe (data not shown).
Hybridization with telomeric and KN2-220 probes also revealed the presence of tandem copies of pKN2 sequence at telomeres in most of the transformants. In a PvuII digest, extra copies of pKN2 were detected as a band at 2 kb (PI band marked with arrow in Figure 1B, C) and in EcoRI digests as a band at ∼6 kb (EI band in Figure 1D). This second band was present in 38 of the 41 transformants. Figure 1A shows the expected structure of a dimeric integration of pKN2. We have not determined whether the tandem inserts represent dimers or higher order multimers. However, the varied relative intensity of PI and PS bands between different transformants (Figure 1C) suggests that multimeric inserts were likely present.
Characterization of the transformants showed that 25 of them had pKN2 inserted in one of the six telomeres found in the smallest group of EcoRI fragments (P2, P4, P7, P10 or P11). None of these transformants showed plasmid integration at the telomeric band P12, the sixth telomere in this group. Inserts into other telomeres included one at P6, nine at P3, two at P5, and two at P1. Only two were at the P8–9 doublet (labelled as P9 throughout this work). Sequence data from the telomeric end to the EcoRI site as well as restriction data using 12 restriction enzymes suggested that these two transformants represented either the same end or two indistinguishable ends.
To clone subtelomeric sequence from the transformants, 10 restriction enzymes were considered for use. Each of these enzymes cuts once within the polylinker of an integrated pKN2 and permits excision of subtelomeric sequence still attached to the plasmid vector (Figure 1A). Such fragments could then be circularized by ligation and introduced into E. coli. The choice of enzyme(s) to use for cloning of each end was based on Southern data (data not shown); transformants were digested by all 10 enzymes and hybridized to the KN2-220 probe. For each digest, the anticipated size of the extra band that resulted from the additional copy of the vector was calculated. The sizes differed for each digest due to restriction sites within URA3. This left the other band resulting from each digest to be used to estimate the amount of sequence that would be cloned using that digest. Enzymes used to recover clones in this study can be seen in Figure 2, the leftmost restriction site indicated on the clone.
The tandem integration of pKN2 in most transformants was of special concern in the plasmid rescue procedure. During plasmid rescue many circles may be created, but only those derived from the vector plasmid pKN2 would enable E. coli to grow in presence of ampicillin. However, with two or more copies of vector sequence at a given end, a functional plasmid may be either one that contains K. lactis subtelomeric sequence (e.g. the ES fragment shown in Figure 1A) or simply a recircularized copy of the original pKN2 plasmid (e.g. the EI fragment shown in Figure 1A), both of which would yield white colonies in a blue–white β-galactosidase screen. Although producing an undesirable background of unwanted E. coli transformants, this situation was of some use, since the tandem copy could serve as a positive internal control for the plasmid rescue conditions. Recovery of pKN2 without successful recovery of the expected plasmid with a particular piece of subtelomeric DNA from a K. lactis transformant could indicate that the latter plasmid might not be recoverable.
A number of plasmids from each end (P6, six plasmids, fewest–P1, 70 plasmids, most) were examined before recovering one from each transformant that appeared correct, based on size of insert and hybridization to a subtelomeric probe (EcoRI–XbaI fragment) that is known to hybridize to 11/12 telomeres in 7B520. EcoRI was used successfully as one recovery enzyme for all ends tested. The P1 clone contained an extra EcoRI fragment that later proved to not be subtelomeric in origin. EcoRI telomere fragments were in the range 0.7–3 kb; these moderate sizes probably contributed to the consistent recovery of plasmids that contained subtelomeric sequence. An average of 35% of plasmids recovered after cleavage of genomic DNA with EcoRI was found to contain the expected subtelomeric DNA (with the remainder usually being recircularized pKN2).
For 9/10 telomeres for which we had obtained pKN2 integrations, we were also able to recover substantially larger subtelomeric pieces using other restriction enzymes. Six telomeres have been recovered as fragments of 9–18 kb, and the other four had 0.7–4 kb of subtelomeric sequence cloned. For P2, a plasmid larger than the EcoRI plasmid was not recovered, even after screening 193 plasmids using the enzymes BamHI, ClaI and XmaI. In all cases described in this work, recovered subtelomeric sequences were found to match the sizes of the corresponding native terminal restriction fragments. This indicates that gross rearrangements in the cloned sequences are unlikely.
Structure of K. lactis subtelomeric regions
The four 7B520 telomeres from which we recovered 0.7–4 kb subtelomeric sequence were completely sequenced. The six telomeres recovered with much larger regions of subtelomeric DNA were in most cases just partially sequenced. However, at least 1.2 kb of the sequence immediately adjacent to the telomeric repeats was sequenced from each. One of these large telomeric clones, P10, was sequenced over its entire 9.8 kb length. The regions sequenced in our subtelomeric clones are shown in Figure 2.
Previous hybridization experiments have shown that all but one telomere from K. lactis 7B520 (the P1 telomere) share close sequence similarity in the region ∼0.6 kb immediately adjacent to the telomeric repeats (McEachern and Blackburn, 1996). Our results here confirm and extend those results. The similarity shared between subtelomeric regions of at least two telomeres is shown in Figure 2 as lightly shaded boxes. At least 8 of the 12 telomeres share >85% similarity, extending at least 1.5 kb from the telomeric repeats. This number may be as high as 11 of 12 as the P8 and P12 telomeres were not cloned, and the P2 telomere was recovered only as a fragment with ∼0.7 kb of subtelomeric DNA.
A number of the K. lactis telomeres share subtelomeric similarity that extends multiple kilobases from the telomeres. Of five telomeres with at least some sequence information from positions 8–18 kb internal from the telomeric repeats, three have highly homologous sequence that is shared with at least one of the others and extends to the internal end of the sequenced region. The telomere clone with the most available sequence, P10, contains little or no unique DNA sequence in its 9.8 kb length.
Five of the 10 cloned telomeres, P1, P3, P4, P5 and P11, had appreciable lengths of sequence that are not shared with other cloned telomeres of this study. These sequences are shown as plain black lines in Figure 2. In each case, this sequence extends to the internal border of the sequenced region and represents potentially unique sequence. The P1 subtelomeric region is unique in that it lacks strong similarity to subtelomeric regions from any of the other telomeres.
Subtelomeric sequences located within 1.5–2 kb from the base of the telomeric repeat arrays were distinguishable from more internal subtelomeric sequence in a number of respects. As noted above, they are homologous among a greater number of chromosome ends than are more internal subtelomeric regions. They also show little or no sign of containing genes, as ORFs tend to be short and have no significant homology to regions from other organisms. An especially notable characteristic of these immediate subtelomeric regions is a very pronounced strand bias in the content of pyrimidines and purines. As shown in Figure 3, the strand running 5′–3′ toward the telomere averages 70% purine over a 1.6 kb region next to the telomeric repeats. The strand bias is nearly 80% purine in the region within 400 bp from the telomere and gradually declines at more internal positions. The unique P1 telomere exhibits a very similar pattern of strand bias to other telomeres despite having very limited identity to them (Figure 3). Among the four types of subtelomeric regions we identified that were non-homologous to each other beyond ∼1.6–2 kb from the telomere (represented by the P1, P3, P5 and P10 telomeres), regions more internal than this invariably lack appreciable strand bias in purine content (Figure 3).
The sequences of the immediate subtelomeric sequence for each of the 10 telomeres we have cloned are shown aligned with one another in Figure 4. The sequences are aligned with the telomeric repeats at the start (top left of Figure 4). Numbering of the subtelomeric sequences begins with position ‘1’ as the first base pair that does not match the sequence of the telomeric repeat. The sequences shown in Figure 4 are the pyrimidine-rich strands. Sequence features present in this region include three families of short, irregular tandem repeats. A series of 8–9 bp repeats is centred ∼70 bp from the telomere, and a second series of similar repeats is present ∼180 bp from the telomere. Several copies of an unrelated 14 bp direct repeat are present near position 580. Each of these repeat families showed the same pronounced purine/pyrimidine strand bias that characterizes the region in general. Most of the larger gaps that occur in the alignment of the subtelomeric regions (excluding P1) are due to variations in the numbers of these repeats. The P1 telomere lacks the short tandem repeats present at other telomeres. Interestingly, although the P1 subtelomere is highly dissimilar from all other cloned subtelomeric regions, it does appear to have weak similarity in places to them. The region in P1 at position 280–704 (Figure 4) has 59% identity to the corresponding regions of other subtelomeres. The DNA in P1 located between this region and the telomere also has small patches of identity. Sequence that is ∼0.4 kb internal to the end of the similarity between P1 and the other subtelomeres was hybridized to 7B520 genomic DNA. The resulting Southern blot (Figure 5A) supports the claim that the P1 sequence reported in this study is unique within the 7B520 genome. A single fragment hybridized to the probe in each of eight different restriction digests that were examined.
We have named the ∼1.5 kb subtelomeric region adjacent to most of the K. lactis telomeres the R element, due to its purine-rich nature on the telomeric G-strand. The sequence data in this study and unpublished data suggest that this element is present in conserved form in 11 of 12 telomeres of 7B520. Hybridization using ∼600 bp of the R element as a probe to 7B520 DNA cut with multiple restriction enzymes is seen in Figure 5B. The P1 telomere can be said to have a smaller, highly degenerate version of the R element.
In addition to recovering subtelomeric sequences from different telomeres, our cloning experiments also recovered the adjacent telomeric repeats. Because of our cloning strategy, it was expected that some of the recovered telomeric repeats would be derived from pKN2. Those nearest to the subtelomeric sequence, however, are likely to be the native repeats, as they were present in the cell prior to pKN2 integration. Among 92 native telomeric repeats cloned from 10 subtelomeres, all but one had the wild-type 25 bp sequence. The single variant repeat observed was a 26 bp repeat resulting from a single bp insertion within three consecutive T residues (TTTTGATTAGGTATGTGGTGTACGGA, with extra base underlined). This type of rare variant repeat has been noted previously in K. lactis (Tzfati et al., 2000).
Subtelomeric sequence from four of the cloned ends shows sequence match to the K. lactis retrotransposon Tkl1.1, a Ty1-like element (Neuveglise et al., 2002). Figure 6 shows a schematic of the alignment of Tkl1.1 with the sequences present in the different subtelomeric regions. P3 has the greatest amount of similarity: an ∼2 kb stretch of 96% identity with the pol region of the element. On either end of this alignment, the similarity ends, indicating that the P3 Tkl1.1 sequence is an incomplete element. A probe made from the P3 subtelomere (mapped in Figure 2), located completely within this pol similarity is shown hybridized to 7B520 DNA cut with each of multiple restriction enzymes in Figure 5C. Most of these enzymes produced only one band that hybridized to the PstI Tkl1.1 fragment probe. As the nearest sites for these enzymes reside outside of Tkl1.1 sequence, on one or both sides of the element, inserts at positions other than the specific site mapped in the P3 telomere were expected to produce additional bands on the gel. The single enzyme (NsiI) that produced two bands has a site within the fragment used as a probe. Our results therefore indicate that the full length of the sequence used as probe is present in the 7B520 genome only at the P3 subtelomeric location we have identified.
The other telomeres with Tkl1.1 sequence only have LTR sequences. P10 and P9 have an approximately 400 bp stretch of 86% identity to the 3′ LTR of the element, P4 has a 142 bp stretch at 95% identity to the 5′ LTR, and P5 has a 182 bp stretch of 78% identity to the 3′ LTR. The two LTRs of the originally reported Tkl1.1 element are only 93% identical to each other and can therefore be distinguished from one another. A hybridization probe made from a ∼1 kb EcoRI fragment from P7 (Figure 2), which includes the Tkl1.1 LTR, produced multiple bands when hybridized to 7B520 DNA (Figure 5D). These bands represent at least three subtelomeric sequences and indicate that either the LTR or the other subtelomeric sequence present in the probe is present at a number of other sites in the genome.
ORFs found in K. lactis subtelomeric regions
There are multiple regions in the K. lactis subtelomeric regions we have sequenced that have homology to genes from other organisms. Figure 2 shows positions of ORFs of > 100 amino acids and of regions with homology to ORFs from S. cerevisiae or from other organisms. A ∼2 kb region on P1 has amino acid similarity with three related genes, FLO1, FLO5 and FLO9, a subtelomeric family of S. cerevisiae genes involved in cell adhesion known as flocculation. This similarity appears to be chiefly to the region containing copies of flocculin repeat A, a 45 amino acid repeat. There are one, eight and 18 copies of this repeat in the three S. cerevisiae genes (Teunissen and Steensma, 1995), and at least nine are found in the K. lactis FLO gene. Because our sequence ends within the flocculin repeat region, the total number of repeats is not known. Of the nine we have sequenced, six complete repeats are 44–47 amino acids in length, and three are partial repeats of 16, 24 and 27 amino acids. The K. lactis flocculin repeats are generally about 50% identical to their S. cerevisiae counterparts. Downstream of the flocculin A repeats in the S. cerevisiae FLO1, 5 and 9 genes, there are several serine-rich repeats (Teunissen and Steensma, 1995). In this region of the K. lactis gene, there is also serine-rich sequence similarity that continues through the end of the alignment.
There are regions on P10 and P7 nearly 10 kb from the telomere that share amino acid similarity with the putative arsenite transport protein Arr3p. These P10 and P7 sequences are 99% identical with each other on the nucleotide level. On the protein level there is 67% identity and 81% similarity between the S. cerevisiae and K. lactis copies and 63% identity at the nucleotide level. The available K. lactis sequence ends at a point in the alignment 20 amino acids from the 3′ end of the S. cerevisiae gene.
There are four K. lactis subtelomeric sequences that appear to encode proteins with similarity to S. cerevisiae Mch2p, a protein of unknown function (Figure 2). These homologies are present on P6, P7, P9 and P10. Three of the four are in regions of incomplete sequence. Pairwise comparisons between each translated sequence and the S. cerevisiae Mch2p show ∼45% similarity. The four K. lactis sequences show 98–99% identity with each other at the nucleotide level.
BLASTX analysis of P11 internal sequence showed a 55 amino acid alignment (50% identity/69% similarity) containing a 27 amino acid protein kinase C-terminal domain (InterPro Accession No. IPR000961E). This domain is present in 12 yeast cAMP-dependent kinases, such as Tpk1p, and in kinases of other organisms including humans. The similarity in P11 is close to an end of available sequence.
BLASTX analysis also showed that five of the K. lactis subtelomeric sequences have regions with similarity to hypothetical oxidoreductase proteins from several organisms. These hypothetical proteins are believed to be oxidoreductases, based on their similarity to members of a dioxygenase superfamily (Aravind and Koonin, 2001). The highest score was assigned to the alignment with hypothetical protein 15E6.100 from Neurospora crassa. Two of the five K. lactis ORFs (in the P4 and P7 sequences) have internal 250 bp deletions (shown by** in Figure 2).
There are four other open reading frames throughout the recovered sequence that would yield peptides of at least 100 amino acids (Figure 2). None of these have significant alignments to any known sequences when analysed with BLASTX. The ORF within the R element in P7 is not present as an ORF of > 100 amino acids in the other homologous regions, due to base pair changes that generate stop codons.
Probing other Kluyveromyces strains for homology to subtelomeric sequences from 7B520
A selection of strains has been used to assess whether subtelomeric sequences from K. lactis 7B520 identified in this study are present in other Kluyveromyces strains and species. The strains from the Spanish Type Culture Collection (CECT) were chosen from closely related taxa and K. lactis strains discussed in Belloch et al. (2002), a study of genetic variation within the K. marxianus group of yeasts. The positions of the DNA fragment probes made from 7B520 DNA are indicated in Figure 2. The P1 probe, 1012 bp located 1.2–2.2 kb from the telomere, hybridized to all but one of the K. lactis strains, to one of the K. marxianus strains, and to the K. wickerhamii strain (Figure 7A). EcoRI-digested DNA from five of the six K. lactis strains showed hybridization to a ∼3.5 kb band identical in size to the P1 EcoRI fragment of 7B520. The remaining K. lactis strain showed no hybridization to this probe. Additionally, one of two K. marxianus strains showed hybridization to the same ∼3.5 kb band, as well as to two additional bands above 10 kb. The hybridization of the P1 probe to genomic DNA from at least some K. marxianus strains is consistent with Génolevures BLAST data; the P1 sequence shows homology to three telomeric K. marxianus RSTs (data not shown). The 932 bp PstI fragment probe from P3, located about 3 kb from the telomere, hybridized to a single band in all K. lactis strains and to three bands in the K. dobzhanskii strain (Figure 7B).
A telomere-adjacent probe (R1-6 EcoRI–XbaI) from the R element was also used in examining these strains (Figure 7C). This hybridization showed a series of diffuse bands recognizable as telomeric bands in each K. lactis strain examined except K. lactis var. drosophilarum 10390, which showed a series of faint and generally larger bands. K. dobzhanskii showed several bands that hybridized well to the probe that were different in size to the K. lactis bands. The K. wickerhamii strain and both K. marxianus strains showed no hybridizing bands. Two DNA fragments from P10 (Figure 2) were also used as hybridization probes to the various strains. These hybridized to the same or very similar pattern of at least five bands of 1 to ∼8 kb in 7B520 and in all K. lactis strains except K. lactis var. drosophilarum 10390, which showed no hybridization (data not shown).
The overall picture from the hybridizations of 7B520 subtelomeric probes to other strains is consistent with the 5.8S rRNA phylogeny study previously performed using 39 CECT strains including the eight used in this study (Belloch et al., 2002). K. dobzhanskii appears to be the most closely related to K. lactis in both studies and K. wickerhamii, used as an outgroup previously, the most distantly related.
The K. lactis subtelomeric regions presented in this study contain several features that are shared with subtelomeric regions of other organisms. One such characteristic is the presence of two general zones of homology that are shared between multiple ends (Pryde et al., 1997). Distal domains are generally shorter, can lack functional genes, and are present at a large percentage of telomeres. Proximal domains of homology are usually longer, contain genes, and are present at fewer ends. Multiple unrelated proximal domains can be present within a given organism. An example of two zones of homology in humans is seen in hybridization analysis using the 4q subtelomeric sequence. A 15 kb region just proximal to the 4q telomere is shared with 17 other ends, and sequence internal to this region (15–60 kb from the end) was only shared with four other ends (Flint et al., 1997). The distal subdomain contains a high density of ESTs and short sequences that match other ends in a patchwork fashion (Der-Sarkissian et al., 2002). In S. cerevisiae a related picture is seen on a smaller scale. The distal domain of homology includes the X and Y′ elements. The X element is present in some form at all S. cerevisiae ends and consists of a core sequence of 472 bp and up to several hundred bp of families of variable repeats (Louis et al., 1994). The Y′ element, often present at about half of telomeres, consists of a 5–6 kb sequence that is sometimes present in more than one copy per telomere (Pryde and Louis, 1997). The larger blocks of similarity that make up the proximal domains of S. cerevisiae subtelomeres exist in 10 distinct sequence groups with common blocks of genes shared between two to three telomeres (http://www.le.ac.uk/genetics/ejl12/research/telostruc/ClustersSmall.html; Ed Louis, personal communication). Twenty-three of 32 telomeric ends are part of these proximal domain groups in S. cerevisiae S288C.
K. lactis 7B520 can also be described as having two zones of similarity among its telomeric ends. The R element, which lacks detectable homology to the X and Y′ elements of S. cerevisiae, comprises the distal domain and is present on 11/12 telomeres in conserved form and on the 12th in degenerate from. In this study, proximal domains of similarity, consisting minimally of a putative oxidoreductase gene, have been observed to be present on at least five telomeres. The limited sequencing done in this study prevents us from determining the full extent of proximal domains of similarity at telomeres in 7B520. However, it does appear likely that K. lactis will differ from S. cerevisiae in some respects. K. lactis is unlikely to have as many different types of proximal similarity domains as S. cerevisiae. Thus far, not more than a single type of proximal domain can be said with certainty to exist in the 7B520 strain. K. lactis also has a higher percentage of telomeres with the same proximal domain (at least 5/12 telomeres) than is seen in the sequenced S. cerevisiae strain.
Another feature in the subtelomeric sequences of many organisms is the presence of smaller repetitive elements. A variety of such repeats has been observed in different organisms. A well-studied example in S. cerevisiae is the family of subtelomeric repeats (STRs) found in the distal part of the X element. Four types of STR elements have been characterized. They vary in length (35–150 bp), but all are made up of short degenerate repeats. The K. lactis R element lacks homology to the STRs of S. cerevisiae, but it does have its own small repeats in the distal ∼650 bp of the R element. These are similar in size (∼6–14 bp) to those within S. cerevisiae STRs and occupy a similar relative position in the distal part of their respective element. Human subtelomeric regions have been found to be an area where minisatellite sequences are clustered (Royle et al., 1988; Wells et al., 1989; Vergnaud et al., 1991). Due to the observations of major clusters at several chromosome ends and of the suspected transposition of sequence containing a minisatellite from a terminal region to an internal region (Wong et al., 1990), it has been suggested that this region even serves as an origin for these minisatellites, which are then spread to other areas of the genome (Amarger et al., 1998).
Telomere-like repeats have also been observed in subtelomeric DNA. In human cells, the human telomeric repeat, TTAGGG and degenerate copies of it, are often located at the border between the two zones of similarity described earlier (Flint et al., 1997). S. cerevisiae subtelomeres can contain two types of telomere-like sequences: tracts of sequence matching its own variable telomeric repeats and short tracts of the vertebrate telomeric repeat (TTAGGG). TTAGGG sequences, both single and multiple copy, have been shown to be binding sites, in vivo (Koering et al., 2000) and in vitro (Brigati et al., 1993), for the essential yeast protein Tbf1p and can function as a telomere in S. cerevisiae (Henning et al., 1998; Alexander and Zakian, 2003). In K. lactis, no tandem blocks of TTAGGG repeats were found in any of the sequenced subtelomeric DNA (data not shown). Single TTAGGG sequences were found at the same location (∼900 bp from the telomere) in eight of the 10 ends reported. At one telomere (P1) a TTAGGG sequence was at a different position and in the opposite orientation of the others, and in the cloned P2 subtelomere, the sequence was not present.
Strand bias in base composition is another characteristic of S. cerevisiae subtelomeric sequences that we have observed in K. lactis. In S. cerevisiae, on the strand reading 5′–3′ toward the telomere, X elements have a high G/C ratio and a somewhat low A/T ratio, and Y′ elements have relatively high G/C and A/T ratios. It has been proposed that this strand asymmetry in base composition is a result of replication-associated mutation (Gierlik et al., 2000). DNA replication has been implicated in the bias that is observed in the base composition in many bacteria, as all bacterial genomes with well-defined replication origins and termini are asymmetric in the base composition of leading and lagging strands (Lobry, 1996; Grigoriev, 1998). Strand bias in replicative mutagenesis has recently been shown associated with yeast origins as well (Pavlov et al., 2002). Eukaryotic telomeres may be similar to bacterial chromosomes, since in both cases leading and lagging replication strands are set due to their positions relative to the ARS. In support of this, the strand bias that exists in S. cerevisiae subtelomeric sequences is diminished internal to positions of ARS elements (Gierlik et al., 2000). This might predict that the position marking the beginning of strand bias in base composition of the R element and the degenerate P1 subtelomeric sequence of K. lactis may indicate the position of an origin of replication. Alternatively, or in addition, the strand bias in base composition of K. lactis subtelomeric sequences might be due to selection for some advantageous property, one possibility of which is discussed below.
Another feature of K. lactis subtelomeric sequences that has been seen in other organisms is the presence of transposable elements (Zou et al., 1995; Flint et al., 1996; Mefford et al., 2001). An extreme example of the association between telomeric ends and transposons is the D. melanogaster chromosome end, which is composed of, and maintained by, active retrotransposons instead of DNA synthesis by telomerase. S. cerevisiae has a retrotransposon associated with its ends; Ty5 is an element that is typically found near the X element. Active copies of this element have been shown to have a preference for insertion at this region of the genome (Zou et al., 1996). Fragments of a characterized K. lactis LTR-retrotransposon (Tkl1.1) are found near at least four telomeres of 7B520. Tkl1.1 is an incomplete Ty1-like element that has a truncated gag element (Neuveglise et al., 2002). LTR fragments from this element and from a truncated version (Tkl1.2) are believed to be present in the genome in at least 30 copies, based on random sequence tags from the Génolevures project (Neuveglise et al., 2002). Our comparisons of Génolevures BLAST data of a K. lactis telomere search and a Tkl1.1 LTR search showed that at least four of the 30+ copies in strain CLIB 210 are subtelomerically located (data not shown). Three of these sequences appear to correspond to inserts reported in this study.
Subtelomeric regions have been described as the most plastic of the genome (Pryde et al., 1997). Manifestations of this include a lack of sequence conservation between species, allelic variability within species, and the flexible nature of many of the resident genes. Genes located in subtelomeric regions often fit the definition of contingency genes, genes that have high mutation rates and can help the host organism adapt to a changed environment (Moxon et al., 1994; Barry et al., 2003). The subtelomeric surface antigen genes of P. falciparum, T. brucei (Borst and Ulbert, 2001) and other parasites are well-known examples of eukaryotic contingency genes (Barry et al., 2003). In human cells, variation is prominent among the subtelomeric members of the olfactory receptor (OR) gene family and it has been suggested that this region of the human genome serves as a ‘nursery’ for new olfactory receptor proteins (Mefford et al., 2001). Some gene families near telomeres in S. cerevisiae are clear candidates for being contingency genes (Barry et al., 2003). Yeast used in brewing, baking and wine-making have undergone strong selection for centuries and exhibit substantial variation in subtelomeric genes (Codon et al., 1998; Dequin, 2001). The SUC and RTM genes, involved in sucrose utilization and resistance to molasses, respectively (Carlson et al., 1985; Ness and Aigle, 1995), vary in copy number and telomere of location between strains, and are completely missing in some strains (Naumov et al., 1996; Denayrolles et al., 1997). Similarly, the subtelomeric MAL and MEL gene families, involved in utilization of other sugars, vary in number between strains (Naumov et al., 1994, 1995).
S. cerevisiae and K. lactis have similarities in genes that are found in subtelomeric DNA as FLO, ARR3 and MCH2 family genes are present there in both yeasts. The FLO and ARR3 genes have characteristics suggesting they could be contingency genes. Both genes are expected to participate in processes, flocculation and arsenite resistance, respectively, that would affect the cell's response to its environment. Flocculation in S. cerevisiae is known to involve FLO gene family members and appears to be a variable and often unstable trait (Dequin, 2001; Verstrepen et al., 2003). The function of MCH2 is unclear. It is homologous to monocarboxylic acid transporters but does not appear to play that role in S. cerevisiae (Makuc et al., 2001).
Subtelomeric gene families likely arise as a result of recombination between chromosome ends. New forms of the P. falciparum subtelomeric var gene copies have been observed resulting from increased ectopic recombination (Freitas-Junior et al., 2000). In S. cerevisiae, a constitutive mutant form of the subtelomeric MAL63 gene was derived via gene conversion (Wang and Needleman, 1996). Also, subtelomeric recombination clearly occurs between Y′ elements in wild-type strains (Louis and Haber, 1990). This probably contributes to the homogenization of sequence within these elements. In K. lactis, homogenization of subtelomeric sequence also has occurred, given the extensive similarity that exists between the ends. The P1 telomere of 7B520, with its poor sequence match to the other ends, may be an example of a sequence that has escaped the homogenization and diverged over time to become a quite distinct sequence.
Recombinational telomere maintenance provides an extreme example of how recombination can rapidly alter regions in and around telomeres. This process has been observed in the absence of telomerase activity in human cell lines (Bryan et al., 1997; Dunham et al., 2000) as well as in S. cerevisiae (Lundblad and Blackburn, 1993; Teng and Zakian, 1999) and K. lactis yeast mutants lacking telomerase (McEachern and Blackburn, 1996). The mechanisms involved in maintaining telomeres through recombination have not been fully resolved, but they are believed to include homologous recombination as a means of spreading telomeric sequences to other ends in the cell. In S. cerevisiae, Type 1 survivors show greatly amplified arrays of subtelomeric Y′ elements. In Type 2 survivors of S. cerevisiae and all K. lactis survivors examined, the amplification seen is that of telomeric sequence (McEachern and Blackburn, 1996; Teng and Zakian, 1999). In K. lactis, the multiple lengthened telomeres within a given survivor appear to arise from a single source, suggesting widespread subtelomeric sequence homogenization (Natarajan, 2002). K. lactis survivors also often showed spreading of a subtelomeric marker gene from one telomere to most or all other telomeres in the cell (McEachern and Iyer, 2001).
As mentioned above, the extent of common sequence found at multiple ends of K. lactis 7B520 implies that there has been homogenization of these ends over time. The question arises of how a cell determines which chromosome end to use as a donor in subtelomeric gene conversion events. It seems likely that certain sequence features would perform better mechanistically as a recombination donor than others. In vitro assays with RecA, a prokaryotic DNA strand exchange protein, and its yeast equivalent, Rad51p, show a preference for both binding GT-rich DNA and promoting strand invasion with GT-rich DNA (Dixon and Kowalczykowski, 1991; Tracy et al., 1997). It is possible that sequence features of the K. lactis R element, such as its strand bias in base composition and its families of short repeats, might favour its ability to spread to other telomeres through recombinational processes.
The availability of K. lactis subtelomeric sequence will be useful in many future studies. Interesting questions that could be examined include the extent of subtelomeric variation in natural populations and mutants affecting telomere function, the evolutionary roles of subtelomeric genes, and possible functions of the R element. Also of interest will be comparisons between 7B520 subtelomeric sequences and those from the recently sequenced K. lactis CBS2359 genome (unpublished data, Monique Bolotin-Fukuhara).
We would like to thank Will McRae and Ashley Chadha for their technical contributions to this work. We also thank Sidney Kushner for the gift of the plasmid pWSK29 and for critical reading of this manuscript. Eladio Barrio provided the yeast strains from the Spanish Type Culture Collection. This work was funded by grants from the American Cancer Society (RPG GMC-99746) and from the National Institutes of Health (GM6164501).