The budding yeast rRNA and ribosome biosynthesis (RRB) regulon contains over 200 genes


  • Christopher H. Wade,

    1. Molecular Biology and Biochemistry Department, Wesleyan University, Middletown, CT 06459, USA
    Current affiliation:
    1. Social and Behavioral Research Branch, National Human Genome Research Institute, Bethesda, MD 20893-0249, USA.
    Search for more papers by this author
  • Mark A. Umbarger,

    1. Molecular Biology and Biochemistry Department, Wesleyan University, Middletown, CT 06459, USA
    Current affiliation:
    1. Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.
    Search for more papers by this author
  • Michael A. McAlear

    Corresponding author
    1. Molecular Biology and Biochemistry Department, Wesleyan University, Middletown, CT 06459, USA
    • Molecular Biology and Biochemistry Department, Wesleyan University, Middletown, CT 06459, USA.
    Search for more papers by this author


The ribosome biogenesis pathway constitutes one of the major metabolic obligations for a dividing yeast cell and it depends upon the activity of hundreds of gene products to produce the necessary rRNA and ribosomal protein components. Previously, we reported that a set of 65 S. cerevisiae genes that function in the rRNA biosynthesis pathway are transcriptionally co-regulated as cells pass through a variety of physiological transitions. By analysing multiple microarray-based transcriptional datasets, we have extended that study and now suggest that the ribosomal and rRNA biosynthesis regulon contains over 200 genes. This regulon is distinct from the set of ribosomal protein genes, and the promoters of the expanded RRB gene set are highly enriched for the PAC and RRPE motifs. Since a similar pattern of organization and gene regulation can be recognized in C. albicans, the RRB regulon appears to be a conserved, extensive, and metabolically important group of genes. Copyright © 2006 John Wiley & Sons, Ltd.


One of the ongoing challenges of the post-genomic era in yeast is the elucidation of the functions and activities of the large number of genes that constitute the yeast genome. The development of genome-wide technologies, such as DNA microarrays, has made it possible to simultaneously monitor the expression of thousands of genes at a time, and to correlate changes in gene expression patterns with specific metabolic processes or cellular responses (DeRisi et al., 1997; Spellman et al., 1998). In this way it is becoming possible to identify and dissect the regulatory circuits and pathways that serve to coordinate gene product activities, either as components of basic aspects of cellular metabolism, or as appropriate responses to specific stimuli including changing environmental conditions (Gasch et al., 2000).

One of the most significant and highly regulated aspects of cellular metabolism in yeast is the production of ribosomes (reviewed in Warner, 1999). Rapidly dividing cells need to produce some 2000 ribosomes every minute, each of which contains 79 different ribosomal proteins and four rRNAs. These two considerable biosynthetic pathways, one for the production of ribosomal proteins and one for the production of rRNAs, depend upon a significant fraction of the roughly 6000 genes in yeast. Most of the ribosomal protein (RP) genes are duplicated in the genome, and as a group they account for a large fraction of the total number of RNA polymerase II transcripts. Expression of the 138 RP genes is tightly regulated in response to varying cellular conditions, and many of the promoters of RP genes share binding sites for the Rap1p or Abf1p transcription factors (Planta, 1997; Warner, 1999). Other proteins that have been shown to function in the regulated expression of the RP genes include Sfp1p, Fhl1p, and Ifh1p (Jorgensen et al., 2002; Lee et al., 2002; Rudra et al., 2005).

The synthesis of the four rRNAs is also complex and requires the activity of hundreds of gene products (Kressler et al., 1999; Krogan et al., 2004). The rRNA genes alone represent a significant fraction of the entire genome, and exist as a tandem array of approximately 150 rDNA repeats located on chromosome XII. RNA polymerase I is used to transcribe a 35S primary rRNA transcript that is subsequently cleaved and modified to yield the final 25S, 18S and 5.8S mature rRNAs (Planta, 1997). This complex rRNA-processing pathway depends on the activities of a multitude of factors, including RNA exonucleases, endonucleases, base modification enzymes and RNA helicases (Tollervey, 1996). The 5S rRNA is transcribed from a distinct gene in the rDNA repeat by RNA polymerase III, and it too must be processed before it is incorporated into ribosomes.

Previously, we used genome-wide expression analysis to identify and characterize a set of 65 transcriptionally co-regulated genes, many of which function in the ribosome and rRNA biosynthesis (RRB) pathway (Wade et al., 2001). This regulon is distinct from the RP regulon, as it represents a parallel pathway required for the production of rRNAs. The genes in the RRB regulon were found to encode for proteins that play varied roles in the rRNA biosynthesis pathway, ranging from transcription factors for the relevant RNA polymerases I and III to RNA helicases and RNA-modifying enzymes. Sequence analysis of the promoters of the RRB genes revealed that they were highly enriched for the PAC and RRPE motifs (Dequard-Chablat et al., 1991; Hughes et al., 2000), and there is strong evidence indicating that these motifs contribute to the regulation of RRB gene expression (Beer and Tavazoie, 2004; Fingerman et al., 2003; Wade et al., 2001). The PAC element was named for its occurrence within promoters of subunits of RNA polymerases A (I) and C (III), and RRPE stands for ribosomal RNA processing element (Hughes et al., 2000).

In this study we present evidence that the RRB regulon can be expanded to include over 200 genes in budding yeast. Like the original 65 gene RRB set, the members of the expanded RRB regulon are associated with functions related to ribosome and rRNA biosynthesis. The promoters from this larger set of genes are also enriched for the PAC and RRPE motifs, and the genes are transcriptionally co-regulated under a variety of physiological conditions. Furthermore, we present evidence that elements associated with the RRB regulon from S. cerevisiae can also be recognized in C. albicans, suggesting that this metabolically important regulatory circuit may be widely conserved at least across yeast species.

Materials and methods

Microarray analysis

Strain YMM13 (MATaleu2Δ1, trp1Δ63, ura3-52) was grown to a density of 3 × 107 cells/ml in yeast extract peptone dextrose (YPD, pH 5.5) at 30 °C. Cells were arrested by the addition of α-factor (5 µg/ml) for 2 h. The cells were washed and then released into fresh media, after which samples were taken every 7 min, centrifuged and flash-frozen in dry ice and ethanol. A hot acidic phenol extraction was then used to prepare total RNA from all of the samples (Ausubel et al., 1995). The RNA samples were prepared and the hybridization onto Yeast Genome S98 Genechips, as directed by the manufacturer (Affymetrix), and the data was processed as described in Spellman et al. (1998) and Wade et al. (2001).

The sort and rank method to create expression similarity scores

The normalized, background-corrected and log2-transformed (Cy5 : Cy3 ratio) cDNA expression profiles of 64 (one of the original 65 RRB genes has since been deemed not to be a true open reading frame) of the previously classified RRB genes were extracted from the following four microarray datasets:

  • 1.Release from α-factor arrest: a 17 point time-course with measurements every 7 min (Spellman et al., 1998).
  • 2.Sporulation: a 7 point time-course with measurements taken at 0, 0.5, 2, 5, 7, 9 and 11.5 h (Chu et al., 1998).
  • 3.Heat shock: a 9 point time-course for cells grown in a 30 °C medium and then switched to a 37 °C medium (Gasch et al., 2000).
  • 4.Osmotic stress: This dataset contained three subsets: (a) a three point, 30 min time-course for cells initially grown at 29 °C in YPD containing 1 M sorbital and then transferred to YPD at 33 °C; (b) a three point, 30 min time-course for cells grown at 29 °C in 1 M sorbital and then transferred to 33 °C YPD containing 1 M sorbital; (C) a seven point, 2 h time-course for cells initially grown in YPD and then transferred to YPD containing 1 M sorbital (Gasch et al., 2000).

The extracted RRB gene profiles were used to calculate an average RRB profile for each of the four datasets. The function, daisy, from the S library, was utilized to calculate the Euclidean distance between the average RRB expression profile and the profile of every other gene in the yeast genome (separately for each of the four datasets) (Ripley and Venables, 1997). The Euclidean distance meets the requirements of a metric, and is defined as follows:

equation image

where xik and xjk denote the expression levels of genes i and j at time point k.

As a result of our distance calculations, each gene had associated with it four distances (one for each dataset). Using these four distances, genes were then ranked in ascending order, with the least distant genes receiving the rank closest to one. Consequently, four individual ranks were generated: (a) the release from α-factor arrest rank; (b) the sporulation rank; (c) the heat shock rank; (d) the osmotic stress (sorbital exposure) rank. The four distances associated with each gene were then summed in order to generate a total distance for each gene.

Functionally annotating genes

A file containing the gene ontology of each budding yeast open reading frame (as of 11 April 2005) was obtained via ftp from the Saccharomyces cerevisiae Genome Database ( This file was then used to annotate each of the genes in our overall rank with the following information: (1) process; (2) function. Using this functional information, we then classified each gene into one of four different categories: (1) RRB-like; (2) non-RRB-like or other; (3) unknown; (4) ribosomal protein (RP). RRB genes are defined as those genes with any association in AmiGO to rRNA metabolism functions, transcription from the RNA polymerase I promoter, and transcription from the RNA polymerase III promoter. Additional genes were included because they met the AmiGo criteria for inclusion in the RRB functional category, based upon current SGD annotations. RP genes are defined as those genes which are defined as a structural constituent of the ribosome on the AmiGO annotation site. Again, genes which met the same criteria in the current SGD annotation were included in the RP gene category. Non-RRB, or other, genes are those genes which had an annotated biological pathway or molecular function which was not included in the RRB or RP functional categories. Unknown genes were defined as genes which had no annotated biological pathway or molecular function.

Determining the statistical significance of enrichment

A hypergeometric distribution (Tavazoie et al., 1999) was used to calculate the probability of observing at least k genes from a functional category within the top n genes of our overall rank, which is given by:

equation image

where F is the total number of genes within the functional category and G is the total number of genes in the genome. Functional p values were calculated in Mathematica using its discrete distributions package.

Identifying conserved motifs in promoters

A file containing the 300 bases of 5′ DNA sequence upstream of the start of translation (promoter sequences) of each S. cerevisiae gene was obtained via ftp from the Saccharomyces cerevisiae Genome Database. The program MEME (Multiple Expectation Maximization for Motif Elicitation) was then run, using a file which contained the sequences for the top 100 genes in our ranking (Bailey and Elkan, 1994). MEME is a program that performs local multiple sequence alignments via an expectation maximization algorithm, and hence identifies common sub-strings (motifs) present in a set of strings (sequences). The motifs that were generated were then used with the computer program MAST (Motif Alignment and Search Tool; Bailey and Gribskov, 1998) to determine the genome-wide motif distribution, using an E value of 50 as a cutoff.

Identifying the RRB regulon in C. albicans

Comparison resource links from the Saccharomyces cerevisiae Genome Database's web pages for genes in our top 100 positions were used to identify Candida albicans homologues. The 500 bases of 5′ untranslated DNA sequence was obtained from the CandidaDB for these C. albicans homologues and MEME was then run on these sequences using the same parameters that were used on the promoter sequences of our top 100 genes.

For the expression analysis, we chose to analyse two sets of C. albicans microarray data: (1) heat shock: four point time-course for heat-shocked C. albicans cells—measurements were taken at: 0, 10, 30 and 60 min; (2) osmotic stress: a four point time-course for C. albicans cells exposed to osmotic stress—measurements were taken at: 0, 10, 30 and 60 min (Enjalbert et al., 2003). These datasets were analysed through the use of PAM (partitioning about medoids) function provided by the S-plus statistical analysis package (Ripley and Venables, 1997). PAM implements a variant of a k-means clustering algorithm. Before separately clustering each of the C. albicans datasets, we chose to remove those genes whose expression profiles changed relatively little over time. A cut-off level was set in order to reduce the total number of genes in each of the datasets to approximately 3000 genes. Through the use of PAM, each of these filtered datasets was clustered into 12 clusters.


Expression of the RRB genes increases upon release from α-factor arrest

Our previous analysis of genome-wide gene expression profile datasets led to the identification of a group of 65 yeast genes which were transcriptionally co-regulated under a variety of conditions (Wade et al., 2001). Since a majority of the characterized genes within that set functioned in the ribosome and rRNA biosynthesis pathways, it was named the RRB regulon. In that study we performed relative, quantitative, reverse transcription polymerase chain reaction (RT-PCR) assays to monitor the mRNA expression profiles of five candidate genes from within the RRB regulon. These experiments confirmed that this small subset of RRB genes did in fact share similar expression profiles under a range of conditions, including release from α-factor arrest, heat shock and tunicamycin treatment. We did notice a difference, however, in the shape of the expression profiles between our experiments and the previously published microarray data for yeast cells after release from α-factor arrest mitosis (Spellman et al., 1998). Whereas our results indicated that gene expression levels for the candidate RRB genes rose quickly following release from α-factor arrest, the published microarray profiles showed evidence of an initial steep decline (Wade et al., 2001).

To address this discrepancy and to expand our analysis, we performed our own microarray experiments on yeast cells as they progressed into mitosis. In duplicate experiments we recovered mRNA samples every 7 min as yeast cells recovered from α-factor arrest, and these samples were processed and hybridized to Affymetrix Yeast Genome S98 Genechips. When we investigated the expression patterns of the members of the entire RRB gene set, we found that they did share similar profiles, and they confirmed the rise in expression levels following release from arrest that was seen in our previous RT-PCR analysis (Figure 1A). While we are unable to tell why the previous microarray analysis indicated that RRB gene expression fell upon release from α-factor arrest (Spellman et al., 1998), one possibility is that the handling conditions involved in releasing the cells from the arrest in that study elicited a stress response. We suggest that it would be reasonable to expect that cells would increase RRB gene expression (thereby facilitating ribosome biogenesis) as they prepared to proceed through the cell cycle.

Figure 1.

The expression of the RRB gene set rises after release from α-factor arrest. Gene expression profiles were monitored following release from α-factor arrest by microarray analysis (A) average expression profiles from 64 RRB genes. The error bars represent one standard deviation. (B) expression profiles of individual genes not included in the original RRB gene set

We also searched through our microarray data for genes that, although not part of our original RRB gene set, were known to function in the rRNA biosynthesis pathway. We found that many of the rRNA-related genes that we examined did in fact have expression patterns that matched the profiles of the RRB genes (Figure 1B), and these profiles were unlike a non-RRB gene control (CLN2). This result suggested to us that the RRB regulon may have more than 65 members.

The budding yeast RRB gene set includes over 200 genes

In order to better define the membership of the RRB regulon, we adopted a more integrative approach for identifying sets of co-regulated genes. We reasoned that the initial group of RRB genes could be used as a reference set with which to compare other gene expression profiles. Genes that exhibited expression profiles similar to those of the original RRB genes would be good candidates for being RRB genes themselves. Therefore, rather than using a k-means algorithm to generate discrete clusters of transcriptionally related gene sets, we used a Euclidean metric to rank genes based on their similarity to the average RRB expression profile derived from the 65 previously identified RRB genes. In order to limit the problems associated with missing or erroneous individual microarray data points, we analysed several gene expression series, including multiple data point transitions through mitosis (Spellman et al., 1998), meiosis (Chu et al., 1998), heat shock and osmotic stress (Gasch et al., 2000). For each experimental condition, we calculated the Euclidean distance between the expression profile of the average RRB gene with that of each gene in the yeast genome. These calculated distances can be interpreted as measures of dissimilarity, with the lowest scores corresponding to the genes whose expression profiles are most similar to the average of the original RRB gene set. Consistent with our previous analysis, the original RRB genes exhibited related expression profiles as the yeast cells progressed through these varied physiological transitions (Figure 2A–D).

Figure 2.

Members of the RRB gene set are transcriptionally co-regulated across multiple growth conditions. (A) The average mRNA expression profiles for the original RRB gene set are depicted as yeast cells progress through (A) sporulation, (B) heat shock, (C) osmotic stress and (D) release from α-factor arrest. (E) Outline of the ranking strategy used to sort the gene expression profiles

We then summed the four dissimilarity values for each gene, and ranked the overall scores by increasing values (Figure 2E). This produced a list of 5442 genes, starting with the gene whose expression profiles were most similar to the RRB average (YNL075W or IMP4, in position 1), to the gene with the most dissimilar expression profiles (YFL014W or HSP12, in position 5442). As would be expected, the great majority of the previously classified RRB genes were found amongst those genes at the top of the list. Interspersed with these RRB genes were other genes whose expression profiles most closely matched the RRB average. To assess whether or not these other genes may be candidates for membership in the RRB gene set, we investigated their respective annotations for activities or functions related to ribosome or rRNA biosynthesis. Genes were designated as being RRB candidates if they were associated with Gene Ontology (AmiGO) annotations, including rRNA metabolism functions or transcription from the RNA polymerase I or III promoters. Genes were classified as ribosomal protein (RP) genes if they encoded structural components of the ribosome. The other genes were classified as being non-RRB or unknown, depending on whether they had been ascribed a function that was outside of the RRB or RP pathways.

Unlike the original k-means clustering approach that defined the discrete, 65 member RRB gene set, this strategy yielded a linear ranking that was based on a continuum of gene expression similarity scores. While it is difficult to unambiguously assign a single cut-off point that divides the RRB genes from the other genes in the yeast genome, we could, however, recognize informative patterns within this ranking of the gene expression profiles (Figure 3). For example, the top 188 positions in this list included 58 of the previously identified RRB genes and another 36 genes with RRB related annotations. It also included 69 non-RRB genes, 24 genes of unknown function, and only 1 RP gene. Since in total, only 225 of the 5442 genes had GO annotations consistent with RRB activities, the probability that 94 of them would be found by chance in the set of the top 188 genes is exceedingly low (p = 5.1 × 10−81). Similarly, the overall ranking position of the ribosomal protein (RP) genes was highly significant, with the majority of the genes (i.e. 103 RP genes) falling in the list between positions 189 and 500 (p = 1.6 × 10−86). Thus, on the whole, this sorting strategy was able to demonstrate that gene expression profiles correlated with gene activities for at least the RRB and RP gene sets. Additionally, the ranking positions revealed that, as a group, the RP genes exhibited expression profiles that were closely related, yet distinct from those of the RRB genes.

Figure 3.

Ranking of the 5442 yeast genes based on their similarities to the average RRB gene expression. The functional categorization of the relevant gene products was determined for the sets of genes falling in positions 1–188, 189–500 and 501–5442. Included are annotations of rRNA biogenesis (RRB), ribosomal proteins (RP), functions other than rRNA biogenesis (Non-RRB) and genes with undetermined functions (Unknown)

The expanded RRB gene set is enriched for the PAC and RRPE promoter motifs

Having expanded the potential membership of the RRB gene set, we then sought to investigate whether this larger gene set was also enriched for the two promoter motifs that we identified in our original analysis. The MEME computer program (Bailey and Elkan, 1994) was used to scan the 300 bp sequences immediately upstream of the initiator codons for each of the top 100 genes from our ranking. Once again, consensus sequences matching the PAC and RRPE sequences were found as the two most frequently found motifs (Figure 4A). We then used the output from the MEME program to search for occurrences of the motif sequences within the entire yeast genome by using the program MAST (Bailey and Gribskov, 1998). Briefly, MAST identifies motif occurrences within the sequences of interest that meet a defined level of statistical significance.

Figure 4.

The RRB gene promoters are enriched for the PAC and RRPE motifs. (A) MEME was used to identify two highly enriched sequence motifs (PAC and RRPE) from within the 300 bp promoter sequences of the top 100 genes in the transcriptional ranking. (B) The frequency of occurrence of the two motifs was greatest for those genes at the top of the transcriptional ranking

We used the Position-Specific Probability Matrix (PSPM) of PAC and RRPE produced by MEME as separate inputs to MAST and searched the yeast promoter database. MAST identified 297 gene promoter sequences within this database as containing PAC and 352 gene promoters sequences as containing RRPE. Of the 297 promoters containing PAC, 129 were those of genes in the top 188 of our rank. Similarly, of the 352 promoters identified as containing RRPE, 102 were those of genes in the top 188 (Figure 4B). Thus, the top genes in our ranking were heavily enriched for both of these two motifs (Table 1), with the associated probabilities of the PAC and RRPE motifs being found in this distribution by chance at 1.6 × 10−130 and 2.1 × 10−75, respectively. The set of genes falling within positions 189–500 of our ranking were also enriched for the PAC and RRPE motifs when compared to the large set of genes encompassing positions 501–5442. These PAC and RRPE occurrences were not associated with the promoters of the RP genes. We could identify known RRB genes in positions below the top 188 genes in the ranking and, interestingly, those that contained PAC or RRPE motifs within their promoters tended to fall at the higher positions (Table 2).

Table 1. Genes ranked 1–188 in similarity to the average RRB expression profile
original image
Table 2. RRB genes ranked over the cutt-off of 188
original image

We then investigated the positioning and orientation of the motif occurrences within the promoters of the respective genes. There was a strong positional bias in the positions of the motifs, with the great majority of the occurrences falling between 50 and 200 bp upstream of the initiator ATG (Figure 5A), and typically, the PAC motif was found to be more proximal to the ATG than the RRPE motif. In those promoters that contained both motifs, there was also an orientation bias, with the majority of the motifs (53%) running 5′ to 3′ on the coding strand (Figure 5B). For the majority of the promoters, the relative spacing between the two motifs ranged from 12 to 35 bp (Figure 5C), indicating that there was a relatively narrow spacing bias. Also, there did not appear to be any obvious phasing in this distribution that would indicate that the motifs were aligned with respect to either full or half turns of the double helix.

Figure 5.

There is an orientation bias for the PAC and RRPE motifs. (A) The positions of the two motifs were highly biased with respect to the initiator ATG. (B) The relative position of the PAC and RRPE motifs are indicated for those genes that contain both motifs. (C) The relative spacing between the PAC and RRPE motifs varies from 6 to 60 bp

One interesting observation that we made was that 28 (15%) of the top 188 co-regulated genes were found to be directly adjacent to each other in the genome. While the functional significance of this unlikely distribution (p = 4.6 × 10−3) is not known, one possibility is that the mechanisms that govern transcriptional activity could regulate pairs of adjacent genes. This explanation would extend beyond the case of bidirectional promoters, since only one-third of the pairs of genes have orientations consistent with this possibility (Figure 6). Of the remaining convergent or tandemly arranged gene pairs, there were a number of arrangements with respect to the occurrence of the promoter motifs. On the whole, the genes within this subset were more likely (18/28, or 64%) to contain both PAC and RRPE motifs within their promoters as compared to the other genes (73/160, or 46%) within the top 188 ranking positions.

Figure 6.

A significant fraction (15%) of the RRB genes exist as tandem pairs in the genome. The relative frequencies and positions of the PAC (P) and RRPE (R) motifs are indicated for the 28 paired RRB genes from within the top 188 gene ranking positions. Red indicates RRB functions, black indicates non-RRB functions and blue indicates unknown functions

The RRB regulon can be recognized in Candida albicans

In order to address the question as to whether or not elements associated with the RRB regulon exist in other microorganisms, we looked for similar promoter motifs and expression patterns in the yeast Candida albicans. Of the top 100 genes in our overall ranking for S. cerevisiae, 92 were identified as having homologues in C. albicans. We retrieved sequences corresponding to the 500 bp of DNA immediately upstream of the initiator codons for these genes and used the program MEME to search for any common motifs. MEME identified the two most common motifs within these sequences, the first being an A-tract that was identified multiple times within most of the promoters. The second of these two motifs was one that strongly resembles the PAC element from S. cerevisiae (Figure 7A). This motif was identified in over half of the C. albicans promoters, and it matched the S. cerevisiae PAC motif at 11/13 positions (including the core GATGAGATGAG sequence). We also investigated the location of the PAC motif occurrences within the promoters of the genes from C. albicans. As in the case from S. cerevisiae, there was a strong positional bias, with the majority of the motifs falling between 50 and 150 bp upstream of the respective initiator codons (Figure 7B).

Figure 7.

Elements of the RRB regulon can be found in C. albicans. (A) The PAC motifs were highly represented within the promoters of RRB orthologues from C. albicans. (B) The relative position of the PAC motif mirrored that which was found for S. cerevisiae. The RRB homologues from C. albicans are transcriptionally co-regulated during both heat (C) and osmotic shock (D)

Given that orthologues of the RRB genes from S. cerevisiae can be recognized in C. albicans, and these genes are enriched for the PAC promoter motif, we sought to determine whether these genes are also transcriptionally co-regulated. We analysed microarray-derived expression profiles of C. albicans cells as they progressed through heat shock and osmotic stress (Enjalbert et al., 2003). We selected from these datasets the subset of genes that varied the most during the respective time-courses (roughly 3000 genes in each case), and a partitioning about medoids (PAM) function was used to cluster the gene expression profiles into 12 groups. We then analysed the membership of the resulting clusters to determine whether or not RRB-related orthologues were grouped into sets of related expression profiles. Of the 347 genes in Cluster 4, 84 were orthologues of the genes from the top 188 positions from the S. cerevisiae ranking (p = 1.4 × 10−21). For Cluster 6 the correspondence was 47 orthologues out of 214 genes (p = 1.1 × 10−10). Furthermore, when we investigated the respective expression responses, both Clusters 4 and 6 exhibited similar profiles (Figure 7C). For the osmotic stress experiment (Figure 7D), 89 of the 363 genes in Cluster 1 were orthologues of genes in the top 188 positions in the S. cerevisiae ranking (p = 7.0 × 10−31). Therefore, elements associated with the RRB regulon from S. cerevisiae, namely transcriptional co-regulation of ribosome and rRNA related genes that are enriched for positionally biased PAC motifs in their promoters, can also be recognized in C. albicans.


A variety of tools and algorithms have been developed to sort through the extensive DNA microarray datasets in order to identify transcriptionally co-regulated sets of genes. The outcomes of these analyses are determined (and limited) by both the experimental data and the initial assumptions and parameters by which the datasets are analysed to calculate measures of similarity (or dissimilarity). The advantage of the approach taken in this study is that it involved no preset assumptions that would limit the number of genes that could be candidates for inclusion in the RRB gene set. A consequence of this strategy is that it does not provide an unambiguous position within the continuous scale of expression rankings with which to delineate similar from dissimilar transcriptional profiles.

Given this consideration, we used functional information in order to better define and characterize the membership of the RRB gene set. We included in the RRB category those genes with annotations associated with rRNA metabolism, or transcription from the relevant RNA polymerases I or III. This was a deliberately restrictive definition, and we noted that there were a number of genes towards the top of our transcriptional ranking that had more general annotations, such as RNA helicases or functions related to tRNA metabolism. Since these functions are closely related to rRNA and ribosome metabolism, it is reasonable to suspect that yeast cells may have evolved to co-regulate the expression of these genes as well. Although for the purposes of some of our analysis (Table 1) we choose to focus on those genes that fell into the top 188 positions in the expression ranking, there were clearly strong candidates for inclusion in the RRB regulon that fell below that cut-off (Table 2). By combining data on gene expression patterns, functional annotations and promoter motif occurrences, we suggest that the RRB regulon contains over 200 genes.

Our analysis correlates well with previous studies that demonstrated that ribosome biosynthesis factors exhibited similar expression profiles. This includes one study that followed the transcriptional regulation of a subset of 10 ribosome biogenesis factors in response to cellular stresses, such as heat shock and nutrient deprivation (Miyoshi et al., 2003). Also, of the 119 nucleolar proteins that were identified as being linked through co-immunoprecipitation experiments, half of them were included in our top 188 positions (Jorgensen et al., 2002). The greatest overlap is with the genes in the so-called ‘ribosome biogenesis’ or Ribi regulon, a set of 236 co-regulated genes that was enriched for factors that play roles in ribosome biogenesis as well as in tRNA synthesis and translation efficiency (Jorgensen et al., 2004). Some 52% of the Ribi genes were located in the top 188 positions of our gene ranking, and 81% of the genes were in the top 500 positions. While the exact membership of the RRB regulon is harder to define than the membership of the RP regulon, it is clear that both sets of genes are regulated in similar, yet distinct, manners. Since it would appear advantageous for the cell to co-regulate the expression of these two classes of genes together (i.e. for ribosome biogenesis), is not clear why the RP and RRB sets of promoter configurations are distinct. One possibility is that by splitting up the regulation of the two sets of genes, the cell can respond to circumstances whereby it needs different levels of the two classes of proteins (i.e. structural proteins of the ribosome vs. enzymes).

The discovery and characterization of such a large class of genes not only advances our understanding of a major aspect of cellular metabolism, but also offers a strong predictive value for deciphering genes that have, as yet, no known activities. We suggest that those genes falling near the top of the gene ranking, particularly those that contain PAC or RRPE motifs in their promoters, are strong candidates for having an activity related to rRNA or ribosome biosynthesis. In this regard, it is worth noting that, of the 20 genes included in our original RRB regulon that had no known functions at that time (Wade et al., 2001), 18 have subsequently been found to have roles related to rRNA biosynthesis. By applying this strategy to other organisms for which genome-wide datasets are available, it would be possible to identify novel, species-specific ribosome biogenesis factors. These factors could represent useful targets for the development of new drugs, such antibiotics targeted to specifically impair ribosome biogenesis. More generally, this strategy could be widely applied towards identifying functionally and transcriptionally related gene sets, thereby facilitating efforts towards obtaining genome-wide annotation datasets.

Although the DNA expression profiles of the RRB genes reflect contributions from both mRNA transcript synthesis and turnover, the high enrichment and biased positioning of the PAC and RRPE motifs in the promoters of many of the RRB genes suggests that they are likely to fall under a common transcriptional control mechanism. Evolutionarily, the two motifs are conserved across yeast species (Kellis et al., 2003) and one study found that the RRPE and PAC motifs represent the forth and fifth most highly conserved cis-sequences found within a comparison of 14 different fungal genomes (Gasch et al., 2004). Both motifs were previously recognized for being enriched within the promoters of genes involved in RNA metabolism (Tavazoie et al., 1999; Wade et al., 2001), and detailed analysis of gene expression profiles revealed that the relative spacing and orientations of the motifs impacted the coherence of expression patterns (Beer and Tavazoie, 2004). The results of two independent experiments have indicated that removing the RRPE element dramatically decreases expression of a reporter gene (Fingerman et al., 2003; Wade et al., 2001). Although a genome-wide analysis of the binding sites for the histone deactylase Rpd3 protein revealed that genes involved in rRNA processing were bound at a significantly higher level than other categories (Kurdistani et al., 2002), to date, it is not known what factors, if any, bind these promoter motifs directly.

The unexpected co-localization of a significant fraction of the RRB genes raises the possibility that some of the genes may be regulated as pairs. Only one-third of the gene pairs have orientations consistent with the possibility of bidirectional promoters, as in the case of other yeast gene pairs (Kraakman et al., 1989; West et al., 1984). It is possible that part of the RRB gene regulation involves local changes in chromatin structure that extend beyond the boundaries of a single gene. Others have observed that adjacent genes as well as close non-adjacent genes are correlated for expression regardless of gene orientation (Cohen et al., 2000).

The observation that elements associated with the RRB regulon can be recognized in C. albicans suggests that the strategy of co-regulating genes that are involved in the rRNA and ribosome biosynthesis pathways may be generally conserved. Given that ribosome production represents such a large fraction of the total cellular economy, and that together the RRB and RP regulons include several hundred genes, it makes sense that cells would coordinate the expression of these genes as groups. Our analysis indicates that, although the RRB and RP regulons exhibit similar responses to changing conditions, their respective expression patterns and gene promoters are distinct. One possibility is that they represent two sets of targets of a signal transduction pathway (i.e. possibly the TOR, the Ras/PKA or the environmental stress response pathway) for regulating ribosome biogenesis (Gasch et al., 2000; Jorgensen et al., 2004; Powers and Walter, 1999). The recognition that the RRPE and PAC promoter elements are important cis-acting sequences provides a starting point from which to identify and characterize the regulatory factors that control RRB gene expression.


We would like to thank Rick Jensen and Michael Rice for assistance with the computational analysis. This grant was supported in part by Grant MCB-9875283 (to M.M.) from the National Science Foundation.