1. Top of page
  2. Abstract
  3. Introduction
  4. Results
  5. Discussion
  6. Experimental procedures
  7. Acknowledgements
  8. References
  9. Supporting Information

DNA sequencing of the 5′-flanking region of the transcriptome effectively identifies transcription initiation sites and also aids in identifying unknown genes. This study describes a comprehensive polling of transcription start sites and an analysis of full-length complementary DNAs derived from the genome of the pathogenic fungus Candida glabrata. A comparison of the sequence reads derived from a cDNA library prepared from cells grown under different culture conditions against the reference genomic sequence of the Candida Genome Database (CGD: revealed the expression of 4316 genes and their acknowledged transcription start sites (TSSs). In addition this analysis also predicted 59 new genes including 22 that showed no homology to the genome of Saccharomyces cerevisiae, a genetically close relative of C. glabrata. Furthermore, comparison of the 5′-untranslated regions (5′-UTRs) and core promoters of C. glabrata to those of S. cerevisiae showed various global similarities and differences among orthologous genes. Thus, the C. glabrata transcriptome can complement the annotation of the genome database and should provide new insights into the organization, regulation, and function of genes of this important human pathogen.


  1. Top of page
  2. Abstract
  3. Introduction
  4. Results
  5. Discussion
  6. Experimental procedures
  7. Acknowledgements
  8. References
  9. Supporting Information

Fungal species of the genus Candida generally live communally on and in the human body, yet Candida infections can become systemic and lethal in up to 60% of immunocompromised patients (Pfaller & Diekema 2007). The pathogenic fungus, Candida glabrata is the second most common causative agent of candidiasis, and systemic infections have been linked to the death of immunocompromised and immunosuppressed patients (Pfaller & Diekema 2007). Heavy use of azole antibiotics to which C. glabrata is not very susceptible, steroid and immunosuppressive agents, as well as use of other antibiotics which collapse microbial flora, have led to the emergence of C. glabrata as an important opportunistic fungus (Fidel et al. 1999; Pfaller et al. 2010a,b).

The genome sequence of C. glabrata was published in 2004; 5464 genes ( were assigned to the 12 318 245 bp genome (Dujon et al. 2004). Since C. glabrata is haploid and is easily manipulated in genetic studies, this yeast provides a model system to study molecular mechanisms of fungal infection as well as aiding in the hunt for candidates for antifungal targets. Therefore, accurately annotated genomes are important tools for elucidating the genomic underpinning of this yeast as well as other organisms. In contrast with the high level quality of the genome sequence, transcriptional data alone is insufficient for these purposes. For example, we found that the transcription initiation site of YKU80 is located downstream from the first initiation codon (Ueno et al. 2007). Even for the well-annotated Saccharomyces cerevisiae genome, recent reports have suggested a significant number of errors (Nagalakshmi et al. 2008; Lipson et al. 2009). Because the structures of the genomes and genes are very different from species to species, it is difficult to make precise, uniform gene predictions, using computational methods such as GenScan or GlimmerM (Burge & Karlin 1997). Therefore, cDNA sequences are extremely useful and provide an additional level of accuracy to the annotation endeavor.

Cap analysis gene expression (CAGE) was introduced in 2003 as a method to determine transcription start sites (TSSs) on a genome-wide scale by isolating and sequencing the fragments originating from the 5′ end of RNA transcripts (Shiraki et al. 2003). Mapping these reads back to the reference genome identifies TSSs from which these transcripts originated. In early CAGE experiments, the throughput of sequencers limited the achievable sequencing depth such that many CAGE tags were found only once in a given experiment. More recently, use of the new generation sequencers (NGSs) with high-throughput (Mardis 2008; Ansorge 2009), resulted in more in depth CAGE sequencing from a single experiment (de Hoon & Hayashizaki 2008). Furthermore, NGSs provide more accurate digital-quantification by counting the sequencing reads obtained from CAGE and/or SAGE experiments (Lim et al. 2012).

To provide an opportunity to examine several aspects of the C. glabrata transcriptome, we performed sequencing using a cDNA library comprised from cells cultured under seven different conditions (Table 1). After mapping the tags against the published C. glabrata genome, we predicted the existence of more than 50 new genes as well as confirmed the expression of 4316 genes listed in the public Candida genome database (CGD). In addition, we deduced their transcription start sites (TSSs), and, analysed the core promoter regions of deduced genes to show similarities and divergence to S. cerevisiae, a closely related species.

Table 1. cDNA libraries were constructed from Candida glabrata cultured under different conditions
No TempMediumSupplements and treatment
1Rich medium37°CYPD
2House keeping37°CYNB-Glu
3Heat shock42°CYNB-Glu
4Nitrogen starvation response37°CYNB-GluWithout ammonium sulfate
5Glucose starvation response37°CYNB
6Chromatin silencing37°C5% SCAll component 5% normal condition, except 2% glucose
7Host factors37°CYNB-Glu20% serum


  1. Top of page
  2. Abstract
  3. Introduction
  4. Results
  5. Discussion
  6. Experimental procedures
  7. Acknowledgements
  8. References
  9. Supporting Information

Data summary of sequencing and mapping of C. glabrata genome

To obtain as complete as possible transcript data representative of the C. glabrata genome, we prepared mRNAs from type culture strain, CBS138 cells, cultured under seven different growth conditions: YPD medium (rich medium), synthetic minimal medium, heat shock, nitrogen starvation, carbon starvation, chromatin silencing and stimuli from host factors detailed in Table 1). As the result, 18 697 829 reads were obtained from the mixed libraries obtained from these seven growth conditions. After mapping of these C. glabrata sequence reads by rmap (, 4 398 598 were successfully mapped to the reference genomic sequences at corresponding positions with perfect matches or 1–2 mismatches (details in experimental procedures). When the CGD-annotated open reading frames (ORFs) were combined with our mapping data, 2 171 259 reads (49% of total mapped reads) were located at 102 670 sites corresponding to 5′-untranslated regions (UTRs) (from −1000 to −1 is the position relative to the ATG start codon), and 1 244 256 reads (28%) were located within the ORFs (Table S1 in Supporting information). From 1 974 935 5′UTR tags, we found that 4316 C. glabrata genes were expressed under at least one of the seven culture conditions in this study. This represents ~80% of annotated genes on CGD.

Approximately 91% of the reads (1 974 935/2 171 259) were located within a range from −200 to −1 bp upstream of the start codon of annotated ORFs, suggesting that TSSs of almost all genes would be mapped within a 200 bp region of the ATG start codon (Fig. 1 and Table S1 in Supporting Information). These results are similar to those obtained for S. cerevisiae wherein most transcription start sites (TSSs) map within a 100-bp region upstream of the ATG start codon, based on the results of both 5′ SAGE tag experiments (Zhang & Dietrich 2005; Miura et al. 2006) and single-molecule sequencing digital gene expression (smsDGE) (Lipson et al. 2009).


Figure 1. Distribution of mapped position of tags on the 5′-flanking region of each gene.

Download figure to PowerPoint

Among 4316 genes, 435 genes were identified as genes which are transcribed from a single site, as the tags for each mapped to a single site. As for the remaining, 3881 genes, the most distal position where a tag mapped was postulated as the TSS, since the tag sites could not correspond to different sites for each culture condition (Table S2 in Supporting Information). These results suggest that transcription of many genes might start at various start sites which would be consistent with a S. cerevisiae study revealing extensive differences in the 5′ and 3′ ends of transcripts prepared from cells cultured under different conditions (Waern & Snyder 2013).

Location of cis-elements on core promoter regions of C. glabrata genes

In S. cerevisiae, it has been reported that TSSs locate to 45–120 bp downstream of a TATA element (Hampsey 1998). Recent studies have revealed that a GA element (GAE), ‘GAAAA’, was also conserved in TATA-less RNA polymerase II promoter regions (Seizl et al. 2011). Similar to studies of Zhang & Dietrich (2005) which surveyed the 5′-UTRs of 2231 S. cerevisiae genes, we surveyed TATA box and GAE (GAAAA) locations in upstream regions (−200 to −1) of deduced TSS of 41 027 C. glabrata genes. We analysed genes wherein the frequency of the TATA box or GAE was greater than 1, scanning upstream sequences independently for ‘TATAWAWR’ (for TATA box) or ‘GAAAA’ (for GAE). For C. glabrata, it was estimated that 502 genes (12%) have only a TATA box, 1231 genes (29%) have only a GAE and 679 genes (16%) have both a TATA box and GAE. On the other hand, it was estimated for S. cerevisiae that 212 genes (10%) have only a TATA box, 1080 genes (48%) have only a GAE and 328 genes (15%) have both. In addition, for both yeasts the number of orthologous genes that have a TATA box, GAE, or both was 24, 229, and 63 respectively (Table 2) suggesting divergent orthologous gene regulation in these two yeast species.

Table 2. Number of genes that harbor TATA box and/or GA element (GAE) in their core promoters
 TATA (%)GAE (%)TATA + GAE (%)Others (%)
  1. a

    Analyzed Saccharomyces cerevisiae gene sequences (2231 genes) were obtained from data published by Zhang & Dietrich (2005).

Candida glabrata 502 (11.6)1231 (28.5)679 (15.7)1904 (44.1)
Number of orthologues conserved in S. cerevisiae2422963941
S. cerevisiae a 212 (9.5)1080 (48.4)328 (4.7)611 (7.3)

Identification of a TSS consensus sequence

In S. cerevisiae, Hampsey (1998) reported that the sequence recognition pattern in TSS was PyA(A/T)Pu, where A is the TSS. It was also reported that TSS were also found in A(A rich)5PyA(A/T)NN(A rich)6 regions, where A is TSS (Zhang & Dietrich 2005). From the TSS information obtained in this study, we selected 102 670 tags that unambiguously map in the −500 to 0 bp region upstream of initiation codon. The sequence of ±10 bp flanking each TSS (total 20 bases) was extracted from the C. glabrata genomic sequence database ( at 09 December 2012) and analysed with WebLOGO ( (Fig. 2) (Crooks et al. 2004). As a result, (A/T)(A/T/G)(A/T)3N(A/T)N(G/T/A)(A/G)(A/T)AN(A/T)2 N32(A/T)3, where A/G is the TSS, implies that C. glabrata might prefer a ‘purine base’ as a TSS.


Figure 2. Sequence motif of TSS in Candida glabrata. To predict the motif of transcription start sites (TSSs), 20 bases of sequence flanking each TSS identified in this study was mined from genome sequence and processed by WebLOGO.

Download figure to PowerPoint

Comparison of the length of 5′-UTRs in genes categorized by gene ontology (GO)

We next compared length distributions of UTRs for different functional and localization categories (GO annotations). The genes involved in catalytic activity (557 bp) and enzyme regulatory activity (700 bp) had much longer 5′ UTRs than average (314 bp). However, genes involved in DNA metabolic processes (59 bp), ribosomes (42 bp), and nucleoplasm (0 bp) had shorter 5′ UTRs (Table 3). Therefore, genes with longer UTRs appear to fall into categories that require regulation, whereas genes with shorter UTRs appear to fall into categories with a reduced requirement for posttranscriptional regulation, such as housekeeping genes, suggesting that this type of analysis could reveal significant correlations regarding gene functions. When the same analysis was performed against the S. cerevisiae genome, significant differences of 5′-UTR lengths between two yeasts were observed in several categories (Table 3; asterisks denote differences between S. cerevisiae and C. glabrata).

Table 3. Average length of 5′-UTR of genes classified according to function and localization in Candida glabrata and Saccharomyces cerevisiae
  C. glabrata S. cerevisiae Significant differenceb
  1. a

    Probability of significant difference of P < 0.05.

  2. Predicted C. glabrata genes were categorized by CGD ( The information for TSSs of S. cerevisiae was also obtained from the web site (

  3. b

    The average distance between TSS and the first ATG of each gene ontology (GO) was calculated and statistical analysis (Wilcoxon test) was performed to confirm whether there were significant differences between C. glabrata and S. cerevisiae.

Transferase activity−805.0−73.4 *
Actin binding−23.8−29.2 
Lipid metabolic process−29.2−73.5 
Phosphoprotein phosphatase activity−51.2−136.9 *
Nuclear envelope−48.0−80.2 
Golgi apparatus−31.2−103.0 *
Mitochondrion organization−99.5−205.5 *
Nuclease activity−46.1−42.4 
Translation−258.4−62.7 *
Cell communication−700.1−46.0 *
Signal transduction−24.3−97.7 
Electron carrier activity−120.2−56.3 
RNA binding−8.3−64.5 *
Structural molecule activity−31.2−81.9 *
Nucleotide binding−29.2−89.5 *
Peptidase activity−46.8−45.8 
Cytoskeletal protein binding−40.0−54.0 
Biological process−4.0−88.2 *
Carbohydrate metabolic process−805.0−83.6 *
Transcription factor activity−154.7−105.4 
Protein binding−48.0−51.2 
DNA binding−9.6−80.1 *
Hydrolase activity−134.6−79.5 *
Lipid binding−29.1−97.6 *
Metabolic process−12.5−100.6 *
Receptor activity−4.0−115.4 
Endoplasmic reticulum−628.2−61.8 *
Cytosol−126.2−123.0 *
Endosome−49.7−139.0 *
Nucleus−154.7−70.1 *
Vacuole−664.0−75.6 *
Protein metabolic process−126.9−52.0 
Response to stress−37.3−64.8 *
Cytoplasm−258.4−70.5 *
Calcium ion binding−404.3−45.3 *
Peroxisome−28.6−132.4 *
Mitochondrion−670.8−73.8 *
Biosynthetic process−47.6−98.4 *
Cytoskeleton organization−93.0−74.0 *
Protein modification process−161.8−50.0 
Nucleobase, nucleoside, nucleotide and nucleic acid metabolic process−135.0−83.2 *
Enzyme regulator activity−700.1−30.5 *
Cellular component−150.0−106.3 
DNA metabolic process−59.0−108.0 
Kinase activity−47.6−69.3 
Catalytic activity−556.9−100.6 *
Plasma membrane−805.0−118.8 *
Protein transport−157.6−88.0 
Signal transducer activity−201.6−177.7 

Prediction of uORF

5′-UTRs sometimes harbor upstream small open reading frames (uORFs) that can attenuate translation of the main ORFs by interfering with translational reinitiation at the main start codon in prokaryotes, and eukaryotes, including humans (Vattem & Wek 2004). In S. cerevisiae, 252 genes have been found to harbor small conserved uORFs (Cvijović et al. 2007) but none of the orthologous C. glabrata genes did so. We therefore explored putative transcripts containing uORFs in C. glabrata using the following criteria: a length between 4 and 6 codons, a distance from the start codon of the main ORF between 50 and 150 nucleotides, and no overlap with and clear separation from neighboring uORFs. As shown in Table 4, 72 genes, were predicted to have uORFs upstream of the main ORF although further analysis is needed to confirm these findings.

Table 4. Predicted genes harboring upstream small open reading frames (uORF)
Gene IDGene nameSc orthologueDescription
CAGL0A01366gEPA9 Putative adhesin
CAGL0A01716g PNC1Ortholog(s) have nicotinamidase activity, role in chromatin silencing at rDNA, chromatin silencing at telomere, replicative cell aging and cytosol, nucleus, peroxisome localization
CAGL0A02948g FRQ1Ortholog(s) have calcium ion binding, enzyme activator activity, role in regulation of conjugation with cellular fusion, regulation of signal transduction and Golgi membrane, cytosol, nucleus, plasma membrane localization
CAGL0B00726g GLK1Ortholog(s) have glucokinase activity, role in glucose import, glycolysis, mannose metabolic process and cytosol, plasma membrane enriched fraction, soluble fraction localization
CAGL0B01331g DCP1Ortholog(s) have enzyme activator activity, m7G(5′)pppN diphosphatase activity, mRNA binding activity, role in deadenylation-dependent decapping of nuclear-transcribed mRNA and cytoplasmic mRNA processing body localization
CAGL0B03245g SSH1Ortholog(s) have signal sequence binding activity, role in SRP-dependent cotranslational protein targeting to membrane and Ssh1 translocon complex, plasma membrane localization
CAGL0B03311g SAF1Ortholog(s) have ubiquitin-protein ligase activity, role in SCF-dependent proteasomal ubiquitin-dependent protein catabolic process and SCF ubiquitin ligase complex localization
CAGL0C04609g YUH1Ortholog(s) have ubiquitin-specific protease activity, role in protein deubiquitination and cytosol, nucleus localization
CAGL0D01298gTKL1TKL1Putative transketolase
CAGL0D02354g MSL5Ortholog(s) have pre-mRNA branch point binding activity, role in nuclear mRNA splicing, via spliceosome and commitment complex localization
CAGL0D02860g IRC19Ortholog(s) have role in ascospore formation, mitotic recombination
CAGL0D03828g MED6Ortholog(s) have RNA polymerase II transcription coactivator activity involved in preinitiation complex assembly activity
CAGL0D04884g RRP9Ortholog(s) have snoRNA binding activity, role in cellular response to drug and 90S preribosome, box C/D snoRNP complex, small-subunit processome localization
CAGL0D05214g RPL29Ortholog(s) have structural constituent of ribosome activity, role in cytoplasmic translation and cytosolic large ribosomal subunit, nucleolus localization
CAGL0E01419gYPS2MKC7Putative aspartic protease; predicted GPI-anchor; member of a YPS gene cluster that is required for virulence in mice; induced in response to low pH and high temperature
CAGL0E02761g LDB7Ortholog(s) have DNA translocase activity and role in ATP-dependent chromatin remodeling, cell wall mannoprotein biosynthetic process, nucleosome disassembly, transcription elongation from RNA polymerase II promoter
CAGL0E06402g  Uncharacterized
CAGL0E06424g MCR1Ortholog(s) have cytochrome-b5 reductase activity and role in cellular response to oxidative stress, ergosterol biosynthetic process
CAGL0F01529g MEF1Ortholog(s) have role in mitochondrial translation and mitochondrion localization
CAGL0F02695g MRP20Ortholog(s) have structural constituent of ribosome activity, role in mitochondrial translation, and mitochondrial large ribosomal subunit localization
CAGL0F04609g EDE1Ortholog(s) have ubiquitin binding activity, role in endocytosis, endoplasmic reticulum unfolded protein response and actin cortical patch, cellular bud neck, cellular bud tip, mating projection tip localization
CAGL0F05049g YLR283WOrtholog(s) have mitochondrion localization
CAGL0F05687g YDR186COrtholog(s) have cytoplasm, ribosome localization
CAGL0F05775g SAS4Ortholog(s) have histone acetyltransferase activity, role in chromatin silencing at telomere, and SAS acetyltransferase complex, nuclear chromatin localization
CAGL0F08239g COQ6Ortholog(s) have role in ubiquinone biosynthetic process and mitochondrial inner membrane localization
CAGL0G01056gGAS3GAS2Putative glycoside hydrolase of the Gas/Phr family; predicted GPI-anchor
CAGL0G06798g YJR005C-AOrtholog of S. cerevisiae: YJR005C-A
CAGL0G09361g PDX1Ortholog(s) have structural molecule activity, role in acetyl-CoA biosynthetic process from pyruvate, filamentous growth, single-species biofilm formation on inanimate substrate and mitochondrial pyruvate dehydrogenase complex localization
CAGL0H01683g URC2Ortholog(s) have sequence-specific DNA binding activity and cytoplasm, nucleus localization
CAGL0H02475g PET111Ortholog(s) have translation regulator activity, role in mitochondrial respiratory chain complex IV biogenesis, positive regulation of mitochondrial translation and mitochondrial inner membrane localization
CAGL0H02717g PUS5Ortholog(s) have pseudouridylate synthase activity, role in pseudouridine synthesis, rRNA modification, and mitochondrion localization
CAGL0H04103g UFO1Ortholog(s) have role in SCF-dependent proteasomal ubiquitin-dependent protein catabolic process, cellular response to methylmercury, response to DNA damage stimulus and SCF ubiquitin ligase complex, cytoplasm, nucleus localization
CAGL0H04147g  Uncharacterized
CAGL0H05731g SEC62Ortholog(s) have role in posttranslational protein targeting to membrane, translocation and Sec62/Sec63 complex, plasma membrane localization
CAGL0H06963g YML096WOrtholog(s) have cytosol, nucleus localization
CAGL0H07645g ZIP2Ortholog(s) have role in reciprocal meiotic recombination, synaptonemal complex assembly and synaptonemal complex localization
CAGL0H09636g YER010COrtholog of S. cerevisiae: YER010C, C. albicans SC5314: orf19.4894, C. parapsilosis CDC317: CPAR2_805070, Candida tenuis NRRL Y-1498: CANTEDRAFT_136813 and Debaryomyces hansenii CBS767: DEHA2B12342g
CAGL0I01386g UTR1Ortholog(s) have NAD+ kinase activity, NADH kinase activity, role in NADP biosynthetic process, cellular iron ion homeostasis and cytosol, nucleus localization
CAGL0I02398g NMD3Ortholog(s) have ribosomal large subunit binding activity, role in ribosomal large subunit export from nucleus and cytosol, cytosolic large ribosomal subunit, nucleus localization
CAGL0I05412g NUP120Ortholog(s) have structural constituent of nuclear pore activity
CAGL0I07969g ATP19Ortholog(s) have hydrogen ion transporting ATP synthase activity, rotational mechanism activity and role in ATP synthesis coupled proton transport, mitochondrial proton-transporting ATP synthase complex assembly
CAGL0I10472g PHB1Ortholog(s) have role in inner mitochondrial membrane organization, mitochondrion inheritance, mitochondrion morphogenesis, negative regulation of proteolysis, protein folding, replicative cell aging
CAGL0J01353g YTA12Ortholog(s) have ATP binding, ATPase activity, metallopeptidase activity, role in protein complex assembly, proteolysis, signal peptide processing and m-AAA complex, mitochondrial inner boundary membrane localization
CAGL0J05940g YCK2Ortholog(s) have protein serine/threonine kinase activity
CAGL0J07040g GDE1Ortholog(s) have glycerophosphocholine phosphodiesterase activity, role in cellular response to drug, glycerophospholipid catabolic process and cytosol, ribosome localization
CAGL0J09064g ARF1Ortholog(s) have GTPase activity, role in ER to Golgi vesicle-mediated transport, Golgi to plasma membrane transport, macroautophagy, and Golgi apparatus localization
CAGL0K00671g RPS14BOrtholog(s) have SSU rRNA binding activity, role in maturation of SSU-rRNA from tricistronic rRNA transcript (SSU-rRNA, 5.8S rRNA, LSU-rRNA), ribosomal small subunit assembly and small-subunit processome localization
CAGL0K02475gDSS4DSS4GDP/GTP exchange factor for Sec4p; protein abundance decreased in ace2 mutant cells
CAGL0K02783g CAP1Ortholog(s) have actin filament binding activity and role in actin cytoskeleton organization, barbed-end actin filament capping, cell cycle cytokinesis, cellular protein localization, regulation of cell shape
CAGL0K03927g YMR134WOrtholog(s) have role in cellular iron ion homeostasis, mitochondrion organization and endoplasmic reticulum, nuclear envelope localization
CAGL0K06369g TFB3Ortholog(s) have role in nucleotide-excision repair, phosphorylation of RNA polymerase II C-terminal domain, regulation of cyclin-dependent protein kinase activity, transcription from RNA polymerase II promoter
CAGL0K06523g SDC1Ortholog(s) have histone methyltransferase activity (H3-K4 specific) activity, role in chromatin silencing at telomere, histone H3-K4 methylation and Set1C/COMPASS complex localization
CAGL0K07315g YMR244C-AOrtholog(s) have cytoplasm, nucleus localization
CAGL0K08492g YKL100CHas domain(s) with predicted aspartic-type endopeptidase activity and integral to membrane localization
CAGL0K10428g IGD1Ortholog(s) have enzyme inhibitor activity, role in negative regulation of glycogen catabolic process and cytoplasm localization
CAGL0K12056g RSM10Ortholog(s) have structural constituent of ribosome activity and mitochondrial small ribosomal subunit localization
CAGL0L00759gHIS1HIS1ATP phosphoribosyltransferase; protein abundance increased in ace2 mutant cells
CAGL0L04422g DDP1Ortholog(s) have bis(5′-adenosyl)-hexaphosphatase activity, bis(5′-adenosyl)-pentaphosphatase activity, diphosphoinositol-polyphosphate diphosphatase activity, endopolyphosphatase activity
CAGL0L04840g RPS23AOrtholog(s) have structural constituent of ribosome activity and role in maturation of SSU-rRNA from tricistronic rRNA transcript (SSU-rRNA, 5.8S rRNA, LSU-rRNA), regulation of translational fidelity
CAGL0L06094gSTR3STR3Putative cystathionine beta-lyase; gene is upregulated in azole-resistant strain
CAGL0L06600gNAS6NAS6Regulatory, nonATPase subunit of the 26S proteasome
CAGL0L06754g SUP45Ortholog(s) have translation release factor activity, codon specific activity, role in cytokinesis, translational termination and cytosol, nucleus, translation release factor complex localization
CAGL0L10472g VPS5Ortholog(s) have phosphatidylinositol-3-phosphate binding, protein transporter activity and role in ascospore formation, intracellular protein transport, protein retention in Golgi apparatus, retrograde transport, endosome to Golgi
CAGL0L12276g XDJ1Ortholog(s) have integral to mitochondrial outer membrane, nucleus localization
CAGL0M00902g DIF1Ortholog(s) have cytoplasm localization
CAGL0M01782g SEC18Ortholog(s) have ATPase activity
CAGL0M02497g RPL33AOrtholog(s) have structural constituent of ribosome activity, role in cytoplasmic translation, and cytosolic large ribosomal subunit localization
CAGL0M05819g  Uncharacterized
CAGL0M07249g MFM1Ortholog(s) have magnesium ion transmembrane transporter activity, role in mitochondrial magnesium ion transport and mitochondrial inner membrane localization
CAGL0M07359g LEE1Has domain(s) with predicted nucleic acid binding, zinc ion binding activity
CAGL0M08074g YJR030COrtholog of S. cerevisiae: YJR030C
CAGL0M09779gCTS1CTS1Putative endochitinase with a predicted role in cell separation

Prediction of new genes

To expedite the finding of new C. glabrata genes, we explored C. glabrata gene sequences that do not share homology to S. cerevisiae. We selected mapped reads using two criteria: the read was not from a TSS of an already annotated ORF of C. glabrata and at least two reads could be found at the 5′-UTR of the putative ORF. Thus, 200 563 reads were selected and analysed as to whether the downstream sequences contained a putative ORF. The length of the analysed downstream sequence was within 1000 bp, since our study revealed that the first ATG would be located within 1000 bp from the TSSs for most of the genes annotated in the database. The predicted amino acids sequences, (more than 24 amino acids in length downstream of 3853 TSSs) were analysed by Blast to S. cerevisiae (Genome sequence 2009/05/08). The predicted amino acids sequences, that did not match to S. cerevisiae, were analysed by Blast to nr (2013/07/11). By this analysis, 59 new genes were predicted, and could be divided into two groups. One group consists of 38 genes that harbor homologous motifs to S. cerevisiae (Table 5 and ORF1–9 in Table 6), and another group 21 genes (ORF10-0RF30 in Table 6) that have no homology to S. cerevisiae. Within the first group, of 38 genes, 29 genes showed homology to the S. cerevisiae reference strain S288C (Table 5), four genes (ORF1-0RF4) shared homology to S. cerevisiae genes which have not been annotated in Saccharomyces Genome database (SGD), and five genes (ORF5-ORF9) shared homology to S. cerevisiae strains other than S288C, Fosters O and AWRI 1631.

Table 5. Candida glabrata genes showing homology to annotated S. cerevisiae genes (SGD)
ChromosomeStrandTSS positionTranslational postionAmino acidsS. cerevisiae homologue namee-ValueSynteny
StartStopLengthPredicted sequence
  1. * indicates a stop codon

Table 6. List of the predicted new genes
 ChromosomeStrandTss positionTranslational positionAmino acidsHomologue gene
StartStopLengthSequenceIDDescriptionAmino acids length
  1. * indicates a stop codon

Homology with S. cerevisiae S288C genes that do not listed on SGD database
Homology with S. cerevisiae AWRI1631
Homology with S. cerevisiae Fosters O
No homology with S. cerevisae
ORF16G+68980268986369007068MVAAPGVQLHGKTVPFLSVHWKPSTNLKASSTFLPTGKSLMVMCLTIPLGSMMNRPLKAIPSSSMRTP*XP_001710117.1Hypothetical protein GL50803_31706 [Giardia lamblia ATCC 50803]101
ORF29L+14467481447254144750081MVRLVFRPYTQIRRSICTSEPLRASTRVSSGFTLFRHSSPSFGSQQLCSYSNPSEDIRIGRLCTPQVEGPNLRSLSLRVRV*XP_003366551.1Conserved hypothetical protein [Trichinella spiralis]31
ORF30M12029011202222120200671MKEAKIMWTPFSTPNLKSALSFSDKAGKSTSVPGKLTPLWEQILPAFKDLTFKVLSSTTCKTSKDKTPSST*XP_714998.1Hypothetical protein CaO19.11060 [Candida albicans SC5314]105


  1. Top of page
  2. Abstract
  3. Introduction
  4. Results
  5. Discussion
  6. Experimental procedures
  7. Acknowledgements
  8. References
  9. Supporting Information

In S. cerevisiae, several studies relating to the analysis of 5′-UTRs reveal consensus sequences within core promoter regions (Zhang & Dietrich 2005; Nagalakshmi et al. 2008; Lipson et al. 2009; Tsankov et al. 2010; Seizl et al. 2011; Waern & Snyder 2013). Relative to these studies, our data suggest significant differences between C. glabrata and S. cerevisiae core sequences (Tables 2-4, and S2 in Supporting Information). The average length of 5′-UTRs of C. glabrata is longer than that of S. cerevisiae (314 bp vs. 75 bp) when all genes from both yeasts are compared (Table 3). This supports a previous study comparing the transcriptome of YPD growing cultures which showed that C. glabrata transcribes longer 5′-UTRs than S. cerevisiae (Tsui et al. 2011). This evidence may also support a recent study related to nucleosome occupancy of chromosomes, which demonstrated that nucleosome free regions (NFR) at proximal promoter regions of C. glabrata are wider than that of S. cerevisiae (Tsankov et al. 2010). Differences in the 5′-UTR lengths of genes especially involved in plasma membrane, calcium ion binding, and, endoplasmic reticulum were significantly divergent.

In this study, we predicted that 72 genes might harbor uORFs (Table 4). However, S. cerevisiae genes orthologous to these have not been reported to harbor uORFs and therefore further studies will be needed to confirm our results. Genes containing uORFs such as GCN4 (Mueller & Hinnebusch 1986), CPA1 (Gaba et al. 2001), and YAP1/YAP2 (Vilela et al. 1998) are well characterized in S. cerevisiae. This study, however, showed that the 5′UTR of their orthologues found in the C. glabrata genome (GCN4: GAGL0L02475g, CPA1: CAGL0I09592g, YAP1/YAP2: CAGL0H04631g) habor no uORFs. In addition, this study also showed that the 5′-UTR of the PCL5 orthologue (CAGL0J10846g) gene has no uORFs since the 5′-UTR length deduced in this study was significantly shorter than Zhang & Dietrich's (2005) study which predicted three uORFs in the 5′-flanking region of the CAGL0J10846g. Thus, our results imply regulatory divergence in post transcriptional regulation between these two yeasts.

We suggest that approximately 70 genes should be added to the C. glabrata genome database. We also identified 4316 transcripts and their TSSs. Among them, 20 genes share no regions with homologies to S. cerevisiae genes. Ten of these 20 genes show homology with Candida albicans and two non-albicans pathogenic Candida species, five genes (ORF10, 12, 21, 28, and 30) show homology to C. albicans, three genes (ORF11, 15, and 26) show homology with Clavispora (Candida) lusitaniae, and two (ORF14 1nd 25) genes show homology with Meyerozyma (Candida) guilliermondii). The gene ORF19 shows homology to a gene of Phaeosphaeria nodorum, a major fungal pathogen of wheat. Furthermore, we also predict that two ORFs (ORFs16 and 29) shows homology to genes belonging to the parasites, Giardia lamblia and Trichinella spiralis and two genes (ORFs17 and 24) shows homology to the bacterial pathogens, Cronobacter sakazakii and Prevotella sp. Therefore, it is likely that some of these newly discovered C. glabrata genes are involved in host pathogenicity. On the other hand, four genes (ORF13, 22, 23, and 27) show homology with genes found in higher eukaryotes, Zea mays, Arabidopsis, Medicago truncatula, and Drosophila, respectively. These results might imply horizontal gene transfer from a host since C. glabrata and these organisms can live in similar environments.

Genome databases provide comprehensive and integrated biological information and maintain up-to date genome annotations. Continuous updates of the database are needed since experimental and computational analyses are continuously accumulating. In this study, we omitted the finding of antisense transcripts and genes containing introns because short nucleotide sequences obtained from Solexa sequencing hampered their accurate prediction. Such studies can be realized by the longer-read sequencing technologies such as RNA direct sequencing. Our study described not only new genes containing homologies to those found in pathogenic fungi but also 5′-UTR regions comprising transcriptional and post-transcriptional regulatory sequences possibly required for environmental and host adaption. Therefore, it will provide tenable information to improve C. glabrata genome annotation, and may be useful for further analyses related to pathogenicity.

Experimental procedures

  1. Top of page
  2. Abstract
  3. Introduction
  4. Results
  5. Discussion
  6. Experimental procedures
  7. Acknowledgements
  8. References
  9. Supporting Information

Strain and media

The C. glabrata strain CBS138 (ATCC 2001) was grown at 37 °C in YPD [1% (w/v) yeast extract, 2% (w/v) Bacto peptone, 2% (w/v) dextrose], SD [0.67% (w/v) yeast nitrogen base without amino acids, 2% (w/v) dextrose] media, and SD with or without indicated supplement shown in Table 1.

Construction of the 5′-end cDNA library

CBS138 was cultivated overnight in YPD medium (10 mL) or SD medium (40 mL) at 37 °C, and then diluted to OD600 = 0.4 in fresh YPD medium (40 mL) or SD medium (200 mL) respectively. Cells cultured in YPD were harvested after 4 h cultivation and after reaching log phase (approximately OD600 = 1.5), were transferred to collection tubes (30 mL per each tubes), harvested and washed in SD medium. Cells were subsequently resuspended in each medium indicated in Table 1 and each tube was further cultured for 1 h before harvesting.

Total RNA samples were obtained as follows. Harvested cells were resuspended in 5 mL of the RNA isolation reagent RNA-bee (Tel-Test, Inc., USA) and disrupted by Multi-Beads Shocker (Yasui Kikai, Inc., Japan), according to the manufacturer's instructions. For the construction of 5′-end-cDNA libraries, the 5′-end-cDNAs were prepared by the IniTIA method (Hashimoto et al. 2004) from obtained total RNA. The cDNA libraries obtained under seven growth conditions were mixed and sequenced by a Solexa sequencer (Bentley 2006). Using Illumina 1G analyzer, the Sequencing analysis was performed by the Post Genome Research Center (Japan).

Mapping of sequenced reads to C. glabrata genome

We initially selected reads including ‘TCGTATGC’ because all the reads (36 bp) were theoretically followed by the ligation adaptor sequence (5′-TCGTATGCCTTCTTCTGCTTGTT-3′) at their 3′ end. To obtain more reliable data, we used 25 bp from 5′ end of the selected reads using rmap (Smith et al. 2008) ( software, the selected reads were mapped (permitting up to 2 bp mismatch) onto the C. glabrata reference genome obtained from the Candida genome database (, 9 December 2012). The complete dataset can be found at the NCBI Gene Expression Omnibus (GEO, with accession numbers 2592882 and 2592883.

Comparison of 5′-UTRs between C. glabrata and S. cerevisiae

We categorized the genes predicted in this study by using data disclosed from web sites ( All genes were categorized on the basis of the data disclosed on web site ( The information for TSSs of S. cerevisiae was also obtained from the web site, (Zhang & Dietrich 2005). The average distance between TSS and 1st ATG of each GO Gene was calculated and statistical analysis (Wil–Coxon test) was performed to confirm whether there were significant differences between C. glabrata and S. cerevisiae.


  1. Top of page
  2. Abstract
  3. Introduction
  4. Results
  5. Discussion
  6. Experimental procedures
  7. Acknowledgements
  8. References
  9. Supporting Information

This work was supported by a Grant-in-Aid for Scientific Research from the Ministry of Education, Culture, Sports, Science and Technology of Japan (No. 24590540 to H. N., and No. 23590502 to H.C).


  1. Top of page
  2. Abstract
  3. Introduction
  4. Results
  5. Discussion
  6. Experimental procedures
  7. Acknowledgements
  8. References
  9. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  3. Introduction
  4. Results
  5. Discussion
  6. Experimental procedures
  7. Acknowledgements
  8. References
  9. Supporting Information
gtc12147-sup-0001-TableS1.pdfapplication/PDF10KTable S1 The number of reads in each mapping process
gtc12147-sup-0002-TableS2.pdfapplication/PDF1267KTable S2 Genes expressed in this study

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.