Expression Profiling of Cassava Storage Roots Reveals an Active Process of Glycolysis/Gluconeogenesis


  • Jun Yang,

    1. Shanghai Center for Cassava Biotechnology, National Laboratory of Plant Molecular Genetics, Institute of Plant Physiology & Ecology, Shanghai Institutes for Biological Sciences, the Chinese Academy of Sciences, Shanghai 200032, China
    Search for more papers by this author
  • Dong An,

    1. Key Laboratory of Synthetic Biology, Institute of Plant Physiology & Ecology, Shanghai Institutes for Biological Sciences, the Chinese Academy of Sciences, Shanghai 200032, China
    Search for more papers by this author
  • Peng Zhang

    Corresponding author
    1. Shanghai Center for Cassava Biotechnology, National Laboratory of Plant Molecular Genetics, Institute of Plant Physiology & Ecology, Shanghai Institutes for Biological Sciences, the Chinese Academy of Sciences, Shanghai 200032, China
    2. Key Laboratory of Synthetic Biology, Institute of Plant Physiology & Ecology, Shanghai Institutes for Biological Sciences, the Chinese Academy of Sciences, Shanghai 200032, China
    3. Shanghai Chenshan Plant Science Research Center, the Chinese Academy of Sciences, Chenshan Botanical Garden, Shanghai, 201602, China
    Search for more papers by this author

Corresponding author
Tel: +86 21 5492 4096; Fax: +86 21 5492 4318; E-mail:


Mechanisms related to the development of cassava storage roots and starch accumulation remain largely unknown. To evaluate genome-wide expression patterns during tuberization, a 60 mer oligonucleotide microarray representing 20 840 cassava genes was designed to identify differentially expressed transcripts in fibrous roots, developing storage roots and mature storage roots. Using a random variance model and the traditional twofold change method for statistical analysis, 912 and 3 386 upregulated and downregulated genes related to the three developmental phases were identified. Among 25 significantly changed pathways identified, glycolysis/gluconeogenesis was the most evident one. Rate-limiting enzymes were identified from each individual pathway, for example, enolase, L-lactate dehydrogenase and aldehyde dehydrogenase for glycolysis/gluconeogenesis, and ADP-glucose pyrophosphorylase, starch branching enzyme and glucan phosphorylase for sucrose and starch metabolism. This study revealed that dynamic changes in at least 16% of the total transcripts, including transcription factors, oxidoreductases/transferases/hydrolases, hormone-related genes, and effectors of homeostasis. The reliability of these differentially expressed genes was verified by quantitative real-time reverse transcription-polymerase chain reaction. These studies should facilitate our understanding of the storage root formation and cassava improvement.


Cassava (Manihot esculenta Crantz) provides the major source of dietary carbohydrates for almost 750 million people throughout the tropics (Cock 1982; Nassar and Ortiz 2010). The majority of the cassava that is produced is used for human foods, livestock feeds and starches in small-scale industries (Lopez et al. 2005). The storage root is a key organ for the direct production of cassava. In many circumstances, its yield reflects the productivity of the entire plant (Alves and Cameira 2002). The physiological significance of the storage roots is belied by their relative structural simplicity compared with other plant organs: roots largely lack some of the major metabolic pathways, such as photosynthesis, and they have a stereotypical morphology that is conserved throughout the stages of cassava root development and throughout the life cycle of individual plants. This combination of physiological relevance and structural simplicity has made storage roots excellent targets for functional genomic analyses (Jiang and Deyholos 2006). It is possible to reveal the dynamic mechanisms of cassava root development using high-throughput expression profiling technologies, such as microarray and large-scale sequencing. Despite its economic importance, especially in developing countries, this orphan crop has received little attention in the scientific community compared to other major crops, such as rice or maize (Aerni and Bernauer 2006).

Numerous tools are currently available for performing functional genomic analyses on cassava (genetic map, bacterial artificial chromosome (BAC) library, expressed sequence tags (EST) library). A transformation system has also been developed (Taylor et al. 2004), and a draft sequence of the cassava genome was recently released ( Global gene expression profiling provides another useful tool to improve our ability to study biological processes in cassava. In plants, cDNA microarrays have been used to study responses to various stresses (Thimm et al. 2001; Seki et al. 2002; Oono et al. 2003; Rabbani et al. 2003), identify genes related to metabolic pathways (Guterman et al. 2002) and analyze gene expression during adventitious root development (Brinker et al. 2004). Oligonucleotide microarrays have been used to investigate lateral root development after nitrate stimulation (Liu et al. 2008). In cassava, differentially expressed genes involved in the incompatible interaction between cassava and Xanthomonas axonopodis pv. manihotis (Lopez et al. 2005), as well as the process of post-harvest physiological deterioration, have been studied using cDNA microarrays (Reilly et al. 2007). Although both cDNA and oligonucleotide microarrays can be used to analyze gene expression patterns, fundamental differences exist between these methods; for example, the oligonucleotide microarrays apparently have high precision, whereas the cDNA arrays show poor concordance (Woo et al. 2004). Short and long oligonucleotide arrays have several advantages over cDNA arrays in terms of specificity, sensitivity and reproducibility. Long oligonucleotides can provide increased signal intensity compared to short ones (Relogio et al. 2002; Shippy et al. 2004).

To increase our understanding of storage root development, a transcriptome analysis of gene expression during the process is needed. Recently, Sojikul et al. (2010) reported the characterization of differentially expressed genes in fibrous and storage roots using cDNA-amplified fragment length polymorphism (AFLP). In this study, we investigated gene expression changes in cassava root at different developmental stages, e.g., fibrous roots, developing storage root and mature storage root, using a cassava 60 mer oligonucleotide microarray. Our study provides new insights into the molecular nature of cassava tuberous root development.


Development of a custom cassava microarray

For the RIKEN cDNA library (Sakurai et al. 2007), 35 400 sequences were assembled into 13 063 unique consensus sequences (unigenes) that contain 6 998 tentative contigs from 29 138 sequences and 6 065 singletons. Only 197 sequences have been screened out in the sequence analysis (phrap with minmatch 17 and minscore 40). Sequence comparison with assemblies and singletons from The Institute for Genomic Research (TIGR) indicated that 7 629 sequences had high similarity with a subset of 13 063 unigenes (E value < 1e-20), and 7 774 TIGR sequences showed no significant similarity with the unigenes. A total of 20 837 sequences composed of the 13 063 unigenes and the 7 774 TIGR sequences were generated for further analyses. The largest EST dataset, 71 520 ESTs from National Centre for Biotechnology Information (NCBI) including both TIGR and RIKEN results, was reduced to 11 536 contigs and 8 222 singletons with the same program and parameters mentioned above (556 sequences have been screened out). Only 37 sequences showing low similarity (E value≥1e-20) were found after sequence comparison with the 20 837 sequences that were generated by the previous step. This finding indicates that the combination of the TIGR and RIKEN libraries provides nearly equivalent information to the NCBI public database. Therefore, a dataset of 20 874 unigenes was used to design the oligonucleotide probes for the microarray using Agilent (Palo Alto, CA, USA) technology. Finally, a 4×44K custom oligonucleotide array was designed for the present study containing in situ-synthesized 60 mer oligonucleotides representing 20 840 unique cassava genes; there was one replicate for each probe located at another position on the slide (Agilent Technologies). A total of 20 840×2 in situ synthesized 60 mer oligonucleotide probes plus 1 264 positive control features and 153 negative control features were designed on each microarray. Four housekeeping genes (Table 1) were used as internal control genes.

Table 1. Internal control genes used on cassava microarray
Probe nameSequence name*Description
  1. *Contig represents for riken_Mes_flcDNA.fa.Contig.

CUST_18705Contig350Manihot esculenta actin
CUST_7767Contig3493Manihot esculenta cytochrome
  P450 protein CYP71E (c15)
  mRNA,complete cds
CUST_10481Contig6723AT5G12250.1 TUB6
CUST_9480Contig6971AT1G07930.1 EF-1-alpha

The signal-to-noise ratio (SNR) for each spot was calculated as described by Leiske et al. (2006). The average percentages of acceptable spots (SNR > 2.6) and high-quality spots (SNR > 10) were 94.16 ± 1.23% and 73.6 ± 2.51% in all 11 arrays. To evaluate the microarray quality, principal component analysis (PCA) of all 11 cassava arrays was conducted using the GeneSpring GX Software (Agilent Technologies), as shown in Supporting Figure S1. Technical replicates (TR-1 and TR-2) were assigned together and indicated that the entire experiment from RNA extraction to data extraction was reliable and reproducible. Obvious separation was detected among the three different types of materials analyzed, fibrous root (FR), mature storage root (MR) and technical repeat (TR). FR and MR showed a distinct boundary; while developing storage root (DR) was expected as an intermediate phase between FR and MR (Figure 1) from a developmental point of view. Gene expression profiling may be more informative than morphological observation for the identification of different root development stages. Correlation and coefficient (Supporting Figure S2) as well as hierarchical cluster (Supporting Figure S3) analyses also revealed that among the three biological replicates of developing storage root, DR-1 was similar to the FR group, while DR-2 and DR-3 were related to the MR group. Expression patterns of four house-keeping genes further confirmed the reliable quality of the microarray (Supporting Figure S4).

Figure 1.

Different stages of tuberous root development.
DR, developing storage root in cross-section; FR, fibrous roots; MR, mature storage root in cross-section.

When the total 20 840 sequences were compared with 27 217 arabidopsis proteins, 20 450 sequences had hits with 11 995 arabidopsis proteins, and 14 813 sequences had hits with 9 158 arabidopsis proteins at E value ≤ 1e-5. After searching the descriptions of the 912 differentially expressed genes in the NCBI non-redundant protein database, the descriptions of 847 hits were identical to the The Arabidopsis Information Resource (TAIR) result (836 similar descriptions) from arabidopsis (Supporting File S3).

Gene expression profiling of developing cassava storage roots

Before normalization, the signal intensities of each feature were filtered against negative controls on the array. It was found that 72.13 ± 2.63%, 72.14 ± 2.48%, 68.91 ± 2.24% and 71.61 ± 2.10% of the genes on the chip were expressed in FR, DR, MR and TR, respectively. A comparative study was conducted by comparing gene expression profiles between each of the three selected stages (FR, DR and MR) using a twofold change cutoff (FCC). There were 777, 1 410 and 48 significantly upregulated genes (Figure 2A); 623, 1 349 and 409 significantly downregulated genes (Figure 2B) in DR/FR, MR/FR and MR/DR, respectively, at the cut-off SNR > 2.6 and P-value < 0.05. In total, 3386 differentially expressed genes were defined (Figure 2C, Supporting File S1). When using the random variance model (RVM) method, 912 differentially expressed genes could be identified at a cut-off P-value < 0.05 and false discovery rate (FDR) < 0.1 (Supporting File S2). In total, 742 common genes were detected by the FCC and RVM methods (Figure 2D).

Figure 2.

The number of differentially expressed genes during cassava storage root development.
Upregulated (A) and downregulated (B) genes in paired comparison. DR/FR, developing storage roots/fibrous roots; MR/FR, mature storage roots/fibrous roots; MR/DR, mature storage roots/developing storage roots. (C) Non-redundant up- and downregulated genes. (D) Comparison of two methods. FCC, two-fold change cutoff; RVM, Random variance model; Red and green numbers indicate the up- and downregulated genes, respectively, validated by real time reverse transcription-polymerase chain reaction (RT-PCR) in each subset.

Clustering of and pathways represented by differentially expressed genes

Among the 912 differentially expressed genes associated with Gene IDs in the Kyoto Encyclopedia of Genes and Genomes (KEGG), Biocarta and Reactome, 54 related pathway maps were selected for further analysis. Among these pathways, 25 pathways were considered significant at a cut-off P-value < 0.05 and FDR < 0.05 (Supporting File S4). The enrichment of each pathway was calculated according to the given equation and shown in Figure 3. The FDR of most pathways was not higher than 0.01 with only one exception (the FDR of fatty acid metabolism pathway was 0.0125). These pathways are suggested to be important during the initiation and formation of the storage root in cassava; and some of them belong to glucide metabolism, which includes N-glycan degradation, the pentose phosphate pathway, fructose and mannose metabolism, glycan structure degradation, glycolysis/gluconeogenesis, and starch and sucrose metabolism.

Figure 3.

Significantly enriched pathways of the differentially expressed genes.

A global pathway net was constructed to illustrate the key pathways in the process of root development (Supporting File S5). Twenty out of 25 significant pathways were included in the pathway net (Figure 4), and the five pathways that were omitted are arachidonic acid metabolism, glycan structure degradation, limonene and pinene degradation, N-glycan degradation, and the phosphatidylinositol signaling system. Glycolysis/gluconeogenesis was considered to be the most important node in the net because the component exchanges with other pathways were strongly dependent on its existence.

Figure 4.

Significant and interacting pathways during storage root development in cassava.
The node size was correlated with the number of interacting pathways (Supporting File S5); Red nodes represent 20 significant pathways and blue nodes represent non-significant ones; five isolated significant pathways were absent.

The top hitting locus identifiers of the 11 995 non-redundant arabidopsis genes, together with the corresponded MR/FR ratio (Supporting File S6), were mapped to KEGG arabidopsis pathways using the KegArray tool (Version 1.2.1) for a detailed view of the pathways of interest (Figure 5). A subset of 2 086 identifiers corresponding to the differentially expressed cassava genes (SNR > 2.6 and P-value < 0.05) was also mapped. There were 4 158 and 691 hits, respectively, corresponding to all identified genes and differentially expressed genes. Changes in enrichment were calculated according to the equation: Re = (nf/n)/(Nf/N), where Nf and N were set as 691 and 4158, respectively (Supporting File S7). Among the 25 significant pathways mentioned above, 17 were confirmed by the 2 086 identifiers and 14 pathways exceeded the average enrichment, which is consistent with the results of previous pathway analysis.

Figure 5.

Figure 5.

Regulatory changes in the pathway of glycolysis/gluconeogenesis (A) and starch and sucrose metabolism (B).
Colored rectangles correspond with the cassava genes detected on microarray. Green, downregulated; yellow, nonregulated; red, upregulated.

Figure 5.

Figure 5.

Regulatory changes in the pathway of glycolysis/gluconeogenesis (A) and starch and sucrose metabolism (B).
Colored rectangles correspond with the cassava genes detected on microarray. Green, downregulated; yellow, nonregulated; red, upregulated.

Short time-series clustering revealed that six significant expression patterns were involved in root development (Figure 6). There were 144, 195, 92, 38, 39 and 36 genes (Supporting File S8) clustered in profiles 0 (0, –2, –3), 4 (0, –1, –1), 11 (0, 1, 1), 3 (0, –1, –2), 15 (0, 2, 3) and 2 (0, –1, –3), respectively. Two pairs of opposite profiles were identified: profiles 0 and 15, and profiles 4 and 11. The latter pair of profiles (Figure 6B) contained more genes and was selected for further gene ontology analysis. Functional category enrichment evaluation based on the gene ontology (GO) was performed on the upregulated or downregulated genes assigned to profile 11 and 4, respectively. Significant GO terms were found at a cut-off P-value < 0.05, and the enrichment of each GO term was calculated (see additional file 9). Significantly enriched GO terms in profile 11 and profile 4 are illustrated in Figure 7. Some reliable GO terms are presented for profile 11 (amylopectin biosynthetic process, basipetal auxin transport, response to wounding, response to cytokinin stimulus and the starch metabolic process) and profile 4 (carbohydrate biosynthetic process, response to auxin stimulus, carbohydrate transport, carbohydrate metabolic process and plant-type cell wall loosening). There were three common GO terms considered as basic functions in profiles 11 and 4 (oligopeptide transport, regulation of transcription and its subset, DNA-dependent regulation of transcription). Some interesting relationships were found between profile 11 and 4. For example, basipetal auxin transport was present in the former, and response to auxin stimulus appeared in the latter. Among the three components described by GO annotation (cellular component, molecular function and biological process), the biological processes might be the more relevant aspect of GO with respect to root development. Therefore, only functional clusters belonging to this component are presented in the selected profiles.

Figure 6.

Cluster analyses of differentially expressed genes.
Profiles in color indicate significant ones (A). Green, upregulated; red, downregulated; profile number (up left), trend (line) and P-value (bottom left) in each profile were indicated. The profiles of 92 upregulated (profile 11) and 195 downregulated (profile 4) genes are illustrated (B).

Figure 7.

Figure 7.

Significantly enriched gene ontology (GO) terms in profile 11 and profile 4.
(A) Enrichment of 27 significant GO terms in profile 11.
(B) Enrichment of 40 significant GO terms in profile 4.

Figure 7.

Figure 7.

Significantly enriched gene ontology (GO) terms in profile 11 and profile 4.
(A) Enrichment of 27 significant GO terms in profile 11.
(B) Enrichment of 40 significant GO terms in profile 4.

To verify the results of the GO analysis, 11 995 and 2 086 non-redundant arabidopsis gene locus identifiers were annotated with GO terms in TAIR (, in which 3 448 and 1 507 non-redundant GO IDs were hit, respectively. Changes in enrichment were calculated according to the equation: Re = (nf/n)/(Nf/N), where Nf and N were set as 2 086 and 11 995, respectively (Supporting File S10). Among the 64 significant GO terms in profiles 11 and 4 mentioned above, 62 were found in 3 448 GO IDs and 56 were found in 1 057 GO IDs. Among the 56 GO terms, 42 exceeded the average enrichment.

When arabidopsis gene locus identifiers were mapped with KEGG, a BRITE (Biomolecular Relations in Information Transmission and Expression) (functional hierarchies and ontologies) view of all hits was constructed. Similarly, a functional categorization was presented after annotation in TAIR (Supporting File S11). The expression of many transcription factors was significantly changed during storage root development. A total of 138 genes (182 distinct gene models) were considered to be transcription factors when the 2 086 identifiers were annotated in TAIR. The number of upregulated transcription factors was nearly equal to the number of downregulated ones. The arabidopsis genome has at least 1 922 predicted transcription factors in the Database of arabidopsis Transcription Factors (DATF, and more than 2 192 in the Plant Transcription Factor Database (PlnTFDB, These transcription factors have been classified into 65 or 83 families in DATF and PlnTFDB, respectively. The enrichment of each family was calculated by classification of non-redundant identifiers according to the two databases (Supporting File S12). Several enriched families were identified, such as ARF, C2C2-Dof, CCAAT, CPP, E2F-DP, G2-like and GRF.

Verification of gene expression patterns by quantitative real-time PCR

To validate the microarray data and evaluate the methods of selected differentially expressed genes, real time RT-PCR was performed using the RNA extracted from the three sample replicates at different developmental stages that were used in the microarray experiment. A total of 55 genes were selected for verification, including 42 differentially expressed genes identified by FCC and RVM methods, as indicated in Figure 2D, and 13 genes involved in starch and sucrose metabolism.

Cassava actin was used as the normalization standard. The fold changes (log2 ratio) in DR and MR compared with FR are presented in Table 2. In total, 94.55% of the tested genes were consistent with the microarray analysis.

Table 2. Validation of microarray-based gene expression by real time reverse transcription-polymerase chain reaction (RT-PCR) (55 selected genes in developing storage root (DR) and mature storage root (MR) in comparison with fibrous root (FR))
Sequence name*Tair locusExpectDR-FRMR-FR
  1. Note: All data are shown as log2 ratio, positive and negative value means up- and downregulated in DR or MR compared with FR, respectively. *Contig represent riken_Mes_flcDNA.fa.Contig; quantitative reverse transcription-polymerase chain reaction (qRT-PCR) results shown in bold font (15 values) are the results of those that were dramatically higher or lower (log2 ratio difference > 4) compared with array results and bold italic font indicates inconsistent results (six values) between microarray and qRT-PCR.



Development of cassava storage root has drawn increasing attention from the cassava research community due to the use of cassava as a staple food crop in the tropics and also as feedstock for bio-ethanol production in many countries (Nguyen et al. 2007a, 2007b; Jansson et al. 2009). Recently, a study related to storage root formation was conducted using AFLP-based transcript profiling (Sojikul et al. 2010). In our study, the custom-designed 4×44K long oligonucleotide (60 mer) microarray was developed and used to investigate the genome-wide gene expression profile related to storage root development in cassava cultivar TMS60444. Differentially expressed genes at different developmental stages were identified, and their potentially relevant functions were studied using pathway and GO analyses, in which 25 important pathways were identified as significant ones to be regulated. Important genes related to each individual pathway were identified. The results of real time RT-PCR also confirmed the validity of the microarray, as well as the storage root specific expression patterns. Therefore, our study sheds new light on storage root development in cassava by using the long oligonucleotide (60 -mer) cassava microarray.

High-quality microarrays are a prerequisite for reliable analysis of different biological processes. Different types of microarray, including cDNA (long strands of amplified cDNA sequences), short oligonucleotide (25–30 nt), and long oligonucleotide (50–80 nt), used in transcriptome study may have different feature performances (Petersen et al. 2005; de Reynies et al. 2006; Tsai et al. 2006; Fan 2009; McHale et al. 2009). The custom long oligonucleotide array described here was generated by the Agilent SurePrint ink-jet technology, which provides a flexible platform for revising and updating oligonucleotide probes in the array without additional cost (Hughes et al. 2001; Wolber et al. 2006; Li et al. 2008). Most probes used on the current array are represented in the most recent draft of the cassava genome ( In addition, the 4×44K platform used for the array design contains four independent arrays in one slide; this arrangement is cost-effective and can reduce the variation among the arrays within a slide because high background levels in an array might obscure the signal from low-expressed genes and impede accurate quantification. The average SNR of the current microarray was 602.30, which was much higher than that of most cDNA array platforms (35.1 to 38.3). High SNR will promote sufficient signal generation for the detection of even low copy genes.

Two statistical criteria have been applied in the current analysis. Several thousand differentially expressed genes were identified in a pair-comparison at P < 0.05. Because more than 20 000 genes were analyzed in this microarray experiment, it is important to control the proportion of false positives (Tsai et al. 2003). The FDR based on P-values is the expected proportion of true null hypotheses that will be rejected in relation to the total number of null hypotheses that will be rejected (Benjamini and Hochberg 1995). The FDR is a more convenient and natural scale than the P-value scale, and it can provide the probability that a gene value is a false positive (Pawitan et al. 2005). In this study, the false discovery rates of differentially expressed genes selected by the RVM method were controlled at less than 10%, which guaranteed the reliable results of the current microarray experiment. The FDR was also calculated in the pathway analyses, and it was used in the GO analyses for correcting P-values (Dupuy et al. 2007). Quantitative real time PCR has become the gold standard for measuring gene expression, and it is generally used to validate microarray results (Dallas et al. 2005). With a criterion of P < 0.05 in the microarray analysis, the false positives could be effectively controlled (94.55% of qRT-PCR results were consistent with microarray data). The results indicate that microarray analyses in the present study are statistically reliable and accurate.

Gene expression profiling could provide valuable information related to the biological process during starchy storage root development, and these processes are expected to be conserved in storage root-bearing species e.g., sweet potato and yams. Starch accumulation is obviously the main theme in the process, but how this is achieved is still unclear. In the present study, there were 25 significant enriched pathways, and many of them are related to glucide metabolism, which is strongly correlated to starch accumulation and storage root bulking. In parallel, zeatin biosynthesis was highlighted because it is not only related to the promotion of root cell propagation, elongation and enlargement, resulting in sufficient room for starch granule accumulation and tuberization (Melis and Vanstaden 1985; Gibson 2004), but it is also needed for amyloplast formation and starch accumulation (Miyazawa et al. 2002; Bishopp et al. 2009). Lipid biosynthesis is required to support cell propagation and jasmonic acid biosynthesis. The regulation of tuberization in potato and yam by jasmonates has been reported (Koda and Kikuta 1991; Koda et al. 1991; Vandenberg and Ewing 1991; Ovono et al. 2010). For signal transduction, the inositol phosphate-calcium signaling system may play a comprehensive role in the tuberization process (Cenzano et al. 2008). Furthermore, several secondary metabolic processes, such as flavonol biosynthesis, were also found to be important for plant growth and development (Taylor et al. 2004; Besseau et al. 2007). Based on our study, a molecular mode of storage root development was constructed (Figure 8).

Figure 8.

Schematic illustration representing microarray analysis of starchy storage root development related to major biological events in cassava.
Downregulated reactions are marked by dashed arrows; upregulated ones are indicated by bold arrows. Patterns of differentially expressed genes (green for downregulated and red for upregulated) in carbon flux were shown in boxes. The regulation and developmental processes of tuberization are indicated under the structural scenario. AGPase, ADP-glucose pyrophosphorylase; GP, glucan phosphorylase; SBE, starch branching enzyme.

In the two opposite development-associated expression patterns (profile 11 and profile 4), three common GO terms were identified (oligopeptide transport, regulation of transcription and DNA-dependent regulation of transcription). The array data resulted in fewer differentially expressed genes (Figure 2A, B) between DR and MR, which suggests that profile 11 and profile 4 are root development-associated. The results of the GO analysis also support this hypothesis. For example, basipetal auxin transport appeared in profile 11 (upregulated pattern), whereas response to auxin stimulus was found in profile 4 (downregulated pattern). Interestingly, metal ion transport and copper ion transport were also highlighted in profile 11, which is highly consistent with the physiological requirement of cassava for copper (Chew et al. 1978). Furthermore, defense response, response to abiotic stimulus, and response to wounding were also noticeable, possibly due to stress responses that occurred during sample collection.

When pathways of interest (glycolysis/gluconeogenesis, starch and sucrose metabolism) were examined (Figure 5), several genes that were either upregulated or downregulated based on the array were subsequently validated by qRT-PCR (Table 2). These genes included six genes involved in glycolysis/gluconeogenesis (Contig4784, AtALDH7B4; Contig6596, AtALDH2B4; Contig6973, AtLOS2; Contig6535, AtALDH2B7; CAS01_007_P13.f, Arabidopsis L-lactate dehydrogenase; DV453892, Arabidopsis Enolase) and seven genes involved in starch and sucrose metabolism (Contig2017, AtAPL3; TA9083_3983, Arabidopsis Glucan phosphorylase; TA5570_3983, AtSBE2.1; Contig5561, AtADG1; TA8522_3983, AtSBE2.2; Contig4441, Arabidopsis Pectinesterase; BM259732, AtATPME2). When we mapped the qRT-PCR ratios (DR/FR and MR/FR) of these 13 genes to KEGG pathways, upregulations of ADP-glucose pyrophosphorylase, glucan phosphorylase and starch branching enzyme were observed, indicating that these are the key enzymes required for starch accumulation in the storage root. ADP-glucose pyrophosphorylase (AGPase), which catalyzes the rate-limiting step in starch biosynthesis in plants, is strongly associated with the yield production both in grains and root crops (Reviewed by Smith 2008 and Zeeman et al. 2010). The downregulation of pectinesterase was also considered to be important because it blocks the carbon flowing to pectate. In glycolysis/gluconeogenesis, downregulation of enolase, L-lactate dehydrogenase and aldehyde dehydrogenase (NAD+) could slow down the entry of carbon into the citrate cycle, pyruvate metabolism and propanoate metabolism, leading to less α-D-glucose-6P to be converted to glycerate-3P and α-D-glucose-1P. This would result in most of the α-D-glucose-6P being transported into the amyloplast for starch and sucrose metabolism. This result may also suggest that these enzymes are rate-limiting in the two important pathways in starchy root formation (Figure 8). These findings may facilitate the engineering of cassava for enhanced starch accumulation and deeper understanding of the interested biological process, the starchy root formation (Figure 8). By comparison with the report by Sojikul et al. (2010), in which only 157 transcript-derived fragments were indentified between leaves and roots using cDNA-AFLP, the present study was able to distinguish a large amount of differentially expressed genes among the three types of roots, giving more interesting findings related to the storage root tuberization.

Several enriched transcription factor families, such as ARF, C2C2-Dof, CCAAT, CPP, E2F-DP, G2-like and GRF, were identified in the present study. C2C2-Dof (Yanagisawa 2002, 2004), G2-like (Bravo-Garcia et al. 2009) and GRF (Kim et al. 2003) are of particular interest. Although their roles related to starchy storage root development have not been determined, it is possible to narrow down the gene candidates to study the comprehensive biological process by developing a hypothetical model. Recently, a transcriptional factor called RSR1 belonging to the AP2/EREBP family of transcription factors was found to regulate starch biosynthesis in rice endosperm (Fu and Xue 2010) and another transcription factor MADS1 was involved in tuberous root initiation in sweet potato (Ku et al. 2008).

In conclusion, gene expression during the process of cassava starchy root formation was characterized using the newly developed cassava microarray, and putative rate-limiting enzymes in key pathways have been highlighted, which provides potential targets for cassava genetic engineering. The platform will provide a valuable resource for the scientific community to study the developmental biology, stress response, virus resistance, genetics and genomics in cassava.

Materials and Methods

Plant materials

The cultivar TMS60444, which is frequently used for genetic engineering, was used in the present study. The in vitro plants were transferred to pots for one month of growth in a greenhouse and then planted in a field. Fibrous roots (FR), developing storage roots (DR) and mature storage roots (MR, Supplemental Figure S1) were collected from three independent healthy 4-month-old cassava plants in the field and submerged into liquid nitrogen immediately. All samples were maintained in liquid nitrogen during transportation, and they were stored in an ultra-freezer (−80 °C) until RNA extraction.

Generation of a cassava oligonucleotide microarray

The microarray design was based on the sequence information from a large collection of cassava ESTs from NCBI (71,520 ESTs of cassava, released 28 March 2008) and TIGR (Manihot_esculenta_release_5, released 1 June 2007; 5 189 assemblies, 10 214 singletons) as well as a 35 400 full-length cDNA RIKEN library (Sakurai et al. 2007). Phrap ( and BLAT, the BLAST-like alignment tool (Kent 2002), were employed in sequence analyses. The design of the 60 mer oligonucleotide probes and the microarray were performed using Agilent technology.

Hybridization and data extraction

Total RNA was extracted from root samples, including FR, DR and MR, using the RNeasy Mini Kit (Qiagen, Valencia, CA, USA). Two RNA samples extracted from stored storage root slices were used as technical repeats (TR) for quality control. RNA quality was checked on a 1% agarose gel using an RNase-free electrophoresis system. RNA labeling and hybridization were conducted by the Shanghai Biochip Corporation (Shanghai, China) following the manufacturer's protocols. Arrays were incubated at 65 °C for 17 h in Agilent hybridization chambers (G2545A) and then washed according to the protocol at room temperature. Hybridized microarray slides were scanned at 5 μm resolution with an Agilent Technologies Scanner (G2505B), and images were saved in JPG format. Both 10% and 100% photomultiplier tube (PMT) settings were selected, and combined images were exported. The signal intensities of all spots on each image were quantified by Agilent Feature Extraction software, and data were saved as .txt files for further analysis.

Normalization and differential gene definition

Two types of statistical analyses were used. First, the signal intensity of each gene was globally normalized with the GeneSpring GX Software (Agilent Technologies) following the work flow guide (Bolstad et al. 2003). The signal-to-noise ratio (SNR) was calculated using the difference of the median signal minus the background median signal, divided by the background standard deviation (Leiske et al. 2006). Pair comparison was used to analyze the normalized and averaged data from the three types of samples (FR, DR and MR). P-values from T tests and fold changes between each comparison for each gene were calculated. Genes induced or suppressed greater than a twofold ratio (twofold change cutoff, FCC) were taken as differentially expressed when SNR > 2.6 and P-value < 0.05. Second, raw data were normalized using LOWESS within the R statistics package. The RVM (random variance model) corrective anova was used to analyze the normalized data from the three different samples (FR, DR and MR). The P-values and FDR were calculated by the R program according to Benjamini and Hochberg's method (Benjamini and Hochberg 1995). Genes were taken as differentially expressed when both P-value < 0.05 and FDR < 0.1 (Wright and Simon 2003).


All non-redundant sequences that were considered to be unique cassava genes were locally blasted in the TAIR protein database (27 217 arabidopsis protein sequences), which was downloaded from, using the blastx program in the blastall package (version 2.2.9). The top hits were used for gene annotation, and the corresponding arabidopsis gene locus identifiers were mapped to the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways using the KegArray tool (Version 1.2.1). Differential genes identified using the random variance model (RVM) method were locally blasted against the NCBI non-redundant protein database using the blastx program in the blastall package. All the hits were recorded to confirm the gene annotations from the arabidopsis protein database, and accession numbers were assigned and used to search Entrez IDs using gene2accession in NCBI. Entrez IDs were recorded and used to search Gene IDs in the Gene Ontology database ( for functional categorization of differentially expressed genes. In addition, the Entrez IDs were converted to Gene IDs in the KEGG database (, Biocarta ( and Reactome ( The significant pathways were identified based on KEGG, Biocarta and Reactome. Fisher's exact test (Yi 2006) was used to select the significant pathways, and the threshold of significance was defined by P-value < 0.05 and FDR < 0.05. The enrichment Re was given by the following: Re = (nf/n)/(Nf/N), where nf is the number of differential genes within a particular pathway, n is the total number of genes within the same pathway, Nf is the number of differential genes on the entire microarray, and N is the total number of genes on the microarray. Pathway-net was built according to the direct or systemic interactions between pathways in the KEGG database. Using a strategy for clustering short time-series (STC) gene expression data (Ernst et al. 2005), some unique profiles were defined. Significant profiles that had a higher probability than expected were identified using Fisher's exact test and multiple comparisons (Miller et al. 2002; Ramoni et al. 2002). GO-analysis was applied to the genes belonging to specific profiles (Ashburner et al. 2000; Harris et al. 2006). Generally, Fisher's exact test and the χ2 test were used to classify the GO category, and the FDR (Dupuy et al. 2007) was calculated to correct the P-value. The FDR was defined as FDR= 1 −Nk/T, where Nk refers to the number of Fisher's tests with P-values less than those calculated using the χ2 test. Within the significant category, the enrichment Re was given by: Re = (nf/n)/(Nf/N), where nf is the number of differential genes within the particular category, n is the total number of genes within the same category, Nf is the number of differential genes on the entire microarray, and N is the total number of genes on the microarray (Schlitt et al. 2003).

Quantitative real-time PCR

To validate the array data, the expression of 55 genes of interest were confirmed using quantitative real-time PCR (qRT-PCR) with the cassava RNA samples extracted using Plant RNA Reagent (Invitrogen, Cat. No. 12322-012). DNA was removed from the samples using DNase I (TaKaRa, Cat. No. D2215) treatment according to the manufacturer's protocol. The RNA quantity and purity were determined using a NanoDrop ND-1000 spectrophotometer (Nano Drop Technologies, Wilmington, DE, USA). A 2 μg aliquot of total RNA was used to synthesize first-strand cDNA using the ReverTra Ace (TOYOBO, Code: TRT-101) in a 20 μL reaction volume. The qRT-PCR primers were designed with Primer3Plus ( The PCR reactions were performed in a 20 μL volume containing a 2×SYBR Green Master Mix (TOYOBO, Code: QPK-201), 50 ng cDNA, 400 nM of forward primer, and 400 nM of reverse primer in a Bio-Rad CFX96 thermocycler. The amplification conditions were 95 °C for 1 min, followed by 40–50 cycles of 95 °C for 15 s and 61 °C for 30 s. Beta-actin was used as the internal control. All of the samples were measured in triplicate. The comparative Ct method was used to calculate the relative gene expression levels across the samples. The relative expression level of each gene in one sample (ΔCt) was calculated as follows: Ct target gene – Ct beta-actin. The relative expression of each gene in two different samples (ΔΔCt) was calculated as follows: ΔCt (sample 1) –ΔCt (sample 2).

(Co-Editor: Hai-Chun Jing)


This work was supported by grants from the National Basic Research Program (2010CB126605), the National High Technology Research and Development Program of China (2009AA10Z102), the Earmarked Fund for Modern Agro-industry Technology Research System (nycytx-17), the Chinese Academy of Sciences (KSCX2-EW-J-12) and Shanghai Municipal Afforestation & City Appearance and Environmental Sanitation Administration (G102410). Wenzhi Zhou is acknowledged for downloading and checking the sequences of the latest cassava genome draft.

Utilization of the microarray

MIAME information about the cassava transcriptome microarray used here has been deposited in the Gene Expression Omnibus (GEO) of NCBI (Edgar and Barrett 2006; Barrett et al. 2007). The accession numbers are: Platform, GPL11271; Series, GSE25813; Samples, GSM634255-GSM634265.